Dear B, I went through a similar thing. It
took about five years to get over. So I am going
to say things to you that other people might
not say. Other people may not have gone
through what you and I have gone through. I
am going to say things that may sound harsh
and dogmatic or unforgiving. You talk like an
addict. You concentrate on the feeling that she
gave you and how you've lost it and crave it
and fear you'll never get it back. You're living
without her and have nothing to replace her
with. Nothing else will give you that feeling.
She was your heroine and your heroin. What
you did when you were with her sounds like
what we do when we're high. You talk like an
addict who's white-knuckling it. You abused
her, as it were, the way one abuses a
substance. You took too much of her. You got
drunk on her. When she ran dry you needed
more. So you went out and got more. You
didn't care where you got or who you got from
or how it made her feel. What you felt for her
was not as much love for her as love for what
she contained, what she promised, what she
brought to you. You loved her like an addict
loves the bottle the whiskey comes in. You
loved the whiskey that she was. You loved the
high she gave you. Maybe this sounds harsh.
The phenomena you describe -- the
connectedness, the paranormal awareness, the
intoxication -- sound like the romantic love
that poets talk about. She understood you like
no one else. She understood you in a way that
could be magical or could be pathological.
Your love of this understanding of you, this
also could be romantic or could be
pathological. Your love of this one feeling is a
love of feeling understood. Addiction and love
are different. Love is hard. Addiction is a
better high. But addiction will destroy you.
Addiction will leave you wanting more. Just
look at you. You are still reeling. You are still
craving that future of limitless highs,
smothered in the butter of her endless
fascination. To dwell on this will only excite
more hunger, more desire, more self-
interested seeking after conquest and orgasm
and acceptance and adrenaline and intimacy
and release. You saw it yourself: As great as
she was, she was not enough. She was not
enough because it wasn't about her. It was
about your high. You were using her to get
high. To maintain the high, you found the high
wherever you could find it. In doing so, you
hurt her. You professed to not know why you
hurt her, because to admit why you hurt her
would be to admit that she was an object.
Since she was an object, in a sense you tried
to destroy her. We attempt to destroy the
objects of our addiction. In our delusion, we
see them as the cause of our addiction. I
suggest you consider the possibility that you
have a kind of sex and love addiction. There
are groups that deal with this. I suggest that
you consider the possibility that in being the
wonderful person you are, talented and
successful and creative, you are, like so many
of us talented and successful and creative
people, deeply flawed in the classic way,
flawed like Byron and Jim Morrison, flawed
like Shakespeare and Don Juan, flawed like
Richard Burton, flawed like JFK and Bill
Clinton, flawed like a rock star, flawed like
Sinatra. This flaw will not leave you in the
physical gutter the way an alcohol addiction
will. It leaves you in a cultural gutter, despised
by women and men alike, outcast, unable to
live within society's rules, unable take care of
your family. So it's a tricky thing to figure out,
whether you're just a red-blooded cocksman
or an addict. It's a hard thing to figure out. But
what you're telling me, this is what it sounds
like: Your psychological structure drives you
to seek a high through other people, and this
self-centered seeking of a high prevents you
from genuinely encountering the other. The
other will not make you high. The other will
not always be psychically connected to you.
The other will not always have the same
dreams. The other is genuinely the other. You
say, quite understandably, "I'm well
acquainted with the concepts of accepting
responsibility, grieving, growing and moving
on, I actually understand how to do all of that.
The problem here is that I don't want to do
that last part." Well, my friend, no addict
wants to move on. No addict wants to willingly
give up the only thing that makes him feel OK.
But it doesn't matter what you want. Nor does
it matter what you think. You think you know
how to grieve and move on. You think you
know how to do these things. But grieving and
moving on are not things that you do. They are
things that happen to you. You might know
how to walk in the rain. But you don't decide
when it rains or how long it rains or how hard
it rains. You just carry an umbrella and keep
your head down. Here is the other thing that is
paradoxical but makes it sound like an
addiction to me: You are so great in every
other way. What this says is not, "Gee, then
how could he be an addict?" but, "Gee, so of
course he's an addict!" The essential
characteristic is present: The split. The void.
The discomfort. The space between your
perfection and your craven emptiness, your
public satisfaction and secret craving, your
placid exterior and need for a high, your
intellectual competence and spiritual longing.
After years of Lacanian psychoanalysis, you
might actually be able to be with the woman
you love. But this also may be some kind of
dream akin to the dream alcoholics have of
one day drinking normally. It is more likely
that you will have to learn to live with a new
way of relating to women that is more difficult,
more nuanced and more centered in the give-
and-take between two independent, self-
governing individuals. It doesn't mean you
can't have sex and love. But it means that the
fundamental mechanism that you have
mistaken for love must be altered. She's gone.
She's gone and it's over. What you are left with
is yourself.
Friday, January 28, 2011
Monday, January 17, 2011
FYP Post 3: Finding relations betweeen nouns
I am trying to create a hierarchy tree for my nouns but somehow it doesn't seem to go well
 
>>> sentence = [('The', 'DT'), ('average', 'JJ'), ('compact', 'NN'), ('point-and-shoot', 'NN'), ('camera', 'NN'), ('has', 'VBZ'), ('a', 'DT'), ('4x', 'CD'), ('or', 'CC'), ('5x', 'CD'), ('zoom', 'NN'), ('lens', 'NNS'), ('in', 'IN'), ('it', 'PRP')]
>>> print cp.parse(sentence)
(S
The/DT
average/JJ
(NP compact/NN point-and-shoot/NN camera/NN)
has/VBZ
a/DT
4x/CD
or/CC
5x/CD
zoom/NN
lens/NNS
in/IN
it/PRP)
>>> tree = nltk.tree('NP', ['camera'])
>>> sentence = [('The', 'DT'), ('average', 'JJ'), ('compact', 'NN'), ('point-and-shoot', 'NN'), ('camera', 'NN'), ('has', 'VBZ'), ('a', 'DT'), ('4x', 'CD'), ('or', 'CC'), ('5x', 'CD'), ('zoom', 'NN'), ('lens', 'NNS'), ('in', 'IN'), ('it', 'PRP')]
>>> print cp.parse(sentence)
(S
The/DT
average/JJ
(NP compact/NN point-and-shoot/NN camera/NN)
has/VBZ
a/DT
4x/CD
or/CC
5x/CD
zoom/NN
lens/NNS
in/IN
it/PRP)
>>> tree = nltk.tree('NP', ['camera'])
Sunday, January 9, 2011
Probably my last post with regards to the saga
Jessica, Well, I guess forgetting you is not an option...you are too ingrained in my memory. Even though I hardly know you, I know I have put you on a pedestal and ignore all the flaws that you possibly have. I saw you as an angel, though I never touched the heavens through you. I have entombed myself in a cold prison, together with memories of you. Perhaps feelings will disappear over time. But the memories will remain. I do not want to know whether I am in your good books or not. I will remain as a person who will wish you well from the sidelines. I messed up, and I have since learned well. I wish you a wonderful life ahead, and I hope your dreams will be fulfilled in time to come.
Till then
Goodbye
Till then
Goodbye
Wednesday, January 5, 2011
FYP Post 3: Chaining of multiple commands
I have just completed a new command called chaining, which allows me to segment the raw sentences and break them down into parts of speeches. Here's the code.
>>> document = "The average compact point-and-shoot camera has a 4x or 5x zoom lens in it. Anyone who's ever tried with one of these cameras to get in closer to a player on the field at a sporting event or snap off a shot of a bird without scaring it away knows that such a short range doesn't cut it. The obvious solution is to get something with a longer zoom, but that means a larger camera, too, that might no longer fit in your pocket. Or could it? With more megapixels waning as a marketable feature, increased optical zooms and/or wide-angle lenses are supplanting that spec on compact cameras. If all you want is to get in a little tighter on the action, the 14-megapixel Sony Cyber-shot DSC-W370's 7x optical zoom or 12-megapixel Panasonic Lumix DMC-ZR3's 8x zoom lens should satisfy. Those who are really tired of their short zooms will probably want to step up to a 10x zoom lens like those on the Casio Exilim EX-FH100 and Sony Cyber-shot DSC-H55. They can be a little uncomfortable for smaller pockets, but can definitely fit in a jacket pocket or bag. Reaching a little farther beyond those is the Panasonic DMC-ZS5, which packs a wide-angle lens with a 12x zoom range into what's still a fairly compact body and the 14x Canon PowerShot SX210 IS. There are a couple things to keep in mind about these cameras. Longer lenses, as well as wider ones, typically cause a bit of distortion, which some models correct for when processing photos. Plus, because the lens glass isn't always top quality, the images from these cameras can be soft. Also, generally speaking, the longer the lens, the slower the camera's performance. We've learned not to expect really fast start-up and shot-to-shot times on models with a 10x zoom or greater."
>>> def ie_preprocess(document):
sentences = nltk.sent_tokenize(document)
sentences = [nltk.word_tokenize(sent) for sent in sentences]
sentences = [nltk.pos_tag(sent) for sent in sentences]
return sentences
>>> ie_preprocess(document)
[[('The', 'DT'), ('average', 'JJ'), ('compact', 'NN'), ('point-and-shoot', 'NN'), ('camera', 'NN'), ('has', 'VBZ'), ('a', 'DT'), ('4x', 'CD'), ('or', 'CC'), ('5x', 'CD'), ('zoom', 'NN'), ('lens', 'NNS'), ('in', 'IN'), ('it', 'PRP'), ('.', '.')], [('Anyone', 'NN'), ('who', 'WP'), ("'s", 'VBZ'), ('ever', 'RB'), ('tried', 'VBN'), ('with', 'IN'), ('one', 'CD'), ('of', 'IN'), ('these', 'DT'), ('cameras', 'NNS'), ('to', 'TO'), ('get', 'VB'), ('in', 'IN'), ('closer', 'JJR'), ('to', 'TO'), ('a', 'DT'), ('player', 'NN'), ('on', 'IN'), ('the', 'DT'), ('field', 'NN'), ('at', 'IN'), ('a', 'DT'), ('sporting', 'NN'), ('event', 'NN'), ('or', 'CC'), ('snap', 'VB'), ('off', 'RP'), ('a', 'DT'), ('shot', 'NN'), ('of', 'IN'), ('a', 'DT'), ('bird', 'JJ'), ('without', 'IN'), ('scaring', 'NN'), ('it', 'PRP'), ('away', 'RB'), ('knows', 'VBZ'), ('that', 'IN'), ('such', 'JJ'), ('a', 'DT'), ('short', 'JJ'), ('range', 'NN'), ('does', 'VBZ'), ("n't", 'RB'), ('cut', 'VB'), ('it', 'PRP'), ('.', '.')], [('The', 'DT'), ('obvious', 'JJ'), ('solution', 'NN'), ('is', 'VBZ'), ('to', 'TO'), ('get', 'VB'), ('something', 'NN'), ('with', 'IN'), ('a', 'DT'), ('longer', 'JJR'), ('zoom', 'NN'), (',', ','), ('but', 'CC'), ('that', 'IN'), ('means', 'NNS'), ('a', 'DT'), ('larger', 'JJR'), ('camera', 'NN'), (',', ','), ('too', 'RB'), (',', ','), ('that', 'IN'), ('might', 'MD'), ('no', 'RB'), ('longer', 'RBR'), ('fit', 'JJ'), ('in', 'IN'), ('your', 'PRP$'), ('pocket', 'NN'), ('.', '.')], [('Or', 'CC'), ('could', 'MD'), ('it', 'PRP'), ('?', '.')], [('With', 'IN'), ('more', 'JJR'), ('megapixels', 'NNS'), ('waning', 'VBG'), ('as', 'IN'), ('a', 'DT'), ('marketable', 'JJ'), ('feature', 'NN'), (',', ','), ('increased', 'VBD'), ('optical', 'JJ'), ('zooms', 'NNS'), ('and/or', 'JJ'), ('wide-angle', 'JJ'), ('lenses', 'NNS'), ('are', 'VBP'), ('supplanting', 'VBG'), ('that', 'IN'), ('spec', 'NN'), ('on', 'IN'), ('compact', 'NN'), ('cameras', 'NNS'), ('.', '.')], [('If', 'IN'), ('all', 'DT'), ('you', 'PRP'), ('want', 'VBP'), ('is', 'VBZ'), ('to', 'TO'), ('get', 'VB'), ('in', 'IN'), ('a', 'DT'), ('little', 'RB'), ('tighter', 'NN'), ('on', 'IN'), ('the', 'DT'), ('action', 'NN'), (',', ','), ('the', 'DT'), ('14-megapixel', 'JJ'), ('Sony', 'NNP'), ('Cyber-shot', 'JJ'), ('DSC-W370', '-NONE-'), ("'s", 'VBZ'), ('7x', 'CD'), ('optical', 'JJ'), ('zoom', 'NN'), ('or', 'CC'), ('12-megapixel', 'CD'), ('Panasonic', 'NNP'), ('Lumix', 'NNP'), ('DMC-ZR3', 'NNP'), ("'s", 'POS'), ('8x', 'CD'), ('zoom', 'NN'), ('lens', 'NNS'), ('should', 'MD'), ('satisfy', 'VB'), ('.', '.')], [('Those', 'DT'), ('who', 'WP'), ('are', 'VBP'), ('really', 'RB'), ('tired', 'VBN'), ('of', 'IN'), ('their', 'PRP$'), ('short', 'JJ'), ('zooms', 'NNS'), ('will', 'MD'), ('probably', 'RB'), ('want', 'VB'), ('to', 'TO'), ('step', 'VB'), ('up', 'RP'), ('to', 'TO'), ('a', 'DT'), ('10x', 'CD'), ('zoom', 'NN'), ('lens', 'NNS'), ('like', 'IN'), ('those', 'DT'), ('on', 'IN'), ('the', 'DT'), ('Casio', 'NNP'), ('Exilim', 'NNP'), ('EX-FH100', 'NNP'), ('and', 'CC'), ('Sony', 'NNP'), ('Cyber-shot', 'NNP'), ('DSC-H55', 'NNP'), ('.', '.')], [('They', 'PRP'), ('can', 'MD'), ('be', 'VB'), ('a', 'DT'), ('little', 'RB'), ('uncomfortable', 'JJ'), ('for', 'IN'), ('smaller', 'JJR'), ('pockets', 'NNS'), (',', ','), ('but', 'CC'), ('can', 'MD'), ('definitely', 'RB'), ('fit', 'VB'), ('in', 'IN'), ('a', 'DT'), ('jacket', 'NN'), ('pocket', 'NN'), ('or', 'CC'), ('bag', 'NN'), ('.', '.')], [('Reaching', 'VBG'), ('a', 'DT'), ('little', 'RB'), ('farther', 'RBR'), ('beyond', 'IN'), ('those', 'DT'), ('is', 'VBZ'), ('the', 'DT'), ('Panasonic', 'NNP'), ('DMC-ZS5', 'NNP'), (',', ','), ('which', 'WDT'), ('packs', 'NNS'), ('a', 'DT'), ('wide-angle', 'JJ'), ('lens', 'NN'), ('with', 'IN'), ('a', 'DT'), ('12x', 'CD'), ('zoom', 'NN'), ('range', 'NN'), ('into', 'IN'), ('what', 'WP'), ("'s", 'POS'), ('still', 'RB'), ('a', 'DT'), ('fairly', 'RB'), ('compact', 'JJ'), ('body', 'NN'), ('and', 'CC'), ('the', 'DT'), ('14x', 'CD'), ('Canon', 'NNP'), ('PowerShot', 'NNP'), ('SX210', 'NNP'), ('IS', 'NNP'), ('.', '.')], [('There', 'EX'), ('are', 'VBP'), ('a', 'DT'), ('couple', 'NN'), ('things', 'NNS'), ('to', 'TO'), ('keep', 'VB'), ('in', 'IN'), ('mind', 'NN'), ('about', 'IN'), ('these', 'DT'), ('cameras', 'NNS'), ('.', '.')], [('Longer', 'NNP'), ('lenses', 'NNS'), (',', ','), ('as', 'IN'), ('well', 'RB'), ('as', 'IN'), ('wider', 'NN'), ('ones', 'NNS'), (',', ','), ('typically', 'RB'), ('cause', 'VB'), ('a', 'DT'), ('bit', 'NN'), ('of', 'IN'), ('distortion', 'NN'), (',', ','), ('which', 'WDT'), ('some', 'DT'), ('models', 'NNS'), ('correct', 'VBP'), ('for', 'IN'), ('when', 'WRB'), ('processing', 'VBG'), ('photos', 'NNS'), ('.', '.')], [('Plus', 'NNP'), (',', ','), ('because', 'IN'), ('the', 'DT'), ('lens', 'NNS'), ('glass', 'NN'), ('is', 'VBZ'), ("n't", 'RB'), ('always', 'RB'), ('top', 'VB'), ('quality', 'NN'), (',', ','), ('the', 'DT'), ('images', 'NNS'), ('from', 'IN'), ('these', 'DT'), ('cameras', 'NNS'), ('can', 'MD'), ('be', 'VB'), ('soft', 'VBN'), ('.', '.')], [('Also', 'RB'), (',', ','), ('generally', 'RB'), ('speaking', 'VBG'), (',', ','), ('the', 'DT'), ('longer', 'NN'), ('the', 'DT'), ('lens', 'NN'), (',', ','), ('the', 'DT'), ('slower', 'NN'), ('the', 'DT'), ('camera', 'NN'), ("'s", 'POS'), ('performance', 'NN'), ('.', '.')], [('We', 'PRP'), ("'ve", 'VBP'), ('learned', 'VBN'), ('not', 'RB'), ('to', 'TO'), ('expect', 'VB'), ('really', 'RB'), ('fast', 'JJ'), ('start-up', 'NN'), ('and', 'CC'), ('shot-to-shot', 'JJ'), ('times', 'NNS'), ('on', 'IN'), ('models', 'NNS'), ('with', 'IN'), ('a', 'DT'), ('10x', 'CD'), ('zoom', 'NN'), ('or', 'CC'), ('greater', 'JJR'), ('.', '.')]]
>>> document = "The average compact point-and-shoot camera has a 4x or 5x zoom lens in it. Anyone who's ever tried with one of these cameras to get in closer to a player on the field at a sporting event or snap off a shot of a bird without scaring it away knows that such a short range doesn't cut it. The obvious solution is to get something with a longer zoom, but that means a larger camera, too, that might no longer fit in your pocket. Or could it? With more megapixels waning as a marketable feature, increased optical zooms and/or wide-angle lenses are supplanting that spec on compact cameras. If all you want is to get in a little tighter on the action, the 14-megapixel Sony Cyber-shot DSC-W370's 7x optical zoom or 12-megapixel Panasonic Lumix DMC-ZR3's 8x zoom lens should satisfy. Those who are really tired of their short zooms will probably want to step up to a 10x zoom lens like those on the Casio Exilim EX-FH100 and Sony Cyber-shot DSC-H55. They can be a little uncomfortable for smaller pockets, but can definitely fit in a jacket pocket or bag. Reaching a little farther beyond those is the Panasonic DMC-ZS5, which packs a wide-angle lens with a 12x zoom range into what's still a fairly compact body and the 14x Canon PowerShot SX210 IS. There are a couple things to keep in mind about these cameras. Longer lenses, as well as wider ones, typically cause a bit of distortion, which some models correct for when processing photos. Plus, because the lens glass isn't always top quality, the images from these cameras can be soft. Also, generally speaking, the longer the lens, the slower the camera's performance. We've learned not to expect really fast start-up and shot-to-shot times on models with a 10x zoom or greater."
>>> def ie_preprocess(document):
sentences = nltk.sent_tokenize(document)
sentences = [nltk.word_tokenize(sent) for sent in sentences]
sentences = [nltk.pos_tag(sent) for sent in sentences]
return sentences
>>> ie_preprocess(document)
[[('The', 'DT'), ('average', 'JJ'), ('compact', 'NN'), ('point-and-shoot', 'NN'), ('camera', 'NN'), ('has', 'VBZ'), ('a', 'DT'), ('4x', 'CD'), ('or', 'CC'), ('5x', 'CD'), ('zoom', 'NN'), ('lens', 'NNS'), ('in', 'IN'), ('it', 'PRP'), ('.', '.')], [('Anyone', 'NN'), ('who', 'WP'), ("'s", 'VBZ'), ('ever', 'RB'), ('tried', 'VBN'), ('with', 'IN'), ('one', 'CD'), ('of', 'IN'), ('these', 'DT'), ('cameras', 'NNS'), ('to', 'TO'), ('get', 'VB'), ('in', 'IN'), ('closer', 'JJR'), ('to', 'TO'), ('a', 'DT'), ('player', 'NN'), ('on', 'IN'), ('the', 'DT'), ('field', 'NN'), ('at', 'IN'), ('a', 'DT'), ('sporting', 'NN'), ('event', 'NN'), ('or', 'CC'), ('snap', 'VB'), ('off', 'RP'), ('a', 'DT'), ('shot', 'NN'), ('of', 'IN'), ('a', 'DT'), ('bird', 'JJ'), ('without', 'IN'), ('scaring', 'NN'), ('it', 'PRP'), ('away', 'RB'), ('knows', 'VBZ'), ('that', 'IN'), ('such', 'JJ'), ('a', 'DT'), ('short', 'JJ'), ('range', 'NN'), ('does', 'VBZ'), ("n't", 'RB'), ('cut', 'VB'), ('it', 'PRP'), ('.', '.')], [('The', 'DT'), ('obvious', 'JJ'), ('solution', 'NN'), ('is', 'VBZ'), ('to', 'TO'), ('get', 'VB'), ('something', 'NN'), ('with', 'IN'), ('a', 'DT'), ('longer', 'JJR'), ('zoom', 'NN'), (',', ','), ('but', 'CC'), ('that', 'IN'), ('means', 'NNS'), ('a', 'DT'), ('larger', 'JJR'), ('camera', 'NN'), (',', ','), ('too', 'RB'), (',', ','), ('that', 'IN'), ('might', 'MD'), ('no', 'RB'), ('longer', 'RBR'), ('fit', 'JJ'), ('in', 'IN'), ('your', 'PRP$'), ('pocket', 'NN'), ('.', '.')], [('Or', 'CC'), ('could', 'MD'), ('it', 'PRP'), ('?', '.')], [('With', 'IN'), ('more', 'JJR'), ('megapixels', 'NNS'), ('waning', 'VBG'), ('as', 'IN'), ('a', 'DT'), ('marketable', 'JJ'), ('feature', 'NN'), (',', ','), ('increased', 'VBD'), ('optical', 'JJ'), ('zooms', 'NNS'), ('and/or', 'JJ'), ('wide-angle', 'JJ'), ('lenses', 'NNS'), ('are', 'VBP'), ('supplanting', 'VBG'), ('that', 'IN'), ('spec', 'NN'), ('on', 'IN'), ('compact', 'NN'), ('cameras', 'NNS'), ('.', '.')], [('If', 'IN'), ('all', 'DT'), ('you', 'PRP'), ('want', 'VBP'), ('is', 'VBZ'), ('to', 'TO'), ('get', 'VB'), ('in', 'IN'), ('a', 'DT'), ('little', 'RB'), ('tighter', 'NN'), ('on', 'IN'), ('the', 'DT'), ('action', 'NN'), (',', ','), ('the', 'DT'), ('14-megapixel', 'JJ'), ('Sony', 'NNP'), ('Cyber-shot', 'JJ'), ('DSC-W370', '-NONE-'), ("'s", 'VBZ'), ('7x', 'CD'), ('optical', 'JJ'), ('zoom', 'NN'), ('or', 'CC'), ('12-megapixel', 'CD'), ('Panasonic', 'NNP'), ('Lumix', 'NNP'), ('DMC-ZR3', 'NNP'), ("'s", 'POS'), ('8x', 'CD'), ('zoom', 'NN'), ('lens', 'NNS'), ('should', 'MD'), ('satisfy', 'VB'), ('.', '.')], [('Those', 'DT'), ('who', 'WP'), ('are', 'VBP'), ('really', 'RB'), ('tired', 'VBN'), ('of', 'IN'), ('their', 'PRP$'), ('short', 'JJ'), ('zooms', 'NNS'), ('will', 'MD'), ('probably', 'RB'), ('want', 'VB'), ('to', 'TO'), ('step', 'VB'), ('up', 'RP'), ('to', 'TO'), ('a', 'DT'), ('10x', 'CD'), ('zoom', 'NN'), ('lens', 'NNS'), ('like', 'IN'), ('those', 'DT'), ('on', 'IN'), ('the', 'DT'), ('Casio', 'NNP'), ('Exilim', 'NNP'), ('EX-FH100', 'NNP'), ('and', 'CC'), ('Sony', 'NNP'), ('Cyber-shot', 'NNP'), ('DSC-H55', 'NNP'), ('.', '.')], [('They', 'PRP'), ('can', 'MD'), ('be', 'VB'), ('a', 'DT'), ('little', 'RB'), ('uncomfortable', 'JJ'), ('for', 'IN'), ('smaller', 'JJR'), ('pockets', 'NNS'), (',', ','), ('but', 'CC'), ('can', 'MD'), ('definitely', 'RB'), ('fit', 'VB'), ('in', 'IN'), ('a', 'DT'), ('jacket', 'NN'), ('pocket', 'NN'), ('or', 'CC'), ('bag', 'NN'), ('.', '.')], [('Reaching', 'VBG'), ('a', 'DT'), ('little', 'RB'), ('farther', 'RBR'), ('beyond', 'IN'), ('those', 'DT'), ('is', 'VBZ'), ('the', 'DT'), ('Panasonic', 'NNP'), ('DMC-ZS5', 'NNP'), (',', ','), ('which', 'WDT'), ('packs', 'NNS'), ('a', 'DT'), ('wide-angle', 'JJ'), ('lens', 'NN'), ('with', 'IN'), ('a', 'DT'), ('12x', 'CD'), ('zoom', 'NN'), ('range', 'NN'), ('into', 'IN'), ('what', 'WP'), ("'s", 'POS'), ('still', 'RB'), ('a', 'DT'), ('fairly', 'RB'), ('compact', 'JJ'), ('body', 'NN'), ('and', 'CC'), ('the', 'DT'), ('14x', 'CD'), ('Canon', 'NNP'), ('PowerShot', 'NNP'), ('SX210', 'NNP'), ('IS', 'NNP'), ('.', '.')], [('There', 'EX'), ('are', 'VBP'), ('a', 'DT'), ('couple', 'NN'), ('things', 'NNS'), ('to', 'TO'), ('keep', 'VB'), ('in', 'IN'), ('mind', 'NN'), ('about', 'IN'), ('these', 'DT'), ('cameras', 'NNS'), ('.', '.')], [('Longer', 'NNP'), ('lenses', 'NNS'), (',', ','), ('as', 'IN'), ('well', 'RB'), ('as', 'IN'), ('wider', 'NN'), ('ones', 'NNS'), (',', ','), ('typically', 'RB'), ('cause', 'VB'), ('a', 'DT'), ('bit', 'NN'), ('of', 'IN'), ('distortion', 'NN'), (',', ','), ('which', 'WDT'), ('some', 'DT'), ('models', 'NNS'), ('correct', 'VBP'), ('for', 'IN'), ('when', 'WRB'), ('processing', 'VBG'), ('photos', 'NNS'), ('.', '.')], [('Plus', 'NNP'), (',', ','), ('because', 'IN'), ('the', 'DT'), ('lens', 'NNS'), ('glass', 'NN'), ('is', 'VBZ'), ("n't", 'RB'), ('always', 'RB'), ('top', 'VB'), ('quality', 'NN'), (',', ','), ('the', 'DT'), ('images', 'NNS'), ('from', 'IN'), ('these', 'DT'), ('cameras', 'NNS'), ('can', 'MD'), ('be', 'VB'), ('soft', 'VBN'), ('.', '.')], [('Also', 'RB'), (',', ','), ('generally', 'RB'), ('speaking', 'VBG'), (',', ','), ('the', 'DT'), ('longer', 'NN'), ('the', 'DT'), ('lens', 'NN'), (',', ','), ('the', 'DT'), ('slower', 'NN'), ('the', 'DT'), ('camera', 'NN'), ("'s", 'POS'), ('performance', 'NN'), ('.', '.')], [('We', 'PRP'), ("'ve", 'VBP'), ('learned', 'VBN'), ('not', 'RB'), ('to', 'TO'), ('expect', 'VB'), ('really', 'RB'), ('fast', 'JJ'), ('start-up', 'NN'), ('and', 'CC'), ('shot-to-shot', 'JJ'), ('times', 'NNS'), ('on', 'IN'), ('models', 'NNS'), ('with', 'IN'), ('a', 'DT'), ('10x', 'CD'), ('zoom', 'NN'), ('or', 'CC'), ('greater', 'JJR'), ('.', '.')]]
FYP Post 2: Sentence Segmentation
Yesterday I was saying I couldn't do the sentence segmentation. Well, here's the code for it. Right now I am able to break a passage into individual sentences.
>>> text = "The average compact point-and-shoot camera has a 4x or 5x zoom lens in it. Anyone who's ever tried with one of these cameras to get in closer to a player on the field at a sporting event or snap off a shot of a bird without scaring it away knows that such a short range doesn't cut it. The obvious solution is to get something with a longer zoom, but that means a larger camera, too, that might no longer fit in your pocket. Or could it? With more megapixels waning as a marketable feature, increased optical zooms and/or wide-angle lenses are supplanting that spec on compact cameras. If all you want is to get in a little tighter on the action, the 14-megapixel Sony Cyber-shot DSC-W370's 7x optical zoom or 12-megapixel Panasonic Lumix DMC-ZR3's 8x zoom lens should satisfy. Those who are really tired of their short zooms will probably want to step up to a 10x zoom lens like those on the Casio Exilim EX-FH100 and Sony Cyber-shot DSC-H55. They can be a little uncomfortable for smaller pockets, but can definitely fit in a jacket pocket or bag. Reaching a little farther beyond those is the Panasonic DMC-ZS5, which packs a wide-angle lens with a 12x zoom range into what's still a fairly compact body and the 14x Canon PowerShot SX210 IS. There are a couple things to keep in mind about these cameras. Longer lenses, as well as wider ones, typically cause a bit of distortion, which some models correct for when processing photos. Plus, because the lens glass isn't always top quality, the images from these cameras can be soft. Also, generally speaking, the longer the lens, the slower the camera's performance. We've learned not to expect really fast start-up and shot-to-shot times on models with a 10x zoom or greater."
>>> sents = sent_tokenizer.tokenize(text)
>>> import pprint
>>> pprint.pprint(sents)
['The average compact point-and-shoot camera has a 4x or 5x zoom lens in it.',
"Anyone who's ever tried with one of these cameras to get in closer to a player on the field at a sporting event or snap off a shot of a bird without scaring it away knows that such a short range doesn't cut it.",
'The obvious solution is to get something with a longer zoom, but that means a larger camera, too, that might no longer fit in your pocket.',
'Or could it?',
'With more megapixels waning as a marketable feature, increased optical zooms and/or wide-angle lenses are supplanting that spec on compact cameras.',
"If all you want is to get in a little tighter on the action, the 14-megapixel Sony Cyber-shot DSC-W370's 7x optical zoom or 12-megapixel Panasonic Lumix DMC-ZR3's 8x zoom lens should satisfy.",
'Those who are really tired of their short zooms will probably want to step up to a 10x zoom lens like those on the Casio Exilim EX-FH100 and Sony Cyber-shot DSC-H55.',
'They can be a little uncomfortable for smaller pockets, but can definitely fit in a jacket pocket or bag.',
"Reaching a little farther beyond those is the Panasonic DMC-ZS5, which packs a wide-angle lens with a 12x zoom range into what's still a fairly compact body and the 14x Canon PowerShot SX210 IS.",
'There are a couple things to keep in mind about these cameras.',
'Longer lenses, as well as wider ones, typically cause a bit of distortion, which some models correct for when processing photos.',
"Plus, because the lens glass isn't always top quality, the images from these cameras can be soft.",
"Also, generally speaking, the longer the lens, the slower the camera's performance.",
"We've learned not to expect really fast start-up and shot-to-shot times on models with a 10x zoom or greater."]
>>>
>>> text = "The average compact point-and-shoot camera has a 4x or 5x zoom lens in it. Anyone who's ever tried with one of these cameras to get in closer to a player on the field at a sporting event or snap off a shot of a bird without scaring it away knows that such a short range doesn't cut it. The obvious solution is to get something with a longer zoom, but that means a larger camera, too, that might no longer fit in your pocket. Or could it? With more megapixels waning as a marketable feature, increased optical zooms and/or wide-angle lenses are supplanting that spec on compact cameras. If all you want is to get in a little tighter on the action, the 14-megapixel Sony Cyber-shot DSC-W370's 7x optical zoom or 12-megapixel Panasonic Lumix DMC-ZR3's 8x zoom lens should satisfy. Those who are really tired of their short zooms will probably want to step up to a 10x zoom lens like those on the Casio Exilim EX-FH100 and Sony Cyber-shot DSC-H55. They can be a little uncomfortable for smaller pockets, but can definitely fit in a jacket pocket or bag. Reaching a little farther beyond those is the Panasonic DMC-ZS5, which packs a wide-angle lens with a 12x zoom range into what's still a fairly compact body and the 14x Canon PowerShot SX210 IS. There are a couple things to keep in mind about these cameras. Longer lenses, as well as wider ones, typically cause a bit of distortion, which some models correct for when processing photos. Plus, because the lens glass isn't always top quality, the images from these cameras can be soft. Also, generally speaking, the longer the lens, the slower the camera's performance. We've learned not to expect really fast start-up and shot-to-shot times on models with a 10x zoom or greater."
>>> sents = sent_tokenizer.tokenize(text)
>>> import pprint
>>> pprint.pprint(sents)
['The average compact point-and-shoot camera has a 4x or 5x zoom lens in it.',
"Anyone who's ever tried with one of these cameras to get in closer to a player on the field at a sporting event or snap off a shot of a bird without scaring it away knows that such a short range doesn't cut it.",
'The obvious solution is to get something with a longer zoom, but that means a larger camera, too, that might no longer fit in your pocket.',
'Or could it?',
'With more megapixels waning as a marketable feature, increased optical zooms and/or wide-angle lenses are supplanting that spec on compact cameras.',
"If all you want is to get in a little tighter on the action, the 14-megapixel Sony Cyber-shot DSC-W370's 7x optical zoom or 12-megapixel Panasonic Lumix DMC-ZR3's 8x zoom lens should satisfy.",
'Those who are really tired of their short zooms will probably want to step up to a 10x zoom lens like those on the Casio Exilim EX-FH100 and Sony Cyber-shot DSC-H55.',
'They can be a little uncomfortable for smaller pockets, but can definitely fit in a jacket pocket or bag.',
"Reaching a little farther beyond those is the Panasonic DMC-ZS5, which packs a wide-angle lens with a 12x zoom range into what's still a fairly compact body and the 14x Canon PowerShot SX210 IS.",
'There are a couple things to keep in mind about these cameras.',
'Longer lenses, as well as wider ones, typically cause a bit of distortion, which some models correct for when processing photos.',
"Plus, because the lens glass isn't always top quality, the images from these cameras can be soft.",
"Also, generally speaking, the longer the lens, the slower the camera's performance.",
"We've learned not to expect really fast start-up and shot-to-shot times on models with a 10x zoom or greater."]
>>>
Tuesday, January 4, 2011
FYP post 1
What I have done so far:I have discovered a tool for me to process information of any sentence structure into usable This tool is known as the natural language tool kit. So, what's natural language processing?
By "natural language" we mean a language that is used for everyday communication by humans; languages like English, Hindi or Portuguese. In contrast to artificial languages such as programming languages and mathematical notations, natural languages have evolved as they pass from generation to generation, and are hard to pin down with explicit rules. We will take Natural Language Processing — or NLP for short — in a wide sense to cover any kind of computer manipulation of natural language. At one extreme, it could be as simple as counting word frequencies to compare different writing styles. At the other extreme, NLP involves "understanding" complete human utterances, at least to the extent of being able to give useful responses to them.
So a slight introduction to the NLTK, it is a program based on Python, which is a simple yet powerful programming language with excellent functionality for processing linguistic data. NLTK defines an infrastructure that can be used to build NLP programs in Python. It provides basic classes for representing data relevant to natural language processing; standard interfaces for performing tasks such as part-of-speech tagging, syntactic parsing, and text classification; and standard implementations for each task which can be combined to solve complex problems.
So first things first, installing the NLTK. It can be installed in both Linux and Windows, but since Python is already native to Linux, it would probably run easier on it. So all I had to do was to download the packages from www.nltk.org and install the packages. That should do right?
Unfortunately no.. for some reason, python wasn't too friendly on my version of Ubuntu and I had to spend the whole of yesterday trying to update my Ubuntu from ver 9.1 to 10.04 with a crappy internet network in NTU.
So here's my grand plan. I was planning to break down each passage into individual sentences. But I have not found the means to break them yet.
So the next step was to break down the passages into parts of speeches, basically your nouns, verbs adjectives adverbs etc.
After which, we will attempt to chunk the the nouns with their adjectives, and determine if the nouns are described positively or negatively. This would require me to attempt semantic orientation. I am not too sure if the NLTK can do that, if not I will try some other tools.
Now this is the interesting part. NLTK has the ability to create a hierarchy tree based on the nouns. If we look at a review, we can see that the nouns are linked in one form or another in terms of levels.
For example, the camera has buttons in inappropriate locations. Camera would represent the highest level of the noun hierarchy, followed by buttons and location. If we can create a semantic orientation of the nouns, we can get a very accurate need statement.
I am planning to use ulrich's 5 guidelines, namely represent the need statements in raw, do not represent the sentence with should or must, which i believe requires me to do chinking, which is a mean of adding conditions to a sentence processing. In this case, I would probably add a if else condition, such that the sentence will remove all must and should.There are a few others, but I am running abit of a fever and I will just continue tomorrow.
By "natural language" we mean a language that is used for everyday communication by humans; languages like English, Hindi or Portuguese. In contrast to artificial languages such as programming languages and mathematical notations, natural languages have evolved as they pass from generation to generation, and are hard to pin down with explicit rules. We will take Natural Language Processing — or NLP for short — in a wide sense to cover any kind of computer manipulation of natural language. At one extreme, it could be as simple as counting word frequencies to compare different writing styles. At the other extreme, NLP involves "understanding" complete human utterances, at least to the extent of being able to give useful responses to them.
So a slight introduction to the NLTK, it is a program based on Python, which is a simple yet powerful programming language with excellent functionality for processing linguistic data. NLTK defines an infrastructure that can be used to build NLP programs in Python. It provides basic classes for representing data relevant to natural language processing; standard interfaces for performing tasks such as part-of-speech tagging, syntactic parsing, and text classification; and standard implementations for each task which can be combined to solve complex problems.
So first things first, installing the NLTK. It can be installed in both Linux and Windows, but since Python is already native to Linux, it would probably run easier on it. So all I had to do was to download the packages from www.nltk.org and install the packages. That should do right?
Unfortunately no.. for some reason, python wasn't too friendly on my version of Ubuntu and I had to spend the whole of yesterday trying to update my Ubuntu from ver 9.1 to 10.04 with a crappy internet network in NTU.
So here's my grand plan. I was planning to break down each passage into individual sentences. But I have not found the means to break them yet.
So the next step was to break down the passages into parts of speeches, basically your nouns, verbs adjectives adverbs etc.
After which, we will attempt to chunk the the nouns with their adjectives, and determine if the nouns are described positively or negatively. This would require me to attempt semantic orientation. I am not too sure if the NLTK can do that, if not I will try some other tools.
Now this is the interesting part. NLTK has the ability to create a hierarchy tree based on the nouns. If we look at a review, we can see that the nouns are linked in one form or another in terms of levels.
For example, the camera has buttons in inappropriate locations. Camera would represent the highest level of the noun hierarchy, followed by buttons and location. If we can create a semantic orientation of the nouns, we can get a very accurate need statement.
I am planning to use ulrich's 5 guidelines, namely represent the need statements in raw, do not represent the sentence with should or must, which i believe requires me to do chinking, which is a mean of adding conditions to a sentence processing. In this case, I would probably add a if else condition, such that the sentence will remove all must and should.There are a few others, but I am running abit of a fever and I will just continue tomorrow.
Subscribe to:
Comments (Atom)
 
