Tuesday, January 4, 2011

FYP post 1

What I have done so far:I have discovered a tool for me to process information of any sentence structure into usable This tool is known as the natural language tool kit. So, what's natural language processing?

By "natural language" we mean a language that is used for everyday communication by humans; languages like English, Hindi or Portuguese. In contrast to artificial languages such as programming languages and mathematical notations, natural languages have evolved as they pass from generation to generation, and are hard to pin down with explicit rules. We will take Natural Language Processing — or NLP for short — in a wide sense to cover any kind of computer manipulation of natural language. At one extreme, it could be as simple as counting word frequencies to compare different writing styles. At the other extreme, NLP involves "understanding" complete human utterances, at least to the extent of being able to give useful responses to them.

So a slight introduction to the NLTK, it is a program based on Python, which is a simple yet powerful programming language with excellent functionality for processing linguistic data. NLTK defines an infrastructure that can be used to build NLP programs in Python. It provides basic classes for representing data relevant to natural language processing; standard interfaces for performing tasks such as part-of-speech tagging, syntactic parsing, and text classification; and standard implementations for each task which can be combined to solve complex problems.

So first things first, installing the NLTK. It can be installed in both Linux and Windows, but since Python is already native to Linux, it would probably run easier on it. So all I had to do was to download the packages from www.nltk.org and install the packages. That should do right?
Unfortunately no.. for some reason, python wasn't too friendly on my version of Ubuntu and I had to spend the whole of yesterday trying to update my Ubuntu from ver 9.1 to 10.04 with a crappy internet network in NTU.

So here's my grand plan. I was planning to break down each passage into individual sentences. But I have not found the means to break them yet.
So the next step was to break down the passages into parts of speeches, basically your nouns, verbs adjectives adverbs etc.
After which, we will attempt to chunk the the nouns with their adjectives, and determine if the nouns are described positively or negatively. This would require me to attempt semantic orientation. I am not too sure if the NLTK can do that, if not I will try some other tools.
Now this is the interesting part. NLTK has the ability to create a hierarchy tree based on the nouns. If we look at a review, we can see that the nouns are linked in one form or another in terms of levels.
For example, the camera has buttons in inappropriate locations. Camera would represent the highest level of the noun hierarchy, followed by buttons and location. If we can create a semantic orientation of the nouns, we can get a very accurate need statement.

I am planning to use ulrich's 5 guidelines, namely represent the need statements in raw, do not represent the sentence with should or must, which i believe requires me to do chinking, which is a mean of adding conditions to a sentence processing. In this case, I would probably add a if else condition, such that the sentence will remove all must and should.There are a few others, but I am running abit of a fever and I will just continue tomorrow.

No comments:

Post a Comment