Question

I need to write a code that takes a few lines of comments about a product as input and rate the product based on adjectives that describe the product in the reviews. I have just used POS tagger to tag the parts of speech of every comment. Now, I have to pick out the adjectives that describe the nouns, And if a noun appears to be related to the product, I need to consider the corresponding adjective. This is the code I've used for POS tagging.. It just works fine.

import java.io.*;
import edu.stanford.nlp.tagger.maxent.MaxentTagger;
public class Tagg {
public static void main(String[] args) throws IOException,
ClassNotFoundException {

String tagged;

// Initialize the tagger
MaxentTagger tagger = new MaxentTagger("edu/stanford/nlp/models/pos-tagger/wsj-        left3words/wsj-0-18-left3words-distsim.tagger");
FileInputStream fstream = new FileInputStream("src/input.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
FileWriter q = new FileWriter("src/output.txt",true);
BufferedWriter out =new BufferedWriter(q);
String sample;
//we will now pick up sentences line by line from the file input.txt and store it in the string sample
while((sample = br.readLine())!=null)
{
//tag the string
tagged = tagger.tagString(sample);
System.out.print(tagged+"\n");
//write it to the file output.txt
out.write(tagged);
out.newLine();
}
out.close();
}
}

I need a way to proceed. .

Was it helpful?

Solution

A simple solution that will get you a long way is to use the dependency parser, which is included with Stanford CoreNLP. The algorithm goes like this:

  1. PoS tag and Dependency parse your sentence
  2. Decide which of the nouns you are interested in. If you are dealing with product reviews, an easy way of doing this is to match all nouns in the text against a list of known product names.
  3. Look for amod relations in the output of the dependency parser that include the noun you are interested in.

Example using the online Stanford demo:

Input:

I own a tall glass and just bought a big red car.

amod dependencies:

amod(glass-5, tall-4)
amod(car-12, big-10)
amod(car-12, red-11)

Suppose the reviews are about cars. The last two dependencies contain the target noun car, and the adjectives you are looking for are therefore big and red.

Warning: this is a high-precision search algorithm rather than high recall. Your list of keywords will never be exhaustive, so you are likely to miss some of the adjectives. Also, the parser is not perfect and will sometimes make mistakes. Moreover, the amod relation is one of many way an adjective can describe a noun. For example, "The car is red" parses as

det(car-2, The-1)
nsubj(red-4, car-2)
nsubj(black-6, car-2)
cop(red-4, is-3)
root(ROOT-0, red-4)
conj_and(red-4, black-6)

As you can see, there isn't an amod relations here, just a copula and a conjunction. You could try and craft more complex rules trying to extract the fact that the car is red and car is black. Whether you want to do that is up to up. In its current form, when this algorithm returns an adjective, you can be reasonably confident it is indeed describing the noun. This, in my opinion, is a good characteristic, but it all depends on your use case.


Edit after comment by OP:

Yes, I bought a new car. and It is awesome. are two separate sentences and will be parsed separately. This problem is known as coreference (anaphora) resolution. It turns out Stanford also supports this- see their webpage. There is also a system by CMU, which is also in Java. I haven't used either of these systems, but the latter has a very helpful online demo. Putting the above two sentences in, I get

[I] bought [a new car]2 .
[It]2 is awesome .
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top