Data Structure required to store extracted POS tag text in Java

https://stackoverflow.com/questions/21091808

27-09-2022
|

Question

Friends I am doing Sentiment analysis using AANV(adjective-adverb-noun-verb) approach as a my BE final year project. In this project I have done upto the POS tagging, I am using stanford POS Tagger for the same It gives me appropriate result. for example suppose for the following sentences it gives me output as follows:

Input Sentences:

The camera is worked well.

Camera is very good.

Camera captures photo so slowly.

POS Tagging Output sentences:

The/DT camera/NN is/VBZ worked/VBN well/RB ./.

Camera/NN is/VBZ very/RB good/JJ ./.

Camera/NN captures/VBZ photo/NN so/RB slowly/RB ./.

As above pos tagged output sentences, among that I will required adjective, adverb,noun,verb to be extracted only, with its POS category. For getting AANV I am using regular expression and write down the following code :

private void btnShowTagActionPerformed(java.awt.event.ActionEvent evt) {                                           
    Pattern NounPat=Pattern.compile("[A-Za-z]+/NN");
    Pattern AdvPat=Pattern.compile("[A-Za-z]+/RB");
    Pattern AdjPat=Pattern.compile("[A-Za-z]+/JJ");
    Pattern VerbPat=Pattern.compile("[A-Za-z]+/VB.");
    String StrToken;
    Matcher mat;
    StringTokenizer PosToken;
    String TempStr;  
    int j;
    for(int line=0;line<SAPosTagging.tagedReview.length;line++)
    {
       try{

       PosToken=new StringTokenizer(SAPosTagging.tagedReview[line]);
       while(PosToken.hasMoreTokens())
       {
           StrToken=PosToken.nextToken();
           mat=NounPat.matcher(StrToken);
           if(mat.matches())
           {
               TempStr=StrToken;
               txtareaExTagText.append("Noun=>"+StrToken);   //textarea to be appended
               j=TempStr.indexOf("/");
               TempStr=TempStr.substring(0,j);
               System.out.print("\tNoun=>"+TempStr);
           }
           mat=VerbPat.matcher(StrToken);
           if(mat.matches())
           {

               txtareaExTagText.append("\tVerb=>"+StrToken);
               TempStr=StrToken;
               j=TempStr.indexOf("/");
               TempStr=TempStr.substring(0,j);
               System.out.print("\tVerb=>"+TempStr);

           }
           mat=AdvPat.matcher(StrToken);
           if(mat.matches())
           {

               txtareaExTagText.append("\tAdverb=>"+StrToken);
               TempStr=StrToken;
               j=TempStr.indexOf("/");
               TempStr=TempStr.substring(0,j);
               System.out.print("\tAdVerb=>"+TempStr);

           }
           mat=AdjPat.matcher(StrToken);
           if(mat.matches())
           {

              txtareaExTagText.append("\tAdjective=>"+StrToken);
               TempStr=StrToken;
               j=TempStr.indexOf("/");
               TempStr=TempStr.substring(0,j);
               System.out.print("\tAdjective=>"+TempStr);

           }  
       }
       System.out.println();
       txtareaExTagText.append("\n\n");
      }catch(Exception e){}
    }
}

with the help of above code I am getting the as below output in my required textarea as follows (i.e. after extracting required tag)

Noun=>camera/NN Verb=>is/VBZ Verb=>worked/VBN Adverb=>well/RB

Noun=>Camera/NN Verb=>is/VBZ Adverb=>very/RB Adjective=>good/JJ

Noun=>Camera/NN Verb=>captures/VBZNoun=>photo/NN Adverb=>so/RB Adverb=>slowly/RB**

Now I want to form the pair as (posword,poscategory) for example (camera,n) so that this pair will be passed to sentiwordnet in order to retrive the score from the Sentiwordnet. Please give me code for storing this pair structure without disurbing sentences link or structure so that i will passed it to sentiwordnet. While forming pair the sentences structure should be maintained. It may happened that one sentence contains multiple verbs, nouns, adverbs or adjective.

Solution

I advice you to forget about 'data structure' and model it thinking about OO classes. Think about a Sentence class and what do you want to store about a sentence and how to store Sentences.

If you insist on using 'general' data structures you may use a List where every element represents a sentence with type Guava's Multimap.

The key would be Noun/Verb/Etc and the value would be the word. It allows several values per key. Reference here.

Guava example (not tested):

List<Multimap<String, String>> sentenceList = new ArrayList<>();
for (String line: lines) {
   Multimap<String, String> aux = ArrayListMultimap.create();
   PosToken=new StringTokenizer(SAPosTagging.tagedReview[line]);
   while(PosToken.hasMoreTokens()) {
       // TODO ...
       strToken=PosToken.nextToken();
       // TODO, lets assume it is a noun
       aux.put("noun", strToken);
       // TODO, etc.
   }
  sentenceList.add(aux);
}

OO example (not tested):

public class Sentence {
    private List<String> nouns = new ArrayList<>;
    private List<String> verbs = new ArrayList<>;
    // TODO Adverbs, etc.
    public List<String> getNons() { return nouns; };
    // TODO Other getters, etc.
}

List<Sentence> sentenceList = new ArrayList<>();
for (String line: lines) {
   Sentence aux = new Sentence();
   PosToken=new StringTokenizer(SAPosTagging.tagedReview[line]);
   while(PosToken.hasMoreTokens()) {
       // TODO ...
       strToken=PosToken.nextToken();
       // TODO, lets assume it is a noun
       aux.getNouns().add(strToken);
       // TODO, etc.
   }
  sentenceList.add(aux);
}

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow