Domanda

I have a list of words after Pos Tagging in Java. Now I want to remove particular words with specified tags.How to use string tokenizer to remove the tagged words? such as to-PRP? and all words with tags prp?

The input file:

mike-NNS

Buses-NNP

Walk_VRB

to_PRP

. . . . . . . . . and so on

È stato utile?

Soluzione

    final List<String> result = new ArrayList<String>();

    final List<String> textList= getList(); // get your list

    final StringTokenizer tokenizer = 
      new StringTokenizer(textList, delimiter); // your delimiter
    while (tokenizer.hasMoreElements()) {
      final String token = tokenizer.nextToken();
      if (isValid(token)) { // implement your own isValid method
        result.add(token);
      }

    }
    return result;
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top