Question

Is it possible to remove the tags from the sentences? One can accomplish it by scanning through the file and finding tags and removing them, but since there are many tags( some models have 30+, some have around 48-50, they basically follow the penn treebank pos tags ), is there a fast and sweet way to remove tags in a more efficient manner? I did check the API, but there was no such method for removal of tags.

Was it helpful?

Solution

There's nothing special built in for this, but since the output includes both the word and its tag, I'm not sure why you need to scan the original document again. Can't you just delete the tags by deleting from the last tagSeparator character ('/' or whatever) until white-space? Or, it could be simpler to use

-outputFormat tsv

Then you will get two column output with the words in the first column and the tag in the second column and you could just keep the first column when done.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top