Question

I'm working on making a small summarization utility in Java. I'm using the Stanford Log-linear Part-Of-Speech Tagger to find the parts of speech in the sentences. Then, I'm scoring specific tags and awarding each sentence a score. Then, finally when I summarize, I only add those line which have a score of beyond a certain limit. That's the plan.

Here's a sample code that I have worked out for just scoring adjectives, and then generating a summary based on a score greater than,say 1.

MaxentTagger tagger = new MaxentTagger("taggers/bidirectional-distsim-wsj-0-18.tagger");
BufferedReader reader = new BufferedReader( new FileReader ("C:\\Summarizer\\src\\summarizer\\testing\\testingtext.txt")); 
String line  = null;
int score = 0;
StringBuilder stringBuilder = new StringBuilder();
File tempFile = new File("C:\\Summarizer\\src\\summarizer\\testing\\tempFile.txt");
Writer writerForTempFile = new BufferedWriter(new FileWriter(tempFile));

String ls = System.getProperty("line.separator");
while( ( line = reader.readLine() ) != null )
{
    stringBuilder.append( line );
    stringBuilder.append( ls );
    String tagged = tagger.tagString(line);
    Pattern tagFinder = Pattern.compile("/JJ");
    Matcher tagMatcher = tagFinder.matcher(tagged);
    while(tagMatcher.find())
    {
        score++;
    }
    if(score > 1)
        writerForTempFile.write(stringBuilder.toString());
    score = 0;
}
reader.close();
writerForTempFile.close();

But apparently, I'm going wrong somewhere. It does write the required lines into the tempFile , but there are many extra lines as well. Kindly help!

Was it helpful?

Solution

You need to reset you StringBuilder for every new line you want to write to the file. Currently, for every line you write the currently appended line and all previous appended lines in the StringBuilder is written to your file when score > 1.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top