Segnare ogni frase in una riga in base a un tag e riassumere il testo.(Giava)

https://stackoverflow.com//questions/9702739

13-12-2019
|

Domanda

Sto cercando di creare un riassunto in Java.Sto usando il tagger di part-of-discorso log-lineare di Stanford per taggare le parole, E poi, per alcuni tag, sto segnando la frase e infine nel riassunto, sto stampando frasi con un alto valore di punteggio. Ecco il codice:

    MaxentTagger tagger = new MaxentTagger("taggers/bidirectional-distsim-wsj-0-18.tagger");

    BufferedReader reader = new BufferedReader( new FileReader ("C:\\Summarizer\\src\\summarizer\\testing\\testingtext.txt"));
    String line  = null;
    int score = 0;
    StringBuilder stringBuilder = new StringBuilder();
    File tempFile = new File("C:\\Summarizer\\src\\summarizer\\testing\\tempFile.txt");
    Writer writerForTempFile = new BufferedWriter(new FileWriter(tempFile));


    String ls = System.getProperty("line.separator");
    while( ( line = reader.readLine() ) != null )
    {
        stringBuilder.append( line );
        stringBuilder.append( ls );
        String tagged = tagger.tagString(line);
        Pattern pattern = Pattern.compile("[.?!]"); //Find new line
        Matcher matcher = pattern.matcher(tagged);
        while(matcher.find())
        {
            Pattern tagFinder = Pattern.compile("/JJ"); // find adjective tag
            Matcher tagMatcher = tagFinder.matcher(matcher.group());
            while(tagMatcher.find())
            {
                score++; // increase score of sentence for every occurence of adjective tag
            }
            if(score > 1)
                writerForTempFile.write(stringBuilder.toString());
            score = 0;
            stringBuilder.setLength(0);
        }

    }

    reader.close();
    writerForTempFile.close();

Il codice sopra non funziona.Anche se, se tagliato il mio lavoro e genera un punteggio per ogni linea (non frase), funziona.Ma i riassunti non vengono generati in questo modo, sono? Ecco il codice per questo: (tutte le dichiarazioni sono le stesse come sopra)

while( ( line = reader.readLine() ) != null )
        {
            stringBuilder.append( line );
            stringBuilder.append( ls );
            String tagged = tagger.tagString(line);
            Pattern tagFinder = Pattern.compile("/JJ"); // find adjective tag
            Matcher tagMatcher = tagFinder.matcher(tagged);
            while(tagMatcher.find())
            {
                score++;  //increase score of line for every occurence of adjective tag
            }
            if(score > 1)
                writerForTempFile.write(stringBuilder.toString());
            score = 0;
            stringBuilder.setLength(0);
        }

Modifica 1:

Informazioni riguardanti ciò che fa il maxentTagger.Un codice di esempio per mostrare che è funzionante:

import java.io.IOException; import edu.stanford.nlp.tagger.maxent.MaxentTagger; public class TagText { public static void main(String[] args) throws IOException, ClassNotFoundException { // Initialize the tagger MaxentTagger tagger = new MaxentTagger( "taggers/bidirectional-distsim-wsj-0-18.tagger"); // The sample string String sample = "This is a sample text"; // The tagged string String tagged = tagger.tagString(sample); // Output the result System.out.println(tagged); } }
.

Uscita:

This/DT is/VBZ a/DT sample/NN sentence/NN
.

Modifica 2:

Codice modificato utilizzando la rottura per trovare interruzioni di frase.Eppure il problema è persistente.

while( ( line = reader.readLine() ) != null ) { stringBuilder.append( line ); stringBuilder.append( ls ); String tagged = tagger.tagString(line); BreakIterator bi = BreakIterator.getSentenceInstance(); bi.setText(tagged); int end, start = bi.first(); while ((end = bi.next()) != BreakIterator.DONE) { String sentence = tagged.substring(start, end); Pattern tagFinder = Pattern.compile("/JJ"); Matcher tagMatcher = tagFinder.matcher(sentence); while(tagMatcher.find()) { score++; } scoreTracker.add(score); if(score > 1) writerForTempFile.write(stringBuilder.toString()); score = 0; stringBuilder.setLength(0); start = end; }
.

Soluzione

Trovare le interruzioni di frase possono essere un po 'più coinvolte che in cerca di [.?!], Prenovi in considerazione l'utilizzo di interruzione .getsentenceinstance ()

Le sue prestazioni sono in realtà simili a implementazione (più complessa) di Lingpipe, e meglio di quella in OpenNLP (dai miei test, almeno).

Codice campione

BreakIterator bi = BreakIterator.getSentenceInstance();
bi.setText(text);
int end, start = bi.first();
while ((end = bi.next()) != BreakIterator.DONE) {
    String sentence = text.substring(start, end);
    start = end;
}

modifica

Penso che questo sia quello che stai cercando:

    Pattern tagFinder = Pattern.compile("/JJ");
    BufferedReader reader = getMyReader();
    String line = null;
    while ((line = reader.readLine()) != null) {
        BreakIterator bi = BreakIterator.getSentenceInstance();
        bi.setText(line);
        int end, start = bi.first();
        while ((end = bi.next()) != BreakIterator.DONE) {
            String sentence = line.substring(start, end);
            String tagged = tagger.tagString(sentence);
            int score = 0;
            Matcher tag = tagFinder.matcher(tagged);
            while (tag.find())
                score++;
            if (score > 1)
                writerForTempFile.println(sentence);
            start = end;
        }
    }

Altri suggerimenti

Senza capire tutto, la mia ipotesi sarebbe che il tuo codice dovrebbe essere più simile a questo:

    int lastMatch = 0;// Added

    Pattern pattern = Pattern.compile("[.?!]"); //Find new line
    Matcher matcher = pattern.matcher(tagged);
    while(matcher.find())
    {
        Pattern tagFinder = Pattern.compile("/JJ"); // find adjective tag

        // HERE START OF MY CHANGE
        String sentence = tagged.substring(lastMatch, matcher.end());
        lastMatch = matcher.end();
        Matcher tagMatcher = tagFinder.matcher(sentence);
        // HERE END OF MY CHANGE

        while(tagMatcher.find())
        {
            score++; // increase score of sentence for every occurence of adjective tag
        }
        if(score > 1)
            writerForTempFile.write(sentence);
        score = 0;
    }

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow