Extracting multi word named entities using CoreNLP

Question 1

PrintWriter writer = null;
 try {  
     String inputLine = "Several possible plans emerged from the talks, held at the Federal Reserve Bank of New York" + " and led by Timothy R. Geithner, the president of the New York Fed, and Treasury Secretary Henry M. Paulson Jr.";

     String serializedClassifier = "english.all.3class.distsim.crf.ser.gz";
     AbstractSequenceClassifier<CoreLabel> classifier = CRFClassifier.getClassifierNoExceptions(serializedClassifier);

     writer = new PrintWriter(new File("output.xml"));
     writer.println("<Sentences>");
     writer.flush();
     String output ="<Sentence>"+classifier.classifyToString(inputLine, "xml", true)+"</Sentence>"; 
     writer.println(output);
     writer.flush();
     writer.println("</Sentences>");
     writer.flush(); 
 } catch (FileNotFoundException ex) {
     ex.printStackTrace();
 } finally {
     writer.close();
 }

I was able to come up with this solution. I am writing the output to an XML file "output.xml". From the obtained output, you can merge continuous nodes in xml with "PERSON" or "ORGANIZATION" or "LOCATION" attributes in to one entity. And this format produces the word count by default.

Here is a snapshot of xml output.

<wi num="11" entity="ORGANIZATION">Federal</wi>
<wi num="12" entity="ORGANIZATION">Reserve</wi>
<wi num="13" entity="ORGANIZATION">Bank</wi>
<wi num="14" entity="ORGANIZATION">of</wi>
<wi num="15" entity="ORGANIZATION">New</wi>
<wi num="16" entity="ORGANIZATION">Yorkand</wi>

From the above output you can see that continuously words are recognized as "ORGANIZATION". So these words could be combined to one entity.

Question 2

I use one temp variable to hold the previous ner tag and check if the current ner tag is equal to the temp, it will combine two words together. and iteration goes by assigning temp to current ner tag.