There is a built in static method to the Span object that does what you want. See this answer. OpenNLP Name Entity Recognizer output
Getting string array sections using integers
-
01-07-2022 - |
Question
First off i'm working with OpenNLP however knowledge on it isn't needed but might be useful.
A string is inputted into the method FindName
String input = "Billy Smith the chicken crossed the road to visit Fred Jones";
it is processed by a tokenizer to give the input for the namefinder:
String[] tokenized = "Billy","Smith","the","chicken","crossed","the","road","to","visit","Fred","Jones";
which is searched for a name, the results are given as two strings which is produced in a "for" loop
"[0..2) person","[9..11) person"
now how can i put the original names("Billy Smith" and "Fred Jones") into a arraylist or similar string array?
so far i have tried:
for(Span s: nameSpans){
numbers = s.toString().replace("[", "");
//is "[0..2) person" and "[9..11) person"
sect = numbers.split("\\) ");
}
int x;
for(x=0;x<sect.length;x++){
if(x%2 == 0){
String[] numb = sect[x].split("..");
int n;
int first, second;
first = Integer.parseInt(numb[0]);
second = Integer.parseInt(numb[1]);
for(n=first;n<second;n++){
if(sentence.hashCode() == n){
name.add(sentence[n]);
}
but have had no luck.
Solution
OTHER TIPS
It can be done by parsing the output strings as integers and then creating a string array with the original input string to create words which i could then call with the correct numbers therefor giving the full names and any middle names in between.
the working code:
for(Span s: nameSpans){
String a = s.toString().replace("[", "").replace(")", "");
String[] b = a.split("\\s");
String[] c = b[0].split("\\..");
int first = Integer.parseInt(c[0]);
int second = Integer.parseInt(c[1]);
String[] word = input.split("\\s");
int n;
for(n=first;n<second;n++){
names.add(word[n]);
System.out.println(word[n]);
}
}