Question

Now this is a tricky problem for which I'm not able to figure out a good solution. Suppose we have a String in Java:- "He ate 3 apples today." Now the digit 3 can be easily identified in Java using isNumeric function or using regular expressions. But what if I have a String like: "He ate three apples today."? How can I identify that three is actually a number? I used OpenNlp and used its POS tagger but the time it takes to do is really too much! Can anyone suggest a better solution for this? Also among the ".bin" of OpenNlp, there is one file-"num.bin", but I don't know how to use this file. OpenNlp documentation also say nothing about it. Can anyone tell me if this is exactly what I've been looking for, and if yes then how to use it.

/*********************************************************************************************************************************/ I'm actually short of time here, so I've settled on a temporary solution here. Make a file/dictionary and take all the entries in a hashtable. Then I'll tokenize my sentence and check word by word for numbers, similar to what you guys suggested. I'll keep on updating the file as and when required. Thanks for your valuable suggestions guys, and if you have got something better than this I'd be really glad. OpenNlp implements this in a very good way, the only problem with it is time complexity and I want to do this in minimum time possible.

Was it helpful?

Solution

You have to keep all that words in arrays and then use it. Here is an example how to convert number to string. It may help you... I think you have to split your text into words and check if a word is a number (three). If yes check the next word because it can be say "million", then check the next word and so on. It's not easy and seems like a little library.I think you'll spend a lot of time writing this. Or try to search in google for a library like this. Maybe someone have already got this problem, wrote a library and shares it for free )) Good luck.

OTHER TIPS

Create a dictionary of numbers. Search for elements from that dictionary in the text.

Check asympotic complexity, it may be cheaper to sort the text first.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top