As for the exact keyword match:
10^6 * 2*10^3 words = billions of possible matches. Comparing this with 10^5 possible matches leads to over 10^6 * 2^3 * 10^5 = 2 * 10^14 operations (worst case: no match, probability no-match: big (because 100000 is small compared all possible words?).
and i want some thing near zero time
Not possible.
As for the NER, you must drop the keywords list and classify the grammar in categories you would like to highlight.
Things like:
- verbs
- adverbs
- nouns
- names
- quantities
- etc.
can be identified. After you have done that, you could define a special list containing special words by category. E.g.: President
might be in such a (noun) list to highlight it with special properties. Because you'll end up with a much smaller special list
, spitted into several catagories
. You can decrease the number of operations needed.
(Just reallize, as you know all about NER you already know that.)
So,you could extract a NER like logic (or other non 100% match algorithm) for the language you're targeting.
Another try might be:
Put all your keywords in a hashtable or other (indexed) dictionary, check if the targeted word is existing in that hashtable. As it is indexed, it will be significant faster than the regular matching. You can store additional info for the keyword in the hashtable.