Named entity recognition (NER) features

https://datascience.stackexchange.com/questions/16704

16-10-2019
|

Question

I'm new to Named Entity Recognition and I'm having some trouble understanding what/how features are used for this task.

Some papers I've read so far mention features used, but don't really explain them, for example in Introduction to the CoNLL-2003 Shared Task:Language-Independent Named Entity Recognition, the following features are mentioned:

Main features used by the the sixteen systems that participated in the CoNLL-2003 shared task sorted by performance on the English test data. Aff: affix information (n-grams); bag: bag of words; cas: global case information; chu: chunk tags; doc: global document information; gaz: gazetteers; lex: lexical features; ort: orthographic information; pat: orthographic patterns (like Aa0); pos: part-of-speech tags; pre: previously predicted NE tags; quo: flag signing that the word is between quotes; tri: trigger words.

I'm a bit confused by some of these, however. For example:

isn't bag of words supposed to be a method to generate features (one for each word)? How can BOW itself be a feature? Or does this simply mean we have a feature for each word as in BOW, besides all the other features mentioned?
how can a gazetteer be a feature?
how can POS tags exactly be used as features ? Don't we have a POS tag for each word? Isn't each object/instance a "text"?
what is global document information?
what is the feature trigger words?

I think all I need here is to just to look at an example table with each of these features as columns and see their values to understand how they really work, but so far I've failed to find an easy to read dataset.

Could someone please clarify or point me to some explanation or example of these features being used?

Solution

The features for a token in a NER algorithm are usually binary. i.e The feature exists or it does not. For example, a token (say the word 'hello'), is all lower case. Therefore, that is a feature for that word.

You could name the feature 'IS_ALL_LOWERCASE'.

Now, for POS tags, lets take the word 'make'. It is a verb and hence the feature "IS_VERB" is a feature for that word.

A gazetter can be used to generate features. The presence (or absence) of a word in the gazatter is a valid feature. Example: the word 'John' is present in the gazetter of Person names. so "IS_PERSON_NAME" can be a feature.

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange