I think it is quite well documented in Stanford NER faq section http://nlp.stanford.edu/software/crf-faq.shtml#a.
Here are the steps:
- In your properties file change the map to specify how your training data is annotated (or structured)
map = word=0,myfeature=1,answer=2
In
src\edu\stanford\nlp\sequences\SeqClassifierFlags.java
Add a flag stating that you want to use your new feature, let's call it useMyFeature Below
public boolean useLabelSource = false
, Add public boolean useMyFeature= true;In same file in
setProperties(Properties props, boolean printProps)
method afterelse if (key.equalsIgnoreCase("useTrainLexicon")) { ..}
tell tool, if this flag is on/off for youelse if (key.equalsIgnoreCase("useMyFeature")) { useMyFeature= Boolean.parseBoolean(val); }
In
src/edu/stanford/nlp/ling/CoreAnnotations.java
, add following sectionpublic static class myfeature implements CoreAnnotation<String> { public Class<String> getType() { return String.class; } }
In
src/edu/stanford/nlp/ling/AnnotationLookup.java
inpublic enumKeyLookup{..}
in bottom addMY_TAG(CoreAnnotations.myfeature.class,"myfeature")
In
src\edu\stanford\nlp\ie\NERFeatureFactory.java
, depending on the "type" of feature it is, add inprotected Collection<String> featuresC(PaddedList<IN> cInfo, int loc) if(flags.useRahulPOSTAGS){ featuresC.add(c.get(CoreAnnotations.myfeature.class)+"-my_tag"); }
Debugging: In addition to this, there are methods which dump the features on file, use them to see how things are getting done under hood. Also, I think you would have to spend some time with debugger too :P