Question

I'm trying to use MITIE to extract named entities from short text. I'm interested in entities such as dates, times, names, and locations. Out of the box, MITIE only recognises names, locations, and organisations. I'd like to train it to recognise dates, times and other categories as well. From looking at the structure of MITIE's directories and from the dlib website, I gather that this is done via an SVM. Is this correct?

With regards to adding new categories to the named entity recogniser, I have several questions:

  1. Can this be done in an augmentative fashion? That is, given an existing NER system, can I just add categories examples and train it to recognise those as well? Or do I need to train models from scratch?
  2. If I do need train models from scratch, what dataset can I use to do this?
  3. Related to adding new examples, is there an online method that I can use, feeding the system new examples and categories as and when I need to?
Was it helpful?

Solution

After having used MITIE for a few weeks, I feel like I at least have enough to answer my basic questions:

  1. (and 3.) All models need to be trained from scratch - there is no online method to add new samples to the model as they come in. This is unfortunate because MITIE takes at least 45 minutes to an hour to train on a ~20k-sized dataset.
  2. The datasets I used were ATIS, CoNLL 2003, and DBpedia

I've found MITIE to be quite good as far as classification accuracy goes, although it takes a bit of work to prepare datasets for it.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top