Question

Is/are there existing C++ NLP API(s) out there? The closest thing I have found is CLucene, a port of Lucene. However, it seems a bit obsolete and the documentation is far from complete.

Ideally, this/these API(s) would permit tokenization, stemming and PoS tagging.

Was it helpful?

Solution

Freeling is written in C++ too, although most people just use their binaries to run the tools: http://devel.cpl.upc.edu/freeling/downloads?order=time&desc=1

Try something like DyNet, it's a generic neural net framework but most of its processes are focusing on NLP because the maintainers are creators of the NLP community.

Or perhaps Marian-NMT, it was designed for sequence-to-sequence model machine translation but potentially many NLP tasks can be structured as a sequence-to-sequence task.


Outdated

Maybe you can try Ellogon http://www.ellogon.org/ , they have GUI support and also C/C++ API for NLP too.

OTHER TIPS

if you remove the restriction on c++ , you get the perfect NLTK (python)

the remaining effort is then interfacing between python and c++.

Apache Lucy would get you part of the way there. It is under active development.

Maybe you can use Weka-C++. It's the very popular Weka library for machine learning and data mining (including NLP) ported from Java to C++.

Weka supports tokenization and stemming, you'll probably need to train a classifier for PoS tagging.

I only used Weka with Java though, so I'm afraid can't give you more details on this version.

There is TurboParser by André Martins at CMU, also has a Python wrapper. There is is an online demo for it.

This project provides free (even for commercial use) state-of-the-art information extraction tools. The current release includes tools for performing named entity extraction and binary relation detection as well as tools for training custom extractors and relation detectors.

MITIE is built on top of dlib, a high-performance machine-learning library, MITIE makes use of several state-of-the-art techniques including the use of distributional word embeddings and Structural Support Vector Machines[3]. MITIE offers several pre-trained models providing varying levels of support for both English and Spanish, trained using a variety of linguistic resources (e.g., CoNLL 2003, ACE, Wikipedia, Freebase, and Gigaword). The core MITIE software is written in C++, but bindings for several other software languages including Python, R, Java, C, and MATLAB allow a user to quickly integrate MITIE into his/her own applications.

https://github.com/mit-nlp/MITIE

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top