Question

I was very impressed with the OpenCalais system. It's (is/has) a web service where you send your text, they analyze it, then you are provided with a series of categorized (RDF enabled) tags that your document belongs to.

But - at the moment - English is the only supported language.

Do you know of similar systems that handle multilanguage documents? (I'm interested n Italian, but multi language is a plus, of course)

Was it helpful?

Solution

Apache Stanbol can analyze texts in many different languages. So far the following languages are supported (precision and recall values may vary according to the language):

  • English,
  • 中文 (Chinese),
  • Español (Spanish),
  • Русский (Russian),
  • Português (Portuguese),
  • Deutsch (German),
  • Italiano (Italian),
  • Nederlands (Dutch),
  • Svenska (Swedish),
  • Dansk (Danish),
  • العربية (Arabic),
  • עברית (Hebrew),
  • 日本語 (Japanese).

The analysis will return the discovered entities. The analysis output format can be:

  • JSON-LD,
  • RDF/XML,
  • RDF/JSON,
  • Turtles,
  • N-TRIPLES.

Entities, or tagging, of texts can be further tailored according to the system configuration. Ideally any custom vocabulary can be plugged into the system.

There are a couple of demo end-points:

Not sure whether all the above languages are supported in the afore-mentioned end-points.

RedLink GmbH is going to provide cloud services based on Apache Stanbol and related software.

The WordLift plugin for WordPress already provides text analysis within WordPress for all the afore-mentioned languages (currently in testing stage). You can try it out installing the plug-in in WordPress and submitting textual contents in the post body.

You can also subscribe and write to the Apache Stanbol mailing list for specific requests or information.

OTHER TIPS

OpenCalais supports both French and Spanish metadata tagging for entities. The set of entities will be extended in future releases. See our online documentation at http://www.opencalais.com/documentation/calais-web-service-api

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top