wikionary API - meaning of words

https://stackoverflow.com/questions/4175533

09-10-2019
|

Question

I would like get meaning of selected word using wikionary API. Content retrieve data should be the same as is presented in "Word of the day", only the basic meaning without etympology, Synonyms etc.. for example

"postiche n Any item of false hair worn on the head or face, such as a false beard or wig."

I tried use documentation but i can find similar example, can anybody help with this problem?

Solution

Although MediaWiki has an API (api.php), it might be easiest for your purposes to just use the action=raw parameter to index.php if you just want to retrieve the source code of one revision (not wrapped in XML, JSON, etc., as opposed to the API).

For example, this is the raw word of the day page for November 14:

http://en.wiktionary.org/w/index.php?title=Wiktionary:Word_of_the_day/November_14&action=raw

What's unfortunate is that the format of wiki pages focuses on presentation (for the human reader) rather than on semantics (for the machine), so you should not be surprised that there is no "get word definition" API command. Instead, your script will have to make sense of the numerous text formatting templates that Wiktionary editors have created and used, as well as complex presentational formatting syntax, including headings, unordered lists, and others. For example, here is the source code for the page "overflow":

http://en.wiktionary.org/w/index.php?title=overflow&action=raw

There is a "generate XML parse tree" option in the API, but it doesn't break much of the presentational formatting into XML. Just see for yourself:

http://en.wiktionary.org/w/api.php?action=query&titles=overflow&prop=revisions&rvprop=content&rvgeneratexml=&format=jsonfm

In case you are wondering whether there exists a parser for MediaWiki-format pages other than MediaWiki, no, there isn't. At least not anything written in JavaScript that's currently maintained (see list of alternative parsers, and check the web sites of the two listed ones). And even then, supporting most/all of the common templates will be a big challenge. Good luck.

OTHER TIPS

OK, I admit defeat.

There are some files relating to Wiktionary in Pywikipediabot and I looking at the code, it does look like you should be able to get it to parse meaning/definition fields for you.

However the last half an hour has convinced me otherwise. The code is not well written and I wonder if it has ever worked.

So I defer to idealmachine's answer, but I thought I would post this to save anyone else from making the same mistakes. :)

MediaWiki does have an API but it's low-level and has no support for anything specific to each wiki. For instance it has no encyclopedia support for Wikipedia and no dictionary support for Wiktionary. You can retrieve the raw wikitext markup of a page or a section using the API but you will have to parse it yourself.

The first caveat is that each Wiktionary has evolved its own format but I assume you are only interested in the English Wiktionary. One cheap trick many tools use is to get the first line which begins with the '#' character. This will usually be the text of the definition of the first sense of the first homonym.

Another caveat is that every Wiktionary uses many wiki templates so if you are looking at the raw text you will see plenty of these. The only way to reliably expand these templates is by calling the API with action=parse.

As mentioned earlier, the content of the Wiktionary pages is in human-readable format, wikitext, so MediaWiki API doesn't allow to get word meaning because the data is not structured.

However, each page follows specific convention, so it's not that hard to extract the meanings from the wikitext. Also, there're some APIs, like Wordnik or Lingua Robot that parse Wiktionary content and provide it in JSON format.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow