Question

Is there a Wikipedia API available to fetch its contents in plain JSON if possible without BBCode, or Wikipedia special code! Something similar to YouTube's JSON API like this.

Was it helpful?

Solution

Please take a look at MediaWiki API help. There you can find all the necessary information. You can choose the format of responses among the following list:

json, jsonfm, php, phpfm, wddx, wddxfm, xml, xmlfm, yaml, yamlfm
rawfm, txt, txtfm, dbg, dbgfm, dump, dumpfm, none

OTHER TIPS

There is also the opportunity to consume Wikipedia pages through a wrapper API like JSONpedia. It works both live (ask for the current JSON representation of a Wikipedia page) and storage based (query multiple pages previously ingested in Elasticsearch and MongoDB).

Here's a Windows curl statement that returns a JSON response of a Wikipedia entry (Albert Einstein). Most of the HTML markup is removed although <ref> remains. It also contains some Wikipedia markup.

curl "https://en.wikipedia.org/w/api.php?origin=*&action=query&format=json&formatversion=2&redirects&prop=revisions&rvprop=content&titles=Albert+Einstein" -o curl-wiktionary-result.json

Use this jq command to drill down into the "content" property:

jq ".query.pages[].revisions[].content" < curl-wiktionary-result.json
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top