Question

I'm working on a wiki search engine based on wikimedia.

Actually, here is my query :

/external/wikiPublic/api.php?action=query&list=search&srsearch=".$search."&srprop=snippet&format=xml

It works well but the results are terribles. For example, it returns redirections like this :

<p ns="0" title="Imprimantes" snippet="#REDIRECTION [[<span class='searchmatch'>Imprimantes</span> Enseignement]] "/>

It tried to add the parameter &redirects=0 to the URL but it doesn't work and it still shows this kind of results. It also put wikimedia syntax in the snippet, as you can see. It is sometimes awful, like this one :

<p ns="0" title="Wifi" snippet="== Le <span class='searchmatch'>Wifi</span> ici == [[Fichier:Wi-Fi_Logo.png|right|250px|Logo <span class='searchmatch'>Wifi</span>]] "/>

I also tried to change the snippet to sectionsnippet but it doesn't work, it returns nothing in the sectionsnippet XML attribute

So, do you know how I could resolve these issues?

  • Prevent redirections in search results
  • Remove wikimedia syntax or returning "plain text" or only the matching selection... I don't know which one is the best
  • Show results for partial terms; like "imprimante" returning pages containing "imprimantes"
Was it helpful?

Solution

The snippets returned by MediaWiki search API are generated by the search backend MediaWiki is configured to use.

By default, this is the built-in database search, which indeed returns unparsed snippets. To get nicer parsed snippets, you need to install a custom search extension, such as the Lucene-based MWSearch used by Wikipedia and other Wikimedia wikis.

OTHER TIPS

Have you tried with a simple file_get_contents call?

$phrase = 'World War';

$search = str_replace( ' ', '+', $phrase );
$search_string = 'http://en.wikipedia.org/wiki/Special:Search?go=Go&search=' . $search;
$result = file_get_contents( $search_string );

echo $result;

Works fine for me.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top