Question

If I have downloaded Wikipedia XML dumps, is there any way of removing all of the internal links from within an XML file?

Thanks

Was it helpful?

Solution

One thing you could do, if you are importing them into a local wiki, is to import all the files you want, then use a robot (eg. pywikipediabot is easy to use) to get rid of all the internal links.

OTHER TIPS

Wikipedia database dumps and information about using them are located here: Wikipedia:Database download. You should do this instead of writing a script to scrape Wikipedia.

I would try to use XSLT to transform the XML file into another XML file.

You could do a search and replace in your favorite text editor, replacing [[ and ]] with nothing.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top