Question

So I want to get a random Wikipedia Article but I don't want to grab the ones where the title or category is in a specific list I have (for bad word filtering).

I am currently using javascript and I'm not too familiar with the Wikipedia API but I have the query string to generate a random article and grab the extract but I'm not too sure on how to do the excluding. I didn't see anything in the documentation or even searching Google on how to do that.

The code is working and is fetching random articles but I need to filter them.

My actual javascript code that does the fetching

if (tempscript) return;
        if (!isRetry) {
            attempts = 0;
            minchars = minimumCharacters;
            maxchars = maximumCharacters;
            button.disabled = true;
            button.style.cursor = "wait";
        }
        tempscript = document.createElement("script");
        tempscript.type = "text/javascript";
        tempscript.id = "tempscript";
        tempscript.src = "http://en.wikipedia.org/w/api.php" + "?action=query&generator=random&prop=extracts" + "&exchars=" + maxchars + "&format=json&callback=onComplete&requestid=" + Math.floor(Math.random() * 999999).toString();
        document.body.appendChild(tempscript);
Was it helpful?

Solution

You should change your url to also inclide categories in your prop, and then set the cllimit to the maximum of 500:

tempscript.src = "http://en.wikipedia.org/w/api.php" + "?action=query&generator=random&prop=categories|extracts&cllimit=500&exchars=" + maxchars + "&format=json&callback=onComplete&requestid=" + Math.floor(Math.random() * 999999).toString();

Then, if the page has categories, it will list them in the returned JSON object.

In your callback function, you will then need the following:

var badArticles = ['Poop', 'Pee', 'Underpants'],
    badCategories = ['Images of poop', 'Images of pee', 'Images of underpants'],
    page = response.query.pages;
for (var i in page) {
    page = page[i]; // `i` will be the pageid in this loop
    break; // you don't want the loop to continue within the new `page` object
}

//exit callback when pagename is in bad articles list 
if (badArticles.indexOf(page.title) !== -1) return false;

if (page.categories) {
    for (var i=0;i<page.categories.length;i++) {

        //exit callback when pagename has a category in bad categories list
        if (badCategories.indexOf(page.categories[i].title)) return false;

    }
}

That should work. I did not personally test it, but I would expect that to work, based on the format of the MediaWiki API's response. If that doesn't work, please leave a comment.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top