Pergunta

This is the question I asked yesterday. I was able to get the required data. The final data is like this. Please follow this link.

I tried with the following code to get all the infobox data

                                content = content.split("}}\n");
                                for(k in content)
                                {
                                    if(content[k].search("Infobox")==2)
                                    {
                                        var infobox  = content[k];
                                        alert(infobox); 
                                        infobox = infobox.replace("{{","");
                                        alert(infobox);
                                        infobox = infobox.split("\n|");
                                        //alert(infobox[0]);
                                        var infohtml="";
                                        for(l in infobox)
                                        {
                                            if(infobox[l].search("=")>0)
                                            {
                                                var line = infobox[l].split("=");

                                                infohtml = infohtml+"<tr><td>"+line[0]+"</td><td>"+line[1]+"</td></tr>";

                                            }
                                        }
                                        infohtml="<table>"+infohtml+"</table>";
                                        $('#con').html(infohtml);
                                        break;
                                    }
                                }

I initially thought each element is enclosed in {{ }}. So I wrote this code. But what I see is, I was not able to get the entire infobox data with this. There is this element

{{Sfn|National Informatics Centre|2005}}

occuring which ends my infobox data.

It seems to be far simpler without using json. Please help me

Foi útil?

Solução

Have you tried DBpedia? Afaik they provide template usage information. There is also a toolserver tool named Templatetiger, which does template extraction from the static dumps (not live).

However, I once wrote a tiny snippet to extract templates from wikitext in javascript:

var title; // of the template
var wikitext; // of the page
var templateRegexp = new RegExp("{{\\s*"+(title.indexOf(":")>-1?"(?:Vorlage:|Template:)?"+title:title)+"([^[\\]{}]*(?:{{[^{}]*}}|\\[?\\[[^[\\]]*\\]?\\])?[^[\\]{}]*)+}}", "g");
var paramRegexp = /\s*\|[^{}|]*?((?:{{[^{}]*}}|\[?\[[^[\]]*\]?\])?[^[\]{}|]*)*/g;
wikitext.replace(templateRegexp, function(template){
    // logabout(template, "input ");
    var parameters = template.match(paramRegexp);
    if (!parameters) {
        console.log(page.title + " ohne Parameter:\n" + template);
        parameters  = [];
        }
    var unnamed = 1;
    var p = parameters.reduce(function(map, line) {
        line = line.replace(/^\s*\|/,"");
        var i = line.indexOf("=");
        map[line.substr(0,i).trim() || unnamed++] = line.substr(i+1).trim();
        return map;
    }, {});
    // you have an object "p" in here containing the template parameters
});

It features one-level nested templates, but still is very error-prone. Parsing wikitext with regexp is as evil as trying to do it on html :-)

It may be easier to query the parse-tree from the api: api.php?action=query&prop=revisions&rvprop=content&rvgeneratexml=1&titles=.... From that parsetree you will be able to extract the templates easily.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top