Question

Is there any better way to fetch text contents of particular sections from wikipedia. I have the below code to skip some sections but the process is taking too long to fetch data what am looking for.

    for($i=0;$i>10;$i++){
      if($i != 2 || $i != 4){
          $url = 'http://en.wikipedia.org/w/api.php?action=parse&page=ramanagara&format=json&prop=text&section='.$i;
          $ch = curl_init($url);
          curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
          curl_setopt ($ch, CURLOPT_USERAGENT, "TestScript"); 
          $c = curl_exec($ch);
          $json = json_decode($c);

          $content = $json->{'parse'}->{'text'}->{'*'};
          print preg_replace('/<\/?a[^>]*>/','',$content);
       }
    }
Was it helpful?

Solution

For starters, you're telling this to loop until $i is greater than 10, which in practice, will loop until the server request times out. Change it to $i<10, or if you need only a handful of sections, try:

foreach (array(1,3,5,6,7) as $i)
    //your code

Second, decoding JSON into an associative array like this:

$json = json_decode($c, true);

And referencing it like $json['parse']['text']['*'] is easier to work with, but that's up to you.

And third, you'll find that strip_tags() will likely function faster and more accurately than stripping tags with regular expressions.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top