How to use MediaWiki::DumpFile to convert Wikipedia XML dump to HTML?

https://stackoverflow.com/questions/20128887

03-08-2022
|

Question

On page MediaWiki::DumpFile following code is present:

  use MediaWiki::DumpFile;

  $mw = MediaWiki::DumpFile->new;

  $sql = $mw->sql($filename);
  $sql = $mw->sql(\*FH);

  $pages = $mw->pages($filename);
  $pages = $mw->pages(\*FH);

  $fastpages = $mw->fastpages($filename);
  $fastpages = $mw->fastpages(\*FH);

  use MediaWiki::DumpFile::Compat;

  $pmwd = Parse::MediaWikiDump->new;

I'm completely new to Perl and don't know what to do with $fastpages to save all HTML pages (or text, it doesn't matter) from XML dump. Can you help me? And what is *FH ?

La solution

I haven't used it but the documentation for MediaWiki::DumpFile::FastPages has the following example for printing the title and text of each article in a dump file:

use MediaWiki::DumpFile::FastPages;

$pages = MediaWiki::DumpFile::FastPages->new($file);
$pages = MediaWiki::DumpFile::FastPages->new(\*FH);

while(($title, $text) = $pages->next) {
  print "Title: $title\n";
  print "Text: $text\n";
}

This will write everything to stdout. When you create the MediaWiki::DumpFile::FastPages object, you can pass either a file name, e.g.

$file = "/path/to/dump/file";
$pages = MediaWiki::DumpFile::FastPages->new($file);

or a reference to a file handle, e.g.

open FH, "<", "/path/to/dump/file" or die "Failed to open file: $!";
$pages = MediaWiki::DumpFile::FastPages->new(\*FH);

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow