Question

Ok guys I've downloaded the wikipedia xml dump and its a whopping 12 GB of data :\ for one table and I wanted to import it into mysql databse on my localhost - however its a humongous file 12GB and obviously navicats taking its sweet time in importing it or its more likely its hanged :(.

Is there a way to include this dump or atleast partially at most you know bit by bit.


Let me correct that its 21 GB of data - not that it helps :\ - does any one have any idea of importing humongous files like this into MySQL database.

Was it helpful?

Solution

Use the command line instead, navicat is horrible for importing large files and will likely take 20x longer than using the CLI.

OTHER TIPS

Take a look into Sax parser it allows you to read in the corpus piece by piece rather than reading the whole 12gb into memory. I'm not too sure how you would interface it with mysql though.

this is a quite old question, FWIW.. refreshing with a new answer. i've encountered the same issues and sitting hours for a single massive sql file to run can be risky, and running into any issues basically means you start all over again. what i did to reduce the risk and gain some performance via CLI.

  1. split the massive SQL file into smaller more manageable chunks, for example 'enwiki-20140811-page.sql' split into about 75MB sized files.

    split -l 75 enwiki-20140811-page.sql split_
    

    will produce a fair number of files prefixed with 'split_' in the file name.

  2. iterate over this file list and import one at a time...a simple shell script as such.

    for f in $FILES
    do
      echo "Processing $f file..."
      mysql -h $HOST -u $USER -p$PSWD $DB < $f
    done
    

if this ever breaks for some reason, you can easily resume where you left off.

Spliting the SQL file via line count prevents breaking any large INSERT statements. However if you drop the line count too low, you could split DROP and CREATE statements at the beginning of the SQL. This is easily fixed by opening the first few split files and resolving.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top