Question

I have a couple terabytes of CSV data that I am trying to import into a PostgreSQL 8.4 database (on a RedHat 6.2 server), whose data directory is initialized on a multipath hardware RAID. There are four folders of CSV data that need to be imported, and the import script acts according to what it finds in those directories, so right now it's simplest for me to run the import script separately for each server.

I have run these scripts serially on a Debian server (without multipath) before, waiting for each script to finish, and that worked. However, when I had to re-import later on this RedHat system, I decided to fire up four screen sessions and blast away. Unfortunately, something I'm doing here is destroying the filesystem (asterisks are omitted names):

[trevor@***** ~]$ ls -lah /var/datastore/
ls: cannot access /var/datastore/usersnapshot: Input/output error
ls: cannot access /var/datastore/****: Input/output error
ls: cannot access /var/datastore/localdb: Input/output error
ls: cannot access /var/datastore/*****_DATA: Input/output error
ls: cannot access /var/datastore/*****: Input/output error
total 48K
drwxr-xr-x  10 root   root   4.0K May  3 23:16 .
drwxr-xr-x. 31 root   root   4.0K Apr 22 20:45 ..
d??????????  ? ?      ?         ?            ? *****
d??????????  ? ?      ?         ?            ? *****_DATA
d??????????  ? ?      ?         ?            ? ****
drwxrwx---   2 root   root   4.0K Feb 26 11:47 *******
drwxrwx---   5 trevor coders 4.0K Apr 24 14:07 codez
drwxrwx---   3 root   root   4.0K Mar 14 11:28 *****
d??????????  ? ?      ?         ?            ? localdb
drwx------   2 root   root    16K Feb 26 11:17 lost+found
drwxrwx---   2 root   root   4.0K Feb 26 11:47 ********
drwxr-xr-x   2   1000   1000 4.0K May  4 00:00 trexdata
drwxr-xr-x   2   1000   1000 4.0K May  4 00:00 trexdata_snapshot
d??????????  ? ?      ?         ?            ? usersnapshot

There should be a postgres data directory here with ownership postgres.postgres named pgsqldb, but it's now gone. Worse, when I drop into a psql prompt to look at the database, the tables are listed, but only data from the first import script has been imported properly. If I stop the postmaster, unmount, and run fsck, I don't get that directory back either.

What's going on here? I was assured the multipath drivers and mounts for the RAID volume in question is working, so I don't think it's the hardware at this point. For reference, each script adds about 105,000 points every couple seconds to a table in the database.

Here's the import script code:

#!/bin/bash                                                                                                    

# run as: /pathtoshfile/cell_import.sh $(pwd)/data_files in IMPORT_DATA dir                                                                                            

for csv_file in $@
do
    myfilename=`basename $csv_file`
#    echo $myfilename                                                                                           
    i=${myfilename:0:4}
    j=${myfilename:5:4}
    grid=${myfilename:17:1}

    echo "loading grid $grid, i=$i, j=$j from file $csv_file"

    psql db <<SQLCOMMANDS                                                                                    

CREATE TEMPORARY TABLE timport (LIKE data10min);                                                                
COPY timport                                                                                                    
 (point_date,gmt_time,surface_skin_temp_k,surface_pressure_mb,accum_precip_kg_per_m2,agl_2m_humid_g_per_kg,down_shortwave_rad_flux_w_per_m2,down_longwave_rad_flux_w_per_m2,agl_10m_temp_k,agl_10m_windspd_m_per_s,agl_10m_winddir_deg,agl_50m_temp_k,agl_50m_windspd_m_per_s,agl_50m_winddir_deg,agl_temp_k,agl_80m_windspd_m_per_s,agl_80m_winddir_deg,agl_100m_temp_k,agl_100m_windspd_m_per_s,agl_100m_winddir_deg,agl_200m_temp_k,agl_200m_windspd_m_per_s,agl_200m_winddir_deg) FROM '$csv_file' WITH CSV;
UPDATE timport SET grid_id = '$grid', grid_i=$i, grid_j=$j;                                                     
INSERT INTO data10min SELECT * FROM timport;                                                                    

SQLCOMMANDS                                                                                                     

done

Sample script output:

COPY 105408
UPDATE 105408
INSERT 0 105408
loading grid E, i=0135, j=0130 from file /media/backup1/****_DATA/E/0135_0130.****.E.txt
CREATE TABLE
COPY 105408
UPDATE 105408
INSERT 0 105408
loading grid E, i=0135, j=0131 from file /media/backup1/****_DATA/E/0135_0131.****.E.txt
CREATE TABLE
COPY 105408
UPDATE 105408
INSERT 0 105408

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top