Using GNU Parallel With Split

Question 1

You could let parallel do the splitting:

<2011.psv parallel --pipe -N 50000000 ./carga_postgres.sh

Note, that the manpage recommends using --block over -N, this will still split the input at record separators, \n by default, e.g.:

<2011.psv parallel --pipe --block 250M ./carga_postgres.sh

Testing `--pipe` and `-N`

Here's a test that splits a sequence of 100 numbers into 5 files:

seq 100 | parallel --pipe -N23 'cat > /tmp/parallel_test_{#}'

Check result:

wc -l /tmp/parallel_test_[1-5]

Output:

 23 /tmp/parallel_test_1
 23 /tmp/parallel_test_2
 23 /tmp/parallel_test_3
 23 /tmp/parallel_test_4
  8 /tmp/parallel_test_5
100 total

Question 2

If you use GNU split, you can do this with the --filter option

‘--filter=command’
With this option, rather than simply writing to each output file, write through a pipe to the specified shell command for each output file. command should use the $FILE environment variable, which is set to a different output file name for each invocation of the command.

You can create a shell script, which creates a file and start carga_postgres.sh at the end in the background

#! /bin/sh

cat >$FILE
./carga_postgres.sh $FILE &

and use that script as the filter

split -l 50000000 --filter=./filter.sh 2011.psv

Using GNU Parallel With Split

Testing --pipe and -N

Testing `--pipe` and `-N`