Question

I'm currently generating around 100 files from my script, I would like to iterate through these files in batches of twenty, and execute them through another script, then remove the files when I am done (cleanup) I believe GNU Parallel can do this but I am not sure exactly how to do this?

# test if files exists and run
if [ "$(ls -A ${base_dir}/schedule)" ]; then

    while [ "$(ls -A ${base_dir}/schedule)" ]; do

        # current run of 20 files
        batch=`ls ${base_dir}/schedule | head -n 20`

        # parallel run on 4 processors
        parallel -j4 ./script.sh ${batch} ::: {1..20}

        # cleanup
        for file in "${batch}"; do
            rm "${base_dir}/schedule/${file}"
        done

    done
fi

expected output would be something like

# running first batch of twenty
 ./scipt.sh 1466-10389-data.nfo # after file has finished, rm 1466-10389-data.nfo
 ./scipt.sh 1466-10709-data.nfo # etc
 ./scipt.sh 1466-11230-data.nfo # etc
 ./scipt.sh 1466-11739-data.nfo
 ./scipt.sh 1466-11752-data.nfo
 ./scipt.sh 1466-13074-data.nfo
 ./scipt.sh 1466-14009-data.nfo
 ./scipt.sh 1466-1402-data.nfo
 ./scipt.sh 1466-14401-data.nfo
 ./scipt.sh 1466-14535-data.nfo
 ./scipt.sh 1466-1588-data.nfo
 ./scipt.sh 1466-17012-data.nfo
 ./scipt.sh 1466-17611-data.nfo
 ./scipt.sh 1466-18688-data.nfo
 ./scipt.sh 1466-19469-data.nfo
 ./scipt.sh 1466-19503-data.nfo
 ./scipt.sh 1466-21044-data.nfo
 ./scipt.sh 1466-21819-data.nfo
 ./scipt.sh 1466-22325-data.nfo
 ./scipt.sh 1466-23437-data.nfo

# wait till all are finished, OR queue up next file so  all times
# twenty files are running at until the directory is empty
Was it helpful?

Solution

If I understand correctly what you want to do, and if the files in schedule are not continuously being created, the script could be replaced with those two lines (not tested)

ls -A ${base_dir}/schedule | xargs -n 1 -P 4 ./script.sh 
rm "${base_dir}/schedule/*"

OTHER TIPS

My guess is that you want 20 scripts running constantly in parallel:

ls -A ${base_dir}/schedule | parallel -j20 ./script.sh {/}\; rm {}

Your while loop confuses me a bit: Is it needed because more files may be added while you run? If so you need to add that while loop:

while [ "$(ls -A ${base_dir}/schedule)" ]; do
  ls -A ${base_dir}/schedule | parallel -j20 ./script.sh {/}\; rm {}
done

Walk through the tutorial http://www.gnu.org/software/parallel/parallel_tutorial.html Your command line will love you for it.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top