You might want to consider using GNU Parallel. By default, the output is buffered until the instance has completed running:
When running jobs that output data, you often do not want the output of multiple jobs to run together. GNU parallel defaults to grouping the output of each job, so the output is printed when the job finishes. If you want the output to be printed while the job is running you can use -u.
I believe the best way to run your script is vai:
find /path/to/logfiles/*.gz | parallel python logparser.py
or
parallel python logparser.py ::: /path/to/logfiles/*.gz
You can specify the number of processes to run using the -j
flag, i.e., -j4
.
The nice thing about Parallel is that is supports cartesian products of input arguments. For example, if you had some additional arguments that you wanted to iterate through for each file, you can use:
parallel python logparser.py ::: /path/to/logfiles/*.gz ::: 1 2 3
This will result in running the following across multiple processes:
python logparser.py /path/to/logfiles/A.gz 1
python logparser.py /path/to/logfiles/A.gz 2
python logparser.py /path/to/logfiles/A.gz 3
python logparser.py /path/to/logfiles/B.gz 1
python logparser.py /path/to/logfiles/B.gz 2
python logparser.py /path/to/logfiles/B.gz 3
...
Good luck!