I'm using exec instead of simply calling bash on the file because if I use exec on a large list of wget calls I get more than one wget process spawned.
No. When exec
is called, it does not spawn a new process. It replaces the existing process. See man bash
for details.
Simply calling bash on a file with a large list of urls is slow as it waits for one wget operation to complete before moving onto the next one.
True. Fortunately, there is a solution. To run a lot of processes in parallel, run them in the background. For example, to run many wget
processes in parallel, use:
while read url
do
wget "$url" -O - >> output.html &
done <list_of_urls
The ampersand at the end of the line causes that that command to run in the background in parallel with everything else. The above code will start new wget
processes as fast as it can. Those processes will continue until they complete.
You can experiment with this idea very simply at the command prompt. Run
sleep 10s
and your command prompt will disappear for 10 seconds. However, run:
sleep 10s &
and your command prompt will return immediately while sleep
runs in the background.
man bash
explains:
If a command is terminated by the control operator &, the shell executes the command in the background in a subshell. The shell does not wait for the command to finish, and the return status is 0.