Why avoid subshells?

Question 1

There are a few things going on.

First, forking a subshell might be unnoticible when it happens only once, but if you do it in a loop, it adds up to measurable performance impact. The performance impact is also greater on platforms such as Windows where forking is not as cheap as it is on modern Unixlikes.

Second, forking a subshell means you have more than one context, and information is lost in switching between them -- if you change your code to set a variable in a subshell, that variable is lost when the subshell exits. Thus, the more your code has subshells in it, the more careful you have to be when modifying it later to be sure that any state changes you make will actually persist.

See BashFAQ #24 for some examples of surprising behavior caused by subshells.

Question 2

sometimes examples are helpful.

f='fred';y=0;time for ((i=0;i<1000;i++));do if [[ -n "$( grep 're' <<< $f )" ]];then ((y++));fi;done;echo $y

real    0m3.878s
user    0m0.794s
sys 0m2.346s
1000

f='fred';y=0;time for ((i=0;i<1000;i++));do if [[ -z "${f/*re*/}" ]];then ((y++));fi;done;echo $y

real    0m0.041s
user    0m0.027s
sys 0m0.001s
1000

f='fred';y=0;time for ((i=0;i<1000;i++));do if grep -q 're' <<< $f ;then ((y++));fi;done >/dev/null;echo $y

real    0m2.709s
user    0m0.661s
sys 0m1.731s
1000

As you can see, in this case, the difference between using grep in a subshell and parameter expansion to do the same basic test is close to 100x in overall time.

Following the question further, and taking into account the comments below, which clearly fail to indicate what they are trying to indicate, I checked the following code: https://unix.stackexchange.com/questions/284268/what-is-the-overhead-of-using-subshells

time for((i=0;i<10000;i++)); do echo "$(echo hello)"; done >/dev/null 
real    0m12.375s
user    0m1.048s
sys 0m2.822s

time for((i=0;i<10000;i++)); do echo hello; done >/dev/null 
real    0m0.174s
user    0m0.165s
sys 0m0.004s

This is actually far far worse than I expected. Almost two orders of magnitude slower in fact in overall time, and almost THREE orders of magnitude slower in sys call time, which is absolutely incredible. https://www.gnu.org/software/bash/manual/html_node/Bash-Builtins.html

Note that the point of demonstrating this is to show that if you are using a testing method that's quite easy to fall into the habit of using, subshell grep, or sed, or gawk (or a bash builtin, like echo), which is for me a bad habit I tend to fall into when hacking fast, it's worth realizing that this will have a significant performance hit, and it's probably worth the time avoiding those if bash builtins can handle the job natively.

By carefully reviewing a large programs use of subshells, and replacing them with other methods, when possible, I was able to cut about 10% of the overall execution time in a just completed set of optimizations (not the first, and not the last, time I have done this, it's already been optimized several times, so gaining another 10% is actually quite significant)

So it's worth being aware of.

Because I was curious, I wanted to confirm what 'time' is telling us here: https://en.wikipedia.org/wiki/Time_(Unix)

The total CPU time is the combination of the amount of time the CPU or CPUs spent performing some action for a program and the amount of time they spent performing system calls for the kernel on the program's behalf. When a program loops through an array, it is accumulating user CPU time. Conversely, when a program executes a system call such as exec or fork, it is accumulating system CPU time.

As you can see in particularly the echo loop test, the cost of the forks is very high in terms of system calls to the kernel, those forks really add up (700x!!! more time spent on sys calls).

I'm in an ongoing process of resolving some of these issues, so these questions are actually quite relevant to me, and the global community of users who like the program in question, that is, this is not an arcane academic point for me, it's realworld, with real impacts.

Question 3

well, here's my interpretation of why this is important: it's answer #2!

there's no little performance gain, even when it's about avoiding one subshell… Call me Mr Obvious, but the concept behind that thinking is the same that's behind avoiding useless use of <insert tool here> like cat|grep, sort|uniq or even cat|sort|uniq etc..

That concept is the Unix philosophy, which ESR summed up well by a reference to KISS: Keep It Simple, Stupid!

What I mean is that if you write a script, you never know how it may get used in the end, so every little byte or cycle you can spare is important, so if your script ends up eating billions of lines of input, then it will be by that many forks/bytes/… more optimized.

Question 4

I think the general idea is it makes sense to avoid creating an extra shell process unless otherwise required.

However, there are too many situations where either can be used and one makes more sense than the other to say one way is overall better than the other. It seems to me to be purely situational.