Why should the shebang line always be the first line?

https://stackoverflow.com/questions/12910744

07-07-2021
|

Question

I have a simple perl script as below:

#!/usr/bin/perl

use strict;
use warnings;

print "hello world!\n";

I can execute this script as below:

>temp.pl
hello world!
>

If I add some comments like this:

#this script is just for test
#the shebang
#!/usr/bin/perl

use strict;
use warnings;

print "hello world!\n";

and when I try to execute, it gives me output as below:

> temp.pl
use: Command not found.
use: Command not found.
print: Command not found.
>

The point here is the shebang line should be always at the top, no matter what. Can anybody explain why?

Solution

The shebang must be the first line because it is interpreted by the kernel, which looks at the two bytes at the start of an executable file. If these are #! the rest of the line is interpreted as the executable to run and with the script file available to that program. (Details vary slightly, but that is the picture).

Since the kernel will only look at the first two characters and has no notion of further lines, you must place the hash bang in line 1.

Now what happens if the kernel can't execute a file beginning with #!whatever? The shell, attempting to fork an executable and being informed by the kernel that it can't execute the program, as a last resort attempts to interpret the file contents as a shell script. Since the shell is not perl, you get a bunch of errors, exactly the same as if you attempted to run

 sh temp.pl

OTHER TIPS

It's not just that it has to be the first line, the characters #! have to be the first two bytes in the file. That this can run scripts is a shell feature, not an OS one, and it's not specific to any particular scripting language.

When the system is told to execute the contents of a file, either with something like .../path/to/bin/program, or via the analogous route through the PATH, it examines the first few bytes of the file to look for the 'magic numbers' which reveal what type of file it is (you can peek at that process using the file(1) command). If it's a compiled binary, then it'll load and execute it in an appropriate manner, and if those first two bytes are #! it'll do the 'shebang-hack'.

The 'shebang-hack' is a special case that's employed by some shells (in fact, essentially every one, but it's convention rather than a requirement), in which the shell reads the remaining bytes up to a newline, interprets these as a filename, and then executes that file giving it the rest of the current file as input. Plus some details you can probably read about elsewhere.

Some (versions of) shells will allow quite long first lines, some allow only short ones; some allow multiple arguments, some allow only one.

If the file doesn't start with #!, but does appear to be text, some shells will heuristically try to execute it anyway. Csh (if I recall correctly) takes a punt on it being a csh-script, and there's some complicated and arcane case to do with some shells' behaviour if the first line is blank, which life is too short to remember.

There are interesting and extensive details (and accurate ones, in the sense that they match my recollections!) at Sven Mascheck's #! page.

In addition to the explanations above, which are covered in detail here and here and here there's some special things about the #! and Perl which haven't been mentioned yet.

Perl reads the #! line and does two things. First, if the path doesn't look like perl, it will rexecute the program using that! For example...

#!/bin/sh

echo "Hello world!"

Will run correctly if executed as perl /path/to/that/program. I don't know for what historical reason Perl does this, but it comes in handy when you're testing multiple languages with Test::Harness.

The second thing is Perl finds any switches in the #! line and applies them just as if they were on the command line. This is why #!/usr/bin/perl -w works to turn on warnings.

It's worth mentioning that unlike the other parts of the shebang processing, this is all done inside Perl, not Unix, and so is portable to Windows.

Another Perl + shebang note is this madness you might find at the top of many Perl programs.

#!/usr/bin/perl

eval 'exec /usr/bin/perl -w -S $0 ${1+"$@"}'
    if 0; # not running under some shell

Sometimes, on very, very, very old systems, #! does not work and the Perl program is executed by the shell. The eval forces the the shell to first thing rexecute the file with Perl. Since shell statements end on newline it doesn't see the if 0. Perl does see the if 0, so it doesn't execute the eval. Both Perl and shell have syntactically equivalent eval operators which makes the hack work.

At least on POSIX compliant systems, the shebang is used to tell the executable loader what to do with text files having the executable bit set.

The loader knows what to do with binary files, they start with a "magic number", usually ELF related these days.

On the other hand, text files that do not have a shebang are executed by the POSIX compliant shell available on the machine, this is why you have these shell error messages:

use: Command not found.
use: Command not found.
print: Command not found.

When your executable is not to be interpreted by the POSIX compliant shell, you need to tell the loader what interpreter to use. Other OSes like Windows pick the file extension to figure it out but Unix doesn't use or care about extensions in this specific case. What it uses is the shebang on the first line which states what command interpreter to use. The only drawback is that the scripting language should ignore this first line. This is hopefully the case as # is a comment line prefix with most scripting languages.

Despite popular belief, portable scripts should not have a shebang at all. In particular #!/bin/sh is not recommended for them.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow