The Whole Shebang

#!/bin/sh
#!/usr/bin/perl
#!/usr/bin/env python

If you've been using POSIX systems for any serious length of time, chances are you've encountered some scripts that start with lines like you see above. It's even possible that you've written some yourself. The magic two bytes #! at the beginning of the file signal the kernel that this is a script that requires an interpreter to run, and that interpreter is specified by the rest of the first line.

Or does it?

In fact, this isn't specified by POSIX. In even strong terms, this is explicitly unspecified behavior. From the specification for the exec():

There are two distinct ways in which the contents of the process image file may cause the execution to fail, distinguished by the setting of errno to either [ENOEXEC] or [EINVAL] (see the ERRORS section). In the cases where the other members of the exec family of functions would fail and set errno to [ENOEXEC], the execlp() and execvp() functions shall execute a command interpreter and the environment of the executed command shall be as if the process invoked the sh utility using execl() as follows:

execl(<shell path>, arg0, file, arg1, ..., (char *)0);

where <shell path> is an unspecified pathname for the sh utility, file is the process image file, and for execvp(), where arg0, arg1, and so on correspond to the values passed to execvp() in argv[0], argv[1], and so on.

In summary, whenever the implementation doesn't recognize a file as an executable, it is required to be passed to sh for execution. And if we take a look at the Shell Command Language, we see (emphasis added):

The shell reads its input from a file (see sh), from the -c option or from the system() and popen() functions defined in the System Interfaces volume of POSIX.1-2017. If the first line of a file of shell commands starts with the characters "#!", the results are unspecified.

What does this mean, really? Unlike everyone's favorite topic undefined behavior, unspecified behavior essentially means that the implementation should attempt to do something useful, but it isn't required to document what that is (in contrast with implementation defined behavior). This is really a compromise on behalf of the standard committee between existing implementations, some of which took #! as a magic byte sequence and some of which did not. By leaving the behavior explicitly unspecified, both interpretations are allowed.

This is important for writing portable scripts. Since the behavior is unspecified, it cannot be relied on. Essentially, every script should be treated as a sh script. This post serves to set the motivation for a series of posts on how to write scripts in a portable manner without relying on #!.

Copyright © 2019 Jakob Kaivo <jakob@kaivo.net>