On program configuration

In a perfect world, programs would require no configuration. Nor would they require user input. They would simply do whatever the user required. Sadly, we don't live in a perfect world, and programs need configuration.

I'm focusing on command line utilities here. GUI applications bring their own set of complications that are not as easily addressed by the solutions I will present. On the other hand, most GUI toolkits include configuration APIs, so if you're doing a GUI program, you should probably use your toolkit's preferred solution.

When faced with configuring a program, there are several ways to do it:

Let's look at some of the pros and cons of each:

Compilation Time Configuration

Pros: Your program doesn't have to parse any options, reducing the amount of code required, and perhaps reducing the risk of invalid values.

Cons: You hate your users. Your users hate you. Seriously, if you force you users to recompile your program to do something like change what color is used for highlighting, your program is bad and you should feel bad.

Environment Variables

Pros: The system provides easy access to key-value pairs via getenv() or its equivalents. Users can put all their configuration in their shell startup script. Easy to back up.

Cons: The environment is shared state, and a limited resource. Every byte used in the environment counts against {ARG_MAX}, which is allowed to be as small as 4096. Image a dozen programs with, say, 100 bytes of environment each. That's more than 1/4 of the space used already, and if your program is one of those dozen, at least 11/12 of that space is useless to your program.

Configuration Files

Pros: Infinitely extensible. Can be maintained in source control and easily synchronized between systems.

Cons: Infinitely extensible. That's code that you have to write and maintain. Often wind up becoming Domain Specific Languages in their own right, which can be a barrier to entry for some users, especially if your configuration file syntax is similar to, but not quite like another tool the user uses.

Command-Line Parameters

Pros: Decent system support with POSIX getopt(). User can copy and paste complete working samples. User can easily create aliases with different options.

Cons: Limited in number; if you are using the aforementioned POSIX getopt(), you are limited to 52 options. Some alternatives allow "long options", but these are not necessarily portable.

My Recommendations

When possible, your program should require minimal configuration. If it can derive all necessary parameters from the environment (e.g. it works on every file in the current directory), it should. This isn't always possible, but it's great when it is. Even the specification for most POSIX utilities include at least one option, and that standard has Minimal Interface, Minimally Defined as a guiding principle.

The next step is to indicate input and/or output files. Ideally, if not specified, these should be stdin and stdout. Additionally, specifying - as a file name should mean stdin or stdout, as appropriate.

The next level would be command-line options. Stick with POSIX getopt(). Especially because this limits you to no more than 52 options. Honestly, if you have more than 4 or 5 options, you need to strongly consider whether you're writing one program or two, and whether or not it's time to split the functionality into separate utilities. Allow options that appear mutually exclusive to simple overwrite each other, like ls's -x and -C; the last one present on the command line takes precedence.

If your list of options starts growing long and unwieldy, then it's time for a configuration file. There should be a reasonable default path, though it should be possible to specify an alternate location. Try to use an existing format before reinventing the wheel if possible.

And compile time configuration? Never do that. Just don't.

Copyright © 2020 Jakob Kaivo <jakob@kaivo.net>