On the zero-initialization of structs

There is an anti-pattern, particularly strong in network programming, that looks something like this:

struct sockaddr_in sin;
memset(&sin, 0, sizeof(sin));

It occasionally occurs in a roughly equivalent but distinctly BSD-flavored variant:

struct sockaddr_in sin;
bzero(&sin, sizeof(sin));

How did this come to be? What's wrong with it? What is the better path? Let's discuss each of those questions in turn.

Where did this come from?

I think my first encounter with this anti-pattern was in a network programming tutorial, almost certainly derived from W. Richard Steven's seminal UNIX Network Programming. It's certainly present (in the BSD flavor) in the sample echo client and server presented in the 3rd edition, and I have little reason to believe that has changed significantly since the 1st edition, other than perhaps to adjust from K&R style function declarations to standardized prototypes (The 1st edition was published in 1990, so it might have gone either way - if anyone has a copy of the 1st edition and wants to compare the echo client and server to the 3rd edition, please email me).

It's also still present (in the memset() flavor) in Beej's Guide to Network Programming, which is usually pulled out as the go-to starter's guide on the topic.

I don't readily have access to any earlier sources, but I am almost certain that this same pattern (especially in the BSD flavor) can be traced to the earliest examples demonstrating the use of the sockets API, which showed up in BSD in 1983. And that date is important, because it predates the standardization of C, and hence the standardization of C structs, by a good six years. Indeed, the language specified in the appendix to K&R states "Structures can be initialized, but this operation is incompletely implemented and machine-dependent", so this kind of work-around was necessary prior to 1989.

What's wrong with it?

In short: undefined behavior. A call to memset() sets each byte in the specified range to its second parameter (converted to unsigned char). So with the idiomatic value of 0, you are asking to explicitly set all bits to the pattern your architecture uses to represent 0, which is most likely a bunch of zeroes. But that is really setting all the bits to a pattern. Including padding bits, modification of which is undefined behavior. And bits that represent pointers, which you might think gives you a NULL pointer, but there is no guarantee that this is the case. And bits that represent floating point numbers. In short, you're asking for something to go wrong. Chances are very slim that they will, but that chance is non-zero.

I'll also mention that there is a potential to restrict optimizations that a compiler can perform in these cases. True, GCC at least seems to know enough about the definition of memset() to make some smart choices, but it shouldn't have to, and neither should any other compiler.

What is the better path?

I hinted at it before, and the answer is to initialize your struct variables properly, just like you would with any other variable:

struct sockaddr_in sin = { 0 };

ISO/IEC 9899 tells us that this explicitly initializes the first member of the struct with the literal value 0, and all other members as though they were declared with static storage duration, which means pointers are NULL, arithmetic types are the appropriately encoded 0, and aggregate types are recursively initialized similarly. Notably, the literal 0 ensures that if the first member is a pointer, it will be initialized to NULL (not necessarily all zero bits), and floating-point types will be similarly encoded properly. It also removes the chance at mucking about with padding bits, eliminating another potential source of undefined behavior. There is also a minor benefit of removing the need for a function call, which is probably negligible in overall execution time, but is there.

Perhaps the greatest benefit, though, is in maintainability. By keeping your initialization inline with declaration, you maintain a single point where an object comes into being and is in a valid state. There's no chance that in a refactor you might modify the declaration but forget to update the initialization. All this while very clearly demonstrating your intent. It's a win all-around.

Copyright © 2020 Jakob Kaivo <jakob@kaivo.net>