Building strings in C

I see a common anti-pattern repeated a lot by C programmers who need to build up a string. It goes something like this (assume the undeclared variables are strings coming from elsewhere in the program):

char buffer[MAXSIZE];
strcpy(buffer, the_first_part);
strcat(buffer, the_next_part);
strcat(buffer, another_part);
strcat(buffer, still_more);

This is both inefficient and unsafe. The repeated use of strcat() means that the program has to start at the beginning of the string each time and walk through until it finds the null terminator, then begin appending the string. Some people try to avoid this by manually tracking things:

char buffer[MAXSIZE];
strcpy(buffer, the_first_part);
size_t start = strlen(the_first_part);
strcpy(buffer + start, the_next_part);
start += strlen(the_next_part);
strcpy(buffer + start, another_part);
start += strlen(another_part);
strcpy(buffer + start, still_more);

But really, all you've done is move the work of calculating the next start point out of strcat() and into your own program. A bit more experienced programmer might be tempted by the POSIX stpcpy() function, which returns a pointer to the new null terminator rather than the beginning of the string:

char buffer[MAXSIZE];
char *p = stpcpy(buffer, the_first_part);
p = stpcpy(p, the_next_part);
p = stpcpy(p, another_part);
p = stpcpy(p, still_more);

While this solves the efficiency problem, it's still unsafe. There is no bounds checking here to ensure that you aren't overflowing buffer. Some people might be tempted to use strncpy() and strncat(), but they would be wrong (which is a topic I'm saving for another post). Not only do they not solve the inefficiency problem, they introduce the chance that your buffer is no longer a null terminated string, and now you're just asking to be exploited.

So, what is an efficient, safe way to combine several strings? The answer isn't in <string.h>, but in <stdio.h>: snprintf(). By using snprintf(), you are allowing the implementation to perform the copies efficiently (every library I have looked at does so), while also guaranteeing against buffer overflow. As a pleasant side effect, you also reduce the number of function calls necessary to one:

char buffer[MAXSIZE];
snprintf(buffer, sizeof(buffer), "%s%s%s%s", the_first_part, the_next_part, another_part, still_more);

What's more, using snprintf(), you can dynamically allocate a buffer of just the right size using the fact that snprintf() returns "the number of characters that would have been written had n been sufficiently large, not counting the terminating null character, or a negative value if an encoding error occurred." So, you can introduce this little pattern where it benefits your code:

char *buffer = NULL;
/* add one to account for the null terminator */
size_t len = snprintf(buffer, 0, "%s%s%s%s", the_first_part, the_next_part,
	another_part, still_more) + 1;
buffer = malloc(len);
if (buffer == NULL) {
	/* handle error */
}
snprintf(buffer, len, "%s%s%s%s", the_first_part, the_next_part, another_part,
	still_more);
Copyright © 2020 Jakob Kaivo <jakob@kaivo.net>