It's a common thing to want to determine how big a file is. You might want to present that information to a user, or use it calculate a buffer size, or any of many valid use cases. Unfortunately, there seems to be an idea that this is simple in standard, portable C. The solution usually looks something like this:
#include <stdio.h>
long filesize(const char *path)
{
FILE *f = fopen(path, "rb");
if (f == NULL) {
return -1;
}
fseek(f, 0, SEEK_END);
long size = ftell(f);
fclose(f);
return size;
}
int main(int argc, char *argv[])
{
if (argc != 2) {
fprintf(stderr, "usage: %s \n", argv[0]);
return 1;
}
printf("%s is %ld bytes\n", argv[1], filesize(argv[1]));
}
The really unfortunate thing is that on the surface, and on casual investigation, this sort of thing actually works:
$ ls -l
total 24
-rwxrwxr-x 1 jkk jkk 16968 Jul 1 13:18 filesize*
-rw-rw-r-- 1 jkk jkk 373 Jul 1 13:18 filesize.c
$ ./filesize filesize
filesize is 16968 bytes
Here we have our little program agreeing exactly with ls
. But there is at least one subtle problem with this method. The ftell()
function returns a long int
, and a signed at that.
That causes this program fail, and fail spectacularly when given a file that is larger than LONG_MAX
. This may not seem like a big deal in our increasingly 64-bit world, but there is at least one 64-bit platform in use that defines long
as 32-bit and definitely supports files greater than 2147483647 bytes in size. It's not even an obscure platform, it's Windows, because of reasons better explained by Raymond Chen. So let's take a look at this little program on Windows:
C:\Users\JakobKaivo\source\filesize>dir
Volume in drive C has no label.
Volume Serial Number is D2DF-2996
Directory of C:\Users\JakobKaivo\source\filesize
07/01/2020 01:25 PM <DIR> .
07/01/2020 01:25 PM <DIR> ..
07/01/2020 01:24 PM 373 filesize.c
07/01/2020 01:25 PM 112,128 filesize.exe
07/01/2020 01:25 PM 2,015 filesize.obj
3 File(s) 114,516 bytes
2 Dir(s) 508,801,982,464 bytes free
C:\Users\JakobKaivo\source\filesize>filesize filesize.exe
filesize.exe is 112128 bytes
This, again at first blush, seems OK. But what if we need to check the size of a bigger file? Like the Windows installer ISO?
C:\Users\JakobKaivo\Downloads>dir
Volume in drive C has no label.
Volume Serial Number is D2DF-2996
Directory of C:\Users\JakobKaivo\Downloads
07/01/2020 10:40 AM <DIR> .
07/01/2020 10:40 AM <DIR> ..
07/01/2020 10:40 AM 5,650,477,056 en_windows_10_consumer_editions_version_2004_x64_dvd_8d28c5d7.iso
1 File(s) 5,650,477,056 bytes
2 Dir(s) 501,896,671,232 bytes free
C:\Users\JakobKaivo\Downloads>c:\Users\JakobKaivo\source\filesize\filesize en_windows_10_consumer_editions_version_2004_x64_dvd_8d28c5d7.iso
en_windows_10_consumer_editions_version_2004_x64_dvd_8d28c5d7.iso is -1 bytes
That's definitely not right.
For this particular problem, the truth is that there is no effective portable means of determining a file's size (in theory looping through the entire file with fgetc()
or fread()
might work, but then your file size code is O(n), which is not acceptable for the use cases where the naive implementation fails, because n is known to be large already). You really need to use platform specific functions to accurately get the size of files. That means stat()
on POSIX systems, and GetFileSizeEx()
on Windows. Something like:
#ifdef _WIN32
#include <windows.h>
#else
#define _POSIX_C_SOURCE 200809L
#include <sys/stat.h>
#endif
#include <stdio.h>
#include <stdint.h>
intmax_t filesize(const char *path)
{
#ifdef _WIN32
/* forcing use of the non-Unicode API for expository purposes only */
HANDLE f = CreateFileA(path, GENERIC_READ, FILE_SHARE_READ, NULL,
OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
if (f == INVALID_HANDLE_VALUE) {
return -1;
}
LARGE_INTEGER size = { 0 };
if (GetFileSizeEx(f, &size) == 0) {
size.QuadPart = -1;
}
CloseHandle(f);
return size.QuadPart;
#else
struct stat st = { 0 };
if (stat(path, &st) != 0) {
return -1;
}
return st.st_size;
#endif
}
int main(int argc, char *argv[])
{
if (argc != 2) {
fprintf(stderr, "usage: %s \n", argv[0]);
return 1;
}
printf("%s is %jd bytes\n", argv[1], filesize(argv[1]));
}
This yields the expected, correct, results on both platforms (constrained of course by INTMAX_MAX
, but that's the biggest integral type we can rely on). Note that both platforms define file sizes in terms of signed integers, so we reserve -1
to represent failure in either case.