Why are some C programs written in one huge source file?

https://softwareengineering.stackexchange.com/questions/343357

07-01-2021
|

Question

For example, the SysInternals tool "FileMon" from the past has a kernel-mode driver whose source code is entirely in one 4,000-line file. The same for the first ever ping program ever written (~2,000 LOC).

Solution

Using multiple files always requires additional administrative overhead. One has to setup a build script and/or makefile with separated compiling and linking stages, make sure the dependencies between the different files are managed correctly, write a "zip" script for easier distribution of the source code by email or download, and so on. Modern IDEs today typically take a lot of that burden, but I am pretty sure at the time when the first ping program was written, no such IDE was available. And for files that small as ~4000 LOC, without such an IDE which manages multiple files for you well, the trade off between the mentioned overhead and the benefits from using multiple files might let people make a decision for the single file approach.

OTHER TIPS

Because C isn't good at modularization. It gets messy (header files and #includes, extern functions, link-time errors, etc) and the more modules you bring in, the trickier it gets.

More modern languages have better modularization capabilities in part because they learned from C's mistakes, and they make it easier to break down your codebase into smaller, simpler units. But with C, it can be beneficial to avoid or minimize all that trouble, even if it means lumping what would otherwise be considered too much code into a single file.

Aside from the historical reasons, there is one reason to use this in modern performance-sensitive software. When all of the code is in one compilation unit, the compiler is able to perform whole-program optimizations. With separate compilation units, the compiler cannot optimize the entire program in certain ways (e.g. inlining certain code).

The linker can certainly perform some optimizations in addition to what the compiler can do, but not all. For example: modern linkers are really good at eliding unreferenced functions, even across multiple object files. They may be able to perform some other optimizations, but nothing like what a compiler can do inside a function.

One well-known example of a single-source code module is SQLite. You can read more about it on The SQLite Amalgamation page.

1. Executive Summary

Over 100 separate source files are concatenated into a single large files of C-code named "sqlite3.c" and called "the amalgamation". The amalgamation contains everything an application needs to embed SQLite. The amalgamation file is more than 180,000 lines long and over 6 megabytes in size.

Combining all the code for SQLite into one big file makes SQLite easier to deploy — there is just one file to keep track of. And because all code is in a single translation unit, compilers can do better inter-procedure optimization resulting in machine code that is between 5% and 10% faster.

In addition to the simplicity factor the other respondent mentioned, many C programs are written by one individual.

When you have a team of individuals, it becomes desirable to split the application across several source files to avoid gratuitous conflicts in code changes. Especially when there are both advanced and very junior programmers working on the project.

When one person is working by himself, that isn't an issue.

Personally, I use multiple files based on function as a habitual thing. But that's just me.

Because C89 didn't have inline functions. Which meant that breaking up your file into functions caused the overhead of pushing values on stack and jumping around. This added quite a bit of an overhead over implementing the code in 1 large switch statement (event loop). But an event loop is always much more difficult to implement efficiently (or even correctly) than a more modularized solution. So for large-size projects, people would still opt out to modularize. But when they had the design thought-out in advance and could control the state in 1 switch statement, they opted for that.

Nowadays, even in C, one need not have to sacrifice performance to modularize because even in C functions can be inlined.

This counts as an example of evolution, which I am surprised has not been mentioned yet.

In the dark days of programming, compilation of a single FILE could take minutes. If a program was modularised, then inclusion of the necessary header files (no precompiled header options) would be a significant additional cause of slowdown. Additionally the compiler might choose/need to keep some information on disk itself, probably without the benefit of an automatic swap file.

The habits that these environmental factors led to carried over into ongoing development practices and have only slowly adapted over time.

At the time the gain from using a single file would be similar to that we get by the use of SSDs instead of HDDs.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange