I’d like to make an introduction to splitting a project into files in the C language. We also need this in order to understand what a module or a compilation unit is.

A good understanding of the concept of a “compilation unit” will make it so that later you’ll be better oriented in visibility scopes, e.g., of variables. Additionally, you’ll better understand what compilation is and what linking is.

Why split into files?

When you write your program, 20000% it will start taking up a lot of lines. Navigating a program with 1,000 lines starts to become cumbersome:

Scrolling is very inconvenient
It’s hard to find things we’re interested in if we didn’t strictly follow some order
…and with order as well

Think that, for example, Windows has around a million lines. I heard that a few years ago, so the newest ones probably already have a few million.

How to live?!

Finally, you need to start splitting the program into smaller modules. These modules will be sets of files that we later include in the main program, or we connect them with each other.

We may already know such modules from the standard library! All external libraries like, for example, string.h are exactly such separate sets of files. We call them libraries.

When writing your own programs for microcontrollers, you’ll also be writing such libraries for various purposes. Handling a sensor, handling a menu, some graphics. These will be individual modules.

Files in the C language

What files do we have in C? The most important ones and the ones that interest us are files with the extensions *.c and *.h.

H files

This is a header. This is where there will be only prototypes and declarations.

Such a file has no executable code. Or in other words… we don’t want it to have any 🙂

In it we will write the so-called interface. This will be only what we can use when using this file, or more precisely, the library.

Speaking plainly… Here you’ll find things such as declarations of:

types
enums
structures
functions, i.e., only prototypes
global variables (if we want them ;))

There will be no executable code here, i.e., function bodies.

A set of such declarations we will include into the second type of file (with the extension *.c). For example, into main.c.

We need to use the preprocessor for this purpose.

#include “file.h”

#include <file.h>

With angle brackets <file.h> you indicate a file from the compiler’s default path.

With quotation marks you specify a file from the project path.

C files

Files with the extension *.c are source files. In short, sources.

This is the place for the actual code. Here we write function bodies.

Into this file we also include other headers like, for example, the main header main.h or stdio.h.

This file must know the definitions of the types it works on, as well as other headers that will be required to compile this file.

Apart from functions, we can place global variables here. They will be able to be visible globally, i.e., throughout the entire project. Only… this will be rare and truly a last resort . In good practice we rather avoid global variables. I’ll talk more about that someday.

If not global, then what? Globals with a limited scope to this specific file. This is a much better and more common practice.

I guess I’ll have to talk about those globals earlier. Let me know if I should do that in the next emails.

Minimum for the compiler

This may seem surprising, but you need to know that the compiler to compile a file needs only definitions.

It doesn’t need to have a specific variable allocated in memory.

It also doesn’t need a function body! It’s enough for it to have just the prototype.

You might ask how is that possible?! Keyword: at the compilation stage.

This is quite funny, but compilation as a whole process from A to Z is divided into several stages. The most important ones are:

Preprocessor
Compilation
Linking

Yes! Compilation is part of compilation! Great, right? 🙂

And it is exactly for step no. 2 that only knowledge of prototypes is needed. Why?

Because this stage compiles ONLY one C file at a time. From one file it creates one so-called object file. Such modules are created. This is a single compilation unit.

In it (in that object file) there is (speaking in a huge simplification) information that we want to use variable X in some place or function Y with arguments Z and W in such a place in the program.

Compilation creates therefore (again in a huge simplification) a map of relationships of what, when, how, and with what operations to do. Bare assembly without ties to addresses in memory.

It doesn’t have to know the function, but it knows that it has to be called. Now who ties this information together? The LINKER at the linking stage.

The linker at input collects all object files after compilation and creates a whole from them. It resolves these relationships of variables and functions between modules and assigns them specific locations in memory, e.g., of the microcontroller.

It sees, for example, that Module1 wanted to call a function located in Module2. So it links the call with the function located in ANOTHER compilation unit.

This will be important to understand the essence of how it works. Because what does such a prototype tell the compiler?

“I have such a function and I want to use it. I don’t have the body – the linker will resolve it.”

The compiler will set up the instructions, arguments on the stack, the return from the function, and that’s it. Now the linker has to connect these operations with the appropriate function in another compilation unit.

And here we arrive at the fact that both the compiler and the linker will report different problems.

The compiler will most often tell you that it doesn’t know the name of a variable or a function when they are used. This means that it has never seen the function prototype or the variable definition before. You need to inform it only about the existence (name and types). The function body or memory reservation for a variable may be in another compilation unit. That doesn’t interest it.

What does the linker most often report? That it can’t resolve a reference. Such a situation will occur if in one compilation unit we inform with a prototype about the existence of a function whose… body won’t exist anywhere else (in other compilation units). The linker didn’t resolve the reference and returned an error.

These are two completely different messages and you have to approach them differently.

Summary

I got a bit carried away, and I still feel I haven’t said everything… Let me know if you’d like something in a longer and more expanded form.

We didn’t do, for example, a practice session—i.e., how it should actually look, e.g., in an online compiler. In text form it may be hard to carry out. Maybe you’d prefer learning in video form? In that case, I have a certain proposal 🙂

Do you want to learn the C language with microcontrollers in mind?

I created a course dedicated to microcontrollers (the course is conducted in Polish). I teach C from the basics there. Everything I discussed in this post (and much, much more) is included in the course curriculum.

I’ve gathered my experience from several years of embedded programming and I want to pass on the best knowledge possible. I participated in various projects: on my own, a start-up, a medium-sized company, and a huge corporation.

Apart from the basics and syntax, I share a ton of good practices. I weave this in between explaining subsequent aspects of the C language.

An additional advantage is also that I show how to manage a project well. I’ll show you how to deal with building abstraction layers. We’ll use structures, pointers, and callbacks. And of course splitting into files. It helps a lot.

Such separated layers are much easier to перенос between projects, and even between different families of microcontrollers.

Join the waiting list for the course and start learning together with the materials I’ve prepared. After signing up, you will receive weekly emails about the C language (the newsletter is conducted in Polish): https://cdlamikrokontrolerow.pl