Skip to main content

Revisiting the C compilation pipeline

· 17 min read
Bruno Felix
Digital plumber, organizational archaeologist and occasional pixel pusher

Over the past few years my interests have taken a turn towards system programming and database internals.

In that context languages like C, Rust or Zig make a lot of sense. As such I am going to start a more focused effort to refresh my memory of C and its toolchain. In my day job I currently use Python, Java and occasionally typescript, so I have been a bit out of the system languages game for a while. Time to fix that gap!

The goal of the compilation pipeline is to transform a program's source code into an executable. The C compilation pipeline (with GCC) has the following steps.

The following diagram covers the different pipeline stages covered by this article:

Stage 1: Preprocessing

Preprocessing can be seen as a fancy way of saying "string substitution". The preprocessor directives control how the preprocessor behaves allowing for the addition or removal of text and the usage of specific compiler features and There are a few preprocessor directives 1

Include

The #include directive appends the contents of the specified file at the point where this directive appears. Let's look at a simple example:

Given a file lib.h with the following content.

// A function declaration
int square(int x);

Given a another file with the #include preprocessor directive lib.c:

#include "lib.h"

int square(int x){
return x * x;
}

By running gcc -E lib.c -o output.c2, and the result is:

# 1 "lib.h" 1

int square(int x);
# 2 "lib.c" 2

int square(int x) {
return x * x;
}

Define

The #define directive takes two forms:

  • Associates an identifier with a value, e.g. #define MY_VAR 42

  • Associates an identifier with a function-like expression: #define square(x) (x * x)

In the first case all the occurrences of MY_VAR in the code will be replaced with the value 42. On the second case occurrences of square and the respective parameters will be replaced with the expression, for example if the code contains square(2) the preprocessor will replace that with 2 * 2 (this is a macro - and it comes with some pitfalls3).

Let's start with a simple example program preprocessor.c:

#include <stdio.h>
#define MY_VAR 42
#define square(x) (x*x)

int main() {
printf("%d\n", MY_VAR);
printf("%d\n", square(4));
printf("%d\n", square(MY_VAR));
return MY_VAR;
}

If we run gcc -E preprocessor.c -o out.c, then we get (ignoring the stdio include):

//stdio include above

int main() {
printf("%d\n", 42);
printf("%d\n", (4*4));
printf("%d\n", (42*42));
return 0;
}

Conditionals

The #if, elif, #else and #endif introduce the capability to selectively compile portions of a source file. For example, given the file test-if-directive.c:

#include <stdio.h>
void my_logger(char* message) {
#if LOG_LEVEL > 1
printf("INFO: %s\n", message);
#else
printf("DEBUG: %s\n", message);
#endif
}

int main() {
my_logger("A log message");
return 0;
}

By running gcc -E -DLOG_LEVEL=0 preprocessor.c -o out.c, it yields (ignoring the stdio include):

void my_logger(char* message) {
printf("DEBUG: %s\n", message);

}

int main() {
my_logger("A log message");
return 0;
}

Note that using the -DLOG_LEVEL flag allows the user to set the value of the LOG_LEVEL identifier.

Conditionals: ifdef and ifndef

The #ifdef and ifndefdirectives are also quite common to see in the wild. These check if a value or macro are defined (#ifdef) or undefined (#ifndef). A common use case for this, is to ensure that files are not included repeatedly - this is known as include guards. For example consider an example with the following structure: lib.h includes logger.h and prog.c includes both logger.h and lib.h.

Given the following logger.h file:

// logger.h
struct log_msg {
int log_level;
char* message;
};

void writeMsg(char* message);

And running gcc logger.c lib.c prog.c we would get an error along these lines: logger2.h:1:8: error: redefinition of ‘struct log_msg’

The problem is that the log_msg struct is being defined multiple times, so in order to avoid this we can add include guards to logger.h:

//logger.h
#ifndef LOGGER
#define LOGGER

struct log_msg {
int log_level;
char* message;
};

void writeMsg(char* message);

#endif

In this case if the compiler encounters this include multiple times it will skip preprocessing the file after the first time. Running gcc logger.c lib.c prog.c again will succeed.

Pragma

The #pragma directive specifies compiler, architecture and operating system specific options. Popular examples of this directive are #pragma once which allows the the C preprocessor to include a header file once, ignoring the #include directive otherwise.

This is similar to the example in the previous section, and indeed, if your compiler supports it4 then logger.h of the previous example could have been rewritten as:

//logger.h
#pragma once

struct log_msg {
int log_level;
char* message;
};

void writeMsg(char* message);

Running gcc logger.c lib.c prog.c will succeed. However note that #pragma once while widely supported is not part of the C standard, so use it with caution as your mileage may vary.

Stage 2: Compilation

Runs after the preprocessor and turns the source code into Assembly code. Given a simple (but not very useful) C program my-prog.c:

int mult(int x, int y) {
return x * y;
}

int main() {
int result = mult(2, 3);
return 0;
}

You can compile source code with: gcc -S my-prog.c. This yields a my-prog.s file with the assembly output. Since this example was produced in an x86 machine this is x86-64 assembly.

	.file	"my-prog.c"
.text
.globl mult
.type mult, @function
mult:
.LFB0:
.cfi_startproc
endbr64
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl %edi, -4(%rbp)
movl %esi, -8(%rbp)
movl -4(%rbp), %eax
imull -8(%rbp), %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size mult, .-mult
.globl main
.type main, @function
main:
.LFB1:
.cfi_startproc
endbr64
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $16, %rsp
movl $3, %esi
movl $2, %edi
call mult
movl %eax, -4(%rbp)
movl $0, %eax
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE1:
.size main, .-main
.ident "GCC: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0"
.section .note.GNU-stack,"",@progbits
.section .note.gnu.property,"a"
.align 8
.long 1f - 0f
.long 4f - 1f
.long 5
0:
.string "GNU"
1:
.align 8
.long 0xc0000002
.long 3f - 2f
2:
.long 0x3
3:
.align 8
4:

Note that in the compilation stage you are able to target different architectures with the -march option (depending on what your machine supports).

Stage 3: Assembly

The assembly stage takes the assembly code (which is still human readable plaintext) produced in the previous step and outputs machine readable object code that can be executed. Following from the previous example, it is possible to generate the object code by running gcc -c my-prog.s and the output is a my-prog.o file with the generated object code.

It is possible to run the assembler directly with the C source code, and this will ensure the preprocessing and compilation phases execute.

As mentioned the object code is not human readable, so doing something like cat my-prog.o will not yield intelligible results. The object code (in Linux) is stored in a format called ELF5. ELF can be inspected with the objdump program. By running objdump -d my-prog.o things get interesting:

my-prog.o:     file format elf64-x86-64



Disassembly of section .text:

0000000000000000 <mult>:
0: f3 0f 1e fa endbr64
4: 55 push %rbp
5: 48 89 e5 mov %rsp,%rbp
8: 89 7d fc mov %edi,-0x4(%rbp)
b: 89 75 f8 mov %esi,-0x8(%rbp)
e: 8b 45 fc mov -0x4(%rbp),%eax
11: 0f af 45 f8 imul -0x8(%rbp),%eax
15: 5d pop %rbp
16: c3 retq

0000000000000017 <main>:
17: f3 0f 1e fa endbr64
1b: 55 push %rbp
1c: 48 89 e5 mov %rsp,%rbp
1f: 48 83 ec 10 sub $0x10,%rsp
23: be 03 00 00 00 mov $0x3,%esi
28: bf 02 00 00 00 mov $0x2,%edi
2d: e8 00 00 00 00 callq 32 <main+0x1b>
32: 89 45 fc mov %eax,-0x4(%rbp)
35: b8 00 00 00 00 mov $0x0,%eax
3a: c9 leaveq
3b: c3 retq

Stage 4: Linking

Most real world projects make use of several source files and in that case it is necessary to connect the different object files to create a final executable. That is precisely what the linking phase does.

Consider a sample program composed of the following source files: libmath.c, libmath.h and prog.c.

For projects with multiple files the typical alternative is to generate object files (*.o). In larger projects it may make sense to create separate, reusable libraries that may be used in several places without the headache of building into the project (arguably more efficient as the library may already be used elsewhere and thus be loaded).

For static libraries the code is placed in the executable, for shared libraries, the executable only places a reference in the executable - this means that shared libraries need to be compiled into position independent code.

//libmath.h
int mult(int x, int y);
#include "libmath.h"
// libmath.c
int mult(int x, int y) {
return x * y;
}
#include <stdio.h>
#include "libmath.h"
// prog.c
int main() {
int result = mult(2, 5);
printf("Result = %d\n", result);
return 0;
}

A minor note on the #include directives in prog.c: An attentive reader will note that there are different syntaxes at play. The syntax <stdio.h> looks up this library in the system path. while lib.h looks up this file in the local path.

Compiling and running this code

You can compile this code by running gcc -c -Wall libmath.c prog.c. You will note that you will have two object files. The final step in the process to have an actual executable is to "link" the various object files and resolve the references to the standard I/O (stdio) header (the implementation is typically found in libc.so). It is possible to do that by running gcc *.o -o prog. You now have a fully working executable. By running objdump -d prog it is possible to observe that essentially the code multis directly appended to the executable, and that there is infrastructure code to work with shared libraries (more about that in a minute).

Reusing code: Static libraries

Suppose you don't want to recompile the code in libmath.c or you want this code to be shared by several projects. One way to do this is to create a shared library. This is a single entity that can be used in the linking phase, thus saving the time and effort to recompile the code over again. This entails a few additional steps:

  • Generate the object code for the library (it is perfectly acceptable to have multiple object files): gcc -c libmath.c;
  • Create an archive with all the object files for this library: ar -rcs libmath.a libmath.o (note that the convention is that libraries start with the lib prefix);
  • Compile and link the program with this new library: gcc prog.c -L. -lmath -o prog (note the -l option which defines the library name without the lib prefix and the -L flag to inform of the library's location).

Inspecting the prog executable with objdump -d prog it is possible to see that the static library code is "appended" to the executable, so it is pretty straightforward as everything is known at compile time.

0000000000001149 <main>:
1149: f3 0f 1e fa endbr64
114d: 55 push %rbp
114e: 48 89 e5 mov %rsp,%rbp
1151: 48 83 ec 10 sub $0x10,%rsp
1155: be 05 00 00 00 mov $0x5,%esi
115a: bf 02 00 00 00 mov $0x2,%edi
115f: e8 20 00 00 00 callq 1184 <mult>
1164: 89 45 fc mov %eax,-0x4(%rbp)
1167: 8b 45 fc mov -0x4(%rbp),%eax
116a: 89 c6 mov %eax,%esi
116c: 48 8d 3d 91 0e 00 00 lea 0xe91(%rip),%rdi # 2004 <_IO_stdin_used+0x4>
1173: b8 00 00 00 00 mov $0x0,%eax
1178: e8 d3 fe ff ff callq 1050 <printf@plt>
117d: b8 00 00 00 00 mov $0x0,%eax
1182: c9 leaveq
1183: c3 retq

0000000000001184 <mult>:
1184: f3 0f 1e fa endbr64
1188: 55 push %rbp
1189: 48 89 e5 mov %rsp,%rbp
118c: 89 7d fc mov %edi,-0x4(%rbp)
118f: 89 75 f8 mov %esi,-0x8(%rbp)
1192: 8b 45 fc mov -0x4(%rbp),%eax
1195: 0f af 45 f8 imul -0x8(%rbp),%eax
1199: 5d pop %rbp
119a: c3 retq
119b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)

An important thing to note is that due to the nature of how the linker resolves references, the command line order matters. Object (.o) and archive (.a) files are scanned in the command line order. As the linker encounters a new file it tries to resolve the unresolved references and if by the end of the process there are pending unresolved references, the linker will throw an error. As a best practice the libraries should be placed last (as your program is the one that needs the symbols from the library6].

Reusing code: Shared libraries

Another possibility work with libmath.c is to compile it as a shared library (*.so), which means that any program using this library will effectively "point" to the single copy of it in memory. This results in memory savings, and is especially useful for common libraries like the standard C library in libc.so. This comes at a cost of slightly longer start up times as well as additional complexity to manage the dynamic loading of libraries.

One of the key things to keep in mind is that a dynamic library can be loaded from potentially anywhere in memory, so the compilation needs use a special flag -fPIC (PIC - Position Independent Code) to account for this. As per the name, with dynamic loading the actual memory positions of functions and variables in shared libraries is not known at compile time. In order to deal with this, there are additional constructs like the Global Offset Table (GOT) and the Procedure Linkage Table (PLT) at play (for now let's leave it like this as this warrants its own write up).

In order to compile and use a shared library using the example above, with prog.c and libmath.c:

  • Generate the position independent object code with: gcc -c -fPIC libmath.c;
  • Create a shared library from the object code: gcc -shared -o libmath.so libmath.o. By using nm -D libmath.so it is possible to list any dynamic symbols in this object file;
  • Compile the program using the shared library: gcc prog.c -L. -lmath -o prog.

Things are looking good, by using objdump -d prog, it is possible to confirm that the call to the mult function now goes through the PLT, indicating that it is a call to a shared library:

0000000000001169 <main>:
1169: f3 0f 1e fa endbr64
116d: 55 push %rbp
116e: 48 89 e5 mov %rsp,%rbp
1171: 48 83 ec 10 sub $0x10,%rsp
1175: be 05 00 00 00 mov $0x5,%esi
117a: bf 02 00 00 00 mov $0x2,%edi
117f: e8 ec fe ff ff callq 1070 <mult@plt>
1184: 89 45 fc mov %eax,-0x4(%rbp)
1187: 8b 45 fc mov -0x4(%rbp),%eax
118a: 89 c6 mov %eax,%esi
118c: 48 8d 3d 71 0e 00 00 lea 0xe71(%rip),%rdi # 2004 <_IO_stdin_used+0x4>
1193: b8 00 00 00 00 mov $0x0,%eax
1198: e8 c3 fe ff ff callq 1060 <printf@plt>
119d: b8 00 00 00 00 mov $0x0,%eax
11a2: c9 leaveq
11a3: c3 retq
11a4: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
11ab: 00 00 00
11ae: 66 90 xchg %ax,%ax

However, to run the prog executable will result in an error along these lines: ./prog: error while loading shared libraries: libmath.so: cannot open shared object file: No such file or directory. Why?

Remember that the dynamic library is loaded at runtime, and since our shared library is not on a standard location it becomes necessary to inform the loader about where to find it. We can confirm this by running ldd prog to get a list of the dynamic dependencies of this program. Note that libmath.so is not found:

	linux-vdso.so.1 (0x00007ffd0a93e000)
libmath.so => not found
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f194533b000)
/lib64/ld-linux-x86-64.so.2 (0x00007f1945550000)

It is possible to fix this problem by updating the LD_LIBRARY_PATH environment variable to include the current directory. That can be accomplished by running export LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH. Note the difference in the output of ldd prog for libmath.so:

	linux-vdso.so.1 (0x00007ffd0a93e000)
libmath.so =>. /libmath.so (0x00007f6525810000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f194533b000)
/lib64/ld-linux-x86-64.so.2 (0x00007f1945550000)

And running the prog executable again yields the exoected result. Do keep in mind that doing this will only affect the shell you're running, it is not permanent.

It is possible to compile the main program with the -rpath linker option to embed the location of the shared library in the executable: gcc prog.c -L. -lmath -Wl,-rpath=. -o prog. This means that anyone using this executable would need to have the library in the exact same location, potentially hurting flexibility and portability.

In nutshell this is how static and shared libraries compare:

FeatureStatic Libraries (.a)Shared Libraries (.so)
File SizeLarger executablesSmaller executables
Memory UsageEach process has own copySingle copy shared in memory
Load TimeFasterSlower due to runtime linking
UpdatesRequires recompilationCan be updated independently
DependenciesSelf-containedMust manage dependencies

Strong vs. weak symbols

A symbol is the "entity" that names functions and variables7. By default functions and initialized variables are considered strong symbols. This means that the linker will raise an error in case it encounters more than one instance of a said symbol. Weak symbols are uninitialized variables or symbols that are explicitly declared weak. Weak symbols can be overridden by a strong symbol. Note that in the presence of multiple weak symbols the linker will pick an arbitrary one.

This can be used to provide default implementations that can be replaced at link time. In gcc a weak symbol can be declared with the __attribute__((weak)) modifier or #pragma weak.

Coda

This article covers the basics of compiling a C program with GCC in a somewhat detailed way. Even if you don't usually work with C directly, having the ability to effectively understand how C programs are compiled is useful since many high-level languages are implemented in C and make use of libraries with C bindings (e.g. numpy in the Python world), so this will help you better understand and debug your toolchain.

I will follow up soon with additional content on debugging and optimization and a brief exploration of Clang8 (another C compiler) works and what are the major differences to GCC. Looking forward to dipping my toes in the LLVM9 world.


Footnotes

  1. Preprocessor directives

  2. The -E flag instructs GCC to run only the preprocessor (source)

  3. Macros have some common pitfalls that usually result from unexpected interactions of macro expansion, the evaluation order of parameters, side effects and the validation rules of C.

  4. Pragma once suppot

  5. Here we need to go a bit into the detail of the Executable and Linkable Format (ELF), which is used to store object code, executables, shared libraries and core dumps. The basic structure has a header followed by various sections for program code (.text), symbol table (.symtab), initialized global variables (.data), uninitialized global variables (.bss), string literals (.rodata). More info here.

  6. The top answer to this stackoverflow question provides some nice example code that illustrates the impact of the order in which libraries and object files are stated for linking purposes.

  7. University of Texas Austin CS429

  8. Clang: a C language family frontend for LLVM

  9. LLVM compiler infrastructure

Macro beats micro

· 5 min read
Bruno Felix
Digital plumber, organizational archaeologist and occasional pixel pusher

Science, engineering and to some extent management are permeated by the ideas of Cartesian reductionism. Treating phenomena as machines that can be decomposed into their constituent parts layed the foundation to our modern world. Once analyzed and understood, assembling the constituents back together would yield a complete understanding of the phenomena. This is both enticing and arguably successful, however this approach struggles when faced with emergent properties and behaviors of systems.

The assumption that knowing how each component of a system works (micro level) is sufficient to fully understand how a system works as a whole (macro level) is still the default approach in many fields. The paper "Quantifying causal emergence shows that macro can beat micro"1 provides an interesting model for reasoning about the optimal level of granularity to observe a system. It uses the notion of causal emergence, which is the "zoom level" at which the the causal relationships between the various parts of a system have a stronger correlation - choosing the right "zoom level" will yield the most information. This is interesting because it supplements the idea that focusing only on the study of micro states (which by the way may be more sensitive to noise and subject to degeneracy) does not give the best results in all cases.

If we narrow the scope to the world of technology, distributed systems and the surrounding organizations the observations in the paper make a lot of sense.

After having been through a few incidents I've gained an intuition that knowing how the individual technical pieces of a system work is not sufficient to understand how a system works at a macro scale and how it comes to exhibit certain behaviors. The most interesting incidents actually happen when components and people interact in weird and wonderful ways that take teams completely by surprise.

A system is a whole which cannot be divided into independent parts - Russel Ackoff

The excellent "How Complex Systems Fail"2 offers a brief, but very insightful framing to the nature of failure, its evaluation and proximate cause attribution. The core assertion is that a system comprises of people and the technical artifacts that are deployed to achieve a certain objective, and that safety is an emergent property of the system via the interactions of the social and technical elements at play. Failure can seldom be attributed to any single component - clearly odds with a reductionist view of systems (and why exercises like teh 5 Whys are limited).

Dr. Russel Ackoff has a wonderful description of the limitations of reductionism and makes a very compelling argument for systems-first thinking34:

If you have ever built a non-trivial system in an organization you probably know this already: the technical elements may be well defined, however they operate in a messy human organization, that is part of a larger, and even messier market/society. This results in all sorts of interesting, unforeseen and seemingly unreasonable "asks" from the system (AKA stressors in Residuality theory speak).

Technical systems are hyperliminal5, its components may exhibit hyperliminal coupling6, resulting in the inability of the system's designer to even realize the coupling exists if each component is analyzed separately. Key properties of a system are indeed properties of the whole, and these are lost if the system is taken apart.

The street finds its own uses for things - William Gibson

As we keep building more ambitious technical systems, sometimes in a blissful vacuum7, this serves as a reminder for the need to consider the environment where technical systems operate. Failing to do so will result and fragility and negative outcomes.


Footnotes

  1. Quantifying causal emergence shows that macro can beat micro

  2. How complex systems fail

  3. There are other longer videos of him that are quite interesting. For example here or here

  4. Scientific positivism was perhaps the most maximalist variation of Cartesian reductionism. Two world wars, quantum theory and and increasing pace of technological change discredited positivism - after all even the atom is not able to be fully measured in all its properties and quantum mechanics defy normal cause and effect expectations, can we really understand the world by understanding the behavior of its smallest components? Don Schon has some very interesting material on society's needs for stability and the predicament we find ourselves in.

  5. "Hyperliminality describes an ordered system inside a disordered system. The architect is forced to constantly move between these two worlds, with ordered software and disordered enterprise contexts which require entirely different tools and epistemologies to understand." - source

  6. "If two nodes in a network each have a relationship with a third node, then those two nodes are very likely to have a relationship. Therefore, if a stressor in the wider hyperliminal system interacts with two software components, then those two components can be considered coupled. Since architects are unaware of the stressor, this coupling is invisible to the system's designer until the stressor is realized" - source

  7. One could argue that the deployment of LLMs across many organizations fits this category. The business case and social benefits are still a bit shaky at the moment, but a few things are already clear: a) the technology was built on a foundation of unlicensed scrapping of artistic and copyrighted work; b) hallucinations are a logical consequence of the architecture and there is no solution for that currently; c) it can easily be exploited by malicious actors for criminal or disinformation purposes.

Innovation under the radar

· 4 min read
Bruno Felix
Digital plumber, organizational archaeologist and occasional pixel pusher

These days LLMs are capturing the lion's share of media, investor and corporate attention. Judging by the headlines and the gargantuan amounts of funding being mobilized, one would almost think that the tech sector completely pivoted to this technology.

Tailscale and Docker networking

· 5 min read
Bruno Felix
Digital plumber, organizational archaeologist and occasional pixel pusher

After a well deserved rest for the past few weeks I'm back to my normal routine. One of the ideas I've been toying around with (more on that at some point in the future) benefits from remotely accessing resources running on your local network. If you grew up in the 90s or early 00s you probably remember setting up a NAT in your router to forward certain ports so you could play your favorite game with your friends. Tailscale[^1] has been on my radar for a while so I decided to take it for a spin.

Unicode audio analyzer

· 4 min read
Bruno Felix
Digital plumber, organizational archaeologist and occasional pixel pusher

What does unicode, audio processing and admittedly bad early 2000s Internet memes have to do with one another?

In the previous post in the deep dive into unicode series we explored how combining characters like diacritics work. One interesting property of unicode is that it is possible to combine multiple combining characters together.

Notes on "Programming as theory building"

· 7 min read
Bruno Felix
Digital plumber, organizational archaeologist and occasional pixel pusher

Programming as Theory Building[^1] is an almost 40 year-old paper that remains relevant to this day. In it, the author (Peter Naur, Turing Award winner and the "N" in BNF[^2]) dives into the fundamental question of what is programming, and builds up from that to answer the question about what expectations can one have, if any, on the modification and adaptation of software systems.

A deep dive into unicode and string matching - II

· 8 min read
Bruno Felix
Digital plumber, organizational archaeologist and occasional pixel pusher

In the previous entry of this series I went through a lightning tour of what is Unicode and provided some details into the various encodings that are part of the standard (UTF-8/16/32). This serves as the baseline knowledge for further exploration of how Unicode strings work, and some of the interesting problems that arise in this space.

A deep dive into unicode and string matching -I

· 8 min read
Bruno Felix
Digital plumber, organizational archaeologist and occasional pixel pusher

"The ecology of the distributed high-tech workplace, home, or school is profoundly impacted by the relatively unstudied infrastructure that permeates all its functions" - Susan Leigh Star

Representing, processing, sending and receiving text (also known as strings in computer-speak) is one of the most common things computers do. Text representation and manipulation in a broad sense, has a quasi infrastructural[^1] quality to it, we all use it in one form or another and it generally works really well - so well in fact that we often don't pay attention to how it all works behind the scenes.

Observations on the practice of software architecture - III

· 9 min read
Bruno Felix
Digital plumber, organizational archaeologist and occasional pixel pusher

"It is the developer’s assumptions which get shipped to production" - Alberto Brandolini

In parts one and two of this series we explored why software architecture is needed and some of the common pitfalls and failure modes that afflict "architects". This article will go into some of the principles and practices that are part of my toolbox. Very little of this is new and I've tried to link to the original sources so be sure to check the references. If I have misrepresented anything, that's solely on me.

Observations on the practice of software architecture - II

· 5 min read
Bruno Felix
Digital plumber, organizational archaeologist and occasional pixel pusher

"Two weeks of coding can save you an hour of planning" — Unknown

In part one of this series[^1] I tried making the argument that some degree of software architecture is required, and explored how a naive reading of agile software development practices coupled with organizational incentives create a toxic mix where software design is not valued, leading to a lack of clarity and coherence, viability crushing technical risks being ignored, and creeping and crippling technical debt that slows down software delivery.