Toolchain: Basics

GCC - GNU Compiler Collection

../../../../../../_images/rms.jpg

All-In-One Usage: Single File

  • “Monolithic” program

    #include <stdio.h>
    
    int main(void)
    {
        printf("Hello World\n");
        return 0;
    }
    
  • All-in-one: convert C to executable (seemingly directly)

    $ gcc hello-single.c
    
  • Produces an executable program

    $ ls -l a.out
    -rwxrwxr-x. 1 jfasch jfasch 24360 Mar 25 11:14 a.out
    $ ./a.out
    Hello World
    
  • Changing the output file’s name (a.out is not very expressive)

    $ gcc -o hello-single hello-single.c
    $ ls -l hello-single
    -rwxrwxr-x. 1 jfasch jfasch 24360 Mar 25 11:39 hello-single
    $ ./hello-single
    Hello World
    
  • A lot going on behind the scenes!

All-In-One Usage: Multiple Files

This Is Not As Simple As It Seems!

  • Linux executables are in Executable and Linkable Format

    • ⟶ Complicated but extensible and flexible

    • “Sections”

  • Linux kernel starts programs on behalf of a user

    • User (the shell, for example, when you type the program’s name on the commandline) calls system call execve()

    • Kernel: “Ah yes, someone wants me to run a program” 👍

    • Kernel starts the dynamic loader instead, telling it to start the program to be exec’d

  • Dynamic Loader (commonly called ld.so)

    • The program that interprets the program’s content and sets up the address space

      $ ls -l /lib64/ld-linux-x86-64.so.2
      lrwxrwxrwx. 1 root root 10 Jan 26 02:53 /lib64/ld-linux-x86-64.so.2 -> ld-2.33.so
      
    • Prior to starting the program, shared (dynamic) libraries must be found and loaded

    • And much much more ⟶ man -s 8 ld.so

../../../../../../_images/exec.svg

What’s In A Program? ⟶ Symbols (nm)

Symbols

  • Global variables

  • Functions

  • Contributed from a very wide variety of programs and precompiled object code

$ nm hello-single
...
0000000000401040 T _start
0000000000401000 T _init
00000000004011b8 T _fini


...

Overview: Where Do Which Symbols Come From (⟶ The Toolchain)

Symbol

Where from

Purpose

Source

  • 0000000000401040 T _start

  • 0000000000401000 T _init

  • 00000000004011b8 T _fini

$ ls -l /usr/lib64/crt*.o
-rw-r--r--. 1 root root 18384 Jan 26 02:55 /usr/lib64/crt1.o
-rw-r--r--. 1 root root  1800 Jan 26 02:55 /usr/lib64/crti.o
-rw-r--r--. 1 root root  1536 Jan 26 02:55 /usr/lib64/crtn.o

Startup code

Contributed by OS; remains the same over ages. Calls main() after process initialization, and tears down process after main() returns.

  • 0000000000404000 d _GLOBAL_OFFSET_TABLE_

Generated

Used for relocation of shared libraries into the process address space.

Linker. Part of toolchain; binds objects and libraries together. Resolves open references (e.g. puts below).

$ ls -l /usr/bin/ld
-rwxr-xr-x. 1 root root 1762320 Sep 16  2021 /usr/bin/ld
  • 0000000000401126 T main

Generated binary code (“Text”, hence the T)

main() function from our code, generated by the compiler

Compiler

$ ls -l /usr/bin/gcc
-rwxr-xr-x. 3 root root 1224008 Jan 27 12:29 /usr/bin/gcc
  • U puts@GLIBC_2.2.5

Unresolved symbol, generated by linker who has found it in C library. Obviously printf() expands to that. Dynamically taken from the shared C library

Reference for the loader, who will find it in libc when it loads the program

Will be resolved by the loader when it finds and loads libc

$ ls -l /lib64/libc.so.6
lrwxrwxrwx. 1 root root 12 Jan 26 02:53 /lib64/libc.so.6 -> libc-2.33.so

Recap: Toolchain

The seemingly simple command “Build be an executable hello-single from hello-single.c” does a number of separate things under the hood:

  • Compile hello-single.c to an object file (an ELF format file containing only the machine code of function hello())

    • One does not see that file as it is only kept temporarily. Such object files usually carry the .o extension.

    • Multiple intermediate steps are hidden under the name compilation as well, producing more temporary files:

      • Running the C preprocessor on the source file

      • Compiling what the preprocessor left, into machine specific (but human readable) assembly code

      • Running the assembler to turn that into machine code, finally.

  • Link hello-single.o (lets give the temporary file a name) together with the following artifacts, and finally produce the executable hello-single:

    • The OS specific startup code

    • The shared C runtime - the “C library” (that library defines a large number of functions like printf(), malloc(), free(), …)