Skip to main content
Abdi Moalim

Linking object files

A linker is a program that takes one or more object files generated by a compiler and combines them into a single executable. It fills in the missing addresses, resolves symbolic references between files and creates the final memory layout of the program.

The linker performs three main tasks:

Symbol resolution - it determines the memory location of each symbol (function, variable, etc) and resolves all references to these symbols throughout the code.

Relocation - since each object file is compiled independently with addresses starting at zero, it adjusts these addresses to reflect their final positions in the executable.

Code and data merging - object files contain separate sections (code, data, debugging info), which the linker merges into the final executable sections.

Symbols and symbol tables

Every object file contains a symbol table that records the following.

The linker matches undefined symbols with their definitions across all object files being linked.

Object file formats

Each format contains headers, sections for code and data, symbol tables and relocation information.

Sections and segments

Object files organize content into sections.

With the exception of lld, mainstream linkers combine sections from multiple object files and organize them into segments in the final executable.

Relocation tables

These tables in object files specify how to modify code and data when addresses are finalized. The linker uses relocation entries to patch up addresses throughout the executable.

GNU ld (Linux)

ld is invoked through the compiler frontend, but can also be used directly.

Basic linking.

ld -o output file1.o file2.o

Link with libraries.

ld -o program main.o -L/path/to/libs -lmath

Specify entry point.

ld -o program -e main file.o

Create shared library.

ld -shared -o libname.so file1.o file2.o

Set library search paths.

ld -o program file.o -L. -L/usr/local/lib -lmylib

Pass linker options through gcc.

gcc -Wl,-Map=output.map,-rpath=/custom/path main.o -o program

link.exe (Windows)

Microsoft's linker works with PE format files.

Basic linking.

link /OUT:program.exe main.obj utils.obj

Link with libraries.

link /OUT:program.exe main.obj kernel32.lib user32.lib

Create DLL.

link /DLL /OUT:mydll.dll file1.obj file2.obj

Generate map file.

link /MAP:program.map main.obj

Set base address.

link /BASE:0x400000 /OUT:program.exe main.obj

Incremental linking.

link /INCREMENTAL /OUT:program.exe main.obj

lld (LLVM linker)

Basic linking.

lld -flavor gnu -o output file1.o file2.o

Windows PE linking.

lld-link /OUT:program.exe main.obj

Create executable with custom layout.

lld -o program -T custom.lds file.o

Cross-platform linking.

lld -flavor darwin -o macos-binary file.o -lSystem

Linker scripts define memory organization in address space, section placement & symbol definitions.

SECTIONS defines where each section from your object files will be placed in the final executable.

SECTIONS
{
  . = 0x100000;         /* set location counter to this address */
  .text : { *(.text) }  /* collect all .text sections here */
  .data : { *(.data) }  /* collect all .data sections here */
  .bss : { *(.bss) }    /* collect all .bss sections here */
}

What this essentially does is place sections sequentially according to definition.

The dot (.) is called the location counter. It represents the current output address. As sections are placed, the location counter automatically advances.

When you write . = 0x100000;, you're setting the starting address. After each section is placed, the location counter advances by the size of that section.

*(.text) means "take all .text sections from all input files." You can also be more specific, so:

Sections can have attributes (e.g. something like an output section).

.data : AT(0x10000) {
  *(.data)
} = 0xFF

This places the .data section at the current location counter but loads it from address 0x10000 in the file & fills any gaps with 0xFF bytes.

MEMORY defines the available memory regions.

MEMORY
{
  ROM (rx) : ORIGIN = 0x00000000, LENGTH = 0x20000
  RAM (rw) : ORIGIN = 0x20000000, LENGTH = 0x10000
}

We can then place sections in specific memory regions.

.text : { *(.text) } > ROM  /* place in ROM */
.data : { *(.data) } > RAM  /* place in RAM */

In embedded systems, you may want to place data in ROM but have it copied to the RAM at startup.

.data : {
  _data_start = .;  /* symbol */
  *(.data)
  _data_end = .;    /* symbol */
} > RAM AT > ROM    /* run from RAM, load from ROM */

_data_load = LOADADDR(.data);  /* where data is loaded from */

Linker scripts can define symbols that programs can reference. Perhaps this is the more noticeable use case.

SECTIONS {
  .text : { *(.text) }
  _end_of_text = .;  /* current address after .text */

  .data ALIGN(4) : { *(.data) }  /* align to 4-byte boundary */
}

Symbol aliasing

Create alternative names for symbols.

With objcopy.

objcopy --redefine-sym old_name=new_name input.o output.o

With linker script.

PROVIDE(new_name = old_name);

With ld command line.

ld --defsym=alias=original -o output input.o

Symbol visibility control

Create version script.

cat > version.script << EOF
{
  global:
    public_function;
    public_variable;
  local:
    *;
};
EOF

Apply version script.

ld --version-script=version.script -shared -o lib.so file.o

Weak symbols

Create weak symbols.

objcopy --weaken-symbol=malloc_impl input.o output.o

Use in linking to allow overrides.

ld -o program main.o library.a

Separate debug information

Strip debug info to separate file.

objcopy --only-keep-debug program program.debug
strip --strip-debug --strip-unneeded program
objcopy --add-gnu-debuglink=program.debug program

Use a separate debug file.

gdb program

GDB will pick up program.debug if it's found.

Debug symbol paths

Set debug search paths.

gdb --args program
(gdb) set debug-file-directory /path/to/debug/symbols

Build with debug info.

gcc -g -o program source.c

or

ld --build-id=sha1 -o program file.o

GC

Compile with function/data sections.

gcc -ffunction-sections -fdata-sections -c file.c

Link with GC.

ld --gc-sections -o program file.o

Verify what's removed.

ld --gc-sections --print-gc-sections -o program file.o

Identity-based linking

Generate unique build ID.

gcc -Wl,--build-id=sha1 -o program source.c

Check build ID.

readelf -n program | grep "Build ID"

Use build ID for debugging.

gdb program
readelf -n core | grep "Build ID"

Runtime path control

Set RPATH (runtime library path).

ld -rpath /custom/lib -o program file.o -lmylib

Alternatively, set RUNPATH.

ld --enable-new-dtags -rpath /custom/lib -o program file.o

Check runtime paths.

objdump -x program | grep -E "(RPATH|RUNPATH)"

Reusing linked object files cuts out on link time in development. The main premise is that the library you're working with is only statically linked in release builds but debug builds would only use source files and link dynamically with libraries.

Enable incremental linking.

ld --incremental -o program file1.o file2.o

Update with new object.

ld --incremental-update -o program file2.o

Force full relink.

ld --no-incremental -o program *.o

Target architectures

ARM cross-linking.

arm-linux-gnueabihf-ld -o arm-program file.o -L/arm/libs

Windows from Linux.

x86_64-w64-mingw32-ld -o program.exe file.o

Custom target specification.

ld -m elf_i386 -o 32bit-program file.o

Creating and using archives

Create static library archive.

ar cr libmath.a math1.o math2.o math3.o

Add index for faster linking.

ranlib libmath.a

Link with archive.

ld -o program main.o -lmath -L.

Extract specific members.

ar x libmath.a math2.o

List archive contents.

ar t libmath.a

Replace member in archive.

ar r libmath.a new_math1.o

Archive search order

Multiple library dependencies.

ld -o program main.o -lfoo -lbar -lbaz

Resolving circular dependencies.

ld -o program main.o --start-group -lfoo -lbar --end-group

Whole archive inclusion.

ld -o program --whole-archive -lfoo --no-whole-archive -lbar

Custom memory sections

Place sections at specific addresses.

ld --section-start=.text=0x100000 -o program file.o

Define custom sections.

cat > script.lds << EOF
SECTIONS
{
  .special_data 0x200000 : { *(.special_data) }
  .normal_text : { *(.text) }
}
EOF
ld -T script.lds -o program file.o

Alignment control

Align sections.

ld --section-alignment 4096 -o program file.o

Control fill values.

ld --fill=0x90 -o program file.o

Section ordering.

objcopy --reorder-sections .text=.text.hot -o opt.o input.o

Create map file with different detail levels.

ld -Map=simple.map -o program file.o
ld -Map=detailed.map -stats --verbose -o program file.o

Analyze map file content.

grep "Allocating" detailed.map
grep "Symbol table" detailed.map

Custom map file format

Selective map output.

ld -Map-detailed=file1.o:text,data -o program file1.o file2.o

Cross-reference table.

ld -cref -Map=xref.map -o program *.o

Unused section elimination tracking.

ld --print-gc-sections -o program file.o > gc-report.txt

Mass symbol renaming

Create rename map.

cat > rename.map << EOF
old_function_1 new_func1
old_function_2 new_func2
EOF

Apply renames to object file.

objcopy --redefine-syms=rename.map input.o output.o

Verify renames.

nm output.o | grep -E "(new_func1|new_func2)"

Prefix addition

Add prefix to all symbols.

objcopy --prefix-symbols=mylib_ input.o output.o

Add prefix to specific sections.

objcopy --prefix-alloc-sections=prefix_ input.o output.o

LTO

Prepare object files for LTO.

gcc -flto -c file1.c -o file1.o
gcc -flto -c file2.c -o file2.o

Link with LTO.

gcc -flto -o program file1.o file2.o

Use gold for LTO.

gcc -fuse-ld=gold -flto -o program *.o

Symbol interposition

Enable dynamic symbol preemption.

ld -Bsymbolic-functions -shared -o lib.so file.o

Disable symbol interposition.

ld -Bsymbolic -shared -o lib.so file.o

Selective binding.

ld -Bdynamic -lfoo -Bstatic -lbar -Bdynamic -lbaz

Emulation modes

List available emulations.

ld --help | grep "Supported emulations"

Use specific emulation.

ld -m elf_x86_64 -o program file.o
ld -m elf32ppc -o ppc-program file.o

Multiple architecture support.

ld -m elf_i386 -m elf_x86_64 -o multi-arch file.o

Target-specific options

Windows specific.

link /MACHINE:X64 /SUBSYSTEM:CONSOLE main.obj

macOS specific.

ld -macosx_version_min 10.9 -o program file.o -lSystem

ARM specific.

ld --be8 -o arm-be-program file.o