Linking object files
A linker is a program that takes one or more object files generated by a compiler and combines them into a single executable. It fills in the missing addresses, resolves symbolic references between files and creates the final memory layout of the program.
The linker performs three main tasks:
Symbol resolution - it determines the memory location of each symbol (function, variable, etc) and resolves all references to these symbols throughout the code.
Relocation - since each object file is compiled independently with addresses starting at zero, it adjusts these addresses to reflect their final positions in the executable.
Code and data merging - object files contain separate sections (code, data, debugging info), which the linker merges into the final executable sections.
Symbols and symbol tables
Every object file contains a symbol table that records the following.
- Defined symbols (functions and variables implemented in this file)
- Undefined symbols (external references to be resolved)
- Local symbols (only visible within this file)
The linker matches undefined symbols with their definitions across all object files being linked.
Object file formats
Each format contains headers, sections for code and data, symbol tables and relocation information.
- ELF (Executable and Linkable Format) on Linux/Unix
- PE (Portable Executable) on Windows
- Mach-O on macOS
Sections and segments
Object files organize content into sections.
.text- contains executable code.data- initialized global variables.bss- uninitialized global variables.rodata- read-only data.symtab- symbol table.strtab- string table
With the exception of lld, mainstream linkers combine sections from multiple object files and organize them into segments in the final executable.
Relocation tables
These tables in object files specify how to modify code and data when addresses are finalized. The linker uses relocation entries to patch up addresses throughout the executable.
GNU ld (Linux)
ld is invoked through the compiler frontend, but can also be used directly.
Basic linking.
ld -o output file1.o file2.o
Link with libraries.
ld -o program main.o -L/path/to/libs -lmath
Specify entry point.
ld -o program -e main file.o
Create shared library.
ld -shared -o libname.so file1.o file2.o
Set library search paths.
ld -o program file.o -L. -L/usr/local/lib -lmylib
Pass linker options through gcc.
gcc -Wl,-Map=output.map,-rpath=/custom/path main.o -o program
link.exe (Windows)
Microsoft's linker works with PE format files.
Basic linking.
link /OUT:program.exe main.obj utils.obj
Link with libraries.
link /OUT:program.exe main.obj kernel32.lib user32.lib
Create DLL.
link /DLL /OUT:mydll.dll file1.obj file2.obj
Generate map file.
link /MAP:program.map main.obj
Set base address.
link /BASE:0x400000 /OUT:program.exe main.obj
Incremental linking.
link /INCREMENTAL /OUT:program.exe main.obj
lld (LLVM linker)
Basic linking.
lld -flavor gnu -o output file1.o file2.o
Windows PE linking.
lld-link /OUT:program.exe main.obj
Create executable with custom layout.
lld -o program -T custom.lds file.o
Cross-platform linking.
lld -flavor darwin -o macos-binary file.o -lSystem
Linker scripts define memory organization in address space, section placement & symbol definitions.
SECTIONS defines where each section from your object files will be placed in the final executable.
SECTIONS
{
. = 0x100000; /* set location counter to this address */
.text : { *(.text) } /* collect all .text sections here */
.data : { *(.data) } /* collect all .data sections here */
.bss : { *(.bss) } /* collect all .bss sections here */
}
What this essentially does is place sections sequentially according to definition.
The dot (.) is called the location counter. It represents the current output address. As sections are placed, the location counter automatically advances.
When you write . = 0x100000;, you're setting the starting address. After each section is placed, the location counter advances by the size of that section.
*(.text) means "take all .text sections from all input files." You can also be more specific, so:
file.o(.text)- only the.textsection fromfile.o*(.text .rodata)- all.textand.rodatasections*(.text.*)- all sections that start with.text
Sections can have attributes (e.g. something like an output section).
.data : AT(0x10000) {
*(.data)
} = 0xFF
This places the .data section at the current location counter but loads it from address 0x10000 in the file & fills any gaps with 0xFF bytes.
MEMORY defines the available memory regions.
MEMORY
{
ROM (rx) : ORIGIN = 0x00000000, LENGTH = 0x20000
RAM (rw) : ORIGIN = 0x20000000, LENGTH = 0x10000
}
- ROM starting at address 0 with 128 kB size, read/execute permissions
- RAM starting at address 0x20000000 with 64 kB size, read/write permissions
We can then place sections in specific memory regions.
.text : { *(.text) } > ROM /* place in ROM */
.data : { *(.data) } > RAM /* place in RAM */
In embedded systems, you may want to place data in ROM but have it copied to the RAM at startup.
.data : {
_data_start = .; /* symbol */
*(.data)
_data_end = .; /* symbol */
} > RAM AT > ROM /* run from RAM, load from ROM */
_data_load = LOADADDR(.data); /* where data is loaded from */
Linker scripts can define symbols that programs can reference. Perhaps this is the more noticeable use case.
SECTIONS {
.text : { *(.text) }
_end_of_text = .; /* current address after .text */
.data ALIGN(4) : { *(.data) } /* align to 4-byte boundary */
}
Symbol aliasing
Create alternative names for symbols.
With objcopy.
objcopy --redefine-sym old_name=new_name input.o output.o
With linker script.
PROVIDE(new_name = old_name);
With ld command line.
ld --defsym=alias=original -o output input.o
Symbol visibility control
Create version script.
cat > version.script << EOF
{
global:
public_function;
public_variable;
local:
*;
};
EOF
Apply version script.
ld --version-script=version.script -shared -o lib.so file.o
Weak symbols
Create weak symbols.
objcopy --weaken-symbol=malloc_impl input.o output.o
Use in linking to allow overrides.
ld -o program main.o library.a
Separate debug information
Strip debug info to separate file.
objcopy --only-keep-debug program program.debug
strip --strip-debug --strip-unneeded program
objcopy --add-gnu-debuglink=program.debug program
Use a separate debug file.
gdb program
GDB will pick up program.debug if it's found.
Debug symbol paths
Set debug search paths.
gdb --args program
(gdb) set debug-file-directory /path/to/debug/symbols
Build with debug info.
gcc -g -o program source.c
or
ld --build-id=sha1 -o program file.o
GC
Compile with function/data sections.
gcc -ffunction-sections -fdata-sections -c file.c
Link with GC.
ld --gc-sections -o program file.o
Verify what's removed.
ld --gc-sections --print-gc-sections -o program file.o
Identity-based linking
Generate unique build ID.
gcc -Wl,--build-id=sha1 -o program source.c
Check build ID.
readelf -n program | grep "Build ID"
Use build ID for debugging.
gdb program
readelf -n core | grep "Build ID"
Runtime path control
Set RPATH (runtime library path).
ld -rpath /custom/lib -o program file.o -lmylib
Alternatively, set RUNPATH.
ld --enable-new-dtags -rpath /custom/lib -o program file.o
Check runtime paths.
objdump -x program | grep -E "(RPATH|RUNPATH)"
Reusing linked object files cuts out on link time in development. The main premise is that the library you're working with is only statically linked in release builds but debug builds would only use source files and link dynamically with libraries.
Enable incremental linking.
ld --incremental -o program file1.o file2.o
Update with new object.
ld --incremental-update -o program file2.o
Force full relink.
ld --no-incremental -o program *.o
Target architectures
ARM cross-linking.
arm-linux-gnueabihf-ld -o arm-program file.o -L/arm/libs
Windows from Linux.
x86_64-w64-mingw32-ld -o program.exe file.o
Custom target specification.
ld -m elf_i386 -o 32bit-program file.o
Creating and using archives
Create static library archive.
ar cr libmath.a math1.o math2.o math3.o
Add index for faster linking.
ranlib libmath.a
Link with archive.
ld -o program main.o -lmath -L.
Extract specific members.
ar x libmath.a math2.o
List archive contents.
ar t libmath.a
Replace member in archive.
ar r libmath.a new_math1.o
Archive search order
Multiple library dependencies.
ld -o program main.o -lfoo -lbar -lbaz
Resolving circular dependencies.
ld -o program main.o --start-group -lfoo -lbar --end-group
Whole archive inclusion.
ld -o program --whole-archive -lfoo --no-whole-archive -lbar
Custom memory sections
Place sections at specific addresses.
ld --section-start=.text=0x100000 -o program file.o
Define custom sections.
cat > script.lds << EOF
SECTIONS
{
.special_data 0x200000 : { *(.special_data) }
.normal_text : { *(.text) }
}
EOF
ld -T script.lds -o program file.o
Alignment control
Align sections.
ld --section-alignment 4096 -o program file.o
Control fill values.
ld --fill=0x90 -o program file.o
Section ordering.
objcopy --reorder-sections .text=.text.hot -o opt.o input.o
Create map file with different detail levels.
ld -Map=simple.map -o program file.o
ld -Map=detailed.map -stats --verbose -o program file.o
Analyze map file content.
grep "Allocating" detailed.map
grep "Symbol table" detailed.map
Custom map file format
Selective map output.
ld -Map-detailed=file1.o:text,data -o program file1.o file2.o
Cross-reference table.
ld -cref -Map=xref.map -o program *.o
Unused section elimination tracking.
ld --print-gc-sections -o program file.o > gc-report.txt
Mass symbol renaming
Create rename map.
cat > rename.map << EOF
old_function_1 new_func1
old_function_2 new_func2
EOF
Apply renames to object file.
objcopy --redefine-syms=rename.map input.o output.o
Verify renames.
nm output.o | grep -E "(new_func1|new_func2)"
Prefix addition
Add prefix to all symbols.
objcopy --prefix-symbols=mylib_ input.o output.o
Add prefix to specific sections.
objcopy --prefix-alloc-sections=prefix_ input.o output.o
LTO
Prepare object files for LTO.
gcc -flto -c file1.c -o file1.o
gcc -flto -c file2.c -o file2.o
Link with LTO.
gcc -flto -o program file1.o file2.o
Use gold for LTO.
gcc -fuse-ld=gold -flto -o program *.o
Symbol interposition
Enable dynamic symbol preemption.
ld -Bsymbolic-functions -shared -o lib.so file.o
Disable symbol interposition.
ld -Bsymbolic -shared -o lib.so file.o
Selective binding.
ld -Bdynamic -lfoo -Bstatic -lbar -Bdynamic -lbaz
Emulation modes
List available emulations.
ld --help | grep "Supported emulations"
Use specific emulation.
ld -m elf_x86_64 -o program file.o
ld -m elf32ppc -o ppc-program file.o
Multiple architecture support.
ld -m elf_i386 -m elf_x86_64 -o multi-arch file.o
Target-specific options
Windows specific.
link /MACHINE:X64 /SUBSYSTEM:CONSOLE main.obj
macOS specific.
ld -macosx_version_min 10.9 -o program file.o -lSystem
ARM specific.
ld --be8 -o arm-be-program file.o
- ← Previous
Instrumentation and debugging - Next →
Euler's totient function