x86 lea instruction

03 May 2025
x86

LEA is widely known for being used in pointer arithmetic that computes addresses without memory access. It accepts a base register, scaled index & displacement components through the addressing mode $[b a s e + i n d e x * s c a l e + d i s p]$ .

The scale factor constrains to powers of two which maps to data type sizes in memory layouts. For pointer arithmetic in arrays of 32-bit integers:

lea rax, [rbx + rcx*4]  ; offset by rcx elements

The displacement field spans -128 to +127 for 8-bit form or full 32-bit range. Combined with scaled indexing and base addressing, we can perform multi-component arithmetic in a single cycle ^[1].

lea rdi, [rsi + rax*8 + 64]  ; structure member access

Multiplication operations emerge through repeated addressing. LEA transforms multiplication by 3, 5, or 9 into scaled addition ops.

lea eax, [edx + edx*2]  ; multiply by 3
lea ecx, [ebx + ebx*4]  ; multiply by 5
lea esi, [edi + edi*8]  ; multiply by 9

LEA executes on address generation units (AGUs) ^[2] instead of arithmetic ALUs.

Port separation is a very basic idea in parallel instruction dispatch, so it yields improved ILP when mixing address computation with arithmetic ops in code paths.

lea r8, [r9 + r10*4]    ; address generation unit
add r11, r12            ; arithmetic logic unit

Ideally, structure access patterns can exploit offsets. AoS traversal benefits from base + displacement addressing and maintain scaled index intact.

lea rdx, [rsi + rax*32 + 8]   ; member at offset 8
lea rcx, [rsi + rax*32 + 16]  ; member at offset 16

It also manages negative displacements for backward traversal or relative addressing patterns which is common when setting up a stack frame in function prologues.

lea rbp, [rsp - 128]   ; stack frame allocation

LEA executes stack frame allocation without modifying RSP which creates local space and preserves the current stack pointer. Prologue patterns may use the following:

push rbp
mov rbp, rsp
lea rsp, [rsp - 128]   ; allocate 128 bytes

A direct sub rsp, 128 achieves the same result with one less instruction. LEA gains relevance when combining allocation with base register initialization.

push rbp
lea rbp, [rsp - 128]   ; rbp points to frame bottom
mov rsp, rbp           ; rsp updated

We use this in frame-relative addressing immediately after allocation. Local variables access through [rbp + offset] instead of [rsp + offset] which changes based on stack ops.

Teardown operations rarely use LEA. Standard epilogue restores RSP through MOV or ADD.

mov rsp, rbp          ; MOV preferred
pop rbp
ret

The ADD alternative:

add rsp, 128          ; ADD to update stack ptr
pop rbp
ret

LEA appears in teardown only when combining pointer arithmetic with frame recovery.

lea rsp, [rbp + 16]   ; skip locals, preserve callee-saved
pop rbp
ret

When the frame contains variable-sized allocations or when selective stack cleanup precedes return, the pattern above is useful. The displacement field is used for extra stack adjustment that frame restoration may not cover.

LEA operates on both 32-bit and 64-bit operands. In 64-bit mode, address size prefix modifies computation width. Without prefix, full 64-bit calculation is used. With prefix, 32-bit computation zero-extends to 64 bits.

lea rax, [rbx + rcx]        ; 64-bit operation
lea eax, [ebx + ecx]        ; 32-bit operation, zero-extended

In code generation, pointer chains collapse into LEA sequences. Doubly-indirect access with scaling requires two instructions where alternative approaches emit more.

mov rax, [rdi]
lea rbx, [rax + rsi*8]

LEA avoids CPU status flag modification.

It is actually not necessarily a single cycle, at least in some processors. Cf. Pentium 4 vs. 3.

"Yes, LEA is a nice instruction but even on P4 not as fast as PIII. On amd64- and athlon cpus lea has a latency of two cycles. LEA is still great for expressions like you mentioned, but using more than two dependent leas in a row becomes ineffective for amd64 with a three-cycle mul/imul." - Gerd Isenberg, Narkive forum

↩︎
Address generation unit (AGU) - Wikipedia ↩︎

Next →
Instrumentation and debugging