|
||||||||||
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|||
|
![]() |
![]() |
Gentle Introduction to x86-64 AssemblyIntroductionThis document is meant to summarise differences between x86-64 and i386 assembly assuming that you already know well the i386 gas syntax. I will try to keep this document up to date until official documentation is available. Register set extensionsX86-64 defines new integer registers r8-r15. These registers are encoded using special REX prefix and so using them in non-64bit instruction implies instruction length growth by 1 byte. They are named as follow: rXb for 8 bit register (containing lowest byte of the 64bit value) rXw for 16 bits rXd for 32 bits rX for 64 bitsWhere X stands for integer in the range of 8 to 16. Original integer registers keeps their irregular names and the 64bit versions are called rax, rdx, rcx, rbx, rsi, rdi, rsp and rbp. Registers can be used in all places as the original ones except for the implicit register usage, such as shift counters, source and destination for string operations etc. Extended 8bit instructionsInstructions with REX prefix changes behaviour of 8bit registers. The upper halves (ah, dh, ch, bh) are replaced by lower halves of next 4 registers (sil, dil, spl, bpl). Then the rules described above are applied, so you may access each register as 8bit one. At downside, some instructions require REX prefix, so you can't use upper halves together with addresses requiring REX prefix: addb %ah, (%r1) # Invalid instruction. 64bit instructionsBy default most operations remains 32bit and the 64bit counterparts are invoked by fourth bit in the REX prefix. This means that each 32bit instruction has it's natural 64bit extension and that extended registers are for free in 64bit instruction.To write 64bit instructions, use 'q' suffix: movl $1, %eax # 32bit instruction movq $1, %rax # 64bit instruction Exception from this rule are instructions manipulating with the stack (push, pop, call, ret, enter and leave) that are implicitly 64bit and their 32bit counterparts are not available (only 16bit one). So: pushl %eax # Illegal instruction pushq %rax # 1 byte instruction encoded as pushl %eax in 32 bits pushq %r1 # 2 byte instruction encoded as pushl proceeded by REX. Implicit zero extendResults of 32bit operations are implicitly zero extended to 64bit values. This differs from 16 and 8 bit operations, that don't affect the upper part of registers. This can be used for code size optimisations in some cases, such as: movl $1, %eax # one byte shorter movq $1, %rax xorq %rax, %rax # three byte equivalent of mov $0,%rax andl $5, %eax # equivalent for andq $5, %eax ImmediatesImmediates in the instructions remains 32bits and their value is sign extended to 64bits before calculation. This means that: addq $1, %rax # Valid instruction addq $0x7fffffff, %rax # As this addq $0xffffffffffffffff, %rax # as this one addq $0xffffffff, %rax # Invalid instruction addl $0xffffffff, %eax # Valid instruction Only exception from this rule are the moves of constant to registers that have 64bit form. This means: movl 1, %eax # 5 byte instruction movq 1, %rax # 7 byte instruction movq 0xffffffffffffffff, %rax # 7 byte instruction movq 0x1122334455667788, %rax # 10 byte instruction movq 0xffffffff, %rax # 10 byte instruction movl 0xffffffff, %eax # 5 byte instruction equivalent to above You may write symbolic expressions as operands to both 64bit and 32bit operations. For 32bit operations they result in zero extending relocations, while in 64bit operations they result in sign extending ones. movl $symb, %eax # 5 byte instruction movq $symb, %rax # 7 byte instruction So in case you know that the symbol is in the first 32bits, you should use 32bit instructions whenever possible. To load a symbol as 64bit value, you need to use movabs instruction, that is a synonym for mov only changes the default behaviour: movandq %symb, %rax # 11 byte instruction DisplacementsSimilarly as immediates, the displacements are also sign extended and pretty much the same rules apply to them. X86-64 defines a special form of move instruction having 64bit displacement and similarly, as for immediates, it is implicitly used when the value is known to not fit at compilation time and you need to use movabs to force 64bit relocation: movl 0x1, %eax # load with 32bit sign extended relocation movl 0xffffffff, %eax # load with 64bit relocation movl symb, %eax # load with 32bit sign extended relocation movabsl symb, %eax # load with 64bit sign extended relocationLoads and stores with 64bit displacement are available only for the eax instruction. RIP relative addressingX86-64 defines a new instruction pointer relative addressing mode to simplify writing of position independent code. The original displacement-only addressing of are overwritten by this one and displacement only is now encoded by one of the redundant SIB form. This means that RIP relative addressing is actually cheaper than displacement only. To encode this addressing, just write rip as yet another register: movl $0x1, 0x10(%rip)will store 10 bytes after the end of instruction a 1. Symbolic relocation will be implicitly RIP relative, so movl $0x1, symb(%rip)Will write 0x1 to the address of symbol "symb". FIXME: This looks particularly confusing in the Intel syntax [symb+rip] suggest different location than [symb]. Suggestions for better syntax with symbols? You are recommended to use RIP relative addressing whenever possible to reduce code size. The RIP relative branch instructions are still encoded equally to 32bit mode. This means that they are implicitly RIP relative and "*" is used to switch to absolute form. R13 addressing limitationsThe R13 is upper-half equivalent of RBP, that is used in MODRM encoding to escape out into SIB. The R13 also does the encoding (to prevent REX prefix from changing instruction length), so pretty much same limitations to RBP addressing apply to the R13. This means that(%rbp,index,scale)is not encodable and: 0(%rbp,index,scale)must be used. |
![]() |
Administered by ![]() |
Powered by
![]() |
![]() |
Legal Information |