Assembly X86 // 谭邵杰的计算机奇妙旅程

word/ dword/ qword

In x86 terminology/documentation, a “word” is 16 bits

x86 word = 2 bytes

x86 dword = 4 bytes (double word)

x86 qword = 8 bytes (quad word)

x86 double-quad or xmmword = 16 bytes, e.g. movdqa xmm0, [rdi].

常见X86汇编

https://en.wikipedia.org/wiki/X86_instruction_listings

https://www.felixcloutier.com/x86/

https://officedaytime.com/simd512e/

官方手册第一个4800页

SHR	    # Shift right (unsigned shift right)
SAL       # Shift Arithmetically left (signed shift left)
lea       # Load Effective Address, like mov but not change Flags, can store in any register, three opts
imul      # Signed multiply
movslq    # Move doubleword to quadword with sign-extension.
movl $0x46dd0bfe, 0x804a1dc #将数值0x46dd0bfe放入0x804a1dc的地址中
movl 0x46dd0bfe, 0x804a1dc #将0x46dd0bfe地址里的内容放入0x804a1dc地址中

lea    -0xc(%ebp),%eax
mov    %eax,0x8(%esp) #常见于scanf第三个参数，lea传结果写入地址

// x is %rdi, result is %rax 就是计算地址，没有寻址操作
lea    0x0(,%rdi,8),%rax //result = x * 8;
lea    0x4b(,%rdi),%rax //result = x + 0x4b;
Call 地址：返回地址入栈（等价于“Push %eip，mov 地址，%eip”；注意eip指向下一条尚未执行的指令）
ret：从栈中弹出地址，并跳到那个地址（pop %eip）
leave：使栈做好返回准备，等价于
mov %ebp，%esp
pop %ebp
cmpl   $0x5,$0x1
jle    8048bc5 # 会触发，后面的 1<=5
X86 load store
X86 不像 ARM有专门的ldr， str指令。是通过mov实现的
movswl (%rdi), %eax sign-extending load from word (w) to dword (l). Intel movsx eax, word [rdi]
https://docs.oracle.com/cd/E36784_01/html/E36859/gntbd.html
vxorpd   XORPD
Bitwise Logical XOR for Double-Precision Floating-Point Values
vxorps   XORPS
Bitwise Logical XOR for Single-Precision Floating-Point Values
vmovaps  MOVAPS
Move Aligned Packed Single-Precision Floating-Point Values
test & jump
test    al, al
jne     0x1000bffcc
The test instruction performs a logical and of the two operands and sets the CPU flags register according to the result (which is not stored anywhere). If al is zero, the anded result is zero and that sets the Z flag. If al is nonzero, it clears the Z flag. (Other flags, such as Carry, oVerflow, Sign, Parity, etc. are affected too, but this code has no instruction testing them.)
The jne instruction alters EIP if the Z flag is not set. There is another mnemonic for the same operation called jnz.
test   %eax,%eax
jg     <phase_4+0x35> # eax & eax > 0 jump
注意 cmp不等于 test
The TEST operation sets the flags CF and OF to zero.
The SF is set to the MSB(most significant bit) of the result of the AND.
If the result of the AND is 0, the ZF is set to 1, otherwise set to 0.
kinds of jump
AT&T syntax jmpq *0x402390(,%rax,8) into INTEL-syntax: jmp [RAX*8 + 0x402390].
ja VS jg
JUMP IF ABOVE AND JUMP IF GREATER
ja jumps if CF = 0 and ZF = 0 (unsigned Above: no carry and not equal)
jg jumps if SF = OF and ZF = 0 (signed Greater, excluding equal)
FLAGS
cmp performs a sub (but does not keep the result).
cmp eax, ebx
Let’s do the same by hand:
 reg     hex value   binary value  
 eax = 0xdeadc0de    ‭11011110101011011100000011011110‬
 ebx = 0x1337ca5e    ‭00010011001101111100101001011110‬
  -    ----------
 res   0xCB75F680    11001011011101011111011010000000 
The flags are set as follows:
OF (overflow) : did bit 31 change      -> no
SF (sign)     : is bit 31 set          -> yes
CF (carry)    : is abs(ebx) < abs(eax) -> no  
ZF (zero)     : is result zero         -> no
PF (parity)   : is parity of LSB even  -> no (archaic)
AF (Adjust)   : overflow in bits 0123  -> archaic, for BCD only.
Carry Flag
Carry Flag is a flag set when:
a) two unsigned numbers were added and the result is larger than “capacity” of register where it is saved.
Ex: we wanna add two 8 bit numbers and save result in 8 bit register. In your example: 255 + 9 = 264 which is more that 8 bit register can store. So the value “8” will be saved there (264 & 255 = 8) and CF flag will be set.
b) two unsigned numbers were subtracted and we subtracted the bigger one from the smaller one.
Ex: 1-2 will give you 255 in result and CF flag will be set.
Auxiliary Flag is used as CF but when working with BCD. So AF will be set when we have overflow or underflow on in BCD calculations. For example: considering 8 bit ALU unit, Auxiliary flag is set when there is carry from 3rd bit to 4th bit i.e. carry from lower nibble to higher nibble. (Wiki link)
Overflow Flag is used as CF but when we work on signed numbers.
Ex we wanna add two 8 bit signed numbers: 127 + 2. the result is 129 but it is too much for 8bit signed number, so OF will be set.
Similar when the result is too small like -128 - 1 = -129 which is out of scope for 8 bit signed numbers.
register signed & unsigned
Positive or negative
The CPU does not know (or care) whether a number is positive or negative. The only person who knows is you. If you test SF and OF, then you treat the number as signed. If you only test CF then you treat the number as unsigned.
In order to help you the processor keeps track of all flags at once. You decide which flags to test and by doing so, you decide how to interpret the numbers.
register multiply
The computer makes use of binary multiplication(AND), followed by bit shift (in the direction in which the multiplication proceeds), followed by binary addition(OR).
1100100
0110111
=======
0000000
-1100100
--1100100
---0000000
----1100100
-----1100100
------1100100
==============
1010101111100
100 = 1.1001 * 2^6
55  = 1.10111* 2^5
100 * 55 -> 1.1001 * 1.10111 * 2^(6+5)
for more:
How computer multiplies 2 numbers?
Binary multiplier - Wikipedia
Memory and Addressing Modes
声明静态代码区域
DB, DW, and DD can be used to declare one, two, and four byte data locations,
# 基本例子
.DATA		
var	DB 64  	; Declare a byte, referred to as location var, containing the value 64.
var2	DB ?	; Declare an uninitialized byte, referred to as location var2.
DB 10	; Declare a byte with no label, containing the value 10. Its location is var2 + 1.
X	DW ?	; Declare a 2-byte uninitialized value, referred to as location X.
Y	DD 30000    	; Declare a 4-byte value, referred to as location Y, initialized to 30000.
数组的声明，The DUP directive tells the assembler to duplicate an expression a given number of times. For example, 4 DUP(2) is equivalent to 2, 2, 2, 2.
Z	DD 1, 2, 3	; Declare three 4-byte values, initialized to 1, 2, and 3. The value of location Z + 8 will be 3.
bytes  	DB 10 DUP(?)	; Declare 10 uninitialized bytes starting at location bytes.
arr	DD 100 DUP(0)    	; Declare 100 4-byte words starting at location arr, all initialized to 0
str	DB 'hello',0	; Declare 6 bytes starting at the address str, initialized to the ASCII character values for hello and the null (0) byte.
32位X86机器寻址支持
最多支持32位寄存器和32位有符号常数相加
其中一个寄存器可以再乘上 2，4，8
# right
mov eax, [ebx]	; Move the 4 bytes in memory at the address contained in EBX into EAX
mov [var], ebx	; Move the contents of EBX into the 4 bytes at memory address var. (Note, var is a 32-bit constant).
mov eax, [esi-4]	; Move 4 bytes at memory address ESI + (-4) into EAX
mov [esi+eax], cl	; Move the contents of CL into the byte at address ESI+EAX
mov edx, [esi+4*ebx]    	; Move the 4 bytes of data at address ESI+4*EBX into EDX
# wrong and reason
mov eax, [ebx-ecx]	; Can only add register values
mov [eax+esi+edi], ebx    	; At most 2 registers in address computation
指定存储在地址的数据大小
mov BYTE PTR [ebx], 2	; Move 2 into the single byte at the address stored in EBX.
mov WORD PTR [ebx], 2	; Move the 16-bit integer representation of 2 into the 2 bytes starting at the address in EBX.
mov DWORD PTR [ebx], 2    	; Move the 32-bit integer representation of 2 into the 4 bytes starting at the address in EBX.
汇编寄存器顺序，作用方向
这和汇编器语法有关：
X86 instructions
For instructions with two operands, the first (lefthand) operand is the source operand, and the second (righthand) operand is the destination operand (that is, source->destination).
mov eax, ebx — copy the value in ebx into eax
add eax, 10 — EAX ← EAX + 10
AT&T syntax
AT&T Syntax is an assembly syntax used in UNIX environments, that originates from AT&T Bell Labs. It is descended from the MIPS assembly syntax. (AT&T, American Telephone & Telegraph)
AT&T Syntax is an assembly syntax used mostly in UNIX environments or by tools like gcc that originated in that environment.
语法特点：https://stackoverflow.com/tags/att/info
需要注意的：
Operands are in destination-last order
Register names are prefixed with %, and immediates are prefixed with $
sub $24, %rsp reserves 24 bytes on the stack.
Operand-size is indicated with a b/w/l/q suffix on the mnemonic
addb $1, byte_table(%rdi) increment a byte in a static table.
The mov suffix (b, w, l, or q) indicates how many bytes are being copied (1, 2, 4, or 8 respectively)
imul $13, 16(%rdi, %rcx, 4),  %eax 32-bit load from rdi + rcx<<2 + 16, multiply that by 13, put the result in %eax. Intel imul eax, [16 + rdi + rcx*4], 13.
movswl (%rdi), %eax sign-extending load from word (w) to dword (l). Intel movsx eax, word [rdi].
Intel syntax  (used in Intel/AMD manuals).
The Intel assembler(icc,icpc我猜) uses the opposite order (destination<-source) for operands.
语法特点： https://stackoverflow.com/tags/intel-syntax/info
RISC-V
beq rs1, rs2, Label #RISC-V
SW rs2, imm(rs1)  # Mem[rs1+imm]=rs2 ,汇编将访存放在最后
add rd, rs1, rs2  # rd = rs1 + rs2
但是这个语法不是很重要，因为decompiler有选项控制语法
objdump has -Mintel flag, gdb has set disassembly-flavor intel option.
gcc -masm=intel -S or objdump -drwC -Mintel.
需要进一步的研究学习
遇到的问题
开题缘由、总结、反思、吐槽~~
https://www.cs.virginia.edu/~evans/cs216/guides/x86.html