通常,操作系统为了加载一个程序,会在编译后的代码的前面添加一个文件头,提供相应的定位信息,这样操作系统才能在加载EXE时将代码段、数据段加载到正确的内存位置。同时,有些编译器还会提供一些调试信息,如符号表等。如果是.o文件,通常称为relocatable file,这种文件没有经过链接,需要进行重定位,不可以执行。如果是EXE文件,称为executable file,经过连接器链接的可以直接执行,这时文件中的虚拟地址是最终的。操作系统可以设定加载的段基地址,也就是操作系统可以将整个EXE加载到任意位置,但是必须按照EXE中的信息将相应的段加载到合适的位置,相对距离不变,这样代码才能正确执行。提供文件头的EXE文件依赖于加载器的加载,如execve()系统调用,然而操作系统的初始阶段是没有加载器的,我们只能直接跳到某条指令开始执行,这时需要纯二进制文件(raw binary),代码的入口即为文件的第一条语句。有工具可以将EXE文件转换为纯二进制文件,即objcopy。
这里,我们通过研究64位可执行文件的格式,以及利用工具objdump将编译后的机器指
令反汇编为汇编指令,来了解一些EXE的信息。
二、求最大值的GNU汇编代码max.s
#开头的为注释,下同
#
数据段
.section .data
data_items:
.long 'H','E','L','L','O','_','W','O','R','L','D','!','!',0
#
使用
long
类型是为了看大端和小端
#代码
段
.section .text
#
将入口地址声明为全局可见,默认是局部可见
.globl _start
_start:
#GNU
汇编中左边是源操作数,右边是目标操作数,
与
intel
汇编正好相反
#
常数要加
$
,不加
$
的符号视为地址,寄存器前面要加
%
movl $0, %edi
movl data_items(,%edi,4), %eax
# (data_items+ 4*edi) → eax
#
将
data_items
的第一个数据放入寄存器
ebx
中,
ebx
保存最大值
movl %eax, %ebx
# eax → ebx
start_loop:
#
数据为
0
时结束,表示没有数据了
cmpl $0, %eax
je loop_exit
incl %edi
movl data_items(,%edi,4), %eax
# (data_items+ 4*edi) → eax
cmpl %ebx, %eax
jle start_loop
# eax <= ebx
movl %eax, %ebx
# eax > ebx
,赋给
eax → ebx
jmp start_loop
loop_exit:
movl $1, %eax
# 1
号系统调用,
exit(ebx)
,结束进程
int $0x80
三、编译和运行
环境:ubuntu 15.04
编译:gcc -c -o max.o max.s
链接:ld -o max max.o
运行./max
运行之后通过echo $?可以查看该命令的退出状态,该状态即为最大值,95。
gcc中有指示编译成32位的选项-m32,此时代码段和数据段的对齐就不会是0x200000,距离会变得比较短。对应ld要加-m elf_i386选项,指明为32位平台。
ld中有指示代码段的加载地址的选项-Ttext,如-Ttext 0,则加载地址为0
四、EXE文件的格式
4.1 查看max的ELF等定位信息
命令:readelf -a max
-a表示查看所有ELF信息
可以得到如下的输出信息:
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
#EXE
文件的魔数
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
#
是
EXE
文件
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x4000b0
#
程序入口地址,虚拟地址
Start of program headers: 64 (bytes into file)
#
文件中
program headers
的偏移
Start of section headers: 656 (bytes into file)
#
文件中
section headers
的偏移
Flags: 0x0
Size of this header: 64 (bytes)
#ELF header
的大小
Size of program headers: 56 (bytes)
#program headers
的大小
Number of program headers: 2
#program headers
的个数
Size of section headers: 64 (bytes)
#section headers
的大小
Number of section headers: 6
#section headers
的个数
Section header string table index: 3
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
#
代码段入口地址
0x4000b0
,文件偏移地址
0xb0
,大小为
0x2d
[ 1] .text PROGBITS 00000000004000b0 000000b0
000000000000002d 0000000000000000 AX 0 0 1
#
数据段入口地址
0x6000dd
,文件偏移地址
0xdd
,大小为
0x38
[ 2] .data PROGBITS 00000000006000dd 000000dd
0000000000000038 0000000000000000 WA 0 0 1
#
节名表入口地址
0x0
,文件偏移地址
0x115
,大小为
0x27
[ 3] .shstrtab STRTAB 0000000000000000 00000115
0000000000000027 0000000000000000 0 0 1
#
符号表入口地址
0x0
,文件偏移地址
0x140
,大小为
0x108
[ 4] .symtab SYMTAB 0000000000000000 00000140
0000000000000108 0000000000000018 5 7 8
#
字符串表入口地址
0x0
,文件偏移地址
0x248
,大小为
0x48
[ 5] .strtab STRTAB 0000000000000000 00000248
0000000000000048 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
There are no section groups in this file.
#program headers
提供段定位信息
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
#
代码段,读和可执行,虚拟地址
0x400000 →
物理地址
0x400000
,文件偏移
0
,
#
长度为
#
0xdd
,对齐为
0x200000
#
包含
ELF header
和代码段
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x00000000000000dd 0x00000000000000dd R E 200000
#
数据段,读和写,虚拟地址
0x6000dd →
物理地址
0x6000dd
,文件偏移
0xdd
,长度为
#
0x38
,对齐为
0x200000
LOAD 0x00000000000000dd 0x00000000006000dd 0x00000000006000dd
0x0000000000000038 0x0000000000000038 RW 200000
Section to Segment mapping:
Segment Sections...
00 .text
01 .data
There is no dynamic section in this file.
There are no relocations in this file.
The decoding of unwind sections for machine type Advanced Micro Devices X86-64 is not currently supported.
#
符号表:程序中的符号及其对应的地址
Symbol table '.symtab' contains 11 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 00000000004000b0 0 SECTION LOCAL DEFAULT 1
2: 00000000006000dd 0 SECTION LOCAL DEFAULT 2
3: 0000000000000000 0 FILE LOCAL DEFAULT ABS max.o
4: 00000000006000dd 0 NOTYPE LOCAL DEFAULT 2 data_items
5: 00000000004000bf 0 NOTYPE LOCAL DEFAULT 1 start_loop
6: 00000000004000d6 0 NOTYPE LOCAL DEFAULT 1 loop_exit
7: 00000000004000b0 0 NOTYPE GLOBAL DEFAULT 1 _start
8: 0000000000600115 0 NOTYPE GLOBAL DEFAULT 2 __bss_start
9: 0000000000600115 0 NOTYPE GLOBAL DEFAULT 2 _edata
10: 0000000000600118 0 NOTYPE GLOBAL DEFAULT 2 _end
No version information found in this file.
4.2 反汇编代码
命令:objdump -d max
-d表示反汇编
file format elf64-x86-64
Disassembly of section .text:
#
根据
program headers
提供的信息,最终代码段将加载到
0x4000b0
这个位置
00000000004000b0 <_start>:
4000b0: bf 00 00 00 00 mov $0x0,%edi
#data
_
items
被换成
0x6000dd
,即数据段的起始地址
4000b5: 67 8b 04 bd
dd 00 60
mov 0x6000dd(,%edi,4),%eax
4000bc: 00
4000bd: 89 c3 mov %eax,%ebx
#start_loop
和
loop_exit
都被换掉
00000000004000bf <start_loop>:
4000bf: 83 f8 00 cmp $0x0,%eax
4000c2: 74 12 je 4000d6 <loop_exit>
4000c4: ff c7 inc %edi
4000c6: 67 8b 04 bd
dd 00 60
mov 0x6000dd(,%edi,4),%eax
4000cd: 00
4000ce: 39 d8 cmp %ebx,%eax
4000d0: 7e ed jle 4000bf <start_loop>
4000d2: 89 c3 mov %eax,%ebx
4000d4: eb e9 jmp 4000bf <start_loop>
00000000004000d6 <loop_exit>:
4000d6: b8 01 00 00 00 mov $0x1,%eax
4000db: cd 80 int $0x80
4.3 max文件的二进制内容及对应关系
命令:xxd -g 1 max
查看整个文件,默认偏移为0
0000000: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 .ELF............
#ELF header
0000010: 02 00 3e 00 01 00 00 00 b0 00 40 00 00 00 00 00 ..>.......@.....
#
偏移:
0
0000020: 40 00 00 00 00 00 00 00 90 02 00 00 00 00 00 00 @...............
0000030: 00 00 00 00 40 00 38 00 02 00 40 00 06 00 03 00 ....@.8...@.....
#长度:64B
0000040: 01 00 00 00 05 00 00 00 00 00 00 00 00 00 00 00 ................
#program headers
0000050: 00 00 40 00 00 00 00 00 00 00 40 00 00 00 00 00 ..@.......@.....
#
偏移:
0x40
0000060: dd 00 00 00 00 00 00 00 dd 00 00 00 00 00 00 00 ................
#
长度
: 56B
x 2
0000070: 00 00 20 00 00 00 00 00
01 00 00 00 06 00 00 00 .. ..........…
0000080: dd 00 00 00 00 00 00 00 dd 00 60 00 00 00 00 00 ..........`.....
0000090: dd 00 60 00 00 00 00 00 38 00 00 00 00 00 00 00 ..`.....8.......
00000a0: 38 00 00 00 00 00 00 00 00 00 20 00 00 00 00 00 8......... .....
00000b0: bf 00 00 00 00 67 8b 04 bd dd 00 60 00 89 c3 83 .....g.....`....
#
代码段
00000c0: f8 00 74 12 ff c7 67 8b 04 bd dd 00 60 00 39 d8 ..t...g.....`.9.
#
偏移:
0xb0
00000d0: 7e ed 89 c3 eb e9 b8 01 00 00 00 cd 80
48 00 00 ~............H..
#
长度
:
0x2d
字节
00000e0: 00 45 00 00 00 4c 00 00 00 4c 00 00 00 4f 00 00 .E...L...L...O..
#
数据段
00000f0: 00 5f 00 00 00 57 00 00 00 4f 00 00 00 52 00 00 ._...W...O...R..
#
偏移
: 0xdd
0000100: 00 4c 00 00 00 44 00 00 00 21 00 00 00 21 00 00 .L...D...!...!..
#
长度
: 0x38
字节
0000110: 00 00 00 00 00
00 2e 73 79 6d 74 61 62 00 2e 73 .......symtab..s
#
节名表
shstr
tab
0000120: 74 72 74 61 62 00 2e 73 68 73 74 72 74 61 62 00 trtab..shstrtab.
#
偏移
: 0x115
0000130: 2e 74 65 78 74 00 2e 64 61 74 61 00 00 00 00 00 .text..data.....
#
长度
: 0x27
0000140: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
#
符号表
.symtab
0000150: 00 00 00 00 00 00 00 00
00 00 00 00 03 00 01 00 ................
#
有
11
条目
x 24
字节
0000160:
b0 00 40
00 00 00 00 00 00 00 00 00 00 00 00 00 ..@.............
#
对应下面符号的地址
0000170: 00 00 00 00 03 00 02 00
dd 00 60
00 00 00 00 00 ..........`.....
#
偏移:
0x140
0000180: 00 00 00 00 00 00 00 00
01 00 00 00 04 00 f1 ff ................
#
长度
: 0x108
0000190: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00001a0: 07 00 00 00 00 00 02 00
dd 00 60
00 00 00 00 00 ..........`.....
#data_items
00001b0: 00 00 00 00 00 00 00 00
12 00 00 00 00 00 01 00 ................
#start_loop
00001c0:
bf 00 40
00 00 00 00 00 00 00 00 00 00 00 00 00 ..@.............
00001d0: 1d 00 00 00 00 00 01 00
d6 00 40
00 00 00 00 00 ..........@.....
#loop_exit
00001e0: 00 00 00 00 00 00 00 00
27 00 00 00 10 00 01 00 ........'.......
#_start
00001f0:
b0 00 40
00 00 00 00 00 00 00 00 00 00 00 00 00 ..@.............
0000200: 2e 00 00 00 10 00 02 00
15 01 60
00 00 00 00 00 ..........`.....
#_bss_start
0000210: 00 00 00 00 00 00 00 00
3a 00 00 00 10 00 02 00 ........:.......
0000220:
15 01 60
00 00 00 00 00 00 00 00 00 00 00 00 00 ..`.............
#_edata
0000230: 41 00 00 00 10 00 02 00
18 01 60
00 00 00 00 00 A.........`.....
#_end
0000240: 00 00 00 00 00 00 00 00
00 6d 61 78 2e 6f 00 64 .........max.o.d
#
字符串表
strtab
0000250: 61 74 61 5f 69 74 65 6d 73 00 73 74 61 72 74 5f ata_items.start_
#
偏移
: 0x248
0000260: 6c 6f 6f 70 00 6c 6f 6f 70 5f 65 78 69 74 00 5f loop.loop_exit._
#
长度
: 0x46
0000270: 73 74 61 72 74 00 5f 5f 62 73 73 5f 73 74 61 72 start.__bss_star
0000280: 74 00 5f 65 64 61 74 61 00 5f 65 6e 64 00 00 00 t._edata._end…
0000290: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
#section headers
00002a0:
00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 ................
偏移:
0x290
00002b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
#64B
x 6
00002c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
#
空
00002d0: 1b 00 00 00 01 00 00 00 06 00 00 00 00 00 00 00 ................
00002e0:
b0 00 40
00 00 00 00 00 b0 00 00 00 00 00 00 00 ..@.............
#.text
00002f0: 2d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 -............…
0000300: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0000310: 21 00 00 00 01 00 00 00 03 00 00 00 00 00 00 00 !...............
0000320:
dd 00 60
00 00 00 00 00 dd 00 00 00 00 00 00 00 ..`.............
0000330: 38 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 8...............
#.data
0000340: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0000350: 11 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................
0000360:
00 00 00
00 00 00 00 00 15 01 00 00 00 00 00 00 ................
0000370: 27 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '...............
#.shstrtab
0000380: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0000390: 01 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 ................
00003a0:
00 00 00
00 00 00 00 00 40 01 00 00 00 00 00 00 ........@.......
00003b0: 08 01 00 00 00 00 00 00 05 00 00 00 07 00 00 00 ................
#.symtab
00003c0: 08 00 00 00 00 00 00 00 18 00 00 00 00 00 00 00 ................
00003d0: 09 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................
00003e0:
00 00 00
00 00 00 00 00 48 02 00 00 00 00 00 00 ........H.......
#.strtab
00003f0: 46 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 F...............
0000400: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
五、
关系图
5.1 EXE文件中的关系
注:箭头未必表示先后关系
5.2 代码文件的结构
ELF header : 64B
|
program headers : 56B x 2
|
.text : 45B
|
.data : 56B
|
.shstrtab : 39B
|
.symtab : 24B x 11
|
.strtab : 70B
|
section headers : 64B x 6
|
六、
EXE文件与BIN文件的转换
6.1 抽取代码段和数据段
要将带有可执行文件头和调试信息的EXE文件转换为纯文本文件,可以用如下命令:
objcopy -O binary -R .note -R .comment max max_copy
表示将max输出为二进制文件,保存在max_copy中,忽略.note和.comment的字段。
6.2 查看代码段
命令:xxd -g 1 -l 256 max_copy
查看开头的256个字节
得到开头的代码段:
0000000: bf 00 00 00 00 67 8b 04 bd dd 00 60 00 89 c3 83 .....g.....`....
0000010: f8 00 74 12 ff c7 67 8b 04 bd dd 00 60 00 39 d8 ..t...g.....`.9.
0000020: 7e ed 89 c3 eb e9 b8 01 00 00 00 cd 80 00 00 00 ~...............
0000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0000090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
6.3 查看数据段
命令:xxd -g 1 -l 256 -s 0x20002d max_copy
-s 表示offset,从0x20002d(= 数据段加载地址0x6000dd - 代码段加载地址0x4000b0)
开始展示,-g 表示每组是1个字节的十六进制,-l表示展示256个字节。
得到数据段:
020002d: 48 00 00 00 45 00 00 00 4c 00 00 00 4c 00 00 00 H...E...L...L...
020003d: 4f 00 00 00 5f 00 00 00 57 00 00 00 4f 00 00 00 O..._...W...O...
020004d: 52 00 00 00 4c 00 00 00 44 00 00 00 21 00 00 00 R...L...D...!...
020005d: 21 00 00 00 00 00 00 00 !.......
可以看出,max_copy刚好只包含了代码段和数据段,且代码段位于文件开头。
编译(compile): 源程序
文件
被编译成目标
文件
,
连接(link): 多个目标
文件
被连接成一个最终的
可执行文件
,
可执行文件
的运行:
可执行文件
被加载(load)到内存中
执行
。
2. a.out assembler and link editor output汇编器和链接编辑器的输出
格式
(简述)a.out 是一种古老的
文件
格式
,简单,紧凑,
可执行文件
可以是具有不同
格式
的二进制
文件
,也可以是一个文本的脚本。
可执行文件
映像中包含了进程
执行
的代码和数据,同时也包含了操作系统用来将映像正确装入内存并
执行
的信息。在
Linux
中,当前的“本地
Linux
下面,目标
文件
、共享对象
文件
、
可执行文件
都是使用ELF
文件
格式
来存储的。程序经过编译之后会输出目标
文件
,然后经过链接可以产生
可执行文件
或者共享对象
文件
。
linux
下面使用的ELF
文件
和Windows操作系统使用的PE
文件
都是从Unix系统的COFF
文件
格式
演化来的。我们先来了解一些基本的想法。首先,最重要的思路是一个程序从人能读懂的
格式
转换为供操作系统
执行
的二进制
格式
之后,代码和数据是分...
linux
可以直接运行的
文件
格式
都保存在一个list里,其中list的基本结构是
linux
_binfmt,这个结构包含3个methods:1.load_binary:
执行
文件
创建execution environment的方法2.load_shlib:binds a shared library3.core_dump:dump the execution context of the curren