通常,操作系统为了加载一个程序,会在编译后的代码的前面添加一个文件头,提供相应的定位信息,这样操作系统才能在加载EXE时将代码段、数据段加载到正确的内存位置。同时,有些编译器还会提供一些调试信息,如符号表等。如果是.o文件,通常称为relocatable file,这种文件没有经过链接,需要进行重定位,不可以执行。如果是EXE文件,称为executable file,经过连接器链接的可以直接执行,这时文件中的虚拟地址是最终的。操作系统可以设定加载的段基地址,也就是操作系统可以将整个EXE加载到任意位置,但是必须按照EXE中的信息将相应的段加载到合适的位置,相对距离不变,这样代码才能正确执行。提供文件头的EXE文件依赖于加载器的加载,如execve()系统调用,然而操作系统的初始阶段是没有加载器的,我们只能直接跳到某条指令开始执行,这时需要纯二进制文件(raw binary),代码的入口即为文件的第一条语句。有工具可以将EXE文件转换为纯二进制文件,即objcopy。 这里,我们通过研究64位可执行文件的格式,以及利用工具objdump将编译后的机器指 令反汇编为汇编指令,来了解一些EXE的信息。

二、求最大值的GNU汇编代码max.s

#开头的为注释,下同

# 数据段

.section .data

data_items:

.long 'H','E','L','L','O','_','W','O','R','L','D','!','!',0 # 使用 long 类型是为了看大端和小端

#代码

.section .text

# 将入口地址声明为全局可见,默认是局部可见

.globl  _start

_start:

#GNU 汇编中左边是源操作数,右边是目标操作数, intel 汇编正好相反

# 常数要加 $ ,不加 $ 的符号视为地址,寄存器前面要加 %

movl $0, %edi

movl data_items(,%edi,4), %eax # (data_items+ 4*edi) →  eax

# data_items 的第一个数据放入寄存器 ebx 中, ebx 保存最大值

movl %eax, %ebx # eax → ebx

start_loop:

# 数据为 0 时结束,表示没有数据了

cmpl $0, %eax

je loop_exit

incl %edi

movl data_items(,%edi,4), %eax # (data_items+ 4*edi) →  eax

cmpl %ebx, %eax

jle start_loop # eax <= ebx

movl %eax, %ebx # eax > ebx ,赋给 eax → ebx

jmp start_loop

loop_exit:

movl $1, %eax # 1 号系统调用, exit(ebx) ,结束进程

int $0x80

三、编译和运行

环境:ubuntu 15.04

编译:gcc -c -o max.o max.s

链接:ld -o max max.o

运行./max

运行之后通过echo $?可以查看该命令的退出状态,该状态即为最大值,95。

gcc中有指示编译成32位的选项-m32,此时代码段和数据段的对齐就不会是0x200000,距离会变得比较短。对应ld要加-m elf_i386选项,指明为32位平台。

ld中有指示代码段的加载地址的选项-Ttext,如-Ttext 0,则加载地址为0

四、EXE文件的格式

4.1 查看max的ELF等定位信息

命令:readelf -a max

-a表示查看所有ELF信息

可以得到如下的输出信息:

ELF Header:

Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 #EXE 文件的魔数

Class:                             ELF64

Data:                              2's complement, little endian

Version:                           1 (current)

OS/ABI:                            UNIX - System V

ABI Version:                       0

Type:                              EXEC (Executable file) # EXE 文件

Machine:                           Advanced Micro Devices X86-64

Version:                           0x1

Entry point address:               0x4000b0 # 程序入口地址,虚拟地址

Start of program headers:   64 (bytes into file) # 文件中 program headers 的偏移

Start of section headers:   656 (bytes into file) # 文件中 section headers 的偏移

Flags:                             0x0

Size of this header:               64 (bytes) #ELF header 的大小

Size of program headers:           56 (bytes) #program headers 的大小

Number of program headers:         2 #program headers 的个数

Size of section headers:           64 (bytes) #section headers 的大小

Number of section headers:         6 #section headers 的个数

Section header string table index: 3

Section Headers:

[Nr] Name              Type             Address           Offset

Size              EntSize          Flags  Link  Info  Align

[ 0]                   NULL             0000000000000000  00000000

0000000000000000  0000000000000000           0     0     0

# 代码段入口地址 0x4000b0 ,文件偏移地址 0xb0 ,大小为 0x2d

[ 1] .text             PROGBITS         00000000004000b0  000000b0

000000000000002d  0000000000000000  AX       0     0     1

# 数据段入口地址 0x6000dd ,文件偏移地址 0xdd ,大小为 0x38

[ 2] .data             PROGBITS         00000000006000dd  000000dd

0000000000000038  0000000000000000  WA       0     0     1

# 节名表入口地址 0x0 ,文件偏移地址 0x115 ,大小为 0x27

[ 3] .shstrtab         STRTAB           0000000000000000  00000115

0000000000000027  0000000000000000           0     0     1

# 符号表入口地址 0x0 ,文件偏移地址 0x140 ,大小为 0x108

[ 4] .symtab           SYMTAB           0000000000000000  00000140

0000000000000108  0000000000000018           5     7     8

# 字符串表入口地址 0x0 ,文件偏移地址 0x248 ,大小为 0x48

[ 5] .strtab           STRTAB           0000000000000000  00000248

0000000000000048  0000000000000000           0     0     1

Key to Flags:

W (write), A (alloc), X (execute), M (merge), S (strings), l (large)

I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)

O (extra OS processing required) o (OS specific), p (processor specific)

There are no section groups in this file.

#program headers 提供段定位信息

Program Headers:

Type           Offset             VirtAddr           PhysAddr

FileSiz            MemSiz              Flags  Align

# 代码段,读和可执行,虚拟地址 0x400000 → 物理地址 0x400000 ,文件偏移 0

# 长度为 # 0xdd ,对齐为 0x200000

# 包含 ELF header 和代码段

LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000

0x00000000000000dd 0x00000000000000dd  R E    200000

# 数据段,读和写,虚拟地址 0x6000dd → 物理地址 0x6000dd ,文件偏移 0xdd ,长度为 # 0x38 ,对齐为 0x200000

LOAD           0x00000000000000dd 0x00000000006000dd 0x00000000006000dd

0x0000000000000038 0x0000000000000038  RW     200000

Section to Segment mapping:

Segment Sections...

00     .text

01     .data

There is no dynamic section in this file.

There are no relocations in this file.

The decoding of unwind sections for machine type Advanced Micro Devices X86-64 is not currently supported.

# 符号表:程序中的符号及其对应的地址

Symbol table '.symtab' contains 11 entries:

Num:    Value          Size Type    Bind   Vis      Ndx Name

0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND

1: 00000000004000b0     0 SECTION LOCAL  DEFAULT    1

2: 00000000006000dd     0 SECTION LOCAL  DEFAULT    2

3: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS max.o

4: 00000000006000dd     0 NOTYPE  LOCAL  DEFAULT    2 data_items

5: 00000000004000bf     0 NOTYPE  LOCAL  DEFAULT    1 start_loop

6: 00000000004000d6     0 NOTYPE  LOCAL  DEFAULT    1 loop_exit

7: 00000000004000b0     0 NOTYPE  GLOBAL DEFAULT    1 _start

8: 0000000000600115     0 NOTYPE  GLOBAL DEFAULT    2 __bss_start

9: 0000000000600115     0 NOTYPE  GLOBAL DEFAULT    2 _edata

10: 0000000000600118     0 NOTYPE  GLOBAL DEFAULT    2 _end

No version information found in this file.

4.2 反汇编代码

命令:objdump -d max

-d表示反汇编

file format elf64-x86-64

Disassembly of section .text:

# 根据 program headers 提供的信息,最终代码段将加载到 0x4000b0 这个位置

00000000004000b0 <_start>:

4000b0: bf 00 00 00 00                     mov    $0x0,%edi

#data _ items 被换成 0x6000dd ,即数据段的起始地址

4000b5: 67 8b 04 bd dd 00 60 mov    0x6000dd(,%edi,4),%eax

4000bc: 00

4000bd: 89 c3                       mov    %eax,%ebx

#start_loop loop_exit 都被换掉

00000000004000bf <start_loop>:

4000bf: 83 f8 00                           cmp    $0x0,%eax

4000c2: 74 12                       je     4000d6 <loop_exit>

4000c4: ff c7                              inc    %edi

4000c6: 67 8b 04 bd dd 00 60 mov    0x6000dd(,%edi,4),%eax

4000cd: 00

4000ce: 39 d8                       cmp    %ebx,%eax

4000d0: 7e ed                       jle    4000bf <start_loop>

4000d2: 89 c3                       mov    %eax,%ebx

4000d4: eb e9                       jmp    4000bf <start_loop>

00000000004000d6 <loop_exit>:

4000d6: b8 01 00 00 00                     mov    $0x1,%eax

4000db: cd 80                       int    $0x80

4.3 max文件的二进制内容及对应关系

命令:xxd -g 1 max

查看整个文件,默认偏移为0

0000000: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00  .ELF............ #ELF header

0000010: 02 00 3e 00 01 00 00 00 b0 00 40 00 00 00 00 00  ..>.......@..... # 偏移: 0

0000020: 40 00 00 00 00 00 00 00 90 02 00 00 00 00 00 00  @...............

0000030: 00 00 00 00 40 00 38 00 02 00 40 00 06 00 03 00  ....@.8...@..... #长度:64B

0000040: 01 00 00 00 05 00 00 00 00 00 00 00 00 00 00 00  ................ #program headers

0000050: 00 00 40 00 00 00 00 00 00 00 40 00 00 00 00 00  ..@.......@..... # 偏移: 0x40

0000060: dd 00 00 00 00 00 00 00 dd 00 00 00 00 00 00 00  ................ # 长度 : 56B x 2

0000070: 00 00 20 00 00 00 00 00 01 00 00 00 06 00 00 00  .. ..........…

0000080: dd 00 00 00 00 00 00 00 dd 00 60 00 00 00 00 00  ..........`.....

0000090: dd 00 60 00 00 00 00 00 38 00 00 00 00 00 00 00  ..`.....8.......

00000a0: 38 00 00 00 00 00 00 00 00 00 20 00 00 00 00 00  8......... .....

00000b0: bf 00 00 00 00 67 8b 04 bd dd 00 60 00 89 c3 83  .....g.....`.... # 代码段

00000c0: f8 00 74 12 ff c7 67 8b 04 bd dd 00 60 00 39 d8  ..t...g.....`.9. # 偏移: 0xb0

00000d0: 7e ed 89 c3 eb e9 b8 01 00 00 00 cd 80 48 00 00  ~............H.. # 长度 : 0x2d 字节

00000e0: 00 45 00 00 00 4c 00 00 00 4c 00 00 00 4f 00 00  .E...L...L...O.. # 数据段

00000f0: 00 5f 00 00 00 57 00 00 00 4f 00 00 00 52 00 00  ._...W...O...R.. # 偏移 : 0xdd

0000100: 00 4c 00 00 00 44 00 00 00 21 00 00 00 21 00 00  .L...D...!...!.. # 长度 : 0x38 字节

0000110: 00 00 00 00 00 00 2e 73 79 6d 74 61 62 00 2e 73  .......symtab..s # 节名表 shstr tab

0000120: 74 72 74 61 62 00 2e 73 68 73 74 72 74 61 62 00  trtab..shstrtab. # 偏移 : 0x115

0000130: 2e 74 65 78 74 00 2e 64 61 74 61 00 00 00 00 00  .text..data..... # 长度 : 0x27

0000140: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................ # 符号表 .symtab

0000150: 00 00 00 00 00 00 00 00 00 00 00 00 03 00 01 00  ................ # 11 条目 x 24 字节

0000160: b0 00 40 00 00 00 00 00 00 00 00 00 00 00 00 00  ..@............. # 对应下面符号的地址

0000170: 00 00 00 00 03 00 02 00 dd 00 60 00 00 00 00 00  ..........`..... # 偏移: 0x140

0000180: 00 00 00 00 00 00 00 00 01 00 00 00 04 00 f1 ff  ................ # 长度 : 0x108

0000190: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

00001a0: 07 00 00 00 00 00 02 00 dd 00 60 00 00 00 00 00  ..........`..... #data_items

00001b0: 00 00 00 00 00 00 00 00 12 00 00 00 00 00 01 00  ................ #start_loop

00001c0: bf 00 40 00 00 00 00 00 00 00 00 00 00 00 00 00  ..@.............

00001d0: 1d 00 00 00 00 00 01 00 d6 00 40 00 00 00 00 00  ..........@..... #loop_exit

00001e0: 00 00 00 00 00 00 00 00 27 00 00 00 10 00 01 00  ........'....... #_start

00001f0: b0 00 40 00 00 00 00 00 00 00 00 00 00 00 00 00  ..@.............

0000200: 2e 00 00 00 10 00 02 00 15 01 60 00 00 00 00 00  ..........`..... #_bss_start

0000210: 00 00 00 00 00 00 00 00 3a 00 00 00 10 00 02 00  ........:.......

0000220: 15 01 60 00 00 00 00 00 00 00 00 00 00 00 00 00  ..`............. #_edata

0000230: 41 00 00 00 10 00 02 00 18 01 60 00 00 00 00 00  A.........`..... #_end

0000240: 00 00 00 00 00 00 00 00 00 6d 61 78 2e 6f 00 64  .........max.o.d # 字符串表 strtab

0000250: 61 74 61 5f 69 74 65 6d 73 00 73 74 61 72 74 5f  ata_items.start_ # 偏移 : 0x248

0000260: 6c 6f 6f 70 00 6c 6f 6f 70 5f 65 78 69 74 00 5f  loop.loop_exit._ # 长度 : 0x46

0000270: 73 74 61 72 74 00 5f 5f 62 73 73 5f 73 74 61 72  start.__bss_star

0000280: 74 00 5f 65 64 61 74 61 00 5f 65 6e 64 00 00 00  t._edata._end…

0000290: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................ #section headers

00002a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................ 偏移: 0x290

00002b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................ #64B x 6

00002c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................ #

00002d0: 1b 00 00 00 01 00 00 00 06 00 00 00 00 00 00 00  ................

00002e0: b0 00 40 00 00 00 00 00 b0 00 00 00 00 00 00 00  ..@............. #.text

00002f0: 2d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  -............…

0000300: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

0000310: 21 00 00 00 01 00 00 00 03 00 00 00 00 00 00 00  !...............

0000320: dd 00 60 00 00 00 00 00 dd 00 00 00 00 00 00 00  ..`.............

0000330: 38 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  8............... #.data

0000340: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

0000350: 11 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00  ................

0000360: 00 00 00 00 00 00 00 00 15 01 00 00 00 00 00 00  ................

0000370: 27 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  '............... #.shstrtab

0000380: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

0000390: 01 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00  ................

00003a0: 00 00 00 00 00 00 00 00 40 01 00 00 00 00 00 00  ........@.......

00003b0: 08 01 00 00 00 00 00 00 05 00 00 00 07 00 00 00  ................ #.symtab

00003c0: 08 00 00 00 00 00 00 00 18 00 00 00 00 00 00 00  ................

00003d0: 09 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00  ................

00003e0: 00 00 00 00 00 00 00 00 48 02 00 00 00 00 00 00  ........H....... #.strtab

00003f0: 46 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  F...............

0000400: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

五、 关系图

5.1 EXE文件中的关系

注:箭头未必表示先后关系

5.2 代码文件的结构

ELF header : 64B

program headers : 56B x 2

.text : 45B

.data : 56B

.shstrtab : 39B

.symtab : 24B x 11

.strtab : 70B

section headers : 64B x 6

六、 EXE文件与BIN文件的转换

6.1 抽取代码段和数据段

要将带有可执行文件头和调试信息的EXE文件转换为纯文本文件,可以用如下命令:

objcopy -O binary -R .note -R .comment max max_copy

表示将max输出为二进制文件,保存在max_copy中,忽略.note和.comment的字段。

6.2 查看代码段

命令:xxd -g 1 -l 256 max_copy

查看开头的256个字节

得到开头的代码段:

0000000: bf 00 00 00 00 67 8b 04 bd dd 00 60 00 89 c3 83  .....g.....`....

0000010: f8 00 74 12 ff c7 67 8b 04 bd dd 00 60 00 39 d8  ..t...g.....`.9.

0000020: 7e ed 89 c3 eb e9 b8 01 00 00 00 cd 80 00 00 00  ~...............

0000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

0000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

0000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

0000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

0000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

0000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

0000090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

00000a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

00000b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

00000c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

00000d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

00000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

00000f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

6.3 查看数据段

命令:xxd -g 1 -l 256 -s 0x20002d max_copy

-s 表示offset,从0x20002d(= 数据段加载地址0x6000dd - 代码段加载地址0x4000b0)

开始展示,-g 表示每组是1个字节的十六进制,-l表示展示256个字节。

得到数据段:

020002d: 48 00 00 00 45 00 00 00 4c 00 00 00 4c 00 00 00  H...E...L...L...

020003d: 4f 00 00 00 5f 00 00 00 57 00 00 00 4f 00 00 00  O..._...W...O...

020004d: 52 00 00 00 4c 00 00 00 44 00 00 00 21 00 00 00  R...L...D...!...

020005d: 21 00 00 00 00 00 00 00                          !.......

可以看出,max_copy刚好只包含了代码段和数据段,且代码段位于文件开头。

编译(compile): 源程序 文件 被编译成目标 文件 , 连接(link): 多个目标 文件 被连接成一个最终的 可执行文件 可执行文件 的运行: 可执行文件 被加载(load)到内存中 执行 。 2. a.out assembler and link editor output汇编器和链接编辑器的输出 格式 (简述)a.out 是一种古老的 文件 格式 ,简单,紧凑, 可执行文件 可以是具有不同 格式 的二进制 文件 ,也可以是一个文本的脚本。 可执行文件 映像中包含了进程 执行 的代码和数据,同时也包含了操作系统用来将映像正确装入内存并 执行 的信息。在 Linux 中,当前的“本地 Linux 下面,目标 文件 、共享对象 文件 可执行文件 都是使用ELF 文件 格式 来存储的。程序经过编译之后会输出目标 文件 ,然后经过链接可以产生 可执行文件 或者共享对象 文件 linux 下面使用的ELF 文件 和Windows操作系统使用的PE 文件 都是从Unix系统的COFF 文件 格式 演化来的。我们先来了解一些基本的想法。首先,最重要的思路是一个程序从人能读懂的 格式 转换为供操作系统 执行 的二进制 格式 之后,代码和数据是分... linux 可以直接运行的 文件 格式 都保存在一个list里,其中list的基本结构是 linux _binfmt,这个结构包含3个methods:1.load_binary: 执行 文件 创建execution environment的方法2.load_shlib:binds a shared library3.core_dump:dump the execution context of the curren