数组分配内存过大导致SIGSEGV信号(段错误)

一、背景

今天codding的时候,发现一个段错误。

-> % ./a.out 9000000
the size is: 0x895440
[2]    10558 segmentation fault (core dumped)  ./a.out 9000000

打印跟了一下程序,段错误发生在定义数组的时候,感觉程序没毛病,就使用gdb跟了一下,效果如下:

(gdb) r 2304098328304234802342
Starting program: /home/signal/a.out 2304098328304234802342
the size is: 0x7fffffff
Program received signal SIGSEGV, Segmentation fault.
0x08048512 in main (argc=2, argv=0xbffff634) at sigsegv.c:15
15      bzero(test, sizeof(test));
(gdb) s
Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.
(gdb) quit

于是就专门测试了一下这个信号:SIGSEGV

二、定位问题

1. 测试程序

大概知道了是数组分配的内存太大引起的,就顺手写了个测试程序,如下:

#include <stdio.h>
#include <string.h>
int main(int argc, char *argv[])
    int size;
    if (argc != 2) {
        printf("Usage: %s [size]\n", argv[0]);
        return -1;
    size = atoi(argv[1]);
    printf("the size is: 0x%x\n", size);
    char test[size];
    bzero(test, sizeof(test));
    return 0;

执行结果如下:

-> % ./a.out 9000000
the size is: 0x895440
[2]    10558 segmentation fault (core dumped)  ./a.out 9000000
-> % ./a.out 8000000
the size is: 0x7a1200

可见,当分配的内存大于一定值时,就会出现段错误。

2. gdb调试core文件

使用gdb调试时,打印的错误信息如前所述,

设置ulimit -c 参数,程序运行错误时会生成core文件,使用gdb调试,如下:

-> % gdb -c core ./a.out 
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./a.out...done.
[New LWP 11075]
Core was generated by `./a.out 9000000'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x08048512 in main (argc=2, argv=0xbf9a6584) at sigsegv.c:15
15      bzero(test, sizeof(test));
(gdb) s
The program is not being run.
(gdb) bt
#0  0x08048512 in main (argc=2, argv=0xbf9a6584) at sigsegv.c:15
(gdb) 
3. strace调试系统调用

使用strace跟踪系统调用,打印如下:

-> % strace ./a.out 9000000
execve("./a.out", ["./a.out", "9000000"], [/* 63 vars */]) = 0
brk(0)                                  = 0x8156000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7752000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=90693, ...}) = 0
mmap2(NULL, 90693, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb773b000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/i386-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\340\233\1\0004\0\0\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=1754876, ...}) = 0
mmap2(NULL, 1759868, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb758d000
mmap2(0xb7735000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1a8000) = 0xb7735000
mmap2(0xb7738000, 10876, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7738000
close(3)                                = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb758c000
set_thread_area({entry_number:-1 -> 6, base_addr:0xb758c940, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
mprotect(0xb7735000, 8192, PROT_READ)   = 0
mprotect(0x8049000, 4096, PROT_READ)    = 0
mprotect(0xb7778000, 4096, PROT_READ)   = 0
munmap(0xb773b000, 90693)               = 0
fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 10), ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7751000
write(1, "1the size is: 0x895440\n", 231the size is: 0x895440
) = 23
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0xbf35fc54} ---
+++ killed by SIGSEGV (core dumped) +++
[2]    11100 segmentation fault (core dumped)  strace ./a.out 9000000

由此可知,大概也就这个问题了。

三、分析与解决

SIGSEGV:指示进程进行了一次无效的内存引用(通常说明程序有错,若访问了一个未经初始化的指针)。名字SEGV代表“段违例”(segmentation violation).
SIGSEGV的默认动作是终止+core