在编译本机应用程序时,可以配置不同的编译器标志或优化级别。这种选择取决于不同的要求。例如,如果应用程序二进制文件旨在最终发布,则应针对执行速度和效率设置标志和优化设置。或者,如果应用程序用于调试目的,则应相应地配置调试标志,通常涉及少量或不涉及代码优化。但是,无法从已编译的二进制文件中轻松提取此信息。尽管如此,在比较不同的二进制文件时,确保相同的编译器和编译标志尤为重要,以避免不准确或不可靠的分析。不幸的是,要了解使用了哪些标志和优化,需要对目标体系结构和所使用的编译器有深入的了解。在这项研究中,我们提出了两个深度学习模型,用于检测编译后的二进制文件中的编译器和优化级别。我们研究的优化级别是 x86_64、AArch64、RISC-V、SPARC、PowerPC、MIPS 和 ARM 架构中的 O0、O1、O2、O3 和 Os。另外,对于x86_64和AArch64架构,我们还要判断编译器是GCC还是Clang。我们创建了一个包含 76000 多个二进制文件的数据集并将其用于训练。我们的实验表明,检测编译器的准确率超过 99.95%,检测优化级别的准确率在 92% 到 98% 之间,具体取决于架构。此外,我们分析了数据量极其有限时准确率的变化。
While compiling a native application, different compiler flags or optimization levels can be configured. This choice depends on the different requirements. For example, if the application binary is intended for final release, the flags and optimization settings should be set for execution speed and efficiency. Alternatively, if the application is to be used for debugging purposes, debug flags should be configured accordingly, usually involving minor or no code optimization. However, this information cannot be easily extracted from a compiled binary. Nonetheless, ensuring the same compiler and compilation flags is particularly important when comparing different binary files, to avoid inaccurate or unreliable analyses. Unfortunately, to understand which flags and optimizations have been used, a deep knowledge of the target architecture and the compiler used is required. In this study, we present two deep learning models used to detect both compiler and optimization level in a compiled binary. The optimization levels we study are O0, O1, O2, O3, and Os in the x86_64, AArch64, RISC-V, SPARC, PowerPC, MIPS, and ARM architectures. In addition, for the x86_64 and AArch64 architectures, we also determine whether the compiler is GCC or Clang. We created a dataset of more than 76000 binaries and used it for training. Our experiments showed over 99.95% accuracy in detecting the compiler and between 92% to 98%, depending on the architecture, in detecting the optimization level. Furthermore, we analyzed the change in accuracy when the amount of data was extremely limited. Our study shows that it is possible to accurately detect both compiler flag settings and optimization levels with function-level granularity.