AOT cross-compiler produces different binaries than native AOT compiler #21548

@klasyc

Description

I am deploying a C#.NET application to an ARM-based single board computer. The system image is built with Yocto. Because I want to use AOT in order to speed up the application, I compiled mono (version 6.8.0.123) two times - for ARM as native interpreter and for x86_64 as an ARM cross-compiler. Both compilers seem to work, but the cross-compiler gives me faulty binaries.

Steps to reproduce

I am using mono version 6.8.0.123. Here are my two compilers, configuration and build details attached:

  • Native ARM interpreter (build details here: mono-native-arm-configure-compile.zip )
  • x86_64 cross-compiler (build details here: mono-cross-x64-configure-compile.zip )
  • For the cross-compiler, I had to supply the offsets file which I (after some problems) generated using the offsets-tool . I tried to run the tool on both host machine (using arm cross-compilers) and the target machine (using native compilers). Both methods produced exactly the same file which is also attached in the build details above.

    My minimal example application is here, compiled using the native ARM compiler by csc AotHelloWorld.cs .

    using System;
    namespace AotHelloWorld {
    	class Program	{
    		static void Main(string[] args) {
    			Console.WriteLine("Hello AOT world!");
    

    The application runs normally without AOT (using JIT compiler). Then I try to compile the same file by my AOT compilers:

  • Native (ARM):
    root@beaglebone-lcd:/usr/lib/edaui# mono --aot=tool-prefix=arm-poky-linux-gnueabi- -O=all /usr/lib/edaui/AotHelloWorld.exe
    Mono Ahead of Time compiler - compiling assembly /usr/lib/edaui/AotHelloWorld.exe
    AOTID C71F3454-A3F7-C309-382A-48A0EF1A821C
    Compiled: 2/2
    Executing the native assembler: "arm-poky-linux-gnueabi-as"   -mfpu=vfp3 -o /tmp/mono_aot_Pqu6DF.o /tmp/mono_aot_Pqu6DF
    Executing the native linker: arm-poky-linux-gnueabi-gcc --shared -Wl,-Bsymbolic -o /usr/lib/edaui/AotHelloWorld.exe.so.tmp  /tmp/mono_aot_Pqu6DF.o
    Stripping the binary: "arm-poky-linux-gnueabi-strip" --strip-symbol=\$a --strip-symbol=\$d /usr/lib/edaui/AotHelloWorld.exe.so.tmp
    JIT time: 3 ms, Generation time: 2 ms, Assembly+Link time: 292 ms.
    
  • ARM Cross-compiler (x86_64):
    ubuntu@docker-desktop:~/yocto/yocto-image-eda/build/tmp/work/beaglebone_lcd-poky-linux-gnueabi/edaui/1.0.0-r0/edaui$ /home/ubuntu/yocto/yocto-image-eda/build/tmp/sysroots/x86_64-linux/usr/bin/arm-poky-linux-gnueabi/arm-poky-linux-gnueabi-mono-sgen --aot=tool-prefix=/home/ubuntu/yocto/yocto-image-eda/build/tmp/sysroots/x86_64-linux/usr/bin/arm-poky-linux-gnueabi/arm-poky-linux-gnueabi-,ld-flags=--sysroot=/home/ubuntu/yocto/yocto-image-eda/build/tmp/sysroots/beaglebone-lcd /home/ubuntu/yocto/yocto-image-eda/build/tmp/work/beaglebone_lcd-poky-linux-gnueabi/edaui/1.0.0-r0/edaui/AotHelloWorld.exe
    Mono Ahead of Time compiler - compiling assembly /home/ubuntu/yocto/yocto-image-eda/build/tmp/work/beaglebone_lcd-poky-linux-gnueabi/edaui/1.0.0-r0/edaui/AotHelloWorld.exe
    AOTID 0AF8005A-F850-B04E-BB92-8FDFD63B1162
    Compiled: 2/2
    Executing the native assembler: "/home/ubuntu/yocto/yocto-image-eda/build/tmp/sysroots/x86_64-linux/usr/bin/arm-poky-linux-gnueabi/arm-poky-linux-gnueabi-as"   -mfpu=vfp3 -o /tmp/mono_aot_vftE8f.o /tmp/mono_aot_vftE8f
    Executing the native linker: /home/ubuntu/yocto/yocto-image-eda/build/tmp/sysroots/x86_64-linux/usr/bin/arm-poky-linux-gnueabi/arm-poky-linux-gnueabi-gcc --shared -Wl,-Bsymbolic -o /home/ubuntu/yocto/yocto-image-eda/build/tmp/work/beaglebone_lcd-poky-linux-gnueabi/edaui/1.0.0-r0/edaui/AotHelloWorld.exe.so.tmp  /tmp/mono_aot_vftE8f.o --sysroot=/home/ubuntu/yocto/yocto-image-eda/build/tmp/sysroots/beaglebone-lcd
    Stripping the binary: "/home/ubuntu/yocto/yocto-image-eda/build/tmp/sysroots/x86_64-linux/usr/bin/arm-poky-linux-gnueabi/arm-poky-linux-gnueabi-strip" --strip-symbol=\$a --strip-symbol=\$d /home/ubuntu/yocto/yocto-image-eda/build/tmp/work/beaglebone_lcd-poky-linux-gnueabi/edaui/1.0.0-r0/edaui/AotHelloWorld.exe.so.tmp
    JIT time: 2 ms, Generation time: 1 ms, Assembly+Link time: 65 ms.
    

    Current Behavior

    Both compilers give different results (the output binaries have different size).

    For this test application both binaries work, but for my real application there is some difference and only native-compiled binary work. When I run my application with the cross-compiled binary, the application does not crash, but it fails to connect my device. Since the output logs are huge, I made this test application to make things simpler.

    For further diagnosis I added asmonly command-line switch to both compilers. Now I have two different assembler files:

  • Native AOT compiler output: AotHelloWorld.exe.s-native.zip
  • Cross-compiler output: AotHelloWorld.exe.s-cross.zip
  • As you can see here there is quite a lot of differences.

    Expected Behavior

    I expect both compilers will produce exactly the same binaries.

    On which platforms did you notice this

    [ ] macOS
    [x] Linux
    [ ] Windows

    Version Used:

  • Native:
    Mono JIT compiler version 6.8.0.123 (tarball Wed Jul 13 19:11:14 UTC 2022)
          TLS:           __thread
          SIGSEGV:       normal
          Notifications: epoll
          Architecture:  armel,vfp+hard
          Disabled:      none
          Misc:          softdebug
          Interpreter:   yes
          LLVM:          supported, not enabled.
          Suspend:       preemptive
          GC:            sgen (concurrent by default)
    
  • Cross-compiler:
    Mono JIT compiler version 6.8.0.123 (tarball Wed Aug 31 12:39:47 UTC 2022)
          SIGSEGV:       normal
          Notifications: epoll
          Architecture:  armel,vfp
          Disabled:      none
          Misc:          softdebug
          Interpreter:   yes
          LLVM:          supported, not enabled.
          Suspend:       hybrid
          GC:            sgen (concurrent by default)
    

    I think the problem must be in wrong configure options (see the difference in the target architecture in the version output above), but I have no idea what I should change. Any ideas?

  •