【转载】art gc相关死锁黑屏问题总结_waitingforgcthreadflip_seiyaaa的博客

art gc相关死锁黑屏问题总结 | wwm的笔记

从测试同学那边拿过来两台黑屏的机器，点击电源、屏幕都没有反应，不过还好adb可以链接，有root权限，一番deubgging后，确认表面原因是system_server进程的art虚拟机卡在了gc前flip线程状态的操作中，而最终的root cause也比较有意思，所以本文就简单记录下问题的排查过程。
1. 按照习惯，快速检查下是否发生了watchdog
$ adb shell ``ls` `-l ``/data/anr/
-rw------- 1 system system 53592 2019-05-03 14:21 anr_2019-05-03-14-21-53-368
-rw------- 1 system system 845398 2019-05-03 14:30 anr_2019-05-03-14-30-03-498
-rw------- 1 system system 1078028 2019-05-03 14:58 anr_2019-05-03-14-58-40-505
-rw------- 1 system system 1060669 2019-05-03 14:59 anr_2019-05-03-14-59-00-410
-rw------- 1 system system 527393 2019-05-03 17:32 anr_2019-05-03-17-32-22-852
-rw------- 1 system system 619826 2019-05-03 17:33 anr_2019-05-03-17-32-58-212
// ...
并没有watchdog相关日志 
2. 查看系统当前时间 
 `$ adb shell ``date``Sun May  5 13:52:25 CST 2019`
3. 查看system_server进程各个线程的状态 
 $ pid=`adb shell pidof system_server` && adb shell ``ps` `-T $pid
USER           PID   TID  PPID     VSZ    RSS WCHAN            ADDR S CMD           
system        1471  1471   635 11629888 168336 futex_wait_queue_me 0 S system_server
system        1471  1477   635 11629888 168336 do_sigtimedwait    0 S Signal Catcher
system        1471  1478   635 11629888 168336 futex_wait_queue_me 0 S ADB-JDWP Connec
system        1471  1479   635 11629888 168336 futex_wait_queue_me 0 S Binder:filter-p
system        1471  1480   635 11629888 168336 poll_schedule_timeout 0 S Binder:``read``-per
system        1471  1481   635 11629888 168336 futex_wait_queue_me 0 S ReferenceQueueD
system        1471  1482   635 11629888 168336 futex_wait_queue_me 0 S FinalizerDaemon
system        1471  1483   635 11629888 168336 futex_wait_queue_me 0 S FinalizerWatchd
system        1471  1484   635 11629888 168336 futex_wait_queue_me 0 S HeapTaskDaemon
system        1471  1524   635 11629888 168336 futex_wait_queue_me 0 S Binder:1471_1
system        1471  1525   635 11629888 168336 futex_wait_queue_me 0 S Binder:1471_2
system        1471  1673   635 11629888 168336 futex_wait_queue_me 0 S android.``bg
system        1471  1675   635 11629888 168336 futex_wait_queue_me 0 S ActivityManager
system        1471  1676   635 11629888 168336 SyS_epoll_wait     0 S android.ui
system        1471  1677   635 11629888 168336 SyS_epoll_wait     0 S ActivityManager
system        1471  1678   635 11629888 168336 SyS_epoll_wait     0 S ActivityManager
system        1471  1684   635 11629888 168336 futex_wait_queue_me 0 S batterystats-wo
system        1471  1687   635 11629888 168336 wait_woken         0 S FileObserver
system        1471  1688   635 11629888 168336 futex_wait_queue_me 0 S android.``fg
system        1471  1689   635 11629888 168336 futex_wait_queue_me 0 S android.io
system        1471  1690   635 11629888 168336 futex_wait_queue_me 0 S android.display
system        1471  1691   635 11629888 168336 futex_wait_queue_me 0 S CpuTracker
system        1471  1692   635 11629888 168336 futex_wait_queue_me 0 S PowerManagerSer
system        1471  1693   635 11629888 168336 futex_wait_queue_me 0 S BatteryStats_wa
system        1471  1694   635 11629888 168336 SyS_epoll_wait     0 S work-thread
system        1471  1695   635 11629888 168336 SyS_epoll_wait     0 S PackageManager
system        1471  1696   635 11629888 168336 SyS_epoll_wait     0 S PackageManager
system        1471  1817   635 11629888 168336 SyS_epoll_wait     0 S PackageInstalle
system        1471  1821   635 11629888 168336 SyS_epoll_wait     0 S android.anim
system        1471  1822   635 11629888 168336 SyS_epoll_wait     0 S android.anim.lf
这一步主要是快速检查下有没有D（disk blocked）状态的线程 
3. 查看system_server进程backtrace 
 $ pid=`adb shell pidof system_server` && adb shell ``kill` `-3 $pid
点击下载完整backtrace!system_server_1471堆栈1471.txt-Android文档类资源-CSDN下载 
   
 检查各个线程在虚拟机内部的状态后，看到一个比较奇怪的现象： 
 "Binder:filter-perf-event"` `prio=``5` `tid=``4` `WaitingForGcThreadFlip
"HeapTaskDaemon"` `daemon prio=``5` `tid=``9` `WaitingForGcThreadFlip
"PowerManagerService"` `prio=``5` `tid=``24` `WaitingForGcThreadFlip
"HwBinder:1471_1"` `prio=``5` `tid=``35` `WaitingForGcThreadFlip
"WifiStateMachine"` `prio=``5` `tid=``56` `WaitingForGcThreadFlip
"Binder:1471_3"` `prio=``5` `tid=``92` `WaitingForGcThreadFlip
"Binder:1471_5"` `prio=``5` `tid=``105` `WaitingForGcThreadFlip
"Binder:1471_7"` `prio=``5` `tid=``107` `WaitingForGcThreadFlip
"Binder:1471_A"` `prio=``5` `tid=``120` `WaitingForGcThreadFlip
"Binder:1471_D"` `prio=``5` `tid=``123` `WaitingForGcThreadFlip
"Binder:1471_E"` `prio=``5` `tid=``124` `WaitingForGcThreadFlip
"Binder:1471_11"` `prio=``5` `tid=``127` `WaitingForGcThreadFlip
"Binder:1471_17"` `prio=``5` `tid=``133` `WaitingForGcThreadFlip
"Binder:1471_18"` `prio=``5` `tid=``134` `WaitingForGcThreadFlip
"Binder:1471_19"` `prio=``5` `tid=``135` `WaitingForGcThreadFlip
"HwBinder:1471_4"` `prio=``5` `tid=``12` `WaitingForGcThreadFlip
"Binder:1471_20"` `prio=``5` `tid=``146` `WaitingForGcThreadFlip
"HwBinder:1471_5"` `prio=``5` `tid=``34` `WaitingForGcToComplete
"ADB-JDWP Connection Control Thread"` `daemon prio=``0` `tid=``5` `WaitingForGcToComplete
"android.io"` `prio=``5` `tid=``21` `WaitingForGcToComplete
"CpuTracker"` `prio=``5` `tid=``23` `WaitingForGcToComplete
"UEventObserver"` `prio=``5` `tid=``36` `WaitingForGcToComplete
"HwBinder:1471_2"` `prio=``5` `tid=``43` `WaitingForGcToComplete
"HwBinder:1471_3"` `prio=``5` `tid=``44` `WaitingForGcToComplete
"NetdConnector"` `prio=``5` `tid=``50` `WaitingForGcToComplete
"WifiService"` `prio=``5` `tid=``55` `WaitingForGcToComplete
"DeviceStorageMonitorService"` `prio=``5` `tid=``67` `WaitingForGcToComplete
"SecurityWriteHandlerThread"` `prio=``5` `tid=``72` `WaitingForGcToComplete
"watchdog"` `prio=``5` `tid=``101` `WaitingForGcToComplete
"HwBinder:1471_5"` `prio=``5` `tid=``34` `WaitingForGcToComplete
"Timer-20"` `prio=``5` `tid=``170` `WaitingForGcToComplete
"IpClient.wlan0"` `prio=``5` `tid=``176` `WaitingForGcToComplete
"OkHttp ConnectionPool"` `daemon prio=``5` `tid=``185` `WaitingForGcToComplete
"main"` `prio=``5` `tid=``1` `Blocked
"Binder:1471_1"` `prio=``5` `tid=``10` `Blocked
"Binder:1471_2"` `prio=``5` `tid=``11` `Blocked
"android.bg"` `prio=``5` `tid=``13` `Blocked
"ActivityManager"` `prio=``5` `tid=``14` `Blocked
"batterystats-worker"` `prio=``5` `tid=``18` `Blocked
"android.fg"` `prio=``5` `tid=``20` `Blocked
"android.display"` `prio=``5` `tid=``22` `Blocked
"AlarmManager"` `prio=``5` `tid=``40` `Blocked
"InputDispatcher"` `prio=``10` `tid=``46` `Blocked
"InputReader"` `prio=``10` `tid=``47` `Blocked
"NetworkPolicy.uid"` `prio=``5` `tid=``53` `Blocked
"Binder:1471_4"` `prio=``5` `tid=``93` `Blocked
"Binder:1471_6"` `prio=``5` `tid=``106` `Blocked
"Binder:1471_8"` `prio=``5` `tid=``108` `Blocked
"Binder:1471_9"` `prio=``5` `tid=``118` `Blocked
"Binder:1471_B"` `prio=``5` `tid=``121` `Blocked
"Binder:1471_C"` `prio=``5` `tid=``122` `Blocked
"Binder:1471_F"` `prio=``5` `tid=``125` `Blocked
"Binder:1471_10"` `prio=``5` `tid=``126` `Blocked
"Binder:1471_12"` `prio=``5` `tid=``128` `Blocked
"Binder:1471_13"` `prio=``5` `tid=``129` `Blocked
"Binder:1471_14"` `prio=``5` `tid=``130` `Blocked
"Binder:1471_15"` `prio=``5` `tid=``131` `Blocked
"Binder:1471_16"` `prio=``5` `tid=``132` `Blocked
"Binder:1471_1A"` `prio=``5` `tid=``136` `Blocked
"Binder:1471_1B"` `prio=``5` `tid=``140` `Blocked
"Binder:1471_1C"` `prio=``5` `tid=``117` `Blocked
"Binder:1471_1D"` `prio=``5` `tid=``141` `Blocked
"Binder:1471_1E"` `prio=``5` `tid=``144` `Blocked
"Binder:1471_1F"` `prio=``5` `tid=``145` `Blocked
"NetworkMonitorNetworkAgentInfo [WIFI () - 950]"` `prio=``5` `tid=``149` `Blocked
一堆线程都是gc相关的状态，具体什么意思，我也不关心，还是再继续看下blocked的线程，都具体卡在什么地方： 
 "Binder:1471_6" prio=5 tid=106 Blocked
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x14064288 self=0x721bdfb000
  | sysTid=2126 nice=0 cgrp=default sched=0/0 handle=0x72107ff4f0
  | state=S schedstat=( 5605933458050 5415448290382 12052578 ) utm=475916 stm=84677 core=5 HZ=100
  | stack=0x7210704000-0x7210706000 stackSize=1009KB
  | held mutexes=
  at com.android.server.net.NetworkPolicyManagerService.isUidNetworkingBlockedInternal(NetworkPolicyManagerService.java:4825)
  - waiting to lock <0x05e68352> (a java.lang.Object) held by thread 53
  at com.android.server.net.NetworkPolicyManagerService.access$3600(NetworkPolicyManagerService.java:291)
  at com.android.server.net.NetworkPolicyManagerService$NetworkPolicyManagerInternalImpl.isUidNetworkingBlocked(NetworkPolicyManagerService.java:4900)
  at com.android.server.ConnectivityService.isNetworkWithLinkPropertiesBlocked(ConnectivityService.java:1133)
  at com.android.server.ConnectivityService.filterNetworkStateForUid(ConnectivityService.java:1163)
  at com.android.server.ConnectivityService.getActiveNetworkInfo(ConnectivityService.java:1185)
  at android.net.IConnectivityManager$Stub.onTransact(IConnectivityManager.java:85)
  at android.os.Binder.execTransact(Binder.java:728)
"Binder:1471_8" prio=5 tid=108 Blocked
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x14064398 self=0x7219dd9000
  | sysTid=2168 nice=0 cgrp=default sched=0/0 handle=0x720fdff4f0
  | state=S schedstat=( 5601724562758 5464273391739 12340199 ) utm=473400 stm=86772 core=6 HZ=100
  | stack=0x720fd04000-0x720fd06000 stackSize=1009KB
  | held mutexes=
  at com.android.server.am.ActivityManagerService.activityPaused(ActivityManagerService.java:8527)
  - waiting to lock <0x00c6c1f5> (a com.android.server.am.ActivityManagerService) held by thread 146
  at android.app.IActivityManager$Stub.onTransact(IActivityManager.java:225)
  at com.android.server.am.ActivityManagerService.onTransact(ActivityManagerService.java:3399)
  at android.os.Binder.execTransact(Binder.java:728)
"NetworkPolicy.uid" prio=5 tid=53 Blocked
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x1405f790 self=0x72163fc000
  | sysTid=1881 nice=-2 cgrp=default sched=0/0 handle=0x721767d4f0
  | state=S schedstat=( 992941159886 2687927167537 3629569 ) utm=24231 stm=75063 core=5 HZ=100
  | stack=0x721757a000-0x721757c000 stackSize=1041KB
  | held mutexes=
  at com.android.server.am.ActivityManagerService$LocalService.notifyNetworkPolicyRulesUpdated(ActivityManagerService.java:27503)
  - waiting to lock <0x00c6c1f5> (a com.android.server.am.ActivityManagerService) held by thread 146
  at com.android.server.net.NetworkPolicyManagerService.handleUidChanged(NetworkPolicyManagerService.java:4461)
  - locked <0x05e68352> (a java.lang.Object)
  at com.android.server.net.NetworkPolicyManagerService$18.handleMessage(NetworkPolicyManagerService.java:4435)
  at android.os.Handler.dispatchMessage(Handler.java:102)
  at android.os.Looper.loop(Looper.java:207)
  at android.os.HandlerThread.run(HandlerThread.java:65)
  at com.android.server.ServiceThread.run(ServiceThread.java:44)
接着看 0x05e68352 和 0x00c6c1f5 这俩monitor锁的持有线程： 
 "Binder:1471_20" prio=5 tid=146 WaitingForGcThreadFlip
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x14147028 self=0x72126b5800
  | sysTid=18045 nice=-2 cgrp=default sched=0/0 handle=0x72049824f0
  | state=S schedstat=( 6101767900274 5905637387083 13051781 ) utm=516502 stm=93674 core=7 HZ=100
  | stack=0x7204887000-0x7204889000 stackSize=1009KB
  | held mutexes=
  kernel: __switch_to+0xa4/0xfc
  kernel: futex_wait_queue_me+0xc4/0x13c
  kernel: futex_wait+0xe4/0x204
  kernel: do_futex+0x168/0x80c
  kernel: SyS_futex+0x90/0x1b8
  kernel: __sys_trace+0x4c/0x4c
  native: #00 pc 000000000001ef2c  /system/lib64/libc.so (syscall+28)
  native: #01 pc 00000000000d855c  /system/lib64/libart.so (art::ConditionVariable::WaitHoldingLocks(art::Thread*)+148)
  native: #02 pc 00000000001f0680  /system/lib64/libart.so (art::gc::Heap::IncrementDisableThreadFlip(art::Thread*)+516)
  native: #03 pc 0000000000384fe4  /system/lib64/libart.so (art::JNI::GetStringCritical(_JNIEnv*, _jstring*, unsigned char*)+692)
  native: #04 pc 000000000012e268  /system/lib64/libandroid_runtime.so (android::android_os_Parcel_writeString(_JNIEnv*, _jclass*, long, _jstring*)+64)
  at android.os.Parcel.nativeWriteString(Native method)
  at android.os.Parcel$ReadWriteHelper.writeString(Parcel.java:369)
  at android.os.Parcel.writeString(Parcel.java:707)
  at android.content.pm.PackageItemInfo.writeToParcel(PackageItemInfo.java:652)
  at android.content.pm.ApplicationInfo.writeToParcel(ApplicationInfo.java:1513)
  at android.app.IApplicationThread$Stub$Proxy.bindApplication(IApplicationThread.java:943)
  at com.android.server.am.ActivityManagerService.attachApplicationLocked(ActivityManagerService.java:8146)
  at com.android.server.am.ActivityManagerService.attachApplication(ActivityManagerService.java:8255)
  - locked <0x00c6c1f5> (a com.android.server.am.ActivityManagerService)
  at android.app.IActivityManager$Stub.onTransact(IActivityManager.java:199)
  at com.android.server.am.ActivityManagerService.onTransact(ActivityManagerService.java:3399)
  at android.os.Binder.execTransact(Binder.java:728)
可以看到还是owner线程正是之前 WaitingForGcThreadFlip 中的一员，那么下一步通过gdb看下这一类线程具体卡在哪一行代码上： 
根据机器的版本号确认symbol文件后，通过公司同事分享的脚本一行命令就可以attach到system_server进程上 
   $ ~/Documents/gdb_native_tools/droid.py attach /home/pip/Downloads/e8-V10.3.3.0.PECCNXM-gcflip-hang/out/target/product/equuleus/symbols system_server
process: system_server, pid(1471), parent(zygote64)
gdbserver:/data/local/tmp/gdbserver64
bin_path: /home/pip/Downloads/e8-V10.3.3.0.PECCNXM-gcflip-hang/out/target/product/equuleus/symbols/system/bin/
solib:    /home/pip/Downloads/e8-V10.3.3.0.PECCNXM-gcflip-hang/out/target/product/equuleus/symbols/system/lib64/
bin:      /home/pip/Downloads/e8-V10.3.3.0.PECCNXM-gcflip-hang/out/target/product/equuleus/symbols/system/bin/app_process64
Attached; pid = 1471
gdbserver: Unable to determine the number of hardware watchpoints available.
gdbserver: Unable to determine the number of hardware breakpoints available.
Listening on port 1234
GNU gdb (GDB) 7.11
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
Reading symbols from /home/pip/Downloads/e8-V10.3.3.0.PECCNXM-gcflip-hang/out/target/product/equuleus/symbols/system/bin/app_process64...done.
Remote debugging using localhost:1234
Remote debugging from host 127.0.0.1
warning: .dynamic section for "/home/pip/Downloads/e8-V10.3.3.0.PECCNXM-gcflip-hang/out/target/product/equuleus/symbols/system/bin/app_process64" is not at the expected address (wrong library or version mismatch?)
warning: .dynamic section for "/home/pip/Downloads/e8-V10.3.3.0.PECCNXM-gcflip-hang/out/target/product/equuleus/symbols/system/lib64/libandroid_runtime.so" is not at the expected address (wrong library or version mismatch?)
warning: .dynamic section for "/home/pip/Downloads/e8-V10.3.3.0.PECCNXM-gcflip-hang/out/target/product/equuleus/symbols/system/lib64/libcutils.so" is not at the expected address (wrong library or version mismatch?)
warning: .dynamic section for "/home/pip/Downloads/e8-V10.3.3.0.PECCNXM-gcflip-hang/out/target/product/equuleus/symbols/system/lib64/libaudioclient.so" is not at the expected address (wrong library or version mismatch?)
warning: .dynamic section for "/home/pip/Downloads/e8-V10.3.3.0.PECCNXM-gcflip-hang/out/target/product/equuleus/symbols/system/lib64/libcutils.so" is not at the expected address (wrong library or version mismatch?)
warning: Could not load shared library symbols for 15 libraries, e.g. /system/lib64/libclang_rt.ubsan_standalone-aarch64-android.so.
Use the "info sharedlibrary" command to see the complete listing.
Do you need "set solib-search-path" or "set sysroot"?
Reading symbols from /home/pip/Downloads/e8-V10.3.3.0.PECCNXM-gcflip-hang/out/target/product/equuleus/symbols/system/bin/app_process64...done.
Reading symbols from /home/pip/Downloads/e8-V10.3.3.0.PECCNXM-gcflip-hang/out/target/product/equuleus/symbols/system/bin/linker64...done.
......
syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
41  bionic/libc/arch-arm64/bionic/syscall.S: No such file or directory.
Source directories searched: /mnt/miui/v10-p-dipper-alpha:$cdir:$cwd
#0  syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
#1  0x0000007238a80560 in art::futex (uaddr=0x723961df68, op=0, val=20935915, timeout=0x7fc723ee90, uaddr2=0x7fc723ee90, val3=0) at art/runtime/base/mutex-inl.h:43
#2  art::ConditionVariable::WaitHoldingLocks (this=0x723961df58, self=0x7239614c00) at art/runtime/base/mutex.cc:953
#3  0x0000007238d6241c in art::Monitor::Lock<(art::LockReason)1> (this=0x723961df20, self=0x7239614c00) at art/runtime/monitor.cc:559
#4  0x0000007238d687a8 in art::Monitor::MonitorEnter (self=0x7239614c00, obj=<optimized out>, trylock=<optimized out>) at art/runtime/monitor.cc:1462
#5  0x0000007238ec40fc in art::mirror::Object::MonitorEnter (this=0x62 <_DYNAMIC+98>, self=0x7239614c00) at art/runtime/mirror/object-inl.h:98
#6  artLockObjectFromCode (obj=0x62 <_DYNAMIC+98>, self=0x7239614c00) at art/runtime/entrypoints/quick/quick_lock_entrypoints.cc:32
#7  0x0000007238f04e9c in art_quick_lock_object () at art/runtime/arch/arm64/quick_entrypoints_arm64.S:1510
#8  0x00000072207cf6a4 in ?? ()
Backtrace stopped: not enough registers or memory available to unwind further
/home/pip/Documents/gdb_native_tools/gdb-script/init.gdb: No such file or directory.
/home/pip/Documents/gdb_native_tools/shadow/gdb_driver.py: No such file or directory.
(gdb) handle SIG35 nostop noprint pass
Signal        Stop  Print   Pass to program Description
SIG35         No    No  Yes     Real-time event 35
(gdb) handle SIG33 nostop noprint pass
Signal        Stop  Print   Pass to program Description
SIG33         No    No  Yes     Real-time event 33
tips：handle SIG35 nostop noprint pass 是为了不处理android平台上用于抓native trace的 sig35 信号，减小我们的调试过程意外中断的可能性（只是个习惯, 这个问题中其实不做也没关系）




    
 
切换到gc线程 
   (gdb) thread 9
[Switching to thread 9 (Thread 1471.1484)]
#0  syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
41      svc     #0
(gdb) bt
#0  syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
#1  0x0000007238a80560 in art::futex (uaddr=0x7239723910, op=0, val=93152, timeout=0x7221a89ab0, uaddr2=0x7221a89ab0, val3=0) at art/runtime/base/mutex-inl.h:43
#2  art::ConditionVariable::WaitHoldingLocks (this=0x7239723900, self=0x723966b000) at art/runtime/base/mutex.cc:953
#3  0x0000007238b98d54 in art::gc::Heap::ThreadFlipBegin (this=0x72396b7600, self=0x723966b000) at art/runtime/gc/heap.cc:845 // <<<<<<<<<<<<<<<<<<<<<<<
#4  0x0000007238e52204 in art::ThreadList::FlipThreadRoots (this=<optimized out>, thread_flip_visitor=0x7221a89ca0, flip_callback=<optimized out>, collector=
    0x7239716500, pause_listener=<optimized out>) at art/runtime/thread_list.cc:591
#5  0x0000007238b65cc4 in art::gc::collector::ConcurrentCopying::FlipThreadRoots (this=0x7239716500) at art/runtime/gc/collector/concurrent_copying.cc:620
#6  0x0000007238b64c50 in art::gc::collector::ConcurrentCopying::RunPhases (this=0x7239716500) at art/runtime/gc/collector/concurrent_copying.cc:178
#7  0x0000007238b7af80 in art::gc::collector::GarbageCollector::Run (this=0x7239716500, gc_cause=<optimized out>, clear_soft_references=<optimized out>)
    at art/runtime/gc/collector/garbage_collector.cc:96
#8  0x0000007238b9d93c in art::gc::Heap::CollectGarbageInternal (this=0x72396b7600, gc_type=art::gc::collector::kGcTypeFull, gc_cause=art::gc::kGcCauseBackground,
    clear_soft_references=false) at art/runtime/gc/heap.cc:2616
#9  0x0000007238baeeb8 in art::gc::Heap::ConcurrentGC (this=0x72396b7600, self=<optimized out>, cause=art::gc::kGcCauseBackground, force_full=false)
    at art/runtime/gc/heap.cc:3621
#10 0x0000007238bb4524 in art::gc::Heap::ConcurrentGCTask::Run (this=<optimized out>, self=0x0 <_DYNAMIC>) at art/runtime/gc/heap.cc:3582
#11 0x0000007238bd692c in art::gc::TaskProcessor::RunAllTasks (this=0x72396bb100, self=0x723966b000) at art/runtime/gc/task_processor.cc:129
#12 0x0000000072a7e3f0 in dalvik.system.VMRuntime.clampGrowthLimit [DEDUPED] ()
   from /home/pip/Downloads/e8-V10.3.3.0.PECCNXM-gcflip-hang/out/target/product/equuleus/symbols/system/framework/arm64/boot-core-libart.oat
#13 0x0000000072c20bd8 in java.lang.Daemons$HeapTaskDaemon.runInternal ()
   from /home/pip/Downloads/e8-V10.3.3.0.PECCNXM-gcflip-hang/out/target/product/equuleus/symbols/system/framework/arm64/boot-core-libart.oat
#14 0x0000000072a80c40 in java.lang.Daemons$Daemon.run ()
   from /home/pip/Downloads/e8-V10.3.3.0.PECCNXM-gcflip-hang/out/target/product/equuleus/symbols/system/framework/arm64/boot-core-libart.oat
#15 0x0000000071baa4dc in java.lang.Thread.run ()
   from /home/pip/Downloads/e8-V10.3.3.0.PECCNXM-gcflip-hang/out/target/product/equuleus/symbols/system/framework/arm64/boot-core-oj.oat
#16 0x0000007238f0498c in art_quick_invoke_stub () at art/runtime/arch/arm64/quick_entrypoints_arm64.S:1702
#17 0x0000007238a78524 in art::ArtMethod::Invoke (this=0x6ff68ad8, self=0x723966b000, args=<optimized out>, args_size=<optimized out>, result=0x7221a8a310,
    shorty=0x7269daa9 "V") at art/runtime/art_method.cc:374
#18 0x0000007238e07258 in art::(anonymous namespace)::InvokeWithArgArray (soa=..., method=0x6ff68ad8, arg_array=0x7221a8a318, result=<optimized out>,
    shorty=0x7269daa9 "V") at art/runtime/reflection.cc:456
#19 0x0000007238e08314 in art::InvokeVirtualOrInterfaceWithJValues (soa=..., obj=0x5 <_DYNAMIC+5>, mid=<optimized out>, args=0x0 <_DYNAMIC>)
    at art/runtime/reflection.cc:548
#20 0x0000007238e36764 in art::Thread::CreateCallback (arg=0x723966b000) at art/runtime/thread.cc:473
#21 0x00000072bb501db0 in __pthread_start (arg=0x7221a8a4f0) at bionic/libc/bionic/pthread_create.cpp:254
#22 0x00000072bb4a378c in __start_thread (fn=0x72bb501d88 <__pthread_start(void*)>, arg=0x7221a8a4f0) at bionic/libc/bionic/clone.cpp:52
结合代码： art/runtime/gc/heap.cc#845 
 void Heap::ThreadFlipBegin(Thread* self) {
  // Supposed to be called by GC. Set thread_flip_running_ to be true. If disable_thread_flip_count_
  // > 0, block. Otherwise, go ahead.
  CHECK(kUseReadBarrier);
  ScopedThreadStateChange tsc(self, kWaitingForGcThreadFlip);
  MutexLock mu(self, *thread_flip_lock_);
  bool has_waited = false;
  uint64_t wait_start = NanoTime();
  CHECK(!thread_flip_running_);
  // Set this to true before waiting so that frequent JNI critical enter/exits won't starve
  // GC. This like a writer preference of a reader-writer lock.
  thread_flip_running_ = true;
  while (disable_thread_flip_count_ > 0) { // <<<<<<<<<<<<
    has_waited = true;
    thread_flip_cond_->Wait(self); // <<<<<<<<<<<<<
  if (has_waited) {
    uint64_t wait_time = NanoTime() - wait_start;
    total_wait_time_ += wait_time;
    if (wait_time > long_pause_log_threshold_) {
      LOG(INFO) << __FUNCTION__ << " blocked for " << PrettyDuration(wait_time);
既然在等待一个condition，那么 disable_thread_flip_count_ 肯定是大于0的，我们可以打印一下这个变量的值： 
 (gdb) print disable_thread_flip_count_
$1 = 1
果然它是1，正常情况下，当前线程在thread_flip_cond_这个条件上wait一段时间后，肯定会有一个线程会去notify它，来唤醒处于wait状态的线程； 
那么，接下来自然是快速过一下源码，看看都有哪些函数会去操作thread_flip_cond_： 
 void Heap::ThreadFlipEnd(Thread* self) {
  // Supposed to be called by GC. Set thread_flip_running_ to false and potentially wake up mutators
  // waiting before doing a JNI critical.
  CHECK(kUseReadBarrier);
  MutexLock mu(self, *thread_flip_lock_);
  CHECK(thread_flip_running_);
  thread_flip_running_ = false;
  // Potentially notify mutator threads blocking to enter a JNI critical section.
  thread_flip_cond_->Broadcast(self);  // <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
void Heap::DecrementDisableThreadFlip(Thread* self) {
  // Supposed to be called by mutators. Decrement disable_thread_flip_count_ and potentially wake up
  // the GC waiting before doing a thread flip.
  CHECK(kUseReadBarrier);
  self->DecrementDisableThreadFlipCount();
  bool is_outermost = self->GetDisableThreadFlipCount() == 0;
  if (!is_outermost) {
    // If this is not an outermost JNI critical exit, we don't need to decrement the global counter.
    // The global counter is decremented only once for a thread for the outermost exit.
    return;
  MutexLock mu(self, *thread_flip_lock_);
  CHECK_GT(disable_thread_flip_count_, 0U);
  --disable_thread_flip_count_;
  if (disable_thread_flip_count_ == 0) {
    // Potentially notify the GC thread blocking to begin a thread flip.
    thread_flip_cond_->Broadcast(self); // <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
再看一下各个线程的backtrace，确认了，没有哪个在调用上面两个任意一个 
那些处于WaitingForGcThreadFlip的线程block在哪？ 
   (gdb) t 23
[Switching to thread 23 (Thread 1471.1692)]
#0  syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
41      svc     #0
(gdb) bt
#0  syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
#1  0x0000007238a80560 in art::futex (uaddr=0x7239723910, op=0, val=93152, timeout=0x721cbe2850, uaddr2=0x721cbe2850, val3=0) at art/runtime/base/mutex-inl.h:43
#2  art::ConditionVariable::WaitHoldingLocks (this=0x7239723900, self=0x721d26a000) at art/runtime/base/mutex.cc:953
#3  0x0000007238b98684 in art::gc::Heap::IncrementDisableThreadFlip (this=0x72396b7600, self=0x721d26a000) at art/runtime/gc/heap.cc:798
#4  0x0000007238d2cfe8 in art::JNI::GetStringCritical (env=<optimized out>, java_string=<optimized out>, is_copy=0x0 <_DYNAMIC>) at art/runtime/jni_internal.cc:1887
#5  0x00000072bae3177c in android::sp<android::IGraphicBufferProducer>::~sp (this=0x0 <_DYNAMIC>) at system/core/libutils/include/utils/StrongPointer.h:156
#6  android::nativeCreateScoped (env=<optimized out>, clazz=<optimized out>, surfaceObject=<optimized out>)
    at frameworks/base/core/jni/android_view_SurfaceSession.cpp:51
#7  0x00000000748c94ac in android.filterfw.format.ImageFormat.create ()
   from /home/pip/Downloads/e8-V10.3.3.0.PECCNXM-gcflip-hang/out/target/product/equuleus/symbols/system/framework/arm64/boot-framework.oat
#8  0x0000000000000000 in ?? () from /home/pip/Downloads/e8-V10.3.3.0.PECCNXM-gcflip-hang/out/target/product/equuleus/symbols/system/bin/app_process64
art/runtime/gc/heap.cc:798 对应的代码： 
 void Heap::IncrementDisableThreadFlip(Thread* self) {
  // Supposed to be called by mutators. If thread_flip_running_ is true, block. Otherwise, go ahead.
  CHECK(kUseReadBarrier);
  bool is_nested = self->GetDisableThreadFlipCount() > 0;
  self->IncrementDisableThreadFlipCount();
  if (is_nested) {
    // If this is a nested JNI critical section enter, we don't need to wait or increment the global
    // counter. The global counter is incremented only once for a thread for the outermost enter.
    return;
  ScopedThreadStateChange tsc(self, kWaitingForGcThreadFlip);     // <<<<<<<<<<<<<<<<<<<<<<< 改变线程状态
  MutexLock mu(self, *thread_flip_lock_);
  bool has_waited = false;
  uint64_t wait_start = NanoTime();
  if (thread_flip_running_) {
    ScopedTrace trace("IncrementDisableThreadFlip");
    while (thread_flip_running_) {
      has_waited = true;
      thread_flip_cond_->Wait(self); // <<<<<<<<<<<<<<<<<<<<
  ++disable_thread_flip_count_;
  if (has_waited) {
    uint64_t wait_time = NanoTime() - wait_start;
    total_wait_time_ += wait_time;
    if (wait_time > long_pause_log_threshold_) {
      LOG(INFO) << __FUNCTION__ << " blocked for " << PrettyDuration(wait_time);
所以到这儿，我们清楚的一点是，肯定是由于某个线程没有正常调用 DecrementDisableThreadFlip() 函数，导致 disable_thread_flip_count_ 一直不为0； 
disable_thread_flip_count_ 一直不为 0 这个锅应该是谁的？
 仔细看一下上面这个函数的代码，不仅全局有个disable_thread_flip_count_， 同时每个线程还有thread local的 disable_thread_flip_count，每次调用IncrementDisableThreadFlip()开始的时候，都会先执行Thread::Current()->IncrementDisableThreadFlipCount()，看下它的代码： 
   // art/runtime/thread.h
uint32_t GetDisableThreadFlipCount() const {
  CHECK(kUseReadBarrier);
  return tls32_.disable_thread_flip_count;
YEAH！下面只需要逐个遍历art里面每个线程，把 tls32_.disable_thread_flip_count > 1的部分打印出来：
 稍微修改一下gdb脚本下的 art.gdb脚本里的art_dump_thread_list_base函数： 
 define art_dump_thread_list
    art_dump_thread_list_base 0
define art_dump_thread_list_base
    set $dump_mutex = $arg0
    #  第一个节点
    set $first = ('art::Runtime'::instance_)->thread_list_->list_->__end_->__next_
    printf "\n  Id    Thread Id   art::Thread*    disable_flip_count    STATE         FLAGS                NAME             SUSPEND_COUNT           MUTEX\n"
    set $list_node = 0
    set $count = 1
    while $list_node != $first
        if $list_node == 0
            set $list_node = $first
        set $thread = ('art::Thread'*)(*(void**)((uint64_t)$list_node + 2*sizeof(void*)))
        set $flip_count = (('art::Thread'*)$thread)->tls32_->disable_thread_flip_count
        if $flip_count != 0  # <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 1 
            set $thread_id = (('art::Thread'*)$thread)->tls32_->tid
            #printf "monitor_enter_object: 0x%lx", $thread->tlsPtr_.monitor_enter_object
            printf "%4d     %6d     %p   ", $count, $thread_id, $thread
            printf "         %d        ", $flip_count # <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 2
            thread_print_state $thread
            printf "              "
            thread_print_name $thread
            printf "              "
            thread_print_suspend_count $thread
            if $dump_mutex == 1
                thread_print_mutex $thread
            printf "\n"
        set $list_node = $list_node->__next_
        set $count = $count + 1
        if $list_node == $first
            loop_break
然后重新导入脚本然后执行这个函数即可： 
 (gdb) source /home/pip/Documents/gdb_native_tools/art.gdb
(gdb) art_dump_thread_list
  Id    Thread Id   art::Thread*    disable_flip_count    STATE                       NAME                        SUSPEND_COUNT        
   4       1479     0x723973c800            1          kWaitingForGcThreadFlip     Binder:filter-perf-event     0
  23       1692     0x721d26a000            1          kWaitingForGcThreadFlip     PowerManagerService          0
  31       1825     0x722b7ab800            1          kWaitingForGcThreadFlip     HwBinder:1471_1              0
  52       1884     0x721c426c00            1          kWaitingForGcThreadFlip     WifiStateMachine             0
  88       1938     0x721bc96800            1          kWaitingForGcThreadFlip     Binder:1471_3                0
  98       2125     0x721c54d000            1          kWaitingForGcThreadFlip     Binder:1471_5                0
 100       2166     0x721a129800            1          kWaitingForGcThreadFlip     Binder:1471_7                0
 113       2894     0x7219cc1000            1          kWaitingForGcThreadFlip     Binder:1471_A                0
 116       2949     0x721273ec00            1          kWaitingForGcThreadFlip     Binder:1471_D                0
 117       2950     0x7219cc0400            1          kWaitingForGcThreadFlip     Binder:1471_E                0
 120       3089     0x720b025000            1          kWaitingForGcThreadFlip     Binder:1471_11               0
 126       3135     0x720c454800            1          kWaitingForGcThreadFlip     Binder:1471_17               0
 127       3140     0x721929cc00            1          kWaitingForGcThreadFlip     Binder:1471_18               0
 128       3142     0x720eb42000            1          kWaitingForGcThreadFlip     Binder:1471_19               0
 132       8981     0x7219d25800            1          kWaitingForGcThreadFlip     HwBinder:1471_4              0
 142      18045     0x72126b5800            1          kWaitingForGcThreadFlip     Binder:1471_20               0
 181       4738     0x7211c94400            1          kNative                     AsyncTask #36709             0
最后那个 AsyncTask #36709 十分可疑，别的线程的disable_flip_count 为1的原因是正在调用 IncrementDisableThreadFlip（）函数，OK，接着看下它在执行什么函数： 
嫌疑人 “AsyncTask #36709” 干了些什么？ 
   "AsyncTask #36709" prio=5 tid=180 Native
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x14d149c8 self=0x7211c94400
  | sysTid=4738 nice=0 cgrp=default sched=0/0 handle=0x708411b4f0
  | state=S schedstat=( 3931924 16620626 18 ) utm=0 stm=0 core=7 HZ=100
  | stack=0x7084018000-0x708401a000 stackSize=1041KB
  | held mutexes=
  kernel: __switch_to+0xa4/0xfc
  kernel: binder_thread_read+0xa78/0x11a0
  kernel: binder_ioctl+0x920/0xb08
  kernel: do_vfs_ioctl+0xb8/0x8d8
  kernel: SyS_ioctl+0x84/0x98
  kernel: __sys_trace+0x4c/0x4c
  native: #00 pc 000000000006e5bc  /system/lib64/libc.so (__ioctl+4)
  native: #01 pc 0000000000029544  /system/lib64/libc.so (ioctl+136)
  native: #02 pc 000000000001e22c  /system/lib64/libhwbinder.so (android::hardware::IPCThreadState::talkWithDriver(bool)+240)
  native: #03 pc 000000000001e884  /system/lib64/libhwbinder.so (android::hardware::IPCThreadState::waitForResponse(android::hardware::Parcel*, int*)+324)
  native: #04 pc 0000000000012204  /system/lib64/libhwbinder.so (android::hardware::BpHwBinder::transact(unsigned int, android::hardware::Parcel const&, android::hardware::Parcel*, unsigned int, std::__1::function<void (android::hardware::Parcel&)>)+312)
  native: #05 pc 00000000000c9764  /system/lib64/android.hardware.gnss@1.0.so (android::hardware::gnss::V1_0::BpHwGnssXtra::_hidl_injectXtraData(android::hardware::IInterface*, android::hardware::details::HidlInstrumentor*, android::hardware::hidl_string const&)+244)
  native: #06 pc 000000000005d5c4  /system/lib64/libandroid_servers.so (android::android_location_GnssLocationProvider_inject_xtra_data(_JNIEnv*, _jobject*, _jbyteArray*, int)+268)
  at com.android.server.location.GnssLocationProvider.native_inject_xtra_data(Native method)
  at com.android.server.location.GnssLocationProvider.access$2300(GnssLocationProvider.java:112)
  at com.android.server.location.GnssLocationProvider$10.run(GnssLocationProvider.java:1140)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
  at java.lang.Thread.run(Thread.java:764)
它在一个jni里执行一个hw binder call，结合下代码： 
 // frameworks/base/services/core/jni/com_android_server_location_GnssLocationProvider.cpp#1613
static void android_location_GnssLocationProvider_inject_xtra_data(JNIEnv* env, jobject /* obj */,
        jbyteArray data, jint length) {
    if (gnssXtraIface == nullptr) {
        ALOGE("XTRA Interface not supported");
        return;
    jbyte* bytes = reinterpret_cast<jbyte *>(env->GetPrimitiveArrayCritical(data, 0));
    gnssXtraIface->injectXtraData(std::string((const char*)bytes, length));
    env->ReleasePrimitiveArrayCritical(data, bytes, JNI_ABORT);
（不熟悉art实现）一眼看过去，都是正常的jni调用，好像跟disable_flip_count没啥关系呀？全局搜一下DecrementDisableThreadFlip()和IncrementDisableThreadFlip被调用的地方： 
 ReleaseStringCritical() → DecrementDisableThreadFlip()
ReleasePrimitiveArray() → DecrementDisableThreadFlip()
GetPrimitiveArrayCritical() → IncrementDisableThreadFlip()
GetStringCritical() → IncrementDisableThreadFlip()
到这个地方，稍稍消化总结一下，不难推断出：
 这个问题必然是因为 AsyncTask #36709 在执行jni调用过程中，它执行了：
 GetPrimitiveArrayCritical() → IncrementDisableThreadFlip() → 使得全局变量 disable_thread_flip_count_ = 1
 但是在继续执行 gnssXtraIface →injectXtraData() 这个 hal 接口的时候，一直被block，自然无法执行 ReleasePrimitiveArrayCritical() → DecrementDisableThreadFlip() → disable_thread_flip_count_ = 0 
4. gnssXtraIface→injectXtraData() 为什么会一直被block？ 
查看binder transactin日志 
 $ adb shell cat /sys/kernel/debug/binder/transactions
proc 1471
context hwbinder
  thread 1673: l 00 need_return 0 tr 0
    incoming transaction 979513688: ffffffed66132800 from 1007:1007 to 1471:1673 code 2 flags 10 pri 0:120 r1 node 978962390 size 48:0 data ffffff8014f00258
    outgoing transaction 979513687: ffffffed66133e00 from 1471:1673 to 1007:1007 code 3 flags 10 pri 0:120 r1
  thread 8981: l 01 need_return 0 tr 0
    incoming transaction 979513990: ffffffed2541c700 from 1007:2040 to 1471:8981 code 4 flags 10 pri 0:120 r1 node 978962390 size 132:16 data ffffff8014f00508
  thread 4738: l 00 need_return 0 tr 0
    outgoing transaction 979513850: ffffffee6324a400 from 1471:4738 to 1007:0 code 2 flags 10 pri 0:120 r1
  thread 8981: l 01 need_return 0 tr 0
    incoming transaction 979513990: ffffffed2541c700 from 1007:2040 to 1471:8981 code 4 flags 10 pri 0:120 r1 node 978962390 size 132:16 data ffffff8014f00508
proc 1007
context hwbinder8981
  thread 1007: l 12 need_return 0 tr 0
    outgoing transaction 979513688: ffffffed66132800 from 1007:1007 to 1471:1673 code 2 flags 10 pri 0:120 r1
    incoming transaction 979513687: ffffffed66133e00 from 1471:1673 to 1007:1007 code 3 flags 10 pri 0:120 r1 node 956 size 36:0 data ffffff8012000008
  thread 2040: l 10 need_return 0 tr 0
    outgoing transaction 979513990: ffffffed2541c700 from 1007:2040 to 1471:8981 code 4 flags 10 pri 0:120 r1
pending transaction 979513850: ffffffee6324a400 from 1471:4738 to 1007:0 code 2 flags 10 pri 0:120 r1 node 5700 size 120:16 data ffffff8012000030
incoming transaction from 1471:4738 to 1007:0，得到对端进程pid为1007， 继续看下这个进程里各个线程的状态 
 $ adb shell ps -T 1007
USER           PID   TID  PPID     VSZ    RSS WCHAN            ADDR S CMD           
system        1007  1007     1   29016    156 binder_thread_read  0 S ignss_1_1
system        1007  2040     1   29016    156 binder_thread_read  0 S IPC Thread
system        1007 17865     1   29016    156 poll_schedule_timeout 0 S ignss_1_1
各个线程看起来都是正常的，通过debuggerd查看backtrace 
 $ adb shell debuggerd -b 1007
----- pid 1007 at 2019-05-05 13:25:52 -----
Cmd line: /vendor/bin/ignss_1_1
ABI: 'arm64'
"ignss_1_1" sysTid=1007
  #00 pc 000000000006e5bc  /system/lib64/libc.so (__ioctl+4)
  #01 pc 0000000000029544  /system/lib64/libc.so (ioctl+136)
  #02 pc 000000000001e22c  /system/lib64/vndk-sp-28/libhwbinder.so (android::hardware::IPCThreadState::talkWithDriver(bool)+240)
  #03 pc 000000000001e77c  /system/lib64/vndk-sp-28/libhwbinder.so (android::hardware::IPCThreadState::waitForResponse(android::hardware::Parcel*, int*)+60)
  #04 pc 0000000000012204  /system/lib64/vndk-sp-28/libhwbinder.so (android::hardware::BpHwBinder::transact(unsigned int, android::hardware::Parcel const&, android::hardware::Parcel*, unsigned int, std::__1::function<void (android::hardware::Parcel&)>)+312)
  #05 pc 0000000000097ddc  /system/lib64/android.hardware.gnss@1.0.so (android::hardware::gnss::V1_0::BpHwGnssCallback::_hidl_gnssStatusCb(android::hardware::IInterface*, android::hardware::details::HidlInstrumentor*, android::hardware::gnss::V1_0::IGnssCallback::GnssStatusValue)+208)
  #06 pc 000000000001ccf0  /vendor/lib64/android.hardware.gnss@1.1-impl-xiaomi.so (android::hardware::gnss::V1_0::implementation::Gnss::statusCb(GpsStatus*)+68)
  #07 pc 000000000001b5b4  /vendor/lib64/hw/gps.brcm.so (gps_stop()+156)
  #08 pc 0000000000013ae0  /vendor/lib64/hw/gps.brcm.so (proxy__gps_stop()+168)
  #09 pc 000000000001e1d4  /vendor/lib64/android.hardware.gnss@1.1-impl-xiaomi.so (android::hardware::gnss::V1_0::implementation::Gnss::stop()+28)
  #10 pc 0000000000088328  /system/lib64/android.hardware.gnss@1.0.so (android::hardware::gnss::V1_0::BnHwGnss::_hidl_stop(android::hidl::base::V1_0::BnHwBase*, android::hardware::Parcel const&, android::hardware::Parcel*, std::__1::function<void (android::hardware::Parcel&)>)+152)
  #11 pc 000000000002e324  /system/lib64/android.hardware.gnss@1.1.so (android::hardware::gnss::V1_1::BnHwGnss::onTransact(unsigned int, android::hardware::Parcel const&, android::hardware::Parcel*, unsigned int, std::__1::function<void (android::hardware::Parcel&)>)+1624)
  #12 pc 000000000001df34  /system/lib64/vndk-sp-28/libhwbinder.so (android::hardware::BHwBinder::transact(unsigned int, android::hardware::Parcel const&, android::hardware::Parcel*, unsigned int, std::__1::function<void (android::hardware::Parcel&)>)+72)
  #13 pc 000000000001508c  /system/lib64/vndk-sp-28/libhwbinder.so (android::hardware::IPCThreadState::executeCommand(int)+1508)
  #14 pc 0000000000014938  /system/lib64/vndk-sp-28/libhwbinder.so (android::hardware::IPCThreadState::getAndExecuteCommand()+204)
  #15 pc 0000000000015708  /system/lib64/vndk-sp-28/libhwbinder.so (android::hardware::IPCThreadState::joinThreadPool(bool)+268)
  #16 pc 0000000000000cb8  /vendor/bin/ignss_1_1 (main+468)
  #17 pc 00000000000acec0  /system/lib64/libc.so (__libc_init+88)
"IPC Thread" sysTid=2040
  #00 pc 000000000006e5bc  /system/lib64/libc.so (__ioctl+4)
  #01 pc 0000000000029544  /system/lib64/libc.so (ioctl+136)
  #02 pc 000000000001e22c  /system/lib64/vndk-sp-28/libhwbinder.so (android::hardware::IPCThreadState::talkWithDriver(bool)+240)
  #03 pc 000000000001e884  /system/lib64/vndk-sp-28/libhwbinder.so (android::hardware::IPCThreadState::waitForResponse(android::hardware::Parcel*, int*)+324)
  #04 pc 0000000000012204  /system/lib64/vndk-sp-28/libhwbinder.so (android::hardware::BpHwBinder::transact(unsigned int, android::hardware::Parcel const&, android::hardware::Parcel*, unsigned int, std::__1::function<void (android::hardware::Parcel&)>)+312)
  #05 pc 0000000000098294  /system/lib64/android.hardware.gnss@1.0.so (android::hardware::gnss::V1_0::BpHwGnssCallback::_hidl_gnssNmeaCb(android::hardware::IInterface*, android::hardware::details::HidlInstrumentor*, long, android::hardware::hidl_string const&)+272)
  #06 pc 000000000001d0b4  /vendor/lib64/android.hardware.gnss@1.1-impl-xiaomi.so (android::hardware::gnss::V1_0::implementation::Gnss::nmeaCb(long, char const*, int)+112)
  #07 pc 00000000000136d8  /vendor/lib64/hw/gps.brcm.so (proxy__gps_nmea_cb(long, char const*, int)+180)
  #08 pc 0000000000018c88  /vendor/lib64/hw/gps.brcm.so (GpsiClient::marshal_gps_nmea_cb(IpcIncomingMessage&)+300)
  #09 pc 00000000000252a4  /vendor/lib64/hw/gps.brcm.so (IpcPipeTransportBase::OnSelect(int, bool, bool, bool, void*)+344)
  #10 pc 0000000000025a28  /vendor/lib64/hw/gps.brcm.so (SelectManager::PerformOneWaitAndProcess()+468)
  #11 pc 0000000000017860  /vendor/lib64/hw/gps.brcm.so (ipc_thread_proc(void*)+56)
  #12 pc 0000000000023160  /vendor/lib64/android.hardware.gnss@1.1-impl-xiaomi.so (threadFunc(void*)+12)
  #13 pc 0000000000081dac  /system/lib64/libc.so (__pthread_start(void*)+36)
  #14 pc 0000000000023788  /system/lib64/libc.so (__start_thread+68)
"ignss_1_1" sysTid=17865
  #00 pc 000000000006e604  /system/lib64/libc.so (__pselect6+4)
  #01 pc 000000000002bbf8  /system/lib64/libc.so (select+144)
  #02 pc 000000000001974c  /vendor/lib64/hw/flp.brcm.so (SelectManager::PerformOneWaitAndProcess()+268)
  #03 pc 000000000000e8e8  /vendor/lib64/hw/flp.brcm.so (ipc_thread_proc(void*)+116)
  #04 pc 0000000000081dac  /system/lib64/libc.so (__pthread_start(void*)+36)
  #05 pc 0000000000023788  /system/lib64/libc.so (__start_thread+68)
----- end 1007 -----
结合上面的binder call日志，可以看到ignss_1_1进程内部的两个binder线程都在回调system_server进程的binder 接口，没有空闲binder线程可以处理来自system_server的gnssXtraIface→injectXtraData() 请求！ 
incoming 1471:1673
 outgoing 1007:1007 to 1471:1673 // blocked !
 outgoing 1007:2040 to 1471:8981 // WaitingForGcThreadFlip ! 
p.s. 1471:1673 并不是一个binder线程, 但确响应了来自1007:1007的binder call，这是binder api设计上面的优化，复用？ todo：确认 
 "android.bg" prio=5 tid=13 Blocked
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x13fc3428 self=0x7232239400
  | sysTid=1673 nice=0 cgrp=default sched=0/0 handle=0x721da8e4f0
  | state=S schedstat=( 19417818968810 14397831932031 20757829 ) utm=561393 stm=1380388 core=5 HZ=100
  | stack=0x721d98b000-0x721d98d000 stackSize=1041KB
  | held mutexes=
  at com.android.server.am.ActivityManagerService.broadcastIntent(ActivityManagerService.java:22705)
  - waiting to lock <0x00c6c1f5> (a com.android.server.am.ActivityManagerService) held by thread 146
  at android.app.ContextImpl.sendBroadcastAsUser(ContextImpl.java:1195)
  at com.android.server.location.GnssLocationProvider.reportStatus(GnssLocationProvider.java:1766)
  at com.android.server.location.GnssLocationProvider.native_stop(Native method)
  at com.android.server.location.GnssLocationProvider.stopNavigating(GnssLocationProvider.java:1607)
  at com.android.server.location.GnssLocationProvider.updateRequirements(GnssLocationProvider.java:1405)
  at com.android.server.location.GnssLocationProvider.handleSetRequest(GnssLocationProvider.java:1345)
  at com.android.server.location.GnssLocationProvider.access$4100(GnssLocationProvider.java:112)
  at com.android.server.location.GnssLocationProvider$ProviderHandler.handleMessage(GnssLocationProvider.java:2400)
  at android.os.Handler.dispatchMessage(Handler.java:106)
  at android.os.Looper.loop(Looper.java:207)
  at android.os.HandlerThread.run(HandlerThread.java:65)
5. 修复方案 
所以这个问题也是system_server↔ ignss_1_1 这两个进程相互binder call + jni call + 虚拟机正好在执行垃圾回收； 
参考 oracle的文档: 
After calling GetPrimitiveArrayCritical, the native code should not run for an extended period of time before it calls ReleasePrimitiveArrayCritical. We must treat the code inside this pair of functions as running in a “critical region.” Inside a critical region, native code must not call other JNI functions, or any system call that may cause the current thread to block and wait for another Java thread. (For example, the current thread must not call read on a stream being written by another Java thread.) 
These restrictions make it more likely that the native code will obtain an uncopied version of the array, even if the VM does not support pinning. For example, a VM may temporarily disable garbage collection when the native code is holding a pointer to an array obtained via GetPrimitiveArrayCritical. 
GetPrimitiveArrayCritical 和 ReleasePrimitiveArrayCritical 两个函数间的代码执行耗时不应太久！ 结合此处的业务逻辑，而且也不会被太频繁的调用，拷贝一份原始数组给binder call作为参数即可： 
static void android_location_GnssLocationProvider_inject_xtra_data(JNIEnv* env, jobject /* obj */,
        jbyteArray data, jint length) {
    if (gnssXtraIface == nullptr) {
        ALOGE("XTRA Interface not supported");
        return;
--    jbyte* bytes = reinterpret_cast<jbyte *>(env->GetPrimitiveArrayCritical(data, 0));
++    jbyte* bytes = reinterpret_cast<jbyte *>(env->GetByteArrayElements(data, 0));
    gnssXtraIface->injectXtraData(std::string((const char*)bytes, length));
--    env->ReleasePrimitiveArrayCritical(data, bytes, JNI_ABORT);
++    env->ReleaseByteArrayElements(data, bytes, JNI_ABORT);
                    lart gc相关死锁黑屏问题总结 · 大专栏背景：从测试同学那边拿过来两台黑屏的机器，点击电源、屏幕都没有反应，不过还好 adb 可以链接，有 root 权限，一番 deubgging 后，确认表面原因是 system_server 进程的 art 虚拟机卡在了 gc 前 flip 线程状态的操作中，而最终的 root cause 也比较有意思，所以本文就简单记录下问题的排查过程。1. 按照习惯，快速检查下是否发生了 watchdog$ adb shell ls` `-l/da.
成为一名优秀的Android开发，需要一份完备的知识体系，在这里，让我们一起成长为自己所想的那样~。
众所周知，移动开发已经来到了后半场，为了能够在众多开发者中脱颖而出，我们需要对某一个领域有深入地研究与心得，对于Android开发者来说，目前，有几个好的细分领域值得我们去建立自己的技术壁垒，如下所示：
1、性能优化专家：具备深度性能优化与体系化APM建设的能力。
	2、架构师：具有丰富的应用架构设计经验与心得，对Android Framework层与热门三方库的实现原理与架构设计了如指掌。
				根据操作系统中的定义：死锁是指在一组进程中的各个进程均占有不会释放的资源，但因互相申请被其他进程所站用不会释放的资源而处于的一种永久等待状态。对应到SQLServer中，当在两个或多个任务中，如果每个任务锁定了其他任务试图锁定的资源，此时会造成这些任务永久阻塞，从而出现死锁；这些资源可能是：单行(RID，堆中的单行)、索引中的键(KEY，行锁)、页(PAG，8KB)、区结构(EXT，连续的8页)、堆或B树(HOBT)、表(TAB，包括数据和索引)、文件(File，数据库文件)、应用程序专用资源(APP)、元数据(METADATA)、分配单元(Allocation_Unit)、整个数据库(DB)
触发ANR：在Activity主线程中进行死循环操作，然后另一个按钮发送handler消息给主线程进行处理
分析ANR：借助bugreport生成对应的log，再adb pull出来，进行分析
2、触发ANR
2.1 Activity定义
public class AnrTestActivity extends AppCompatActivity {
    priva...
Time: 2933175644_1545041895232
Flags: 0x38d83e44
Package: com.oppo.reader v62301 (6.2.3.20181030)
PID: 20358
Foreground: Yes
Activity: com.oppo.reader...
				----- pid 7546 at 2017-11-30 04:20:46 -----
Cmd line: com.google.android.apps.maps //引起 ANR 的包，可以是它引起，与它有关，这们定位
Build fingerprint: 'Android/px5/px5:6.0.1/MXC89L/stt-tec11301619:userdebug/test-keys'
ABI: 'arm64'
Build type: optimized
Zygote loaded classes..
Resumed（运行状态）：Activity处于前台，且用户可以与其交互。
	Paused（暂停状态）: Activity被在前台中处于半透明状态或者未覆盖全屏的其他Activity部分遮挡。 暂停的Activity不会接收用户输入，也无法执行任何代码。
	Stopped（停止状态）：Activ..
psum_mutex	68	432	719	552	599
psum_array	7.26	3.64	1.91	1.85	1.84
psum_local	1.06	0.54	0.28	0.29	0.30
信号量、线程锁机制，会大量的消耗系统资源，造成相当大的开销，甚至多于共享全局变量的存取，因此反而会效率低下
通过全局变量计算技巧替换信号量，线程锁，效率得到很大提升
通过局部变量替换全局变量的反复存取
				订单下单扣库存死锁问题通常是由于多个并发的线程或进程在执行相同的操作，如下单和扣库存，但执行顺序不同，导致互相等待对方释放资源而陷入死锁状态。 
例如，线程 A 先下单并锁定了某个库存，线程 B 也想购买同样的商品并锁定相同的库存，但此时线程 A 还未释放该库存，导致线程 B 等待线程 A 释放该库存，而线程 A 又等待线程 B 释放其他库存，导致两个线程互相等待，无法继续执行，从而陷入死锁。
要解决这个问题，可以采用以下方法之一：
1. 引入事务：在下单和扣库存的操作中引入事务，以确保这两个操作可以原子性地执行，从而避免出现死锁。
2. 分离订单和库存：将订单系统和库存系统分离开来，让它们分别独立运行，以避免两个系统之间的冲突。
3. 采用分布式锁：使用分布式锁来避免多个进程或线程同时操作同一资源的问题，确保每个资源只能被一个进程或线程锁定。
4. 避免并发访问：减少并发访问，通过限制访问量或者排队机制来避免同时进行下单和扣库存操作的冲突。