$ ~/Documents/gdb_native_tools/droid.py attach /home/pip/Downloads/e8-V10.3.3.0.PECCNXM-gcflip-hang/out/target/product/equuleus/symbols system_server
process: system_server, pid(1471), parent(zygote64)
gdbserver:/data/local/tmp/gdbserver64
bin_path: /home/pip/Downloads/e8-V10.3.3.0.PECCNXM-gcflip-hang/out/target/product/equuleus/symbols/system/bin/
solib: /home/pip/Downloads/e8-V10.3.3.0.PECCNXM-gcflip-hang/out/target/product/equuleus/symbols/system/lib64/
bin: /home/pip/Downloads/e8-V10.3.3.0.PECCNXM-gcflip-hang/out/target/product/equuleus/symbols/system/bin/app_process64
Attached; pid = 1471
gdbserver: Unable to determine the number of hardware watchpoints available.
gdbserver: Unable to determine the number of hardware breakpoints available.
Listening on port 1234
GNU gdb (GDB) 7.11
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
Reading symbols from /home/pip/Downloads/e8-V10.3.3.0.PECCNXM-gcflip-hang/out/target/product/equuleus/symbols/system/bin/app_process64...done.
Remote debugging using localhost:1234
Remote debugging from host 127.0.0.1
warning: .dynamic section for "/home/pip/Downloads/e8-V10.3.3.0.PECCNXM-gcflip-hang/out/target/product/equuleus/symbols/system/bin/app_process64" is not at the expected address (wrong library or version mismatch?)
warning: .dynamic section for "/home/pip/Downloads/e8-V10.3.3.0.PECCNXM-gcflip-hang/out/target/product/equuleus/symbols/system/lib64/libandroid_runtime.so" is not at the expected address (wrong library or version mismatch?)
warning: .dynamic section for "/home/pip/Downloads/e8-V10.3.3.0.PECCNXM-gcflip-hang/out/target/product/equuleus/symbols/system/lib64/libcutils.so" is not at the expected address (wrong library or version mismatch?)
warning: .dynamic section for "/home/pip/Downloads/e8-V10.3.3.0.PECCNXM-gcflip-hang/out/target/product/equuleus/symbols/system/lib64/libaudioclient.so" is not at the expected address (wrong library or version mismatch?)
warning: .dynamic section for "/home/pip/Downloads/e8-V10.3.3.0.PECCNXM-gcflip-hang/out/target/product/equuleus/symbols/system/lib64/libcutils.so" is not at the expected address (wrong library or version mismatch?)
warning: Could not load shared library symbols for 15 libraries, e.g. /system/lib64/libclang_rt.ubsan_standalone-aarch64-android.so.
Use the "info sharedlibrary" command to see the complete listing.
Do you need "set solib-search-path" or "set sysroot"?
Reading symbols from /home/pip/Downloads/e8-V10.3.3.0.PECCNXM-gcflip-hang/out/target/product/equuleus/symbols/system/bin/app_process64...done.
Reading symbols from /home/pip/Downloads/e8-V10.3.3.0.PECCNXM-gcflip-hang/out/target/product/equuleus/symbols/system/bin/linker64...done.
......
syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
41 bionic/libc/arch-arm64/bionic/syscall.S: No such file or directory.
Source directories searched: /mnt/miui/v10-p-dipper-alpha:$cdir:$cwd
#0 syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
#1 0x0000007238a80560 in art::futex (uaddr=0x723961df68, op=0, val=20935915, timeout=0x7fc723ee90, uaddr2=0x7fc723ee90, val3=0) at art/runtime/base/mutex-inl.h:43
#2 art::ConditionVariable::WaitHoldingLocks (this=0x723961df58, self=0x7239614c00) at art/runtime/base/mutex.cc:953
#3 0x0000007238d6241c in art::Monitor::Lock<(art::LockReason)1> (this=0x723961df20, self=0x7239614c00) at art/runtime/monitor.cc:559
#4 0x0000007238d687a8 in art::Monitor::MonitorEnter (self=0x7239614c00, obj=<optimized out>, trylock=<optimized out>) at art/runtime/monitor.cc:1462
#5 0x0000007238ec40fc in art::mirror::Object::MonitorEnter (this=0x62 <_DYNAMIC+98>, self=0x7239614c00) at art/runtime/mirror/object-inl.h:98
#6 artLockObjectFromCode (obj=0x62 <_DYNAMIC+98>, self=0x7239614c00) at art/runtime/entrypoints/quick/quick_lock_entrypoints.cc:32
#7 0x0000007238f04e9c in art_quick_lock_object () at art/runtime/arch/arm64/quick_entrypoints_arm64.S:1510
#8 0x00000072207cf6a4 in ?? ()
Backtrace stopped: not enough registers or memory available to unwind further
/home/pip/Documents/gdb_native_tools/gdb-script/init.gdb: No such file or directory.
/home/pip/Documents/gdb_native_tools/shadow/gdb_driver.py: No such file or directory.
(gdb) handle SIG35 nostop noprint pass
Signal Stop Print Pass to program Description
SIG35 No No Yes Real-time event 35
(gdb) handle SIG33 nostop noprint pass
Signal Stop Print Pass to program Description
SIG33 No No Yes Real-time event 33
tips:handle SIG35 nostop noprint pass 是为了不处理android平台上用于抓native trace的 sig35 信号,减小我们的调试过程意外中断的可能性(只是个习惯, 这个问题中其实不做也没关系)
(gdb) thread 9
[Switching to thread 9 (Thread 1471.1484)]
#0 syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
41 svc #0
(gdb) bt
#0 syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
#1 0x0000007238a80560 in art::futex (uaddr=0x7239723910, op=0, val=93152, timeout=0x7221a89ab0, uaddr2=0x7221a89ab0, val3=0) at art/runtime/base/mutex-inl.h:43
#2 art::ConditionVariable::WaitHoldingLocks (this=0x7239723900, self=0x723966b000) at art/runtime/base/mutex.cc:953
#3 0x0000007238b98d54 in art::gc::Heap::ThreadFlipBegin (this=0x72396b7600, self=0x723966b000) at art/runtime/gc/heap.cc:845 // <<<<<<<<<<<<<<<<<<<<<<<
#4 0x0000007238e52204 in art::ThreadList::FlipThreadRoots (this=<optimized out>, thread_flip_visitor=0x7221a89ca0, flip_callback=<optimized out>, collector=
0x7239716500, pause_listener=<optimized out>) at art/runtime/thread_list.cc:591
#5 0x0000007238b65cc4 in art::gc::collector::ConcurrentCopying::FlipThreadRoots (this=0x7239716500) at art/runtime/gc/collector/concurrent_copying.cc:620
#6 0x0000007238b64c50 in art::gc::collector::ConcurrentCopying::RunPhases (this=0x7239716500) at art/runtime/gc/collector/concurrent_copying.cc:178
#7 0x0000007238b7af80 in art::gc::collector::GarbageCollector::Run (this=0x7239716500, gc_cause=<optimized out>, clear_soft_references=<optimized out>)
at art/runtime/gc/collector/garbage_collector.cc:96
#8 0x0000007238b9d93c in art::gc::Heap::CollectGarbageInternal (this=0x72396b7600, gc_type=art::gc::collector::kGcTypeFull, gc_cause=art::gc::kGcCauseBackground,
clear_soft_references=false) at art/runtime/gc/heap.cc:2616
#9 0x0000007238baeeb8 in art::gc::Heap::ConcurrentGC (this=0x72396b7600, self=<optimized out>, cause=art::gc::kGcCauseBackground, force_full=false)
at art/runtime/gc/heap.cc:3621
#10 0x0000007238bb4524 in art::gc::Heap::ConcurrentGCTask::Run (this=<optimized out>, self=0x0 <_DYNAMIC>) at art/runtime/gc/heap.cc:3582
#11 0x0000007238bd692c in art::gc::TaskProcessor::RunAllTasks (this=0x72396bb100, self=0x723966b000) at art/runtime/gc/task_processor.cc:129
#12 0x0000000072a7e3f0 in dalvik.system.VMRuntime.clampGrowthLimit [DEDUPED] ()
from /home/pip/Downloads/e8-V10.3.3.0.PECCNXM-gcflip-hang/out/target/product/equuleus/symbols/system/framework/arm64/boot-core-libart.oat
#13 0x0000000072c20bd8 in java.lang.Daemons$HeapTaskDaemon.runInternal ()
from /home/pip/Downloads/e8-V10.3.3.0.PECCNXM-gcflip-hang/out/target/product/equuleus/symbols/system/framework/arm64/boot-core-libart.oat
#14 0x0000000072a80c40 in java.lang.Daemons$Daemon.run ()
from /home/pip/Downloads/e8-V10.3.3.0.PECCNXM-gcflip-hang/out/target/product/equuleus/symbols/system/framework/arm64/boot-core-libart.oat
#15 0x0000000071baa4dc in java.lang.Thread.run ()
from /home/pip/Downloads/e8-V10.3.3.0.PECCNXM-gcflip-hang/out/target/product/equuleus/symbols/system/framework/arm64/boot-core-oj.oat
#16 0x0000007238f0498c in art_quick_invoke_stub () at art/runtime/arch/arm64/quick_entrypoints_arm64.S:1702
#17 0x0000007238a78524 in art::ArtMethod::Invoke (this=0x6ff68ad8, self=0x723966b000, args=<optimized out>, args_size=<optimized out>, result=0x7221a8a310,
shorty=0x7269daa9 "V") at art/runtime/art_method.cc:374
#18 0x0000007238e07258 in art::(anonymous namespace)::InvokeWithArgArray (soa=..., method=0x6ff68ad8, arg_array=0x7221a8a318, result=<optimized out>,
shorty=0x7269daa9 "V") at art/runtime/reflection.cc:456
#19 0x0000007238e08314 in art::InvokeVirtualOrInterfaceWithJValues (soa=..., obj=0x5 <_DYNAMIC+5>, mid=<optimized out>, args=0x0 <_DYNAMIC>)
at art/runtime/reflection.cc:548
#20 0x0000007238e36764 in art::Thread::CreateCallback (arg=0x723966b000) at art/runtime/thread.cc:473
#21 0x00000072bb501db0 in __pthread_start (arg=0x7221a8a4f0) at bionic/libc/bionic/pthread_create.cpp:254
#22 0x00000072bb4a378c in __start_thread (fn=0x72bb501d88 <__pthread_start(void*)>, arg=0x7221a8a4f0) at bionic/libc/bionic/clone.cpp:52
结合代码: art/runtime/gc/heap.cc#845
void Heap::ThreadFlipBegin(Thread* self) {
// Supposed to be called by GC. Set thread_flip_running_ to be true. If disable_thread_flip_count_
// > 0, block. Otherwise, go ahead.
CHECK(kUseReadBarrier);
ScopedThreadStateChange tsc(self, kWaitingForGcThreadFlip);
MutexLock mu(self, *thread_flip_lock_);
bool has_waited = false;
uint64_t wait_start = NanoTime();
CHECK(!thread_flip_running_);
// Set this to true before waiting so that frequent JNI critical enter/exits won't starve
// GC. This like a writer preference of a reader-writer lock.
thread_flip_running_ = true;
while (disable_thread_flip_count_ > 0) { // <<<<<<<<<<<<
has_waited = true;
thread_flip_cond_->Wait(self); // <<<<<<<<<<<<<
if (has_waited) {
uint64_t wait_time = NanoTime() - wait_start;
total_wait_time_ += wait_time;
if (wait_time > long_pause_log_threshold_) {
LOG(INFO) << __FUNCTION__ << " blocked for " << PrettyDuration(wait_time);
既然在等待一个condition,那么 disable_thread_flip_count_ 肯定是大于0的,我们可以打印一下这个变量的值:
(gdb) print disable_thread_flip_count_
$1 = 1
果然它是1,正常情况下,当前线程在thread_flip_cond_这个条件上wait一段时间后,肯定会有一个线程会去notify它,来唤醒处于wait状态的线程;
- 那么,接下来自然是快速过一下源码,看看都有哪些函数会去操作thread_flip_cond_:
void Heap::ThreadFlipEnd(Thread* self) {
// Supposed to be called by GC. Set thread_flip_running_ to false and potentially wake up mutators
// waiting before doing a JNI critical.
CHECK(kUseReadBarrier);
MutexLock mu(self, *thread_flip_lock_);
CHECK(thread_flip_running_);
thread_flip_running_ = false;
// Potentially notify mutator threads blocking to enter a JNI critical section.
thread_flip_cond_->Broadcast(self); // <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
void Heap::DecrementDisableThreadFlip(Thread* self) {
// Supposed to be called by mutators. Decrement disable_thread_flip_count_ and potentially wake up
// the GC waiting before doing a thread flip.
CHECK(kUseReadBarrier);
self->DecrementDisableThreadFlipCount();
bool is_outermost = self->GetDisableThreadFlipCount() == 0;
if (!is_outermost) {
// If this is not an outermost JNI critical exit, we don't need to decrement the global counter.
// The global counter is decremented only once for a thread for the outermost exit.
return;
MutexLock mu(self, *thread_flip_lock_);
CHECK_GT(disable_thread_flip_count_, 0U);
--disable_thread_flip_count_;
if (disable_thread_flip_count_ == 0) {
// Potentially notify the GC thread blocking to begin a thread flip.
thread_flip_cond_->Broadcast(self); // <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
再看一下各个线程的backtrace,确认了,没有哪个在调用上面两个任意一个
- 那些处于WaitingForGcThreadFlip的线程block在哪?
(gdb) t 23
[Switching to thread 23 (Thread 1471.1692)]
#0 syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
41 svc #0
(gdb) bt
#0 syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
#1 0x0000007238a80560 in art::futex (uaddr=0x7239723910, op=0, val=93152, timeout=0x721cbe2850, uaddr2=0x721cbe2850, val3=0) at art/runtime/base/mutex-inl.h:43
#2 art::ConditionVariable::WaitHoldingLocks (this=0x7239723900, self=0x721d26a000) at art/runtime/base/mutex.cc:953
#3 0x0000007238b98684 in art::gc::Heap::IncrementDisableThreadFlip (this=0x72396b7600, self=0x721d26a000) at art/runtime/gc/heap.cc:798
#4 0x0000007238d2cfe8 in art::JNI::GetStringCritical (env=<optimized out>, java_string=<optimized out>, is_copy=0x0 <_DYNAMIC>) at art/runtime/jni_internal.cc:1887
#5 0x00000072bae3177c in android::sp<android::IGraphicBufferProducer>::~sp (this=0x0 <_DYNAMIC>) at system/core/libutils/include/utils/StrongPointer.h:156
#6 android::nativeCreateScoped (env=<optimized out>, clazz=<optimized out>, surfaceObject=<optimized out>)
at frameworks/base/core/jni/android_view_SurfaceSession.cpp:51
#7 0x00000000748c94ac in android.filterfw.format.ImageFormat.create ()
from /home/pip/Downloads/e8-V10.3.3.0.PECCNXM-gcflip-hang/out/target/product/equuleus/symbols/system/framework/arm64/boot-framework.oat
#8 0x0000000000000000 in ?? () from /home/pip/Downloads/e8-V10.3.3.0.PECCNXM-gcflip-hang/out/target/product/equuleus/symbols/system/bin/app_process64
art/runtime/gc/heap.cc:798 对应的代码:
void Heap::IncrementDisableThreadFlip(Thread* self) {
// Supposed to be called by mutators. If thread_flip_running_ is true, block. Otherwise, go ahead.
CHECK(kUseReadBarrier);
bool is_nested = self->GetDisableThreadFlipCount() > 0;
self->IncrementDisableThreadFlipCount();
if (is_nested) {
// If this is a nested JNI critical section enter, we don't need to wait or increment the global
// counter. The global counter is incremented only once for a thread for the outermost enter.
return;
ScopedThreadStateChange tsc(self, kWaitingForGcThreadFlip); // <<<<<<<<<<<<<<<<<<<<<<< 改变线程状态
MutexLock mu(self, *thread_flip_lock_);
bool has_waited = false;
uint64_t wait_start = NanoTime();
if (thread_flip_running_) {
ScopedTrace trace("IncrementDisableThreadFlip");
while (thread_flip_running_) {
has_waited = true;
thread_flip_cond_->Wait(self); // <<<<<<<<<<<<<<<<<<<<
++disable_thread_flip_count_;
if (has_waited) {
uint64_t wait_time = NanoTime() - wait_start;
total_wait_time_ += wait_time;
if (wait_time > long_pause_log_threshold_) {
LOG(INFO) << __FUNCTION__ << " blocked for " << PrettyDuration(wait_time);
所以到这儿,我们清楚的一点是,肯定是由于某个线程没有正常调用 DecrementDisableThreadFlip() 函数,导致 disable_thread_flip_count_ 一直不为0;
- disable_thread_flip_count_ 一直不为 0 这个锅应该是谁的?
仔细看一下上面这个函数的代码,不仅全局有个disable_thread_flip_count_, 同时每个线程还有thread local的 disable_thread_flip_count,每次调用IncrementDisableThreadFlip()开始的时候,都会先执行Thread::Current()->IncrementDisableThreadFlipCount(),看下它的代码:
// art/runtime/thread.h
uint32_t GetDisableThreadFlipCount() const {
CHECK(kUseReadBarrier);
return tls32_.disable_thread_flip_count;
YEAH!下面只需要逐个遍历art里面每个线程,把 tls32_.disable_thread_flip_count > 1的部分打印出来: 稍微修改一下gdb脚本下的 art.gdb脚本里的art_dump_thread_list_base函数:
define art_dump_thread_list
art_dump_thread_list_base 0
define art_dump_thread_list_base
set $dump_mutex = $arg0
# 第一个节点
set $first = ('art::Runtime'::instance_)->thread_list_->list_->__end_->__next_
printf "\n Id Thread Id art::Thread* disable_flip_count STATE FLAGS NAME SUSPEND_COUNT MUTEX\n"
set $list_node = 0
set $count = 1
while $list_node != $first
if $list_node == 0
set $list_node = $first
set $thread = ('art::Thread'*)(*(void**)((uint64_t)$list_node + 2*sizeof(void*)))
set $flip_count = (('art::Thread'*)$thread)->tls32_->disable_thread_flip_count
if $flip_count != 0 # <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 1
set $thread_id = (('art::Thread'*)$thread)->tls32_->tid
#printf "monitor_enter_object: 0x%lx", $thread->tlsPtr_.monitor_enter_object
printf "%4d %6d %p ", $count, $thread_id, $thread
printf " %d ", $flip_count # <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 2
thread_print_state $thread
printf " "
thread_print_name $thread
printf " "
thread_print_suspend_count $thread
if $dump_mutex == 1
thread_print_mutex $thread
printf "\n"
set $list_node = $list_node->__next_
set $count = $count + 1
if $list_node == $first
loop_break
然后重新导入脚本然后执行这个函数即可:
(gdb) source /home/pip/Documents/gdb_native_tools/art.gdb
(gdb) art_dump_thread_list
Id Thread Id art::Thread* disable_flip_count STATE NAME SUSPEND_COUNT
4 1479 0x723973c800 1 kWaitingForGcThreadFlip Binder:filter-perf-event 0
23 1692 0x721d26a000 1 kWaitingForGcThreadFlip PowerManagerService 0
31 1825 0x722b7ab800 1 kWaitingForGcThreadFlip HwBinder:1471_1 0
52 1884 0x721c426c00 1 kWaitingForGcThreadFlip WifiStateMachine 0
88 1938 0x721bc96800 1 kWaitingForGcThreadFlip Binder:1471_3 0
98 2125 0x721c54d000 1 kWaitingForGcThreadFlip Binder:1471_5 0
100 2166 0x721a129800 1 kWaitingForGcThreadFlip Binder:1471_7 0
113 2894 0x7219cc1000 1 kWaitingForGcThreadFlip Binder:1471_A 0
116 2949 0x721273ec00 1 kWaitingForGcThreadFlip Binder:1471_D 0
117 2950 0x7219cc0400 1 kWaitingForGcThreadFlip Binder:1471_E 0
120 3089 0x720b025000 1 kWaitingForGcThreadFlip Binder:1471_11 0
126 3135 0x720c454800 1 kWaitingForGcThreadFlip Binder:1471_17 0
127 3140 0x721929cc00 1 kWaitingForGcThreadFlip Binder:1471_18 0
128 3142 0x720eb42000 1 kWaitingForGcThreadFlip Binder:1471_19 0
132 8981 0x7219d25800 1 kWaitingForGcThreadFlip HwBinder:1471_4 0
142 18045 0x72126b5800 1 kWaitingForGcThreadFlip Binder:1471_20 0
181 4738 0x7211c94400 1 kNative AsyncTask #36709 0
最后那个 AsyncTask #36709 十分可疑,别的线程的disable_flip_count 为1的原因是正在调用 IncrementDisableThreadFlip()函数,OK,接着看下它在执行什么函数:
- 嫌疑人 “AsyncTask #36709” 干了些什么?
"AsyncTask #36709" prio=5 tid=180 Native
| group="main" sCount=1 dsCount=0 flags=1 obj=0x14d149c8 self=0x7211c94400
| sysTid=4738 nice=0 cgrp=default sched=0/0 handle=0x708411b4f0
| state=S schedstat=( 3931924 16620626 18 ) utm=0 stm=0 core=7 HZ=100
| stack=0x7084018000-0x708401a000 stackSize=1041KB
| held mutexes=
kernel: __switch_to+0xa4/0xfc
kernel: binder_thread_read+0xa78/0x11a0
kernel: binder_ioctl+0x920/0xb08
kernel: do_vfs_ioctl+0xb8/0x8d8
kernel: SyS_ioctl+0x84/0x98
kernel: __sys_trace+0x4c/0x4c
native: #00 pc 000000000006e5bc /system/lib64/libc.so (__ioctl+4)
native: #01 pc 0000000000029544 /system/lib64/libc.so (ioctl+136)
native: #02 pc 000000000001e22c /system/lib64/libhwbinder.so (android::hardware::IPCThreadState::talkWithDriver(bool)+240)
native: #03 pc 000000000001e884 /system/lib64/libhwbinder.so (android::hardware::IPCThreadState::waitForResponse(android::hardware::Parcel*, int*)+324)
native: #04 pc 0000000000012204 /system/lib64/libhwbinder.so (android::hardware::BpHwBinder::transact(unsigned int, android::hardware::Parcel const&, android::hardware::Parcel*, unsigned int, std::__1::function<void (android::hardware::Parcel&)>)+312)
native: #05 pc 00000000000c9764 /system/lib64/android.hardware.gnss@1.0.so (android::hardware::gnss::V1_0::BpHwGnssXtra::_hidl_injectXtraData(android::hardware::IInterface*, android::hardware::details::HidlInstrumentor*, android::hardware::hidl_string const&)+244)
native: #06 pc 000000000005d5c4 /system/lib64/libandroid_servers.so (android::android_location_GnssLocationProvider_inject_xtra_data(_JNIEnv*, _jobject*, _jbyteArray*, int)+268)
at com.android.server.location.GnssLocationProvider.native_inject_xtra_data(Native method)
at com.android.server.location.GnssLocationProvider.access$2300(GnssLocationProvider.java:112)
at com.android.server.location.GnssLocationProvider$10.run(GnssLocationProvider.java:1140)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
at java.lang.Thread.run(Thread.java:764)
它在一个jni里执行一个hw binder call,结合下代码:
// frameworks/base/services/core/jni/com_android_server_location_GnssLocationProvider.cpp#1613
static void android_location_GnssLocationProvider_inject_xtra_data(JNIEnv* env, jobject /* obj */,
jbyteArray data, jint length) {
if (gnssXtraIface == nullptr) {
ALOGE("XTRA Interface not supported");
return;
jbyte* bytes = reinterpret_cast<jbyte *>(env->GetPrimitiveArrayCritical(data, 0));
gnssXtraIface->injectXtraData(std::string((const char*)bytes, length));
env->ReleasePrimitiveArrayCritical(data, bytes, JNI_ABORT);
(不熟悉art实现)一眼看过去,都是正常的jni调用,好像跟disable_flip_count没啥关系呀?全局搜一下DecrementDisableThreadFlip()和IncrementDisableThreadFlip被调用的地方:
ReleaseStringCritical() → DecrementDisableThreadFlip()
ReleasePrimitiveArray() → DecrementDisableThreadFlip()
GetPrimitiveArrayCritical() → IncrementDisableThreadFlip()
GetStringCritical() → IncrementDisableThreadFlip()
到这个地方,稍稍消化总结一下,不难推断出: 这个问题必然是因为 AsyncTask #36709 在执行jni调用过程中,它执行了: GetPrimitiveArrayCritical() → IncrementDisableThreadFlip() → 使得全局变量 disable_thread_flip_count_ = 1 但是在继续执行 gnssXtraIface →injectXtraData() 这个 hal 接口的时候,一直被block,自然无法执行 ReleasePrimitiveArrayCritical() → DecrementDisableThreadFlip() → disable_thread_flip_count_ = 0
4. gnssXtraIface→injectXtraData() 为什么会一直被block?
查看binder transactin日志
$ adb shell cat /sys/kernel/debug/binder/transactions
proc 1471
context hwbinder
thread 1673: l 00 need_return 0 tr 0
incoming transaction 979513688: ffffffed66132800 from 1007:1007 to 1471:1673 code 2 flags 10 pri 0:120 r1 node 978962390 size 48:0 data ffffff8014f00258
outgoing transaction 979513687: ffffffed66133e00 from 1471:1673 to 1007:1007 code 3 flags 10 pri 0:120 r1
thread 8981: l 01 need_return 0 tr 0
incoming transaction 979513990: ffffffed2541c700 from 1007:2040 to 1471:8981 code 4 flags 10 pri 0:120 r1 node 978962390 size 132:16 data ffffff8014f00508
thread 4738: l 00 need_return 0 tr 0
outgoing transaction 979513850: ffffffee6324a400 from 1471:4738 to 1007:0 code 2 flags 10 pri 0:120 r1
thread 8981: l 01 need_return 0 tr 0
incoming transaction 979513990: ffffffed2541c700 from 1007:2040 to 1471:8981 code 4 flags 10 pri 0:120 r1 node 978962390 size 132:16 data ffffff8014f00508
proc 1007
context hwbinder8981
thread 1007: l 12 need_return 0 tr 0
outgoing transaction 979513688: ffffffed66132800 from 1007:1007 to 1471:1673 code 2 flags 10 pri 0:120 r1
incoming transaction 979513687: ffffffed66133e00 from 1471:1673 to 1007:1007 code 3 flags 10 pri 0:120 r1 node 956 size 36:0 data ffffff8012000008
thread 2040: l 10 need_return 0 tr 0
outgoing transaction 979513990: ffffffed2541c700 from 1007:2040 to 1471:8981 code 4 flags 10 pri 0:120 r1
pending transaction 979513850: ffffffee6324a400 from 1471:4738 to 1007:0 code 2 flags 10 pri 0:120 r1 node 5700 size 120:16 data ffffff8012000030
incoming transaction from 1471:4738 to 1007:0,得到对端进程pid为1007, 继续看下这个进程里各个线程的状态
$ adb shell ps -T 1007
USER PID TID PPID VSZ RSS WCHAN ADDR S CMD
system 1007 1007 1 29016 156 binder_thread_read 0 S ignss_1_1
system 1007 2040 1 29016 156 binder_thread_read 0 S IPC Thread
system 1007 17865 1 29016 156 poll_schedule_timeout 0 S ignss_1_1
各个线程看起来都是正常的,通过debuggerd查看backtrace
$ adb shell debuggerd -b 1007
----- pid 1007 at 2019-05-05 13:25:52 -----
Cmd line: /vendor/bin/ignss_1_1
ABI: 'arm64'
"ignss_1_1" sysTid=1007
#00 pc 000000000006e5bc /system/lib64/libc.so (__ioctl+4)
#01 pc 0000000000029544 /system/lib64/libc.so (ioctl+136)
#02 pc 000000000001e22c /system/lib64/vndk-sp-28/libhwbinder.so (android::hardware::IPCThreadState::talkWithDriver(bool)+240)
#03 pc 000000000001e77c /system/lib64/vndk-sp-28/libhwbinder.so (android::hardware::IPCThreadState::waitForResponse(android::hardware::Parcel*, int*)+60)
#04 pc 0000000000012204 /system/lib64/vndk-sp-28/libhwbinder.so (android::hardware::BpHwBinder::transact(unsigned int, android::hardware::Parcel const&, android::hardware::Parcel*, unsigned int, std::__1::function<void (android::hardware::Parcel&)>)+312)
#05 pc 0000000000097ddc /system/lib64/android.hardware.gnss@1.0.so (android::hardware::gnss::V1_0::BpHwGnssCallback::_hidl_gnssStatusCb(android::hardware::IInterface*, android::hardware::details::HidlInstrumentor*, android::hardware::gnss::V1_0::IGnssCallback::GnssStatusValue)+208)
#06 pc 000000000001ccf0 /vendor/lib64/android.hardware.gnss@1.1-impl-xiaomi.so (android::hardware::gnss::V1_0::implementation::Gnss::statusCb(GpsStatus*)+68)
#07 pc 000000000001b5b4 /vendor/lib64/hw/gps.brcm.so (gps_stop()+156)
#08 pc 0000000000013ae0 /vendor/lib64/hw/gps.brcm.so (proxy__gps_stop()+168)
#09 pc 000000000001e1d4 /vendor/lib64/android.hardware.gnss@1.1-impl-xiaomi.so (android::hardware::gnss::V1_0::implementation::Gnss::stop()+28)
#10 pc 0000000000088328 /system/lib64/android.hardware.gnss@1.0.so (android::hardware::gnss::V1_0::BnHwGnss::_hidl_stop(android::hidl::base::V1_0::BnHwBase*, android::hardware::Parcel const&, android::hardware::Parcel*, std::__1::function<void (android::hardware::Parcel&)>)+152)
#11 pc 000000000002e324 /system/lib64/android.hardware.gnss@1.1.so (android::hardware::gnss::V1_1::BnHwGnss::onTransact(unsigned int, android::hardware::Parcel const&, android::hardware::Parcel*, unsigned int, std::__1::function<void (android::hardware::Parcel&)>)+1624)
#12 pc 000000000001df34 /system/lib64/vndk-sp-28/libhwbinder.so (android::hardware::BHwBinder::transact(unsigned int, android::hardware::Parcel const&, android::hardware::Parcel*, unsigned int, std::__1::function<void (android::hardware::Parcel&)>)+72)
#13 pc 000000000001508c /system/lib64/vndk-sp-28/libhwbinder.so (android::hardware::IPCThreadState::executeCommand(int)+1508)
#14 pc 0000000000014938 /system/lib64/vndk-sp-28/libhwbinder.so (android::hardware::IPCThreadState::getAndExecuteCommand()+204)
#15 pc 0000000000015708 /system/lib64/vndk-sp-28/libhwbinder.so (android::hardware::IPCThreadState::joinThreadPool(bool)+268)
#16 pc 0000000000000cb8 /vendor/bin/ignss_1_1 (main+468)
#17 pc 00000000000acec0 /system/lib64/libc.so (__libc_init+88)
"IPC Thread" sysTid=2040
#00 pc 000000000006e5bc /system/lib64/libc.so (__ioctl+4)
#01 pc 0000000000029544 /system/lib64/libc.so (ioctl+136)
#02 pc 000000000001e22c /system/lib64/vndk-sp-28/libhwbinder.so (android::hardware::IPCThreadState::talkWithDriver(bool)+240)
#03 pc 000000000001e884 /system/lib64/vndk-sp-28/libhwbinder.so (android::hardware::IPCThreadState::waitForResponse(android::hardware::Parcel*, int*)+324)
#04 pc 0000000000012204 /system/lib64/vndk-sp-28/libhwbinder.so (android::hardware::BpHwBinder::transact(unsigned int, android::hardware::Parcel const&, android::hardware::Parcel*, unsigned int, std::__1::function<void (android::hardware::Parcel&)>)+312)
#05 pc 0000000000098294 /system/lib64/android.hardware.gnss@1.0.so (android::hardware::gnss::V1_0::BpHwGnssCallback::_hidl_gnssNmeaCb(android::hardware::IInterface*, android::hardware::details::HidlInstrumentor*, long, android::hardware::hidl_string const&)+272)
#06 pc 000000000001d0b4 /vendor/lib64/android.hardware.gnss@1.1-impl-xiaomi.so (android::hardware::gnss::V1_0::implementation::Gnss::nmeaCb(long, char const*, int)+112)
#07 pc 00000000000136d8 /vendor/lib64/hw/gps.brcm.so (proxy__gps_nmea_cb(long, char const*, int)+180)
#08 pc 0000000000018c88 /vendor/lib64/hw/gps.brcm.so (GpsiClient::marshal_gps_nmea_cb(IpcIncomingMessage&)+300)
#09 pc 00000000000252a4 /vendor/lib64/hw/gps.brcm.so (IpcPipeTransportBase::OnSelect(int, bool, bool, bool, void*)+344)
#10 pc 0000000000025a28 /vendor/lib64/hw/gps.brcm.so (SelectManager::PerformOneWaitAndProcess()+468)
#11 pc 0000000000017860 /vendor/lib64/hw/gps.brcm.so (ipc_thread_proc(void*)+56)
#12 pc 0000000000023160 /vendor/lib64/android.hardware.gnss@1.1-impl-xiaomi.so (threadFunc(void*)+12)
#13 pc 0000000000081dac /system/lib64/libc.so (__pthread_start(void*)+36)
#14 pc 0000000000023788 /system/lib64/libc.so (__start_thread+68)
"ignss_1_1" sysTid=17865
#00 pc 000000000006e604 /system/lib64/libc.so (__pselect6+4)
#01 pc 000000000002bbf8 /system/lib64/libc.so (select+144)
#02 pc 000000000001974c /vendor/lib64/hw/flp.brcm.so (SelectManager::PerformOneWaitAndProcess()+268)
#03 pc 000000000000e8e8 /vendor/lib64/hw/flp.brcm.so (ipc_thread_proc(void*)+116)
#04 pc 0000000000081dac /system/lib64/libc.so (__pthread_start(void*)+36)
#05 pc 0000000000023788 /system/lib64/libc.so (__start_thread+68)
----- end 1007 -----
结合上面的binder call日志,可以看到ignss_1_1进程内部的两个binder线程都在回调system_server进程的binder 接口,没有空闲binder线程可以处理来自system_server的gnssXtraIface→injectXtraData() 请求!
incoming 1471:1673 outgoing 1007:1007 to 1471:1673 // blocked ! outgoing 1007:2040 to 1471:8981 // WaitingForGcThreadFlip !
p.s. 1471:1673 并不是一个binder线程, 但确响应了来自1007:1007的binder call,这是binder api设计上面的优化,复用? todo:确认
"android.bg" prio=5 tid=13 Blocked
| group="main" sCount=1 dsCount=0 flags=1 obj=0x13fc3428 self=0x7232239400
| sysTid=1673 nice=0 cgrp=default sched=0/0 handle=0x721da8e4f0
| state=S schedstat=( 19417818968810 14397831932031 20757829 ) utm=561393 stm=1380388 core=5 HZ=100
| stack=0x721d98b000-0x721d98d000 stackSize=1041KB
| held mutexes=
at com.android.server.am.ActivityManagerService.broadcastIntent(ActivityManagerService.java:22705)
- waiting to lock <0x00c6c1f5> (a com.android.server.am.ActivityManagerService) held by thread 146
at android.app.ContextImpl.sendBroadcastAsUser(ContextImpl.java:1195)
at com.android.server.location.GnssLocationProvider.reportStatus(GnssLocationProvider.java:1766)
at com.android.server.location.GnssLocationProvider.native_stop(Native method)
at com.android.server.location.GnssLocationProvider.stopNavigating(GnssLocationProvider.java:1607)
at com.android.server.location.GnssLocationProvider.updateRequirements(GnssLocationProvider.java:1405)
at com.android.server.location.GnssLocationProvider.handleSetRequest(GnssLocationProvider.java:1345)
at com.android.server.location.GnssLocationProvider.access$4100(GnssLocationProvider.java:112)
at com.android.server.location.GnssLocationProvider$ProviderHandler.handleMessage(GnssLocationProvider.java:2400)
at android.os.Handler.dispatchMessage(Handler.java:106)
at android.os.Looper.loop(Looper.java:207)
at android.os.HandlerThread.run(HandlerThread.java:65)
5. 修复方案
所以这个问题也是system_server↔ ignss_1_1 这两个进程相互binder call + jni call + 虚拟机正好在执行垃圾回收;
参考 oracle的文档:
After calling GetPrimitiveArrayCritical, the native code should not run for an extended period of time before it calls ReleasePrimitiveArrayCritical. We must treat the code inside this pair of functions as running in a “critical region.” Inside a critical region, native code must not call other JNI functions, or any system call that may cause the current thread to block and wait for another Java thread. (For example, the current thread must not call read on a stream being written by another Java thread.)
These restrictions make it more likely that the native code will obtain an uncopied version of the array, even if the VM does not support pinning. For example, a VM may temporarily disable garbage collection when the native code is holding a pointer to an array obtained via GetPrimitiveArrayCritical.
GetPrimitiveArrayCritical 和 ReleasePrimitiveArrayCritical 两个函数间的代码执行耗时不应太久! 结合此处的业务逻辑,而且也不会被太频繁的调用,拷贝一份原始数组给binder call作为参数即可:
static void android_location_GnssLocationProvider_inject_xtra_data(JNIEnv* env, jobject /* obj */,
jbyteArray data, jint length) {
if (gnssXtraIface == nullptr) {
ALOGE("XTRA Interface not supported");
return;
-- jbyte* bytes = reinterpret_cast<jbyte *>(env->GetPrimitiveArrayCritical(data, 0));
++ jbyte* bytes = reinterpret_cast<jbyte *>(env->GetByteArrayElements(data, 0));
gnssXtraIface->injectXtraData(std::string((const char*)bytes, length));
-- env->ReleasePrimitiveArrayCritical(data, bytes, JNI_ABORT);
++ env->ReleaseByteArrayElements(data, bytes, JNI_ABORT);
lart gc相关死锁黑屏问题总结 · 大专栏背景:从测试同学那边拿过来两台黑屏的机器,点击电源、屏幕都没有反应,不过还好 adb 可以链接,有 root 权限,一番 deubgging 后,确认表面原因是 system_server 进程的 art 虚拟机卡在了 gc 前 flip 线程状态的操作中,而最终的 root cause 也比较有意思,所以本文就简单记录下问题的排查过程。1. 按照习惯,快速检查下是否发生了 watchdog$ adb shell ls` `-l/da.
成为一名优秀的Android开发,需要一份完备的知识体系,在这里,让我们一起成长为自己所想的那样~。
众所周知,移动开发已经来到了后半场,为了能够在众多开发者中脱颖而出,我们需要对某一个领域有深入地研究与心得,对于Android开发者来说,目前,有几个好的细分领域值得我们去建立自己的技术壁垒,如下所示:
1、性能优化专家:具备深度性能优化与体系化APM建设的能力。
2、架构师:具有丰富的应用架构设计经验与心得,对Android Framework层与热门三方库的实现原理与架构设计了如指掌。
根据操作系统中的定义:死锁是指在一组进程中的各个进程均占有不会释放的资源,但因互相申请被其他进程所站用不会释放的资源而处于的一种永久等待状态。对应到SQLServer中,当在两个或多个任务中,如果每个任务锁定了其他任务试图锁定的资源,此时会造成这些任务永久阻塞,从而出现死锁;这些资源可能是:单行(RID,堆中的单行)、索引中的键(KEY,行锁)、页(PAG,8KB)、区结构(EXT,连续的8页)、堆或B树(HOBT)、表(TAB,包括数据和索引)、文件(File,数据库文件)、应用程序专用资源(APP)、元数据(METADATA)、分配单元(Allocation_Unit)、整个数据库(DB)
触发ANR:在Activity主线程中进行死循环操作,然后另一个按钮发送handler消息给主线程进行处理
分析ANR:借助bugreport生成对应的log,再adb pull出来,进行分析
2、触发ANR
2.1 Activity定义
public class AnrTestActivity extends AppCompatActivity {
priva...
Time: 2933175644_1545041895232
Flags: 0x38d83e44
Package: com.oppo.reader v62301 (6.2.3.20181030)
PID: 20358
Foreground: Yes
Activity: com.oppo.reader...
----- pid 7546 at 2017-11-30 04:20:46 -----
Cmd line: com.google.android.apps.maps //引起 ANR 的包,可以是它引起,与它有关,这们定位
Build fingerprint: 'Android/px5/px5:6.0.1/MXC89L/stt-tec11301619:userdebug/test-keys'
ABI: 'arm64'
Build type: optimized
Zygote loaded classes..
Resumed(运行状态):Activity处于前台,且用户可以与其交互。
Paused(暂停状态): Activity被在前台中处于半透明状态或者未覆盖全屏的其他Activity部分遮挡。 暂停的Activity不会接收用户输入,也无法执行任何代码。
Stopped(停止状态):Activ..
psum_mutex 68 432 719 552 599
psum_array 7.26 3.64 1.91 1.85 1.84
psum_local 1.06 0.54 0.28 0.29 0.30
信号量、线程锁机制,会大量的消耗系统资源,造成相当大的开销,甚至多于共享全局变量的存取,因此反而会效率低下
通过全局变量计算技巧替换信号量,线程锁,效率得到很大提升
通过局部变量替换全局变量的反复存取
订单下单扣库存死锁问题通常是由于多个并发的线程或进程在执行相同的操作,如下单和扣库存,但执行顺序不同,导致互相等待对方释放资源而陷入死锁状态。
例如,线程 A 先下单并锁定了某个库存,线程 B 也想购买同样的商品并锁定相同的库存,但此时线程 A 还未释放该库存,导致线程 B 等待线程 A 释放该库存,而线程 A 又等待线程 B 释放其他库存,导致两个线程互相等待,无法继续执行,从而陷入死锁。
要解决这个问题,可以采用以下方法之一:
1. 引入事务:在下单和扣库存的操作中引入事务,以确保这两个操作可以原子性地执行,从而避免出现死锁。
2. 分离订单和库存:将订单系统和库存系统分离开来,让它们分别独立运行,以避免两个系统之间的冲突。
3. 采用分布式锁:使用分布式锁来避免多个进程或线程同时操作同一资源的问题,确保每个资源只能被一个进程或线程锁定。
4. 避免并发访问:减少并发访问,通过限制访问量或者排队机制来避免同时进行下单和扣库存操作的冲突。
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|