这是一个耗时一个多月才解决的native crash (1)

2023年 10月 12日 15.9k 0

一、背景

8.54.0.0版本上线后,出现了大量的socket clientsdk.so库的闪退:

bugly.qq.com/v2/crash-re…

主要分为两种:

1、后台闪退:用户无感知:

bugly.qq.com/v2/crash-re…

2、前台闪退:用户有感知,主要体现为 启动app后立刻就闪退,然后重启后正常;

bugly.qq.com/v2/crash-re…

二、定位问题

1、通过用户反馈,并让用户安装测试包,可以真实的证明,确实是socket so库导致的crash:

image.png

2、由于堆栈并没有指明是so库的哪个地方闪退,所以,无法进行直接修复,开始进行推论:

2.1、 8.54.0.0版本并没有升级socket so库,并且通过反编译对比两个线上版本,并不存在lib_c++ shared的库被升级,因此可以排除是因为升级其他sdk导致的socket 闪退;

2.2、8.54.0.0版本升级了bugly sdk,由于bugly sdk进行了优化,导致之前过滤socket so库后台闪退的记录失效,因此出现大量上报,可以理解,但不应该出现前台闪退;

2.3、通过和多个用户沟通,发现用户并不能稳定复现,只是有时候会出现,但是明显8.54.0.0版本的频率比以前高很多,通过让测试多次安装测试包的方法并没有复现,应该是8.54.0.0版本有什么东西引发了socket so库的闪退;

2.4、从日志里找线索,具体日志如下:

#00 pc 000000000005205c /apex/com.android.runtime/lib64/bionic/libc.so (abort+164) [arm64-v8a::82e5b2ff86b193c94139353a92c4af29]
2
#01 pc 00000000000667d0 /apex/com.android.runtime/lib64/bionic/libc.so (__stack_chk_fail+20) [arm64-v8a::82e5b2ff86b193c94139353a92c4af29]
3
#02 pc 0000000000067d5c /data/app/~~jnhT7CUNDeCebak8i1CqNw==/com.xx.seeyou-oFoQho_36LUWZWiebvJIkg==/lib/arm64/libclientsdk.so [arm64-v8a::de63b5e1d2007ebbad9c3e7b9001d192]
4
#03 pc 0000000000067e4c /data/app/~~jnhT7CUNDeCebak8i1CqNw==/com.xx.seeyou-oFoQho_36LUWZWiebvJIkg==/lib/arm64/libclientsdk.so [arm64-v8a::de63b5e1d2007ebbad9c3e7b9001d192]
5
#04 pc 000000000006cb78 /data/app/~~jnhT7CUNDeCebak8i1CqNw==/com.xx.seeyou-oFoQho_36LUWZWiebvJIkg==/lib/arm64/libclientsdk.so [arm64-v8a::de63b5e1d2007ebbad9c3e7b9001d192]
6
#05 pc 00000000000732d8 /data/app/~~jnhT7CUNDeCebak8i1CqNw==/com.xx.seeyou-oFoQho_36LUWZWiebvJIkg==/lib/arm64/libclientsdk.so [arm64-v8a::de63b5e1d2007ebbad9c3e7b9001d192]
7
#06 pc 00000000000742b0 /data/app/~~jnhT7CUNDeCebak8i1CqNw==/com.xx.seeyou-oFoQho_36LUWZWiebvJIkg==/lib/arm64/libclientsdk.so [arm64-v8a::de63b5e1d2007ebbad9c3e7b9001d192]
8
#07 pc 00000000000748bc /data/app/~~jnhT7CUNDeCebak8i1CqNw==/com.xx.seeyou-oFoQho_36LUWZWiebvJIkg==/lib/arm64/libclientsdk.so [arm64-v8a::de63b5e1d2007ebbad9c3e7b9001d192]
9
#08 pc 00000000000b3ea0 /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+264) [arm64-v8a::82e5b2ff86b193c94139353a92c4af29]
10
#09 pc 0000000000053880 /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+64) [arm64-v8a::82e5b2ff86b193c94139353a92c4af29]

         从以上日志可以发现两个关键点:

         - 1、bionic/libc.so (__stack_chk_fail+20)

                   -  此日志说明一般是因为内存操作导致,也就是说内存相关引发的;

         -2、bionic/libc.so (__start_thread+64) [arm64-v8a::82e5b2ff86b193c94139353a92c4af29] 

                   - 此日志说明是在开启线程的时候挂掉的

                   - 通过分析bugly的跟踪日志我们发现了另外一段系统日志:

04-07 10:51:32.253 8833 8894 E clientsdk: [clientsdk][lvl:1] [KVS] init. version: 0, key count: 0 /data/docker/im-group/im-sdk/jniwks/MeetYou/jni/android/com_rtc_RTCClient.cpp:879204-07

10:51:32.315 8833 11111 E clientsdk: [clientsdk][lvl:1] workThreadProcess start, thread id is 498356722864 /data/docker/im-group/im-sdk/jniwks/MeetYou/jni/client_sdk/client_core.cpp:1589304-07 10:51:32.433 8833 8833

E SysUtils: [variable fonts] error to areFontsVariable:java.lang.ClassNotFoundException: com.huawei.android.graphics.fonts.SystemFontsEx94--------- beginning of crash9504-07 10:51:32.584 8833 11111 F libc : stack corruption detected (-fstack-protector)9604-07 10:51:32.662 8833 11227 E Oms-SDK.WearableApiManager: Service missing when getting application info9804-07 10:51:32.990 8833 8833 E HwResourcesImpl: handleAddIconBackground resId = 0 return: android.graphics.drawable.ColorDrawable@a2e5c3f12304-07 10:51:33.388 8833 8870 E summer : not found implements method com.xx.seeyou.protocol.GaStubImpladdGaOtherParams"!!!!!!!!!!!!!!12404−0710:51:33.4741122611226E[libcrashpadhandlertrampoline.huawei.so](http://libcrashpadhandlertrampoline.huawei.so):openfileerror12504−0710:51:33.7921122611226Ehwbrenginecrashpad:../../thirdparty/crashpad/crashpad/util/process/[processmemoryrange.cc](http://processmemoryrange.cc):75:readoutofrange12604−0710:51:33.7931122611226Ehwbrenginecrashpad:../../thirdparty/crashpad/crashpad/util/process/[processmemoryrange.cc](http://processmemoryrange.cc):75:readoutofrange12704−0710:51:33.81088338870Esummer:notfoundimplementsmethodcom.xx.seeyou.protocol.GaStubImpladdGaOtherParams" !!!!!!!!!!!!!!12404-07 10:51:33.474 11226 11226 E [libcrashpad_handler_trampoline.huawei.so](http://libcrashpad_handler_trampoline.huawei.so): open file error12504-07 10:51:33.792 11226 11226 E hwbr_engine_crashpad: ../../third_party/crashpad/crashpad/util/process/[process_memory_range.cc](http://process_memory_range.cc):75: read out of range12604-07 10:51:33.793 11226 11226 E hwbr_engine_crashpad: ../../third_party/crashpad/crashpad/util/process/[process_memory_range.cc](http://process_memory_range.cc):75: read out of range12704-07 10:51:33.810 8833 8870 E summer : not found implements method com.xx.seeyou.protocol.GaStubImpladdGaOtherParams"!!!!!!!!!!!!!!12404−0710:51:33.4741122611226E[libcrashpadh​andlert​rampoline.huawei.so](http://libcrashpadh​andlert​rampoline.huawei.so):openfileerror12504−0710:51:33.7921122611226Ehwbre​nginec​rashpad:../../thirdp​arty/crashpad/crashpad/util/process/[processm​emoryr​ange.cc](http://processm​emoryr​ange.cc):75:readoutofrange12604−0710:51:33.7931122611226Ehwbre​nginec​rashpad:../../thirdp​arty/crashpad/crashpad/util/process/[processm​emoryr​ange.cc](http://processm​emoryr​ange.cc):75:readoutofrange12704−0710:51:33.81088338870Esummer:notfoundimplementsmethodcom.xx.seeyou.protocol.GaStubImpladdGaOtherParams" !!!!!!!!!!!!!!12804-07 10:51:33.876 8833 11111 E eup : get abort message after Q12904-07 10:51:33.897 8833 8833 E lgr : getToolHistoryList, babyId: 53695904, commonId: 22064617613004-07 10:51:33.897 8833 10392 E lgr : GetToolHistoryWorker, babyId: 53695904, commonId: 22064617613104-07 10:51:33.898 8833 10392 E lgr : shouldUpdate: false13204-07 10:51:33.914 8833 8833 E lgr : getToolHistoryList, size: 

  

                     我们大胆的猜测:socket 进行了初始化,并创建了线程,并且线程体开始执行,然后因为内存问题出现了意想不到的情况,然后就蹦了;

2.5、验证猜想:

  • 最常见的C++空指针,于是我们模拟了一下C++空指针,信息如下,和实际线上日志不符,因此排除空指针的情况,此项不成立 

    Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x9c in tid 30316 (example.meetyou), pid 30272 (example.meetyou)
    2023-04-07 15:16:59.288 30320-30320 DEBUG                   crash_dump64                         A  Softversion: PD2020C_A_7.10.11
    2023-04-07 15:16:59.288 30320-30320 DEBUG                   crash_dump64                         A  Time: 2023-04-07 15:16:59
    2023-04-07 15:16:59.288 30320-30320 DEBUG                   crash_dump64                         A  *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
    2023-04-07 15:16:59.288 30320-30320 DEBUG                   crash_dump64                         A  Build fingerprint: 'vivo/PD2020/PD2020:10/QP1A.190711.020/compiler02071532:user/release-keys'
    2023-04-07 15:16:59.288 30320-30320 DEBUG                   crash_dump64                         A  Revision: '0'
    2023-04-07 15:16:59.288 30320-30320 DEBUG                   crash_dump64                         A  ABI: 'arm64'
    2023-04-07 15:16:59.288 30320-30320 DEBUG                   crash_dump64                         A  Timestamp: 2023-04-07 15:16:59+0800
    2023-04-07 15:16:59.288 30320-30320 DEBUG                   crash_dump64                         A  pid: 30272, tid: 30316, name: example.meetyou  >>> com.example.meetyou > com.example.meetyou

相关文章

服务器端口转发,带你了解服务器端口转发
服务器开放端口,服务器开放端口的步骤
产品推荐:7月受欢迎AI容器镜像来了,有Qwen系列大模型镜像
如何使用 WinGet 下载 Microsoft Store 应用
百度搜索:蓝易云 – 熟悉ubuntu apt-get命令详解
百度搜索:蓝易云 – 域名解析成功但ping不通解决方案

发布评论