Android NDK项目崩溃信息抓取

开发日志系列(十六)

cocos2d-x lua binding,android-ndk-r9c,eclipse,cygwin,window7 64bit,red mi,Thl9,htc one

项目最近进入高速迭代阶段,版本更新速度比平时加快了几倍,这几天出的版本 莫名的出现频繁崩溃闪退的问题(仅仅Android上),不得已停下逻辑开发工作去fix 这个问题.

问题重现

打开游戏进入场景 频繁切换模块界面的时候 有一定几率崩溃

初步猜测

很大可能是加载销毁UI资源的时候导致C++代码出错.而仅仅在Android上出问题 就有可能是挂在异步加载的线程上.由于 最近两周才做的UI资源分模块管理 和异步加载.所以 异步子线程出的问题非常大.

这个问题的难点在于,C++代码在android上调试比较困难,由于我们的项目(cocos2d-x lua)并非单纯的NDK项目 所以并不能完全用单步调试JNI的方式解决.需要知道问题的根源 还是必须从想办法抓取崩溃最后的打印信息开始.

  • LogCat打印内容

在游戏崩溃的时候可以轻易跟踪到打印出最后两句内容:

mtk_dlmalloc_debug DEBUG_INFO]FUNCTION internal_inspect_all Line ....略
libc - Fatal signal 11 (SIGSEGV) at 0xdeadbaad (code=1), thread 6034 (Thread-531)

这种mtk_dlmalloc_debug信息 从libc库中引发,初步猜想是从用户层的原生库ibc.so引发.但是仅仅从这两句log来看 并不能十分确认问题的所在. 下面我们换一种方式抓取.

  • ndk-stack工具抓取

ndk-stack工具的使用对于源码调试非常有用,如果是项目本身的代码(非底层原生库)出BUG的话,崩溃的时候甚至能打印出代码的行数.抓取命令:

adb logcat | ndk-stack -sym "E:\rect\SrcClient\Game\proj.android\obj\local\armeabi-v7a"

要点在于 编译C++代码的时候打开所有DEBUG选项,包括NDK的 NDK_DEBUG = 1 个人建议把V=1也加上,这样在eclipse编译C++代码的时候 能把所有NDK命令都打印出来.对于我这种命令行控来说 再好不过.通过ndk-stack 在 游戏崩溃的时候抓到如下内容

********** Crash dump: **********
Build fingerprint: 'Xiaomi/2013022/HM2013022:4.2.1/HM2013022/JHACNBF17.0:user/release-keys'
pid: 5555, tid: 5578, name: Thread-486  >>> com.cmge.onepiece < <<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr deadbaad
Stack frame I/DEBUG   ( 5612):     #00  pc 000246a8  /system/lib/libc.so: Routine ??
??:0
Stack frame I/DEBUG   ( 5612):     #01  pc 00016cac  /system/lib/libc.so (dlmalloc+4908): Routine ??
??:0
Stack frame I/DEBUG   ( 5612):     #02  pc 0000cb6c  /system/lib/libc.so (malloc+16): Routine ??
??:0
Stack frame I/DEBUG   ( 5612):     #03  pc 00dc6a88  /data/app-lib/com.aaa.bbb-1/libogame.so (operator new(unsigned int)+24): Routi
ne operator new(unsigned int) at ??:?
Stack frame I/AEE/AED ( 5612):     #00  pc 000246a8  /system/lib/libc.so: Routine ??
??:0
Stack frame I/AEE/AED ( 5612):     #01  pc 00016cac  /system/lib/libc.so (dlmalloc+4908): Routine ??
??:0
Stack frame I/AEE/AED ( 5612):     #02  pc 0000cb6c  /system/lib/libc.so (malloc+16): Routine ??
??:0
Stack frame I/AEE/AED ( 5612):     #03  pc 00dc6a88  /data/app-lib/com.aaa.bbb-1/libgame.so (operator new(unsigned int)+24): Routi
ne operator new(unsigned int) at ??:?

这个log传递给我们的信息比之前的丰富多了.我们几乎可以确认 这个崩溃和内存的申请释放有关,从

operator new(unsigned int)+24

来看 有可能是在new某块内存的时候引发的崩溃, 不一定是 unsigned int,因为我猜测很多时候翻译为int是不准确的.可能是玩的反编译多了 我很不信任翻译为int的内容. 目前为止我们能确认 和内存有关,但是并没有证据证明和线程有关系,所以必须有最后一步:NDK-GDB尝试调试.

  • ndk-gdb调试

ndk-gdb是NDK提供的一个调试工具,我之前并没有用过它.所以在正确使用这个工具之前遇到了几个问题,在这里记录一下,由于工具在win平台上需要使用到cygwin,所以第一步是配置cygwin.第二步是使用ndk-build生成可供ndk-gdb调试的文件,第三步才是进行调试

配置Cygwin

  1. cygwin下载默认安装
  2. 配置环境变量

例如我的加入环境变量

NDK_MODULE_PATH = 
/cygdrive/e/rect/SrcClient:/cygdrive/e/rect/SrcClient/cocos2dx/platform/third_party/android/prebuilt

修改.bash_profile,例如我本地的:

ANDROID_NDK_ROOT=/cygdrive/e/rect/android-ndk-r9c
export ANDROID_NDK_ROOT

NDK_MODULE_PATH = /cygdrive/e/rect/SrcClient:/cygdrive/e/rect/SrcClient/cocos2dx/platform/third_party/android/prebuilt
export NDK_MODULE_PATH

到此全部配置完毕.

NDK-BUILD命令行编译C++

命令如下

ndk-build clean all NDK_DEBUG=1

clean all 的意思是 编译之前先清理全部上次生成的内容,如果不加这个参数 则会报 patten %之类的错误.NDK_DEBUG=1 意思是生成调试版本的文件.加了这个参数后 调试的时候能定位到源码行数.整个编译过程大约持续30分钟.

NDK-GDB 调试

调试命令为(需手机连接电脑,并且在手机上运行游戏)

ndk-gdb --verbose

NDK-GDB这个工具略坑,在正常工作之前 抽了几次风.

抽风之一

ndk-gdb Could not extract package’s data directory. Are you sure that your installed application is debuggable?

解决:修改ndk根目录下的ndk-gdb文件

old:adb_var_shell2 DATA_DIR run-as $PACKAGE_NAME /system/bin/sh -c pwd
new:DATA_DIR=”/data/data/$PACKAGE_NAME”

抽风之二

gdb.setp 生成的源码路径错乱,就像下面这样

$ ndk-gdb --verbose
Android NDK installation path: /cygdrive/e/rect/android-ndk-r9c
Using default adb command: /cygdrive/e/rect/adt-bundle-windows/sdk/platform-tools/adb
ADB version found: Android Debug Bridge version 1.0.31
Using ADB flags:
Using JDB command: /cygdrive/c/Program Files/Java/jdk1.7.0_51/bin/jdb
Using auto-detected project path: .
Found package name: com..aaa.bbb
ABIs targetted by application: armeabi-v7a armeabi
Device API Level: 10
Device CPU ABIs: armeabi-v7a armeabi
Compatible device ABI: armeabi-v7a
Using gdb setup init: ./libs/armeabi-v7a/gdb.setup
Using toolchain prefix: /cygdrive/e/rect/android-ndk-r9c/toolchains/arm-linux-androideabi-4.6/prebuilt/windows-x86_64/bin/arm-linux-androideabi-
Using app out directory: ./obj/local/armeabi-v7a
Found debuggable flag: true
Found data directory: '/data/data/com..aaa.bbb'
Found device gdbserver: /data/data/com..aaa.bbb/lib/gdbserver
Found running PID: 8351
Launched gdbserver succesfully.
Setup network redirection
## COMMAND: adb_cmd shell run-as com.cmge.onepiece /data/data/com.aaa.bbb/lib/gdbserver +debug-socket --attach 8351
## COMMAND: adb_cmd forward tcp:5039 localfilesystem:/data/data/com.aaa.bbb/debug-socket
run-as: Package 'com.cmge.onepiece' has corrupt installation
## COMMAND: adb_cmd pull /system/bin/app_process obj/local/armeabi-v7a/app_process
89 KB/s (5736 bytes in 0.062s)
Pulled app_process from device/emulator.
## COMMAND: adb_cmd pull /system/bin/linker obj/local/armeabi-v7a/linker
2468 KB/s (39436 bytes in 0.015s)
Pulled linker from device/emulator.
## COMMAND: adb_cmd pull /system/lib/libc.so obj/local/armeabi-v7a/libc.so
5943 KB/s (273912 bytes in 0.045s)
Pulled libc.so from device/emulator.
/cygdrive/e/rect/android-ndk-r9c/ndk-gdb: line 770: [: armeabi-v7a: unary operator expected
GNU gdb (GDB) 7.3.1-gg2
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "--host=x86_64-pc-mingw32msvc --target=arm-linux-android".
For bug reporting instructions, please see:
.
Warning: E:/rect/SrcClient/lua/proj.android/../../../cocos2dx: No such file or directory.
Warning: E:/rect/SrcClient/lua/proj.android/../../../cocos2dx/include: No such file or directory.
Warning: E:/rect/SrcClient/lua/proj.android/../../../cocos2dx/platform: No such file or directory.
Warning: E:/rect/SrcClient/lua/proj.android/../../../cocos2dx/platform/android: No such file or directory.
Warning: E:/rect/SrcClient/lua/proj.android/../../../cocos2dx/kazmath/include: No such file or directory.
Warning: E:/rect/SrcClient/lua/proj.android/../../../CocosDenshion/include: No such file or directory.
Warning: E:/rect/SrcClient/lua/proj.android/../../../extensions: No such file or directory.
Warning: E:/rect/SrcClient/BaseCore/../../../include: No such file or directory.
Warning: E:/rect/SrcClient/BaseCore/../../../Common: No such file or directory.
Warning: E:/rect/SrcClient/BaseCore/filesystembzip2: No such file or directory.
Warning: E:/rect/SrcClient/GameWnd/GameWnd: No such file or directory.
Warning: E:/rect/SrcClient/EmBattleClient/Classes: No such file or directory.
Warning: E:/rect/SrcClient/ArenaClient/Classes: No such file or directory.
Warning: E:/rect/SrcClient/CardSystem/Classes: No such file or directory.
Warning: E:/rect/SrcClient/DramaticClient/Classes: No such file or directory.
Warning: E:/rect/SrcClient/EntityClient/Classes: No such file or directory.
Warning: E:/rect/SrcClient/FightSystem/Classes: No such file or directory.
Warning: E:/rect/SrcClient/ShipClient/Classes: No such file or directory.
Warning: E:/rect/SrcClient/WorldBossClient/Classes: No such file or directory.
Warning: E:/rect/SrcClient/Log/Classes: No such file or directory.
Warning: E:\rect\SrcClient\Game\proj.android/jni/../../../../../Common: No such file or directory.
Warning: E:\rect\SrcClient\Game\proj.android/jni/../../../../../Include: No such file or directory.
Warning: E:/rect/SrcClient/engine/inc/common: No such file or directory.
Warning: E:/rect/SrcClient/engine/inc/common/jsons: No such file or directory.
Warning: E:/rect/SrcClient/engine/inc/engine: No such file or directory.
Warning: E:/rect/SrcClient/engine/inc/render: No such file or directory.
Warning: E:/rect/SrcClient/SkillSystem/GameWnd: No such file or directory.
(gdb) obj/local/armeabi-v7a/gdb.setup:4: Error in sourced command file:
Remote communication error.  Target disconnected.: No error.

最后的一堆Warning 生成的都是错乱的路径.但是仔细看其实这是两个错误, 一个是错误路径,一个是无效路径.

解决

1.修改所有android.mk 把无效的源码路径全部去掉,大概修改了二十多个地方.
2.对于几个错乱的路径 用完整路径代替.

关于这个gdb.setup的路径生成BUG 我提在stackoverflow

ndk-gdb的问题解决了之后 再次运行调试命令.运行之后 输入

continue

该命令和VS的F5无意.然后 把之前的项目BUG设法重现.终于最后获取了最实在的堆栈信息:

[New Thread 4285]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 4285]
0x400926a8 in ?? () from E:/rect/SrcClient/Game/proj.android/obj/local/armeabi-v7a/libc.so
(gdb) bt
#0  0x400926a8 in ?? () from E:/rect/SrcClient/Game/proj.android/obj/local/armeabi-v7a/libc.so
#1  0x40089614 in dlmalloc_inspect_all () from E:/rect/SrcClient/Game/proj.android/obj/local/armeabi-v7a/libc.so
#2  0x41613a6c in ?? ()
#3  0x41613a6c in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

bt命令可以查看最近的堆栈信息,输入bt后输出了比较实在的内容.从信息上看,大概流程是:

新建线程切换线程调用dlmalloc_inspect_all (dlmalloc 我猜测是 delete malloc 清除内存). – 程序崩溃

总结

经过这几种方式取得的崩溃信息,粗略可以认为是 在 模块切换的时候 UI资源加载移除引发的 io操作 导致了 android虚拟机的频繁GC.有可能是GC到空的内容了.具体的解决方式 是建议把资源分块加载,例如有图片 A,B,C,D,E ,可以设定一个顺序 先后加载. 这个方法是否可行有待实验.over

参考文档

《Android NDK项目崩溃信息抓取》有2个想法

  1. 最近我们的项目Android闪退也比较多,试了这个工具,崩溃时也不能准确看到c++源码准确行数,只有2-3句报错打印,看不到任何引擎的报错相关的信息,不太熟悉ndk,也不知道是不是用了quick-cocos的问题(和官方的lua版本有点区别),也很头痛,项目经理很恼火,我们这个闪退是有几率的,处理起来非常麻烦,不知道你对这块有什么看法,可以帮忙一下

    1. 这个闪退基本上两个原因:1.由于程序员写C++的时候不注意内存手动delete 导致内存泄漏。2.由于程序员对C++指针生命周期把握不足 导致使用了幽灵指针 导致内存错乱。

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注