监控called on a destroyed mutex ：升级target sdk版本后行为变更

系统运维 2023-10-06 贤蛋大眼萌手机阅读

问题背景

最近各大厂商都要求升级target sdk版本到30了，升级版本后，对于一些权限的使用也更加严格，同时也带来很多新特性，本篇是围绕在target 30 升级后，出现的关于mutex的行为变更进行分享。【本文相关代码都已放在这里mooner】

本文中第一个概念是called on a destroyed mutex，它代表着当前有程序使用了一个已经被销毁的互斥量（很多书籍把mutex称为互斥量，本文也按照这个）。为什么单独领出来讲呢？这是因为使用了被销毁的互斥量，不但达不到并发控制的效果，还容易出现额外的bug。同时也是很多Native Crash的根源，不仅作为应用开发的我们容易踩坑，像webview这样的被各个厂商定制的也比较容易踩坑，比如在某个版本的出现类似信息：


     0000007a0fc645e0  0000007a0fc645f0  [anon:stack_and_tls:24685]
     0000007a0fc645e8  0000007b539c8704  /apex/com.android.runtime/lib64/bionic/libc.so (HandleUsingDestroyedMutex(pthread_mutex_t*, char const*)+24)
     0000007a0fc645f0  0000007a0fc64640  [anon:stack_and_tls:24685]
     0000007a0fc645f8  0000007a6ceb6c90  /system/product/app/HwWebview/HwWebview.apk!libwebviewchromium.huawei.so
     0000007a0fc64600  0000007a0fc64610  [anon:stack_and_tls:24685]
     0000007a0fc64608  0000007b539c8704  /apex/com.android.runtime/lib64/bionic/libc.so (HandleUsingDestroyedMutex(pthread_mutex_t*, char const*)+24)

当然，如果你的targetsdk 版本小于28，其实并不会出现什么，当时当你升级大于这个版本的时候，行为就发生变更了

static int __attribute__((noinline)) HandleUsingDestroyedMutex(pthread_mutex_t* mutex,
                                                               const char* function_name) {
    if (android_get_application_target_sdk_version() >= 28) {
        __fortify_fatal("%s called on a destroyed mutex (%p)", function_name, mutex);
    }
    return EBUSY;
}

当target sdk版本大于等于28的时候，就会提前通过信号，把问题暴露出来，而不是像之前版本一样返回EBUSY标记（这个标记不要求强制处理）

当我们在target sdk版本大于等于28时，通过下面代码，就会触发crash（低于28不会触发）

pthread_mutex_t t;
pthread_mutex_init(&t,NULL);
pthread_mutex_destroy(&t);
struct timespec timer  ={2,2};

这些函数都是会检测锁是否被销毁,下文继续介绍
pthread_mutex_timedlock(&t,&timer);
//pthread_mutex_unlock(&t);
...

堆栈信息如下

FORTIFY: pthread_mutex_timedlock called on a destroyed mutex (0x7fe29fcc80)
Fatal signal 6 (SIGABRT), code -1 (SI_QUEUE) in tid 5137 (com.pika.mooner), pid 5137 (com.pika.mooner)
pid: 5137, tid: 5137, name: com.pika.mooner  >>> com.pika.mooner FreePIMutex();
            atomic_store(&mutex->state, 0xffff);
        }
        return result;
    }
    // Store 0xffff to make the mutex unusable. Although POSIX standard says it is undefined
    // behavior to destroy a locked mutex, we prefer not to change mutex->state in that situation.
    if (MUTEX_STATE_BITS_IS_UNLOCKED(old_state) &&
        atomic_compare_exchange_strong_explicit(&mutex->state, &old_state, 0xffff,
                                                memory_order_relaxed, memory_order_relaxed)) {
        return 0;
    }
    return EBUSY;
}

那么什么时候才会走到HandleUsingDestroyedMutex函数呢？我们继续拿pthread_mutex_destroy举例子（其他函数也是同意逻辑），当IsMutexDestroyed为true的时候，就会走到HandleUsingDestroyedMutex

static inline __always_inline bool IsMutexDestroyed(uint16_t mutex_state) {
    这里就判断了0xffff ，这个状态在pthread_mutex_destroy被设置
    return mutex_state == 0xffff;
}

监控手段

既然called on a destroyed mutex 是一个这么隐蔽的问题，而且当发生crash时，能够给我们的堆栈信息也比较有限，因为只有触发时的异常，这样不方便我们排查问题的根因。如果能在crash发生时，我们能够获取到当前的互斥量信息，同时能够获取到释放互斥量时的堆栈，而不是使用时，那么将极大方便开发者定位到问题。

针对这类问题，其实都有一个通用的方法论，就是记录每个互斥量释放的堆栈，当发生IsMutexDestroyed为true时进行堆栈匹配。这个思路跟监控FD思路一致：

下面我们从左到右，以此说明一下实现

pthread_mutex_destroy 代理

pthread系列，我们都可以采用got/plt hook进行代理，同时因为是c函数，符号就是函数名本身，关于got/plt hook我在之前的文章也说过很多次，这是必不可少的Native Hook技能之一，我们拿bhook给出例子代码：

static int pthread_destroy_proxy(pthread_mutex_t *mutex_interface) {
    写入backtrace
    int result = BYTEHOOK_CALL_PREV(pthread_destroy_proxy, pthread_mutex_type, mutex_interface);
    BYTEHOOK_POP_STACK();
    return result;
}

外部传入hook_so_name
bytehook_hook_single(hook_so_name, NULL, "pthread_mutex_destroy",
                     (void *) pthread_destroy_proxy,
                     NULL, NULL);

因为pthread系列几乎在所有so中都会用到，这里我们希望是由外部传入指定的so去进行单一检查，避免hook频繁

backtrace写入

接下来就是写入堆栈了，我们在pthread_destroy_proxy中能拿到当前pthread_mutex_t指针，我们通过key为pthread_mutex_t* value为当前堆栈的方式，写入到一个全局的map当中，用于释放堆栈的写入

static int pthread_destroy_proxy(pthread_mutex_t *mutex_interface) {
    hook_entry *entry = calloc(1, sizeof(hook_entry));
    entry->cfi_backtrace = xunwind_cfi_get(-1, -1, NULL, NULL);
    entry->addr = mutex_interface;
    put(concurrent_hash_map, mutex_interface, entry);
    check_lock_state(mutex_interface);
    int result = BYTEHOOK_CALL_PREV(pthread_destroy_proxy, pthread_mutex_type, mutex_interface);
    BYTEHOOK_POP_STACK();
    return result;
}

获取堆栈，这里我们采用CFI的方式获取，因为CFI能够获取到java层的符号，因此这个会提供给我们更加全面的信息，我们获取的时候，可以直接采用libunwind提供的_Unwind_Backtrace 获取，当然，这也意味着要处理好buffer等数据，还有打印格式，同时要还原dlinfo信息出来。

static _Unwind_Reason_Code unwind_callback(struct _Unwind_Context *context, void *data) {
    struct backtrace_stack *state = (struct backtrace_stack *) (data);
    uintptr_t pc = _Unwind_GetIP(context);
    if (pc) {
        if (state->current == state->end) {

            return _URC_END_OF_STACK;
        } else {
            *state->current++ = (void *) (pc);
        }
    }
    return _URC_NO_REASON;
}


static size_t getbacktrace(void **buffer, size_t max) {
    struct backtrace_stack stack = {buffer, buffer + max};
    _Unwind_Backtrace(unwind_callback, &stack);
    return stack.current - buffer;
}

这里我们直接偷个懒，采用一个比较成熟的unwind库获取即可，xUnwind，这里获取方式是采取了libbacktrace方式获取，我们有机会可以专门介绍这种方式。xUnwind其实被使用在很多地方，比如xCrash等，因为作者都是同一个。

Map写入

因为我写的是C语言，C语言对map的支持其实比较弱，我们直接根据ConcurentHashMap的思想，造一个简单的map结构

map_t *create_map() {
    // allocate memory for the map structure
    map_t *map = malloc(sizeof(map_t));
    if (map == NULL) {
        perror("malloc");
        exit(1);
    }
    // initialize the buckets and locks arrays
    for (int i = 0; i buckets[i] = NULL;
        pthread_mutex_init(&map->locks[i], NULL);
    }
    return map;
}

其中结构体map的定义如下，有一个变量代表当前的数据类entry_t，当发生hash冲突时，它就以链表的形式存储，还有一个变量是locks，代表着每个bucket对应的锁。当需要一些获取操作时，需要先获取lock，才能操作bucket。这里我们简单实现保证并发问题，还可以有更多的优化，比如get操作也可以不用加锁只需要保证内存可见即可。

typedef struct map {
    entry_t *buckets[BUCKEDTS_SIZE]; // an array of pointers to hash buckets
    pthread_mutex_t locks[BUCKEDTS_SIZE]; // an array of mutexes for each bucket
} map_t;

下面给出一个put方法例子，完整代码已经放在mooner中

int put(map_t *map, pthread_mutex_t *key, hook_entry *value) {
    if (map == NULL) return -1;
    // calculate hash
    uint64_t index = hash((uint64_t) key);
    // lock the bucket
    pthread_mutex_lock(&map->locks[index]);
    // search for the key in the bucket
    entry_t *entry = map->buckets[index];
    while (entry != NULL) {
        if (entry->key == key) {
            // key found, update the value and unlock the bucket
            entry->value = value;
            pthread_mutex_unlock(&map->locks[index]);
            return 0;
        }
        entry = entry->next;
    }
    // key not found, create a new entry and insert it at the head of the bucket
    entry = malloc(sizeof(entry_t));
    if (entry == NULL) {
        perror("malloc");
        pthread_mutex_unlock(&map->locks[index]);
        return -1;
    }
    entry->key = key;
    entry->value = value;
    entry->next = map->buckets[index];
    map->buckets[index] = entry;
    // unlock the bucket and return 0
    pthread_mutex_unlock(&map->locks[index]);
    return 0;
}

判断互斥量是否被销毁

我们需要判断互斥量是否被销毁，那么我们就需要使用IsMutexDestroyed方法，遗憾的是，很多情况下我们拿不到这个方法的符号，我们可以采用地址偏移的方式查找函数指针，或者我们重写一个方法判断也可以，因为实现比较简单也没有引用外部变量。

执行IsMutexDestroyed，我们还需要获取当前互斥量的状态，我们根据源码，需要把pthread_mutex_t指针强转换为pthread_mutex_internal_t，这里的pthread_mutex_internal_t就有当前互斥量的状态，然后我们原子读取出来即可

static int handle_check_mutex_destroy(pthread_mutex_t *mutex_interface) {
#if defined(__LP64__)
    auto *mutex = reinterpret_cast(mutex_interface);
    uint16_t old_state = atomic_load_explicit(&mutex->state, memory_order_relaxed);
    if (old_state == 0xffff) {
        return 1;
    }
#endif
    __android_log_print(ANDROID_LOG_ERROR, "mooner", "%s", "unsupported define");

    return 0;
}


extern "C" {
int check_is_destroy_mutex(pthread_mutex_t *mutex_interface) {
    int result = handle_check_mutex_destroy(mutex_interface);
    return result;
}
}

因为我们拿不到实际上的pthread_mutex_internal_t，我们可以根据C++内存模型，自己在调用过程把类定义好。

#if defined(__LP64__)
struct pthread_mutex_internal_t {
    _Atomic(uint16_t) state;
    uint16_t __pad;
    union {
        atomic_int owner_tid;
        PIMutex pi_mutex;
    };
    char __reserved[28];

    PIMutex& ToPIMutex() {
        return pi_mutex;
    }

    void FreePIMutex() {
    }
} __attribute__((aligned(4)));


#else
struct pthread_mutex_internal_t {
    _Atomic(uint16_t) state;
    union {
        _Atomic(uint16_t) owner_tid;
        uint16_t pi_mutex_id;
    };

    PIMutex& ToPIMutex() {
        return PIMutexAllocator::IdToPIMutex(pi_mutex_id);
    }

    void FreePIMutex() {
        PIMutexAllocator::FreeId(pi_mutex_id);
    }
} __attribute__((aligned(4)));


#endif

当然，厂商有可能更改了这个模型，这点需要我们多测试。幸运的是，我测试了很多手机，都没出现兼容异常情况，针对Linux相关类修改的，还是比较少的。

调用时检测

修改完后，我们就可以通过got hook，把所有调用到互斥量的方法进行检测即可，原理就是在调用前检测互斥量状态，如果发生释放后使用情况，就把释放堆栈打印出来

static int use_pthread_mutex_timedlock(pthread_mutex_t* mutex_interface, const void * timeout){
    check_lock_state(mutex_interface);
    int result = BYTEHOOK_CALL_PREV(use_pthread_mutex_timedlock, pthread_mutex_timedlock_type , mutex_interface,timeout);
    BYTEHOOK_POP_STACK();
    return result;
}

demo log如下

发生了销毁后使用
destroyed mutex backtrace
backtrace #00 pc 00000000000080dc  /data/app/~~zA92L0CDzEmmH36g34ZlxA==/com.pika.mooner-JZDSOl3dD6KltKmev-4XKw==/base.apk!libxunwind.so (offset 0x2000)
#01 pc 0000000000007e8c  /data/app/~~zA92L0CDzEmmH36g34ZlxA==/com.pika.mooner-JZDSOl3dD6KltKmev-4XKw==/base.apk!libxunwind.so (offset 0x2000)
#02 pc 00000000000090ec  /data/app/~~zA92L0CDzEmmH36g34ZlxA==/com.pika.mooner-JZDSOl3dD6KltKmev-4XKw==/base.apk!libxunwind.so (offset 0x2000) (xunwind_cfi_get+88)
#03 pc 00000000000032e4  /data/app/~~zA92L0CDzEmmH36g34ZlxA==/com.pika.mooner-JZDSOl3dD6KltKmev-4XKw==/base.apk!libmooner_core.so (offset 0x14000)
#04 pc 0000000000000b1c  /data/app/~~zA92L0CDzEmmH36g34ZlxA==/com.pika.mooner-JZDSOl3dD6KltKmev-4XKw==/base.apk!libmooner.so (offset 0x39000) (Java_com_pika_mooner_MainActivity_createDestroyedPthreadMutex+92)
#05 pc 000000000013ced4  /apex/com.android.art/lib64/libart.so (art_quick_generic_jni_trampoline+148)
#06 pc 0000000000133564  /apex/com.android.art/lib64/libart.so (art_quick_invoke_stub+548)

总结

通过本文，我们学习了如何通过native hook进行一些问题跟踪，同时也通过源码解析，把问题提前暴露出来并给出相关的堆栈，这种思想其实在很多优化监控场景下都适用，希望读者们能够有收获！

监控called on a destroyed mutex ：升级target sdk版本后行为变更

pthread_mutex_destroy 代理

backtrace写入

Map写入

判断互斥量是否被销毁

调用时检测

Win10电脑怎么设置默认账户登录？

CentOS 7系统安装详解

Win10命令提示符打开方法

Win11启动慢怎么办

如何在 AlmaLinux 8 上安装 PyCharm