线上MongoDB查询慢，如何通过索引优化直降响应时间？

数据运维 2024-04-29 向阳逐梦手机阅读

作者 | 吴守阳

审校 | 重楼

背景

线上某个页面的响应速度异常缓慢，达到了16秒，严重影响了业务的正常运行。经过与研发的沟通得知，该页面调用的数据集合只会保留7天的数据，集合有6000万条记录。针对过期数据的处理，使用了根据 create_time 字段创建的过期索引，以自动使数据失效。此外，数据集合还通过 company_id 字段进行了哈希分片。

问题排查

慢语句分析

在后台拿到了慢查询语句，如下：

db.visitor.find({
 "company_id": 13272,
 "create_time": {
 "$gte": ISODate("2024-04-11T00:00:00.000+0800"),
 "$lte": ISODate("2024-04-11T23:59:59.000+0800")
 }
});

db.visitor.find({
 "company_id": 13272,
 "create_time": {
 "$gte": ISODate("2024-04-12T00:00:00.000+0800"),
 "$lte": ISODate("2024-04-18T23:59:59.000+0800")
 }
});

很简单的一个查询，语句上没有再优化的必要了。如果索引都在不应该出现这种十多秒的耗时，接下来开始分析索引。

索引分析

索引如下：

db.getCollection("visitor").createIndex({
 "company_id": "hashed"
}, {
 name: "company_id_hashed"
});

db.getCollection("visitor").createIndex({
 "company_id": NumberInt("1")
}, {
 name: "company_id_1"
});

db.getCollection("visitor").createIndex({
 "create_time": NumberInt("1")
}, {
 name: "create_time_1",
 expireAfterSeconds: NumberInt("604800")
});

其中：

company_id_hashed：创建集合分片使用的hash索引
company_id_1：普通查询的索引
create_time_1：过期时间的索引就这点数据量，按理说会用到索引的，不应该执行耗时16s，接下来执行计划分析。

Explain执行计划

winningPlan

"stage": "SHARDING_FILTER",
 "inputStage": {
 "stage": "FETCH",
 "filter": {
 "$and": [
 {
 "company_id": {
 "$eq": 13272
 }
 },
 {
 "create_time": {
 "$lte": ISODate("2024-04-17T15:59:59.000Z")
 }
 },
 {
 "create_time": {
 "$gte": ISODate("2024-04-10T16:00:00.000Z")
 }
 }
 ]
 },
 "inputStage": {
 "stage": "IXSCAN",
 "keyPattern": {
 "company_id": "hashed"
 },
 "indexName": "company_id_hashed",
 "isMultiKey": false,
 "isUnique": false,
 "isSparse": false,
 "isPartial": false,
 "indexVersion": NumberInt("2"),
 "direction": "forward",
 "indexBounds": {
 "company_id": [
 "[7977521071453068053, 7977521071453068053]"

这部分显示只用到了company_id_hashed索引，没有用到create_time_1索引。

rejectedPlans

"stage": "SHARDING_FILTER",
 "inputStage": {
 "stage": "FETCH",
 "filter": {
 "company_id": {
 "$eq": 13272
 }
 },
 "inputStage": {
 "stage": "IXSCAN",
 "keyPattern": {
 "create_time": 1
 },
 "indexName": "create_time_1",
 "isMultiKey": false,
 "multiKeyPaths": {
 "create_time": [ ]
 },
 "isUnique": false,
 "isSparse": false,
 "isPartial": false,
 "indexVersion": NumberInt("2"),
 "direction": "forward",
 "indexBounds": {
 "create_time": [
 "[new Date(1712764800000), new Date(1713369599000)]"
 ]
 }
 }
 }
 },
 {
 "stage": "SHARDING_FILTER",
 "inputStage": {
 "stage": "FETCH",
 "filter": {
 "$and": [
 {
 "create_time": {
 "$lte": ISODate("2024-04-17T15:59:59.000Z")
 }
 },
 {
 "create_time": {
 "$gte": ISODate("2024-04-10T16:00:00.000Z")
 }
 }
 ]
 },
 "inputStage": {
 "stage": "IXSCAN",
 "keyPattern": {
 "company_id": 1
 },
 "indexName": "company_id_1",
 "isMultiKey": false,
 "multiKeyPaths": {
 "company_id": [ ]
 },

这部分显示的是被拒绝的执行计划列表，不会用到company_id_1、create_time_1索引。

executionStats

"nReturned": NumberInt("229707"),
 "executionTimeMillis": NumberInt("15668"),
 "totalKeysExamined": NumberInt("238012"),
 "totalDocsExamined": NumberInt("238012"),
 "executionStages": {
 "stage": "SINGLE_SHARD",
 "nReturned": NumberInt("229707"),
 "executionTimeMillis": NumberInt("15668"),
 "totalKeysExamined": NumberInt("238012"),
 "totalDocsExamined": NumberInt("238012"),
 "totalChildMillis": NumberLong("15667"),
 "shards": [
 {
 "shardName": "d-m5eee03fdeaeaee4",
 "executionSuccess": true,
 "executionStages": {
 "stage": "SHARDING_FILTER",
 "nReturned": NumberInt("229707"),
 "executionTimeMillisEstimate": NumberInt("14996"),
 "works": NumberInt("238013"),
 "advanced": NumberInt("229707"),
 "needTime": NumberInt("8305"),
 "needYield": NumberInt("0"),
 "saveState": NumberInt("1980"),
 "restoreState": NumberInt("1980"),
 "isEOF": NumberInt("1"),
 "chunkSkips": NumberInt("0"),
 "inputStage": {
 "stage": "FETCH",
 "filter": {
 "$and": [
 {
 "company_id": {
 "$eq": 13272
 }
 },
 {
 "create_time": {
 "$lte": ISODate("2024-04-17T15:59:59.000Z")
 }
 },
 {
 "create_time": {
 "$gte": ISODate("2024-04-10T16:00:00.000Z")
 }
 }
 ]
 },
 "nReturned": NumberInt("229707"),
 "executionTimeMillisEstimate": NumberInt("14595"),
 "works": NumberInt("238013"),
 "advanced": NumberInt("229707"),
 "needTime": NumberInt("8305"),
 "needYield": NumberInt("0"),
 "saveState": NumberInt("1980"),
 "restoreState": NumberInt("1980"),
 "isEOF": NumberInt("1"),
 "docsExamined": NumberInt("238012"),
 "alreadyHasObj": NumberInt("0"),
 "inputStage": {
 "stage": "IXSCAN",
 "nReturned": NumberInt("238012"),
 "executionTimeMillisEstimate": NumberInt("251"),
 "works": NumberInt("238013"),
 "advanced": NumberInt("238012"),
 "needTime": NumberInt("0"),
 "needYield": NumberInt("0"),
 "saveState": NumberInt("1980"),
 "restoreState": NumberInt("1980"),
 "isEOF": NumberInt("1"),
 "keyPattern": {
 "company_id": "hashed"
 },
 "indexName": "company_id_hashed",
 "isMultiKey": false,
 "isUnique": false,
 "isSparse": false,
 "isPartial": false,
 "indexVersion": NumberInt("2"),
 "direction": "forward",
 "indexBounds": {
 "company_id": [
 "[7977521071453068053, 7977521071453068053]"
 ]
 },
 "keysExamined": NumberInt("238012"),
 "seeks": NumberInt("1"),
 "dupsTested": NumberInt("0"),
 "dupsDropped": NumberInt("0")

这部分显示的是查询的执行统计信息。

索引分析

通过explain的执行计划，可以看出索引的使用上存在问题。按理说company_id、create_time都已创建索引，为什么没有使用上？是什么原因导致它失效，没有用上create_time索引？

下面列举了失效的情况：

索引选择性不高：由于查询条件是一个范围查询，create_time 字段可能有许多不同的值满足条件。因此，单键索引 create_time_1 的选择性（即索引中不同值的比例）可能不高，这使得使用该索引无法有效地减少需要检索的文档数量。
查询需要跨越多个索引键值：查询涉及到了两个字段 company_id 和 create_time。虽然索引 create_time_1 可以帮助过滤 create_time 符合条件的文档，但在执行查询时，还需要考虑 company_id 的匹配条件。因此，MongoDB 需要在两个索引之间进行查找和合并，而不是简单地使用单个索引来解决查询。
额外的查找和合并成本：在涉及多个条件的查询中，MongoDB 会尝试使用覆盖索引（Covered Index）来尽可能地减少在磁盘上的文档检索。然而，在这种情况下，create_time_1 索引不能单独满足查询条件，因此 MongoDB 还需要查找和合并从 company_id_1 索引中过滤出来的文档。这种额外的查找和合并过程会增加查询的成本，并且降低性能。因此，针对给定的查询语句，MongoDB 不会使用 create_time_1 索引来优化查询，而是会选择其他更适合的索引，如 company_id_hashed 和 company_id_1。

问题原因

造成执行耗时过长的主要原因是索引失效的问题，在涉及多个条件的查询中，MongoDB 会尝试使用覆盖索引（Covered Index）来尽可能地减少在磁盘上的文档检索。然而，在这种情况下，create_time_1 索引不能单独满足查询条件，因此 MongoDB 还需要查找和合并从 company_id_1 索引中过滤出来的文档。这种额外的查找和合并过程会增加查询的成本，并且降低性能。

优化方案

创建新的复合索引company_id_create_time，让其走company_id_hashed到company_id_create_time的链路。添加新的索引后，相同的语句执行时间只需要400ms，能满足业务的需求。

结论

要多关注索引在什么情况下会失效？复合索引的先后顺序，不是每个条件字段都建个单个普通索引，查询语句都会使用上，不要存在这种误区，有时候复合索引才是最完美的组合。

执行计划详解

1、queryPlanner：包含了MongoDB查询的执行计划。

mongosPlannerVersion：MongoDB计划版本。
winningPlan：胜出的执行计划，即MongoDB选择的最佳执行计划。
shards: 分片的详细信息，包括分片名称、连接字符串、服务器信息等。2、winningPlan: 胜出的执行计划。
stage: 执行阶段，这里是SINGLE_SHARD，表示单分片操作。
shardName: 执行操作的分片名称。
plannerVersion: 计划版本。
namespace: 查询的命名空间。
indexFilterSet: 是否设置了索引过滤器。
parsedQuery: 解析后的查询条件。
winningPlan: 胜出的执行计划的详细信息，这里是SHARDING_FILTER。3、rejectedPlans: 被拒绝的执行计划列表，即非胜出的备选计划。每个被拒绝的执行计划包含了其详细信息，包括执行阶段、过滤器、索引扫描等。4、executionStats: 查询的执行统计信息。
nReturned: 返回的文档数量。
executionTimeMillis: 查询执行时间（毫秒）。
totalKeysExamined: 总共检查的键数量。
totalDocsExamined: 总共检查的文档数量。
executionStages: 执行阶段的详细统计信息。

作者介绍

吴守阳，51CTO社区编辑，拥有8年DBA工作经验，熟练管理MySQL、Redis、MongoDB等开源数据库。精通性能优化、备份恢复和高可用性架构设计。善于故障排除和自动化运维，保障系统稳定可靠。具备良好的团队合作和沟通能力，致力于为企业提供高效可靠的数据库解决方案。

线上MongoDB查询慢，如何通过索引优化直降响应时间？

背景

问题排查

慢语句分析

索引分析

Explain执行计划

winningPlan

rejectedPlans

executionStats

索引分析

问题原因

优化方案

结论

执行计划详解

作者介绍

民工哥死磕Redis教程（六）：事务详解

mysql字符集和校对规则(Mysql校对集)

MySQL数据库和Go语言：如何进行高效的数据存储？

轻松入门MySQL：简明教程解析数据存储与管理（1）

sqlserver 聚集索引和非聚集索引实例