点击上方“IT那活儿”公众号--专注于企业全栈运维技术分享,不管IT什么活儿,干就完了!!!
简介:相比于行存,TiFlash 根据强 Schema 按列式存储结构化数据,借助 ClickHouse 的向量化计算引擎,带来读取和计算双重性能优势。相较于普通列存,TiFlash 则具有实时更新、分布式自动扩展、SI(Snapshot Isolation)隔离级别读取等优势。
TiFlash 对 OLAP 查询加速1.1 OLAP 类的查询通常具有以下几个特点
- 每次查询读取大量的行,但是仅需要少量的列;
- 宽表,即每个表包含着大量的列;
- 查询通过一张或多张小表关联一张大表,并对大表上的列做聚合。
1.2 TiFlash 列存引擎针对这类查询有较好的优化效果1) I/O 优化
- 每次查询可以只读取需要的列,减少了 I/O 资源的使用;
- 同列数据类型相同,相较于行存可以获得更高的压缩比;
- 整体的 I/O 减少,令内存的使用更加高效。
2) CPU 优化
- 列式存储可以很方便地按批处理字段,充分利用 CPU Cache 取得更好的局部性;
-
利用向量化处理指令并行处理部分计算。
实验过程2.1 创建tidb cloud账号2.2 创建tidb 集群2.3 通过tidb cloud import导入1000条和100万条基础测试数据2.4 创建客户、订单、产品表,插入测试数据
mysql> CREATE TABLE customers (
-> id INT AUTO_INCREMENT PRIMARY KEY,
-> name VARCHAR(100),
-> email VARCHAR(100),
-> created_at DATETIME
-> ) ;
Query OK, 0 rows affected (0.71 sec)
mysql> CREATE TABLE orders (
-> id INT AUTO_INCREMENT PRIMARY KEY,
-> customer_id INT,
-> product_id INT,
-> quantity INT,
-> price DECIMAL(10, 2),
-> timestamp DATETIME
-> ) ;
Query OK, 0 rows affected (0.46 sec)
mysql> CREATE TABLE products (
-> id INT AUTO_INCREMENT PRIMARY KEY,
-> name VARCHAR(100),
-> category VARCHAR(100),
-> price DECIMAL(10, 2)
-> ) ;
Query OK, 0 rows affected (0.56 sec)
-- 顾客表插入100万行数据
mysql> INSERT INTO customers (name, email, created_at)
-> SELECT
-> CONCAT('Customer', seq),
-> CONCAT('customer', seq, '@example.com'),
-> NOW()
-> FROM
-> (SELECT seq FROM seq_1_to_1000000) s;
Query OK, 1000000 rows affected (11.96 sec)
Records: 1000000 Duplicates: 0 Warnings: 0
-- 产品表插入1000行数据
mysql> INSERT INTO products (name, category, price)
-> SELECT
-> CONCAT('Product', seq),
-> CONCAT('Category', FLOOR(RAND()*(10-1+1))+1),
-> ROUND(RAND()*(1000-1+1), 2)
-> FROM
-> (SELECT seq FROM seq_1_to_1000) s;
Query OK, 1000 rows affected (0.91 sec)
Records: 1000 Duplicates: 0 Warnings: 0
mysql>
2.5 通过上图可以看出数据已入库并且按照行存储,并执行关联,求和操作,耗时1229ms2.6 构建TIFLASH 副本
mysql> ALTER TABLE test.customers SET TIFLASH REPLICA 1;
Query OK, 0 rows affected, 1 warning (0.87 sec)
mysql> ALTER TABLE test.orders SET TIFLASH REPLICA 1;
Query OK, 0 rows affected, 1 warning (1.65 sec)
mysql> ALTER TABLE test.products SET TIFLASH REPLICA 1;
Query OK, 0 rows affected, 1 warning (0.85 sec)
查看表同步进度:
mysql> SELECT * FROM information_schema.tiflash_replica WHERE TABLE_SCHEMA = 'test' and TABLE_NAME = 'customers';
+--------------+------------+----------+---------------+-----------------+-----------+----------+
| TABLE_SCHEMA | TABLE_NAME | TABLE_ID | REPLICA_COUNT | LOCATION_LABELS | AVAILABLE | PROGRESS |
+--------------+------------+----------+---------------+-----------------+-----------+----------+
| test | customers | 123 | 2 | | 0 | 0 |
+--------------+------------+----------+---------------+-----------------+-----------+----------+
1 row in set (0.64 sec)
mysql> SELECT * FROM information_schema.tiflash_replica WHERE TABLE_SCHEMA = 'test' and TABLE_NAME = 'orders';
+--------------+------------+----------+---------------+-----------------+-----------+----------+
| TABLE_SCHEMA | TABLE_NAME | TABLE_ID | REPLICA_COUNT | LOCATION_LABELS | AVAILABLE | PROGRESS |
+--------------+------------+----------+---------------+-----------------+-----------+----------+
| test | orders | 134 | 2 | | 0 | 0 |
+--------------+------------+----------+---------------+-----------------+-----------+----------+
1 row in set (0.49 sec)
mysql> SELECT * FROM information_schema.tiflash_replica WHERE TABLE_SCHEMA = 'test' and TABLE_NAME = 'products';
+--------------+------------+----------+---------------+-----------------+-----------+----------+
| TABLE_SCHEMA | TABLE_NAME | TABLE_ID | REPLICA_COUNT | LOCATION_LABELS | AVAILABLE | PROGRESS |
+--------------+------------+----------+---------------+-----------------+-----------+----------+
| test | products | 128 | 2 | | 0 | 0 |
+--------------+------------+----------+---------------+-----------------+-----------+----------+
1 row in set (0.52 sec)
AVAILABLE 字段表示该表的 TiFlash 副本是否可用。1 代表可用,0 代表不可用。副本状态变为可用之后就不再改变,如果通过 DDL 命令修改副本数则会重新计算同步进度。PROGRESS 字段代表同步进度,在 0.0~1.0 之间,1 代表至少 1 个副本已经完成同步。刚才查询lineitem表副本不可用,再次查询已经可用。
mysql> SELECT * FROM information_schema.tiflash_replica WHERE TABLE_SCHEMA = 'test' and TABLE_NAME = 'customers';
+--------------+------------+----------+---------------+-----------------+-----------+----------+
| TABLE_SCHEMA | TABLE_NAME | TABLE_ID | REPLICA_COUNT | LOCATION_LABELS | AVAILABLE | PROGRESS |
+--------------+------------+----------+---------------+-----------------+-----------+----------+
| test | customers | 123 | 2 | | 1 | 1 |
+--------------+------------+----------+---------------+-----------------+-----------+----------+
1 row in set (1.94 sec)
mysql> SELECT * FROM information_schema.tiflash_replica WHERE TABLE_SCHEMA = 'test' and TABLE_NAME = 'products';
+--------------+------------+----------+---------------+-----------------+-----------+----------+
| TABLE_SCHEMA | TABLE_NAME | TABLE_ID | REPLICA_COUNT | LOCATION_LABELS | AVAILABLE | PROGRESS |
+--------------+------------+----------+---------------+-----------------+-----------+----------+
| test | products | 128 | 2 | | 1 | 1 |
+--------------+------------+----------+---------------+-----------------+-----------+----------+
1 row in set (0.77 sec)
mysql> SELECT * FROM information_schema.tiflash_replica WHERE TABLE_SCHEMA = 'test' and TABLE_NAME = 'orders';
+--------------+------------+----------+---------------+-----------------+-----------+----------+
| TABLE_SCHEMA | TABLE_NAME | TABLE_ID | REPLICA_COUNT | LOCATION_LABELS | AVAILABLE | PROGRESS |
+--------------+------------+----------+---------------+-----------------+-----------+----------+
| test | orders | 134 | 2 | | 1 | 1 |
+--------------+------------+----------+---------------+-----------------+-----------+----------+
1 row in set (0.66 sec)
2.7 如图所示,发送进行了列存储,执行相同的SQL查询结论:如图所示相同SQL,执行计划使用了TiFlash ,执行时间118ms;相比未使用TiFlash 执行时间1229ms,速度提升了10倍。备注:一条select 查询中,可以同时使用TIKV和TIFLSH,tidb默认会根据select查询进行判断。
END