全表数据核对 ,行数据核对,列数据核对,Mysql 8.0 实例
可以去下载 classicmodels 数据库资源如下
[ 点击:classicmodels官网下载]
[也可以去CSDN 去免费资源下载]
前言
有没有在工作中碰到这样的数据场景:
由于前端技术构架的改变,前端订单系统已经很久没有人维护了,现出现了很多bug。所以需要改造,将原来的 .net 技术构架 改为java,计划用一个月的时间,2个系统并行,并最终切换。请数据部配合验证和核对数据。
我们假设 2个表的数据结构一样。那怎么验证呢具体如下:
一、数据准备
原来的表结构
CREATE TABLE `orderdetails` (
`orderNumber` int NOT NULL,
`productCode` varchar(15) NOT NULL,
`quantityOrdered` int NOT NULL,
`priceEach` decimal(10,2) NOT NULL,
`orderLineNumber` smallint NOT NULL,
PRIMARY KEY (`orderNumber`,`productCode`),
KEY `productCode` (`productCode`),
CONSTRAINT `orderdetails_ibfk_1` FOREIGN KEY (`orderNumber`) REFERENCES `orders` (`orderNumber`),
CONSTRAINT `orderdetails_ibfk_2` FOREIGN KEY (`productCode`) REFERENCES `products` (`productCode`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
新建的表结构
CREATE TABLE `orderdetails_test` (
`orderNumber` int NOT NULL,
`productCode` varchar(15) NOT NULL,
`quantityOrdered` int NOT NULL,
`priceEach` decimal(10,2) NOT NULL,
`orderLineNumber` smallint NOT NULL,
PRIMARY KEY (`orderNumber`,`productCode`),
KEY `productCode` (`productCode`),
CONSTRAINT `orderdetails_ibfk_3` FOREIGN KEY (`orderNumber`) REFERENCES `orders` (`orderNumber`),
CONSTRAINT `orderdetails_ibfk_4` FOREIGN KEY (`productCode`) REFERENCES `products` (`productCode`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
改为 orderdetails_ibfk_3,orderdetails_ibfk_4
CONSTRAINT
orderdetails_ibfk_3
FOREIGN KEY (orderNumber
) REFERENCESorders
(orderNumber
)CONSTRAINT
orderdetails_ibfk_4
FOREIGN KEY (productCode
) REFERENCESproducts
(productCode
)
插入表
insert into classicmodels.orderdetails_test
select * from classicmodels.orderdetails b
插入数据
然后我们 假设 orderdetails_test有3个数据有异常我们更新一下
update orderdetails_test set priceEach='102.00'
where priceEach='102.04' and productCode='S18_1589' and orderNumber='10161'
update orderdetails_test set priceEach='83.87'
where priceEach='83.86' and productCode='S24_3816' and orderNumber='10273'
update orderdetails_test set orderLineNumber='1'
where priceEach='136.00' and productCode='S18_1749' and orderNumber='10100'
好了到这里就结束了,假设我们不知道数据有异常。
二、开始核对数据核对(重点来了)
1.检查表结构是否一致
代码如下(示例):
使用DESCRIBE语句可以查看表结构的
desc classicmodels.orderdetails
desc classicmodels.orderdetails_test
发现表结构一致
2.检查记录条数是否一致
代码如下(示例):
SELECT COUNT(*) FROM classicmodels.orderdetails
SELECT COUNT(*) FROM classicmodels.orderdetails_test
发现行数一致
3.检查数据是否一致
通过 orderNumber 字段关联确认数据是否一致
SELECT * FROM orderdetails_test WHERE
NOT EXISTS (SELECT * FROM orderdetails
WHERE orderdetails.orderNumber=orderdetails_test.orderNumber)
通过 ProductCode 字段关联确认数据是否一致
SELECT * FROM orderdetails_test WHERE
NOT EXISTS (SELECT * FROM orderdetails
WHERE orderdetails.ProductCode=orderdetails_test.ProductCode)
没有记录,说明 数据一致
通过 quantityOrdered 字段关联确认数据是否一致
SELECT * FROM orderdetails_test WHERE
NOT EXISTS (SELECT * FROM orderdetails
WHERE orderdetails.quantityOrdered =orderdetails_test.quantityOrdered)
没有记录,说明数据一致
通过 priceEach 字段关联确认数据是否一致
SELECT * FROM orderdetails_test WHERE
NOT EXISTS (SELECT * FROM orderdetails
WHERE orderdetails.priceEach =orderdetails_test.priceEach)
有记录,说明数据不一致,ok 找到一条
到这里大家觉有问题吗?怎么少一条。是sql 写的不严谨?
4.检查数据是否一致 第2种方案 (CRC32 函数核对)
mysql crc32 函数
具体可以看下mysql 帮助
Computes a cyclic redundancy check value and returns a 32-bit unsigned value. The result is NULL if the argument is NULL. The argument is expected to be a string and (if possible) is treated as one if it is not.
检查2个表列是否一致
select '原系统表' as 'titel',sum(CRC32(ordernumber)),
sum(CRC32(productCode)),
sum(CRC32(quantityOrdered)),
sum(CRC32(priceEach)),
sum(CRC32(orderLineNumber) )
from classicmodels.orderdetails
union all
select '新系统表' as 'titel',sum(CRC32(ordernumber)),
sum(CRC32(productCode)),
sum(CRC32(quantityOrdered)),
sum(CRC32(priceEach)),
sum(CRC32(orderLineNumber) )
from classicmodels.orderdetails_test
这样立刻就知道有2列 priceEach和orderLineNumber 数据和原表不一样了
从列数据定位到行数据
select a.* from (
select ordernumber,productCode,priceEach,quantityOrdered,orderLineNumber,
CRC32(priceEach)+CRC32(orderLineNumber)+CRC32(productCode)+CRC32(quantityOrdered)+CRC32(ordernumber) as num
from classicmodels.orderdetails_test ) a
left join
(select priceEach,orderLineNumber,
CRC32(priceEach)+CRC32(orderLineNumber)+CRC32(productCode)+CRC32(quantityOrdered)+CRC32(ordernumber) as num
from classicmodels.orderdetails ) b
on a.num=b.num
where b.num is null
这样就快速找到了
注意CRC32 对 空格,回车也可以识别额
select CRC32(' ')
文本对比
select crc32('我爱北京天安门,天安门上太阳升,') as 'txt'
union all
select crc32('我爱北京天安门,天安门上太阳升, ') as 'txt'
union all
select crc32('我爱北京天安门,天安门上太阳升,
') as 'txt'
union all
select crc32('我爱北京天安门,天安门上太阳升,') as 'txt'
5.其他方法
那之前的方案就没有办法办法实现是了吗?
有办法,
SELECT * FROM orderdetails_test WHERE
NOT EXISTS (SELECT * FROM orderdetails
WHERE orderdetails.priceEach =orderdetails_test.priceEach
and orderdetails.orderLineNumber =orderdetails_test.orderLineNumber
and orderdetails.ordernumber =orderdetails_test.ordernumber
and orderdetails.productCode =orderdetails_test.productCode
and orderdetails.productCode =orderdetails_test.productCode)
以上的字段比较少,如果 50~80个列你们怎么解决,是不是用 crc32 会快很多
总结
这个2个方法都可行,但是用 CRC32 函数在文本对文本对比 效率会比较高 。希望大家学到了,这个是之前工作的一些笔记。