LeetCode mysql 刷题四:餐馆营业额变化增长——用自连和窗口函数 4 种 sql 实现过去 7 天的营业额

2023年 9月 23日 55.4k 0

题目

题目链接:餐馆营业额变化增长

你是餐馆的老板,现在你想分析一下可能的营业额变化增长(每天至少有一位顾客)。

计算以 7 天(某日期 + 该日期前的 6 天)为一个时间段的顾客消费平均值。average_amount 要 保留两位小数。

结果按 visited_on 升序排序。

返回结果格式的例子如下。

Create table If Not Exists Customer (customer_id int, name varchar(20), visited_on date, amount int);
Truncate table Customer;
insert into Customer (customer_id, name, visited_on, amount) values ('1', 'Jhon', '2019-01-01', '100');
insert into Customer (customer_id, name, visited_on, amount) values ('2', 'Daniel', '2019-01-02', '110');
insert into Customer (customer_id, name, visited_on, amount) values ('3', 'Jade', '2019-01-03', '120');
insert into Customer (customer_id, name, visited_on, amount) values ('4', 'Khaled', '2019-01-04', '130');
insert into Customer (customer_id, name, visited_on, amount) values ('5', 'Winston', '2019-01-05', '110');
insert into Customer (customer_id, name, visited_on, amount) values ('6', 'Elvis', '2019-01-06', '140');
insert into Customer (customer_id, name, visited_on, amount) values ('7', 'Anna', '2019-01-07', '150');
insert into Customer (customer_id, name, visited_on, amount) values ('8', 'Maria', '2019-01-08', '80');
insert into Customer (customer_id, name, visited_on, amount) values ('9', 'Jaze', '2019-01-09', '110');
insert into Customer (customer_id, name, visited_on, amount) values ('1', 'Jhon', '2019-01-10', '130');
insert into Customer (customer_id, name, visited_on, amount) values ('3', 'Jade', '2019-01-10', '150');

-- 时间不连续的例子
insert into Customer (customer_id, name, visited_on, amount) values ('1', 'Jhon', '2019-01-01', '100');
insert into Customer (customer_id, name, visited_on, amount) values ('4', 'Khaled', '2019-01-04', '130');
insert into Customer (customer_id, name, visited_on, amount) values ('5', 'Winston', '2019-01-05', '110');
insert into Customer (customer_id, name, visited_on, amount) values ('6', 'Elvis', '2019-01-06', '140');
insert into Customer (customer_id, name, visited_on, amount) values ('7', 'Anna', '2019-01-07', '150');
insert into Customer (customer_id, name, visited_on, amount) values ('8', 'Maria', '2019-01-08', '80');
insert into Customer (customer_id, name, visited_on, amount) values ('9', 'Jaze', '2019-01-09', '110');
insert into Customer (customer_id, name, visited_on, amount) values ('1', 'Jhon', '2019-01-10', '130');
insert into Customer (customer_id, name, visited_on, amount) values ('3', 'Jade', '2019-01-10', '150');
Customer 表:
+-------------+--------------+--------------+-------------+
| customer_id | name         | visited_on   | amount      |
+-------------+--------------+--------------+-------------+
| 1           | Jhon         | 2019-01-01   | 100         |
| 2           | Daniel       | 2019-01-02   | 110         |
| 3           | Jade         | 2019-01-03   | 120         |
| 4           | Khaled       | 2019-01-04   | 130         |
| 5           | Winston      | 2019-01-05   | 110         |
| 6           | Elvis        | 2019-01-06   | 140         |
| 7           | Anna         | 2019-01-07   | 150         |
| 8           | Maria        | 2019-01-08   | 80          |
| 9           | Jaze         | 2019-01-09   | 110         |
| 1           | Jhon         | 2019-01-10   | 130         |
| 3           | Jade         | 2019-01-10   | 150         |
+-------------+--------------+--------------+-------------+
在 SQL 中,(customer_id, visited_on) 是该表的主键。
该表包含一家餐馆的顾客交易数据。
visited_on 表示 (customer_id) 的顾客在 visited_on 那天访问了餐馆。
amount 是一个顾客某一天的消费总额。

输出:
+--------------+--------------+----------------+
| visited_on   | amount       | average_amount |
+--------------+--------------+----------------+
| 2019-01-07   | 860          | 122.86         |
| 2019-01-08   | 840          | 120            |
| 2019-01-09   | 840          | 120            |
| 2019-01-10   | 1000         | 142.86         |
+--------------+--------------+----------------+
解释:
第一个七天消费平均值从 2019-01-01 到 2019-01-07 是restaurant-growth/restaurant-growth/ (100 + 110 + 120 + 130 + 110 + 140 + 150)/7 = 122.86
第二个七天消费平均值从 2019-01-02 到 2019-01-08 是 (110 + 120 + 130 + 110 + 140 + 150 + 80)/7 = 120
第三个七天消费平均值从 2019-01-03 到 2019-01-09 是 (120 + 130 + 110 + 140 + 150 + 80 + 110)/7 = 120
第四个七天消费平均值从 2019-01-04 到 2019-01-10 是 (130 + 110 + 140 + 150 + 80 + 110 + 130 + 150)/7 = 142.86

本题考察的知识是如何累加一段时间区间内的值

有两种实现方式:

  • 使用窗口函数,窗口函数比较好理解使用 6 PRECEDING AND current ROW 就能查找出来了(方案一)
  • 使用自连,连接条件不太容易想到,需要使用 DATEDIFF 函数,这个函数可以计算两个日期之间的天数,然后使用 BETWEEN 条件(方案二和方案三)
  • 解析

  • 要知道过去 7 天的平均消费额,需要先知道每天的总消费额,作为临时表 tmp1
    select visited_on, sum(amount) sum_amount from Customer group by visited_on

    +-------------+--------------+
    | visited_on  | sum_amount   |
    +-------------+--------------+
    | 2019-01-01  |  100         |
    | 2019-01-02  |  110         |
    | 2019-01-03  |  120         |
    | 2019-01-04  |  130         |
    | 2019-01-05  |  110         |
    | 2019-01-06  |  140         |
    | 2019-01-07  |  150         |
    | 2019-01-08  |  80          |
    | 2019-01-09  |  110         |
    | 2019-01-10  |  280         |
    +-------------+--------------+
    
  • 使用窗口函数,计算过去 7 天的总的消费额,作为临时表 tmp2
    select sum(sum_amount) sum_amount over (order by to_days(visited_on) range between 6 preceding and current row) as sum_amount from tmp1

    | visited_on  | sum_amount   |
    +-------------+--------------+
    | 2019-01-01  | 100          |
    | 2019-01-02  | 210          |
    | 2019-01-03  | 330          |
    | 2019-01-04  | 460          |
    | 2019-01-05  | 570          |
    | 2019-01-06  | 710          |
    | 2019-01-07  | 860          |
    | 2019-01-08  | 840          |
    | 2019-01-09  | 840          |
    | 2019-01-10  | 1000         |
    +-------------+--------------+
    
  • 计算过去 7 天的平均消费额,作为临时表 tmp3
    select visited_on, sum_amount amount, sum_amount / 7 as average_amount from tmp2

    | visited_on  | sum_amount   | average_amount |
    +-------------+--------------+----------------+
    | 2019-01-01  | 100	        | 14.2857        |
    | 2019-01-02  | 210	        | 30.0000        |
    | 2019-01-03  | 330	        | 47.1429        |
    | 2019-01-04  | 460	        | 65.7143        |
    | 2019-01-05  | 570	        | 81.4286        |
    | 2019-01-06  | 710	        | 101.4286       |
    | 2019-01-07  | 860	        | 122.8571       |
    | 2019-01-08  | 840	        | 120.0000       |
    | 2019-01-09  | 840	        | 120.0000       |
    | 2019-01-10  | 1000	     | 142.8571       |
    +-------------+-------------+----------------+
    
  • 筛选出计算数据大于等于七天的数据

    • 需要知道表中日期最小的一天,作为临时表 tmp4
      select min(visited_on) min_visited_on from Customer
    | min_visited_on  |
    +-----------------+
    | 2019-01-01      |
    +-----------------+
    
    • 使用 datediff(expr1, expr2) 函数,计算两个日期之间的天数,这里需要大于等于 6
      select visited_on, amount, round(average_amount, 2) average_amount from tmp3 where datediff(visited_on, (select min(visited_on) from Customer)) >= 6
    | visited_on  | amount       | average_amount |
    +-------------+--------------+----------------+
    | 2019-01-07	| 860          |  122.8571      |
    | 2019-01-08	| 840          |  120.0000      |
    | 2019-01-09	| 840          |  120.0000      |
    | 2019-01-10	| 1000         |  142.8571      |
    +-------------+--------------+----------------+
    
  • 最终 sql 语句如下:

    SELECT
       visited_on,
       sum_amount amount,
       ROUND( sum_amount / 7, 2 ) average_amount
    FROM (
       SELECT
          visited_on,
          SUM( sum_amount ) OVER ( ORDER BY to_days(visited_on) RANGE BETWEEN 6 PRECEDING AND current ROW ) sum_amount
       FROM (
          SELECT
             visited_on,
             SUM( amount ) sum_amount
          FROM Customer
          GROUP BY visited_on
       ) tmp1
    ) tmp2
    WHERE DATEDIFF(visited_on, ( SELECT MIN( visited_on ) FROM Customer )) >= 6;
    

    上面 sql 可以简化一下,不过有问题,就是如果时间不连续,排序不会跳过。

    也就是说 rk > 7 只能筛选出连续 7 天的数据

    SELECT
       visited_on,
       amount,
       SUM( amount / 7, 2 ) average_amount
    FROM (
       SELECT
          visited_on,
          RANK() OVER ( ORDER BY visited_on ) AS rk,
          SUM(SUM( amount )) OVER ( ORDER BY visited_on RANGE INTERVAL 7-1 DAY PRECEDING ) AS amount
       FROM Customer GROUP BY visited_on
    ) AS tep WHERE rk >= 7 ORDER BY 1
    

    方法二

    此方法是使用自连,连接的条件是时间连续 7 天,这个方法如果时间不连续,就会有问题

    WITH t AS (
       SELECT visited_on, SUM( amount ) amount FROM Customer GROUP BY visited_on
    )
    SELECT a.visited_on, SUM( b.amount ) amount, ROUND( AVG( b.amount ), 2 ) average_amount
    FROM t a, t b
    WHERE DATEDIFF( a.visited_on, b.visited_on ) BETWEEN 0 AND 6
    GROUP BY a.visited_on COUNT(*) = 7;
    

    方法三

    SELECT
       a.visited_on,
       sum( b.amount ) AS amount,
       round( sum( b.amount ) / 7, 2 ) AS average_amount
    FROM
       ( SELECT DISTINCT visited_on FROM Customer ) a
       JOIN Customer b ON datediff( a.visited_on, b.visited_on ) BETWEEN 0 AND 6
    WHERE
       a.visited_on >= ( SELECT min( visited_on ) FROM Customer ) + 6
    GROUP BY a.visited_on
    ORDER BY visited_on
    

    往期 MySQL 题目

  • MySQL 题目
  • LeetCode mysql 刷题一:计算特殊奖金 | 买下所有产品的客户
  • LeetCode mysql 刷题二:电影评分——判断日期的五种方法
  • LeetCode mysql 刷题三:确认率——MySQL 中的 null 处理 | 判断三角形的四种方法
  • 相关文章

    JavaScript2024新功能:Object.groupBy、正则表达式v标志
    PHP trim 函数对多字节字符的使用和限制
    新函数 json_validate() 、randomizer 类扩展…20 个PHP 8.3 新特性全面解析
    使用HTMX为WordPress增效:如何在不使用复杂框架的情况下增强平台功能
    为React 19做准备:WordPress 6.6用户指南
    如何删除WordPress中的所有评论

    发布评论