【实战】表查询提升

# 【实战】表查询提升

# 员工考勤记录

有以下两张表：

-- 员工考勤记录
-- 创建日历表calendar
CREATE TABLE calendar(
  id             INTEGER NOT NULL PRIMARY KEY, -- 日历编号
  calendar_date  DATE NOT NULL UNIQUE, -- 日历日期
  calendar_year  INTEGER NOT NULL, -- 日历年
  calendar_month INTEGER NOT NULL, -- 日历月
  calendar_day   INTEGER NOT NULL, -- 日历日
  is_work_day    VARCHAR(1) DEFAULT 'Y' NOT NULL -- 是否工作日
);

-- 创建考勤记录表attendance
-- Oracle、MySQL、PostgreSQL 以及 SQLite
CREATE TABLE attendance(
  id         INTEGER NOT NULL PRIMARY KEY, -- 考勤记录编号
  check_date DATE NOT NULL, -- 考勤日期
  emp_id     INTEGER NOT NULL, -- 员工编号
  clock_in   TIMESTAMP, -- 上班打卡时间
  clock_out  TIMESTAMP, -- 下班打卡时间
  CONSTRAINT uk_attendance UNIQUE (check_date, emp_id)
);

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

问题：统计每个员工某个月的出勤情况？

以下sql可以显示出emp为1的在某个月的出勤情况吗？

select a.calendar_date,b.emp_id,b.clock_in,b.clock_out from calendar a left join attendance b
 on a.calendar_date = b.check_date
where emp_id = 1 and a.calendar_month =1 order by calendar_date asc;

1
2
3

答案是可以的，不过它有个缺陷，对于未打开的天数，它是不会显示的，也就是说它显示的天数不满一个月（如果该员工存在未打卡的情况）。

原因与解法

emp_id = 1 这个条件被放在了 WHERE 子句中，而不是在 ON 子句中。这会导致 LEFT JOIN 的效果变成 INNER JOIN，因为 WHERE 子句中的条件会过滤掉所有 attendance 表中没有匹配记录的行。

你需要将 emp_id = 1 这个条件放在 ON 子句中，这样 LEFT JOIN 才能正确地保留 calendar 表中的所有日期。

select a.calendar_date,b.emp_id,b.clock_in,b.clock_out from calendar a left join attendance b
 on a.calendar_date = b.check_date and emp_id =1
where a.calendar_month =1
order by calendar_date asc;

1
2
3
4

# 扩展-case-when

select a.calendar_date,b.emp_id,b.clock_in,b.clock_out,
   CASE
        WHEN b.clock_in IS NULL THEN '未打卡'
        when extract(HOUR from b.clock_in)> 9 THEN '迟到'
        when extract(hour from b.clock_out)<18 then '早退'
       else '打卡'
    END AS attendance_status
from calendar a left join attendance b
 on a.calendar_date = b.check_date and emp_id =1
where a.calendar_month =1
order by calendar_date asc;

1
2
3
4
5
6
7
8
9
10
11

CASE 语句在 SQL 中用于实现条件逻辑，类似于编程语言中的 if-else 语句。它允许你在查询中根据不同的条件返回不同的值。

在 SQL 中，EXTRACT 函数用于从日期或时间值中提取特定的部分，例如年、月、日、小时、分钟等。EXTRACT 函数的语法如下：

EXTRACT(part FROM datetime_expression)

性能：EXTRACT 函数在处理大量数据时可能会对性能产生一定影响，尽量优化查询以减少不必要的计算。

可读性：保持 CASE 语句的简洁和可读性，避免嵌套过多的 CASE 语句。

但上述sql仍存在一个问题，未打卡时，emp_id为null，怎样使其为1？

当然解法也是有的，以下解法仅供参考：

SELECT 
    a.calendar_date,
    COALESCE(b.emp_id, 1) AS emp_id,
    b.clock_in,
    b.clock_out,
    CASE 
        WHEN b.clock_in IS NULL THEN '未打卡'
        WHEN EXTRACT(HOUR FROM b.clock_in) > 9 THEN '迟到'
        WHEN EXTRACT(HOUR FROM b.clock_out) < 18 THEN '早退'
        ELSE '打卡'
    END AS attendance_status
FROM 
    calendar a
LEFT JOIN 
    attendance b
ON 
    a.calendar_date = b.check_date AND b.emp_id = 1
WHERE 
    a.calendar_month = 1
ORDER BY 
    a.calendar_date ASC;

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

COALESCE(b.emp_id, 1)：

COALESCE 函数用于返回第一个非 NULL 的参数。如果 b.emp_id 为 NULL，则返回 1。
这样可以确保即使在没有打卡记录的情况下，emp_id 也显示为 1。

# left join等链接有什么区别

# 双表连接的语法
select a.xx,b.yy from tableA a left join tableB b on a.id = b.aId

1
2

INNER JOIN：只返回两个表中都有的匹配记录。
LEFT JOIN：返回左表的所有记录，右表中没有匹配的记录则为 NULL。
RIGHT JOIN：返回右表的所有记录，左表中没有匹配的记录则为 NULL。
CROSS JOIN：返回两个表的笛卡尔积，即所有可能的组合。

这里需要注意的是————CROSS JOIN

CROSS JOIN 通常不需要 ON 子句，因为它生成的是两个表的笛卡尔积。
如果你在 CROSS JOIN 中使用 ON 子句，它实际上会变成 INNER JOIN 或其他类型的连接，因为 ON 子句会添加过滤条件。

SELECT * FROM tableA
CROSS JOIN tableB;
# tableA 中的每一行都会与 tableB 中的每一行进行组合 结果将是 M*N

1
2
3

# 畅销商品分析

有以下sql，请你列出“每个卖家销量最高的商品，并列出商品数量”。

-- 商家卖货记录表
CREATE TABLE t5 (
  seller_id int, 
  buyer_id int, 
  item_id int, 
  num int);

1
2
3
4
5
6

为了找出每个卖家销售量最大的商品，可以使用以下步骤（当然你也可以使用其他方法，这里的答案仅供参考）：

计算每个卖家每个商品的总销售数量。
为每个卖家的每个商品分配一个排名，根据销售数量从高到低排序。
选择每个卖家排名第一的商品。

WITH SalesSummary AS (
    SELECT
        seller_id,
        item_id,
        SUM(num) AS total_sales
    FROM
        t5
    GROUP BY
        seller_id,
        item_id
),
MaxSalesPerSeller AS (
    SELECT
        seller_id,
        MAX(total_sales) AS max_sales
    FROM
        SalesSummary
    GROUP BY
        seller_id
)
SELECT
    ss.seller_id,
    ss.item_id,
    ss.total_sales
FROM
    SalesSummary ss
JOIN
    MaxSalesPerSeller mss
ON
    ss.seller_id = mss.seller_id AND ss.total_sales = mss.max_sales;

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

WITH 子句在 SQL 中被称为 Common Table Expression (CTE)，它是一种临时的结果集，可以在查询中多次引用。WITH 子句通常用于简化复杂的查询，提高可读性，并且可以递归地定义数据。以下是它的语法

WITH CTE_Name AS (
    -- 查询语句
)
-- 主查询

1
2
3
4

其实mysql中是有rank()函数的，这里就不展示了。

# MySql聚合-分组及执行顺序

# 聚合函数

Mysql中内置了 5 种聚合函数：它们分别是： sum、max、min、avg 、count。

# group by分组

group by 是对select数据进行分组，分组时，表中有相同值的会分为一组。分组后可以对select的内容进行聚合查询。分组和聚合一般都是在一起搭配使用的。

group by 分组后的查询中， select 的列不能出现除了 group by 分组条件以及聚合函数外的其他列，不然查出来的其他列有问题。常量也可以查出，但意义不大，因为常量直接是写死的。

当你在查询中使用 GROUP BY 时，SELECT 子句中的列必须满足以下条件之一：

出现在 GROUP BY 子句中：即这些列用于分组。

是聚合函数的结果：例如 SUM、COUNT、AVG 等。

为什么需要这个规则

这个规则是为了确保查询结果的一致性和可解释性。当你对数据进行分组时，每个分组会生成一行结果。如果 SELECT 子句中包含一个不在 GROUP BY 子句中的非聚合列，那么对于同一个分组内的多行数据，该列可能会有不同的值，这会导致不确定的结果。

select 列1, 列2, (聚合函数) from table_name group by 列1, 列2;

having 是对 group by 分组后的结果集进行筛选。

select 列1, 列2, (聚合函数) from table_name group by 列1, 列2 having 分组后条件;

# Mysql执行顺序

书写顺序一般是：

# SQL语句的书写顺序：
 
# 1. select 
# 2. from 
# 3. where 
# 4. group by 
# 5. having 
# 6. order by 
# 7. limit

1
2
3
4
5
6
7
8
9

执行顺序一般是：

# SQL语句的执行顺序：
 
# 1. from
# 2. where
# 3. group by
# 4. 聚合
# 5. having
# 6. select
# 7. distinct
# 8. order by
# 9. limit
 
# --> from --> where --> group by --> 聚合 --> having --> select --> distinct-- > order by--> limit

1
2
3
4
5
6
7
8
9
10
11
12
13

# WHERE和HAVING以及ON

# having和where

WHERE 子句：在数据分组之前应用，用于过滤原始数据。
HAVING 子句：在数据分组之后应用，用于过滤聚合结果。

换句话说，having用在where之后，更准确的说法是，它用在group by之后，因为有时候查询可以不加where。

SELECT product_id, SUM(quantity) AS total_quantity, SUM(price * quantity) AS total_sales
FROM sales
WHERE price > 10
GROUP BY product_id;

1
2
3
4

以上是使用where。

SELECT product_id, SUM(quantity) AS total_quantity, SUM(price * quantity) AS total_sales
FROM sales
GROUP BY product_id
HAVING SUM(price * quantity) > 50;

1
2
3
4

以上是使用having。

# 关于on

对于内连接查询，where子句和on子句等效。

对于外连接查询，on子句在连接操作之前执行，where子句(逻辑上)在连接操作之后执行。

假设有两个表 employees 和 departments，结构如下：

-- employees 表
| employee_id | name      | department_id |
|-------------|-----------|---------------|
| 1           | Alice     | 101           |
| 2           | Bob       | 102           |
| 3           | Charlie   | 103           |

-- departments 表
| department_id | department_name |
|---------------|-----------------|
| 101           | HR              |
| 102           | IT              |
| 103           | Sales           |
| 104           | Marketing       |

1
2
3
4
5
6
7
8
9
10
11
12
13
14

使用左外连接查询：

SELECT employees.name, departments.department_name
FROM employees
LEFT JOIN departments
ON employees.department_id = departments.department_id;

1
2
3
4

结果：

| name      | department_name |
|-----------|-----------------|
| Alice     | HR              |
| Bob       | IT              |
| Charlie   | Sales           |

1
2
3
4
5

在这个例子中，ON 子句 employees.department_id = departments.department_id 决定了哪些记录会被连接。即使 departments 表中有 department_id = 104 的记录，由于 employees 表中没有对应的 department_id，这些记录不会出现在结果集中。

上次更新: 2025/05/21, 15:29:11

← 【实战】表设计与优化A 【实战】积分跟优惠券的设计→