💎 秒懂SQL

SQL 面试真题 · 步骤可视化

🔥 连续登录天数 高频
Hive / Spark
WITH ranked AS (
    SELECT user_id, login_date,
        ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY login_date) AS rn
    FROM user_login
)
SELECT user_id, COUNT(*) AS consecutive_days
FROM ranked
GROUP BY user_id, DATE_SUB(login_date, rn);
MySQL
WITH ranked AS (
    SELECT user_id, login_date,
        ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY login_date) AS rn
    FROM user_login
)
SELECT user_id, COUNT(*) AS consecutive_days
FROM ranked
GROUP BY user_id, DATE_SUB(login_date, INTERVAL rn DAY);
ClickHouse
WITH ranked AS (
    SELECT user_id, login_date,
        rowNumberInAllBlocks() OVER(PARTITION BY user_id ORDER BY login_date) AS rn
    FROM user_login
)
SELECT user_id, count(*) AS consecutive_days
FROM ranked
GROUP BY user_id, subtractDays(login_date, rn);
📈 用户留存率 高频
SELECT
    u.register_date,
    DATEDIFF(l.login_date, u.register_date) AS retention_day,
    COUNT(DISTINCT u.user_id) AS retained_users
FROM
    (SELECT DISTINCT user_id, register_date FROM user_activity) u
LEFT JOIN
    user_activity l
ON
    u.user_id = l.user_id
    AND l.login_date >= u.register_date
WHERE
    DATEDIFF(l.login_date, u.register_date) <= 90
GROUP BY
    1, 2
ClickHouse
SELECT
    u.register_date,
    dateDiff('day', u.register_date, l.login_date) AS retention_day,
    uniqExact(u.user_id) AS retained_users
FROM
    (SELECT DISTINCT user_id, register_date FROM user_activity) u
LEFT JOIN
    user_activity l
ON
    u.user_id = l.user_id
    AND l.login_date >= u.register_date
WHERE
    dateDiff('day', u.register_date, l.login_date) <= 90
GROUP BY
    1, 2
🏆 排名函数对比
💡 同样的分数 [95, 95, 88],三种排名结果不同:
ROW_NUMBER()
1
张三 - 95分
2
李四 - 95分
3
王五 - 88分
特点:强制连续,不并列
RANK()
1
张三 - 95分
1
李四 - 95分
3
王五 - 88分
特点:并列后跳跃(跳过2)
DENSE_RANK()
1
张三 - 95分
1
李四 - 95分
2
王五 - 88分
特点:并列后连续
📊 累计求和
💡 每日销售额 → 累计销售额
原始数据 (Daily Sales)
date amount
Day 1 100
Day 2 150
Day 3 200
Day 4 80
100
Day1
累计: 100
150
Day2
累计: 250
200
Day3
累计: 450
80
Day4
累计: 530
💡 窗口函数:SUM(amount) OVER(ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
SELECT date, amount,
    SUM(amount) OVER(
        ORDER BY date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS cumulative_sum
FROM daily_sales;
🔄 行转列 (Pivot)
💡 多行数据 → 多列展示
student subject score
张三 数学 95
张三 语文 88
张三 英语 92
student 数学 语文 英语
张三 95 88 92
SELECT student,
    MAX(CASE WHEN subject = '数学' THEN score END) AS 数学,
    MAX(CASE WHEN subject = '语文' THEN score END) AS 语文,
    MAX(CASE WHEN subject = '英语' THEN score END) AS 英语
FROM scores GROUP BY student;
🔄 列转行 (Unpivot)
💡 多列展示 → 多行数据
student 数学 语文 英语
张三 95 88 92
student subject score
张三 数学 95
张三 语文 88
张三 英语 92
SELECT t.student, m.subject,
    CASE m.subject
        WHEN '数学' THEN t.数学
        WHEN '语文' THEN t.语文
        WHEN '英语' THEN t.英语
    END AS score
FROM scores t
CROSS JOIN (
    SELECT '数学' AS subject
    UNION ALL SELECT '语文'
    UNION ALL SELECT '英语'
) m;
🚀 性能优化:相比 UNION ALL 需要扫描表 3 次,CROSS JOIN (或 LATERAL VIEW) 只需要扫描表 1 次,是生产环境更优的解法。
🚀 直播间在线峰值 高频
💡 上下线事件流 (Event Stream)
+1
A进
10:00
+1
B进
10:05
+1
C进
10:10
-1
A出
10:15
-1
B出
10:20
+1
D进
10:25
📊 实时在现人数 (Running Total)
1
10:00
1人
2
10:05
2人
3
10:10
峰值: 3
2
10:15
2人
1
10:20
1人
2
10:25
2人
WITH events AS (
    SELECT user_id, 1 AS change, login_time AS time FROM user_log
    UNION ALL
    SELECT user_id, -1 AS change, logout_time AS time FROM user_log
),
running_total AS (
    SELECT time, 
        SUM(change) OVER (ORDER BY time) AS online_cnt
    FROM events
)
SELECT MAX(online_cnt) AS max_peak FROM running_total;
💡 核心思路:将"登录/登出"视为"+1/-1"的事件流,通过窗口函数 `SUM() OVER(ORDER BY time)` 计算任意时刻的累计在线人数。