SQL大厂面试真题笔记(牛客网)

01 某音短视频

SQL1 各视频平均完播率

1、明确问题

有两张表:用户-视频互动表tb_user_video_log和短视频信息表tb_video_info

计算2021年里有播放记录的每个视频的完播率(结果保留三位小数),并按完播率降序排序

-- ①2021年

--② 每个视频的完播率(保留三位小数)

-- 完播视频即(duration>=end_time-start_time)

-- 完播率 = 完播视频数/视频播放总次数

-- ③保留三位小数

方式1:round(number,3)

-- ④按完播率降序排序(ASC)

2、解题思路

第一步: 表连接,查找出基本信息

第二步: 计算每个视频的完播率

3、SQL语句

我查询时一般习惯一步一步的来,查看得出的结果是否有问题,再进行下一步。

WITH T
SELECT v1.id,v1.video_id,TIMESTAMPDIFF(SECOND,start_time,end_time) "play_time",duration
FROM tb_user_video_log v1
LEFT JOIN tb_video_info v2 ON v1.`video_id` = v2.`video_id`
WHERE YEAR(release_time) = 2021)
SELECT video_id,
ROUND(SUM(CASE WHEN duration <= play_time THEN 1 ELSE 0 END)/COUNT(id),3) AS "avg_comp_play_rate"
FROM T
GROUP BY video_id
ORDER BY avg_comp_play_rate DESC;

SQL语句优化之后,也可以一步到位,不需要用with...as

SELECT v1.video_id,
ROUND(SUM(IF(TIMESTAMPDIFF(SECOND,start_time,end_time)>=duration,1,0))/COUNT(v1.id),3) AS "avg_comp_play_rate"
FROM tb_user_video_log v1
LEFT JOIN tb_video_info v2 ON v1.`video_id` = v2.`video_id`
WHERE YEAR(release_time) = 2021
GROUP BY v1.video_id
ORDER BY avg_comp_play_rate DESC;

4、题目总结

1)表连接left join

2)case...when...then...end

这里也可以用if

3)时间差函数TIMESTAMPDIFF(interval,start_time,end_time)

interval可以选{second,minute,day,month,year}

4)round(m,d)m指具体的数,d指保留几位。round会四舍五入。

5)group by分组

6)order by降序

7)with new_name as

SQL2 平均播放进度大于60%的视频类别

1、明确问题

有两张表:用户-视频互动表tb_user_video_log和短视频信息表tb_video_info

计算各类视频的平均播放进度,将进度大于60%的类别输出。

-- 播放进度 = 播放时长/视频时长*100%

-- ①按tag分组

-- ②不同tag的平均播放进度

-- 平均播放进度 = avg(播放进度)按tag分组后求平均

-- ③输出平均播放进度大于60%的类别

2、解题思路

第一步 :表连接,查找出基本信息

第二步 :计算平均播放进度

3、SQL语句

解法1:

WITH T AS(
SELECT v1.uid,v2.tag,TIMESTAMPDIFF(SECOND,v1.start_time,v1.end_time)/v2.duration AS play_rate
FROM tb_user_video_log AS v1 
LEFT JOIN tb_video_info v2 ON v1.video_id = v2.video_id)
SELECT tag,CONCAT(ROUND(AVG(play_rate)*100,2),"%") AS playtime_rate FROM T
GROUP BY tag
HAVING AVG(play_rate) >0.6;

解法2:

SELECT v2.tag,CONCAT(ROUND(AVG(TIMESTAMPDIFF(SECOND,v1.start_time,v1.end_time)/v2.duration),2)*100,"%") AS playtime_rate
FROM tb_user_video_log AS v1 
LEFT JOIN tb_video_info v2 ON v1.video_id = v2.video_id
GROUP BY v2.`tag`
HAVING AVG(TIMESTAMPDIFF(SECOND,v1.start_time,v1.end_time)/v2.duration) >0.6;


4、题目总结

有了SQL1的经验,SQL2做起来,思路会更清晰一些。

1)表连接-- left join

2)concat(str1,str2,....)字符串连接函数,可以将字符串连接成一个字符串。MySQL中是不能直接生成百分数格式的,所以需要用到concat函数。

3)round()四舍五入取小数位

4)TIMESTAMPDIFF时间差函数

5)group by

6)having

7)替换函数

REPLACE(str,from_str,to_str)

substring_index(str,delim,count)

substring_index(“待截取有用部分的字符串”,“截取数据依据的字符”,截取字符的位置N)

SQL3 每类视频近一个月的转发量/率

1、明确问题

有两张表:用户-视频互动表tb_user_video_log和短视频信息表tb_video_info

问题:统计在有用户互动的最近一个月(按包含当天在内的近30天算,比如10月31日的近30天为10.2~10.31之间的数据)中,每类视频的转发量和转发率(保留3位小数)。

PS:这道题的问法本身就存在问题,这里根据解释,理解成最大时间(当天)在内的近三十天算起。

-- ①最近一个月(按30算)

-- ②每类视频(按tag分组)——>group by tag

-- ③每类视频的转发量

-- ④每类视频的转发率(保留3位小数)

转发率 = 转发量/播放量

2、解题思路

第一步 :表连接,筛选所需字段

SELECT v1.video_id,v2.tag,v1.uid,v1.if_retweet
FROM tb_user_video_log v1
LEFT JOIN tb_video_info v2 ON v1.`video_id` = v2.`video_id`
WHERE TIMESTAMPDIFF(SECOND,v1.start_time,v1.end_time) > 0#筛选出播放的
AND DATEDIFF((SELECT MAX(DATE(end_time)) FROM tb_user_video_log),DATE(end_time)) <= 30#筛选一月内的数据

第二步 :计算转发量和转发率

3、SQL语句

SELECT v2.tag,SUM(if_retweet) AS "retweet_cut",
ROUND(SUM(if_retweet)/COUNT(uid),3) AS "retweet_rate"
FROM tb_user_video_log v1
LEFT JOIN tb_video_info v2 ON v1.`video_id` = v2.`video_id`
WHERE TIMESTAMPDIFF(SECOND,v1.start_time,v1.end_time) > 0#筛选出播放的
AND DATEDIFF((SELECT MAX(DATE(end_time)) FROM tb_user_video_log),DATE(end_time)) <= 30#筛选一月内的数据
GROUP BY v2.tag;

4、题目总结

这道题相对重要的点是WHERE条件子句。

1)时间差函数DATEDIFF(date1,date2)

SQL4 每个创作者每月的涨粉率及截止当前的总粉丝量

1、明确问题

有两张表:用户-视频互动表tb_user_video_log和短视频信息表tb_video_info

问题 :计算2021年里每个创作者每月的涨粉率及截止当月的总粉丝量

  • 涨粉率=(加粉量 - 掉粉量) / 播放量。结果按创作者ID、总粉丝量升序排序。
  • if_follow-是否关注 为1表示用户观看视频中关注了视频创作者,为0表示此次互动前后关注状态未发生变化,为2表示本次观看过程中取消了关注。

-- ①2021年

-- ②每个创作者(author)及每个月分组

-- ③计算涨粉率和当月的粉丝总量

2、解题思路

第一步 :表连接,筛选、提取关键字段和数据;

第二步 :计算

这里没有办法同时计算出涨粉率和累计的粉丝总量,所以可以采取子查询分步查询。

3、SQL语句

WITH 
AS (SELECT author,DATE_FORMAT(end_time,'%Y-%m') MONTH,
ROUND(SUM(
CASE WHEN if_follow = 0 THEN 0 
     WHEN if_follow = 1 THEN 1
     ELSE -1
     END)/COUNT(uid),3) fans_growth_rate,
SUM(
CASE WHEN if_follow = 0 THEN 0 
     WHEN if_follow = 1 THEN 1
     ELSE -1
     END) add_fans
FROM tb_user_video_log v1
LEFT JOIN tb_video_info v2 ON v1.`video_id` = v2.`video_id`
WHERE YEAR(end_time) = 2021#筛选在2021年的数据
AND TIMESTAMPDIFF(SECOND,start_time,end_time) > 0
GROUP BY author,MONTH)
SELECT author,MONTH,fans_growth_rate,SUM(add_fans) over(PARTITION BY author ORDER BY MONTH) total_fans
FROM T
ORDER BY author,total_fans DESC;

解法2:使用if

WITH T AS (
SELECT v2.author,DATE_FORMAT(v1.start_time,"%Y-%m") AS date_month,SUM(IF(if_follow=1,1,IF(if_follow=2,-1,0)))/COUNT(v1.uid) AS fans_growth_rate,SUM(IF(if_follow=1,1,IF(if_follow=2,-1,0))) AS add_fans
FROM tb_user_video_log AS v1
LEFT JOIN tb_video_info AS v2 ON v1.`video_id` = v2.`video_id`
WHERE TIMESTAMPDIFF(SECOND,start_time,end_time) > 0
GROUP BY author,date_month)
SELECT author,date_month,fans_growth_rate,SUM(add_fans) over(PARTITION BY author ORDER BY date_month) AS total_fans
FROM T
ORDER BY author,total_fans;

4、题目总结

1)case...when...then...end

用if也可以。

2)date_format() :时间格式函数,可以将时间转化成特定的格式;

3)窗口函数

4)with...as 结构的子查询

SQL5 国庆期间每类视频点赞量和转发量

1、明确问题

有两张表 :用户-视频互动表tb_user_video_log和短视频信息表tb_video_info

问题 :统计2021年国庆头3天每类视频每天的近一周总点赞量和一周内最大单天转发量,结果按视频类别降序、日期升序排序。假设数据库中数据足够多,至少每个类别下国庆头3天及之前一周的每天都有播放记录。

-- ①2021年

-- ②国庆头3天(2021-10-01,2021-10-02,2021-10-03)

-- ③每类视频每天(2021-10-01,2021-10-02,2021-10-03)的近一周点赞量和一周内单天最大转发量

-- ④按照视频类别降序、日期升序排序

2、解题思路

我直接是一天一天的算的,牛客网上有另一个思路,是从2021-10-01这个一天开始往前数六行。

用到的窗口函数的 窗口子句 。但我个人觉得这样不是很严谨,加入中间有某一天的数据是空的,那么会导致数据出现问题。(可能是我多虑哈~但最后还是采用最保守,也最严谨的方式。)

第一步 :表连接,提取所需信息

SELECT v1.video_id,tag,DATE(end_time) AS "date",SUM(if_like),SUM(if_retweet)
FROM tb_user_video_log v1
LEFT JOIN tb_video_info v2 ON v1.`video_id` = v2.`video_id`
GROUP BY v1.video_id,tag,DATE(end_time);

第二步 :计算2021-10-01近一周的点赞量和一周内单天最大转发量

用子查询

SELECT tag,'2021-10-01' AS "date",SUM(if_like) total_if_like,MAX(if_retweet) max_if_retweet
(SELECT v1.video_id,tag,DATE(end_time) AS "date",SUM(if_like) AS "if_like",SUM(if_retweet) AS "if_retweet"
FROM tb_user_video_log v1
LEFT JOIN tb_video_info v2 ON v1.`video_id` = v2.`video_id`
WHERE DATEDIFF('2021-10-01',DATE(end_time)) BETWEEN 0 AND 6
GROUP BY v1.video_id,tag,DATE(end_time)) T
GROUP BY tag;

第三步 :替换时间分别计算2021-10-02和2021-10-03的近一周点赞量和一周内单天最大转发量

在第二步的基础上改一下时间即可;

#2021-10-02
SELECT tag,'2021-10-02' AS "date",SUM(if_like) total_if_like,MAX(if_retweet) max_if_retweet
(SELECT v1.video_id,tag,DATE(end_time) AS "date",SUM(if_like) AS "if_like",SUM(if_retweet) AS "if_retweet"
FROM tb_user_video_log v1
LEFT JOIN tb_video_info v2 ON v1.`video_id` = v2.`video_id`
WHERE DATEDIFF('2021-10-02',DATE(end_time)) BETWEEN 0 AND 6
GROUP BY v1.video_id,tag,DATE(end_time)) T
GROUP BY tag;
#2021-10-03
SELECT tag,'2021-10-03' AS "date",SUM(if_like) total_if_like,MAX(if_retweet) max_if_retweet
(SELECT v1.video_id,tag,DATE(end_time) AS "date",SUM(if_like) AS "if_like",SUM(if_retweet) AS "if_retweet"
FROM tb_user_video_log v1
LEFT JOIN tb_video_info v2 ON v1.`video_id` = v2.`video_id`
WHERE DATEDIFF('2021-10-03',DATE(end_time)) BETWEEN 0 AND 6
GROUP BY v1.video_id,tag,DATE(end_time)) T
GROUP BY tag;

第四步 :union all连接结果。

SELECT tag,'2021-10-01' AS "date",SUM(if_like) total_if_like,MAX(if_retweet) max_if_retweet
(SELECT v1.video_id,tag,DATE(end_time) AS "date",SUM(if_like) AS "if_like",SUM(if_retweet) AS "if_retweet"
FROM tb_user_video_log v1
LEFT JOIN tb_video_info v2 ON v1.`video_id` = v2.`video_id`
WHERE DATEDIFF('2021-10-01',DATE(end_time)) BETWEEN 0 AND 6
GROUP BY v1.video_id,tag,DATE(end_time)) T
GROUP BY tag
UNION ALL
SELECT tag,'2021-10-02' AS "date",SUM(if_like) total_if_like,MAX(if_retweet) max_if_retweet
(SELECT v1.video_id,tag,DATE(end_time) AS "date",SUM(if_like) AS "if_like",SUM(if_retweet) AS "if_retweet"
FROM tb_user_video_log v1
LEFT JOIN tb_video_info v2 ON v1.`video_id` = v2.`video_id`
WHERE DATEDIFF('2021-10-02',DATE(end_time)) BETWEEN 0 AND 6
GROUP BY v1.video_id,tag,DATE(end_time)) T
GROUP BY tag
UNION ALL
SELECT tag,'2021-10-03' AS "date",SUM(if_like) total_if_like,MAX(if_retweet) max_if_retweet
(SELECT v1.video_id,tag,DATE(end_time) AS "date",SUM(if_like) AS "if_like",SUM(if_retweet) AS "if_retweet"
FROM tb_user_video_log v1
LEFT JOIN tb_video_info v2 ON v1.`video_id` = v2.`video_id`
WHERE DATEDIFF('2021-10-03',DATE(end_time)) BETWEEN 0 AND 6
GROUP BY v1.video_id,tag,DATE(end_time)) T
GROUP BY tag;

3、SQL语句

SELECT tag,'2021-10-01' AS "date",SUM(if_like) total_if_like,MAX(if_retweet) max_if_retweet
(SELECT v1.video_id,tag,DATE(end_time) AS "date",SUM(if_like) AS "if_like",SUM(if_retweet) AS "if_retweet"
FROM tb_user_video_log v1
LEFT JOIN tb_video_info v2 ON v1.`video_id` = v2.`video_id`
WHERE DATEDIFF('2021-10-01',DATE(end_time)) BETWEEN 0 AND 6
GROUP BY v1.video_id,tag,DATE(end_time)) T
GROUP BY tag
UNION ALL
SELECT tag,'2021-10-02' AS "date",SUM(if_like) total_if_like,MAX(if_retweet) max_if_retweet
(SELECT v1.video_id,tag,DATE(end_time) AS "date",SUM(if_like) AS "if_like",SUM(if_retweet) AS "if_retweet"
FROM tb_user_video_log v1
LEFT JOIN tb_video_info v2 ON v1.`video_id` = v2.`video_id`
WHERE DATEDIFF('2021-10-02',DATE(end_time)) BETWEEN 0 AND 6
GROUP BY v1.video_id,tag,DATE(end_time)) T
GROUP BY tag
UNION ALL
SELECT tag,'2021-10-03' AS "date",SUM(if_like) total_if_like,MAX(if_retweet) max_if_retweet
(SELECT v1.video_id,tag,DATE(end_time) AS "date",SUM(if_like) AS "if_like",SUM(if_retweet) AS "if_retweet"
FROM tb_user_video_log v1
LEFT JOIN tb_video_info v2 ON v1.`video_id` = v2.`video_id`
WHERE DATEDIFF('2021-10-03',DATE(end_time)) BETWEEN 0 AND 6
GROUP BY v1.video_id,tag,DATE(end_time)) T
GROUP BY tag;

4、题目总结

其他的地方和以前没有太大差别

1)UNION ALL :查询结果不排重合并

2)查询中插入新字段并赋值

select 'value' as column_name from table_name;

SQL6 近一个月发布的视频中热度最高的top3视频

1、明确问题

有两张表:用户-视频互动表tb_user_video_log和短视频信息表tb_video_info

问题 :找出近一个月发布的视频中热度最高的top3视频。按热度降序排列;

①近一个月发布的视频,即max(end_time)-release_time<30;

②热度计算公式

热度=(a*视频完播率+b*点赞数+c*评论数+d*转发数)*新鲜度

新鲜度=1/(最近无播放天数+1)

③最近播放日期以end_time-结束观看时间为准,假设为T,则最近一个月按照[T-29,T]闭区间统计。

④结果热度保留为整数,并按热度降序排序。

2、解题思路

第一步 :查找tb_user_video_log最大时间

SELECT MAX(DATE_FORMAT(end_time,'%Y-%m-%d')) T FROM tb_user_video_log;

第二步:查找符合时间范围的视频video_id

SELECT video_id FROM tb_video_info
WHERE DATEDIFF((SELECT T FROM (SELECT MAX(DATE_FORMAT(end_time,'%Y-%m-%d')) T FROM tb_user_video_log) AS t1),DATE_FORMAT(release_time,'%Y-%m-%d')) BETWEEN 0 AND 29;

第三步:在第二步的基础上,计算完播率、转发量、点赞数、评论数、未播放天数

SELECT v1.video_id,SUM(IF(TIMESTAMPDIFF(SECOND,start_time,end_time) >= duration,1,0))/COUNT(uid) play_rate,SUM(if_like) like_cnt,SUM(if_retweet) retweet_cnt,SUM(IF(comment_id IS NULL,0,1))ccomment_cnt,
DATEDIFF((SELECT MAX(DATE_FORMAT(end_time,'%Y-%m-%d')) T FROM tb_user_video_log),MAX(end_time)) no_play_day
FROM tb_user_video_log AS v1
LEFT JOIN tb_video_info AS v2 ON v1.`video_id` = v2.`video_id`
GROUP BY video_id;

第四步:在第三步的基础上计算热度

SELECT v1.video_id,ROUND((100*(SUM(IF(TIMESTAMPDIFF(SECOND,start_time,end_time) >= duration,1,0))/COUNT(uid))+5*(SUM(if_like))+2*(SUM(if_retweet))+3*(SUM(IF(comment_id IS NULL,0,1))))/(1+DATEDIFF((SELECT MAX(DATE_FORMAT(end_time,'%Y-%m-%d')) T FROM tb_user_video_log),MAX(end_time)))) hot_index
FROM tb_user_video_log AS v1
LEFT JOIN tb_video_info AS v2 ON v1.`video_id` = v2.`video_id`
WHERE DATEDIFF((SELECT T FROM (SELECT MAX(DATE_FORMAT(end_time,'%Y-%m-%d')) T FROM tb_user_video_log) AS t1),DATE_FORMAT(v2.release_time,'%Y-%m-%d')) BETWEEN 0 AND 29
GROUP BY video_id
ORDER BY hot_index DESC;

3、SQL语句

SELECT v1.video_id,ROUND((100*(SUM(IF(TIMESTAMPDIFF(SECOND,start_time,end_time) >= duration,1,0))/COUNT(uid))+5*(SUM(if_like))+2*(SUM(if_retweet))+3*(SUM(IF(comment_id IS NULL,0,1))))/(1+DATEDIFF((SELECT MAX(DATE_FORMAT(end_time,'%Y-%m-%d')) T FROM tb_user_video_log),MAX(end_time)))) hot_index
FROM tb_user_video_log AS v1
LEFT JOIN tb_video_info AS v2 ON v1.`video_id` = v2.`video_id`
WHERE DATEDIFF((SELECT T FROM (SELECT MAX(DATE_FORMAT(end_time,'%Y-%m-%d')) T FROM tb_user_video_log) AS t1),DATE_FORMAT(v2.release_time,'%Y-%m-%d')) BETWEEN 0 AND 29
GROUP BY video_id
ORDER BY hot_index DESC;

4、题目总结

这个题目难点在于公式太长,要计算的指标太多,理出来就清晰了。

没有涉及到新的知识点。

02 用户增长场景(某度信息流)

SQL1 2021年11月每天的人均浏览文章时长

1、明确问题

问题:统计2021年11月每天的人均浏览文章时长(秒数),结果保留1位小数,并按时长由短到常排序。

①2021年11月每天的人均浏览时长;每天的人均浏览时长=浏览文章时长之和/浏览用户id数

②结果保留1位小数;

③时长由短到长排序;

2、解题思路

先计算每天的文章浏览文章时长之和,计算每日的浏览用户数。

3、SQL语句

SELECT DATE_FORMAT(out_time,'%Y-%m-%d') date_time,ROUND(SUM(TIMESTAMPDIFF(SECOND,in_time,out_time))/COUNT(DISTINCT uid),1) avg_view_len_sec
FROM tb_user_log
GROUP BY date_time
ORDER BY avg_view_len_sec;

4、题目总结

round():小数点取一位

group by:聚合分组

order by :排序

SQL2 每篇文章同一时刻最大在看人数

1、明确问题

①统计每篇文章同一时刻最大在看人数,如果同一时刻有进入也有离开时,

②先记录用户数增加再记录减少,

③结果按最大人数降序。

2、解题思路

这道题的解题思路就是,先对in_time的记录标记为1,out_time的记录标记为-1,然后将两张表结合,按照artical_id和时间进行排序。这样就很明显可以看到同一篇文章,同一时刻再看人数了。

SQL刷题笔记5——【时间区间内的用户存在量】 - 知乎 (zhihu.com)

3、SQL语句

#连接表
SELECT artical_id,MAX(max_uv) max_uv
FROM(
WITH T AS (
SELECT artical_id,in_time dt,1 `status`
FROM tb_user_log
UNION ALL
SELECT artical_id,out_time dt,-1 `status`
FROM tb_user_log)
SELECT artical_id,dt,SUM(`status`) over(PARTITION BY artical_id ORDER BY dt) max_uv FROM T) AS t2
GROUP BY artical_id;

4、题目总结

union all:将个查询结果按行拼接。

group by:聚合分组。

窗口函数。可参考以下文章 SQL学习笔记7——【聚合函数】 - 知乎 (zhihu.com)

注:重点是想清楚,为什么标记了状态才能求出每一时刻同时在看人数呢?

SQL3 2021年11月每天新用户的次日留存率

1、明确问题

问题:统计2021年11月每天新用户的次日留存率(保留2位小数)

注:次日留存率为当天新增的用户数中第二天又活跃的用户数占比。

注:如果in_time-进入时间和out_time-离开时间跨天了,在两天里都记为该用户活跃过,结果按日期升序。

PS:总觉得牛客网的题目,有时候会模糊,这里只讲大概的留存率解题思路。

对留存率问题比较费解的,可以参考B站上的教程

SQL面试题:用户留存率分析_哔哩哔哩_bilibili

2、解题思路

①先计算每日新用户数:计算每个用户的最小in_time。

SELECT uid,MIN(DATE_FORMAT(in_time,'%Y-%m-%d')) AS new_user_time
FROM tb_user_log
WHERE DATE_FORMAT(in_time,'%Y-%m-%d') BETWEEN '2021-11-01' AND '2021-11-30'
GROUP BY uid;

②计算新用户的第二日留存情况

从结果中可以看出,新用户第二日的留存情况。

SELECT new_user_time,t1.uid,t2.uid next_day_uid
FROM (SELECT uid,MIN(DATE_FORMAT(in_time,'%Y-%m-%d')) AS new_user_time
FROM tb_user_log
WHERE DATE_FORMAT(in_time,'%Y-%m-%d') BETWEEN '2021-11-01' AND '2021-11-30'
GROUP BY uid) AS t1
LEFT JOIN tb_user_log t2
ON (t1.uid=t2.uid) AND (DATE_FORMAT(t2.in_time,'%Y-%m-%d') = new_user_time + INTERVAL '1' DAY);

③计算新用户留存率

WITH T AS (
SELECT new_user_time,t1.uid,t2.uid next_day_uid
FROM (SELECT uid,MIN(DATE_FORMAT(in_time,'%Y-%m-%d')) AS new_user_time
FROM tb_user_log
WHERE DATE_FORMAT(in_time,'%Y-%m-%d') BETWEEN '2021-11-01' AND '2021-11-30'
GROUP BY uid) AS t1
LEFT JOIN tb_user_log t2
ON (t1.uid=t2.uid) AND (DATE_FORMAT(t2.in_time,'%Y-%m-%d') = new_user_time + INTERVAL '1' DAY))
SELECT new_user_time dt,ROUND(COUNT(next_day_uid)/COUNT(uid),2) uv_left_rate
FROM T
GROUP BY new_user_time
ORDER BY dt;

3、SQL语句

WITH T AS (
SELECT new_user_time,t1.uid,t2.uid next_day_uid
FROM (SELECT uid,MIN(DATE_FORMAT(in_time,'%Y-%m-%d')) AS new_user_time
FROM tb_user_log
WHERE DATE_FORMAT(in_time,'%Y-%m-%d') BETWEEN '2021-11-01' AND '2021-11-30'
GROUP BY uid) AS t1
LEFT JOIN tb_user_log t2
ON (t1.uid=t2.uid) AND (DATE_FORMAT(t2.in_time,'%Y-%m-%d') = new_user_time + INTERVAL '1' DAY))
SELECT new_user_time dt,ROUND(COUNT(next_day_uid)/COUNT(uid),2) uv_left_rate
FROM T
GROUP BY new_user_time
ORDER BY dt;

4、题目总结

用户留存率是许多面试题中的经典,建议多思考,多练习。掌握题目逻辑是关键。

df + INTERVAL '1' DAY。为日期加上一天。

SQL4 统计活跃间隔对用户分级结果

1、明确问题

①问题:统计活跃间隔对用户分级后,各活跃等级用户占比,结果保留两位小数,且按占比降序排序。

②用户等级标准简化为:

#忠实用户(近七天活跃过且非新晋用户)

#新晋用户(近7天新增)

#沉睡用户(近7天未活跃但更早前活跃过)

#流失用户(近30天未活跃但更早前活跃过)

#近七天即[T-6,T]

#今天默认为'2021-11-04'

2、解题思路

第一步:计算总的用户数

SELECT COUNT(DISTINCT uid) total_user_cnt FROM tb_user_log;

第二步:计算每一个用户的活跃日期间隔与活跃次数

SELECT uid,TIMESTAMPDIFF(DAY,DATE(out_time),'2021-11-04') interval_time,COUNT(uid) over(PARTITION BY uid) user_cnt
FROM tb_user_log;

第三步:给用户打标签

SELECT uid,
	CASE WHEN user_cnt >1 AND MIN(interval_time) over (PARTITION BY uid) <= 7 THEN '忠实用户'
	WHEN user_cnt = 1 AND interval_time <= 7 THEN '新晋用户'
	WHEN MIN(interval_time) over (PARTITION BY uid) > 30 THEN '流失用户'
	WHEN MIN(interval_time) over (PARTITION BY uid)BETWEEN 7 AND 30 THEN '沉睡用户'
	END AS tag
FROM (SELECT uid,TIMESTAMPDIFF(DAY,DATE(out_time),'2021-11-04') interval_time,COUNT(uid) over(PARTITION BY uid) user_cnt
FROM tb_user_log) AS t1;

第四步:计算用户等级占比

WITH T AS (
SELECT uid,
	CASE WHEN user_cnt >1 AND MIN(interval_time) over (PARTITION BY uid) <= 7 THEN '忠实用户'
	WHEN user_cnt = 1 AND interval_time <= 7 THEN '新晋用户'
	WHEN MIN(interval_time) over (PARTITION BY uid) > 30 THEN '流失用户'
	WHEN MIN(interval_time) over (PARTITION BY uid)BETWEEN 7 AND 30 THEN '沉睡用户'
	END AS tag
FROM (SELECT uid,TIMESTAMPDIFF(DAY,DATE(out_time),'2021-11-04') interval_time,COUNT(uid) over(PARTITION BY uid) user_cnt
FROM tb_user_log) AS t1)
SELECT tag,ROUND(COUNT(DISTINCT uid)/(SELECT COUNT(DISTINCT uid) total_user_cnt FROM tb_user_log),2) ratio
FROM T
GROUP BY tag;

3、SQL语句

WITH T AS (
SELECT uid,
	CASE WHEN user_cnt >1 AND MIN(interval_time) over (PARTITION BY uid) <= 7 THEN '忠实用户'
	WHEN user_cnt = 1 AND interval_time <= 7 THEN '新晋用户'
	WHEN MIN(interval_time) over (PARTITION BY uid) > 30 THEN '流失用户'
	WHEN MIN(interval_time) over (PARTITION BY uid)BETWEEN 7 AND 30 THEN '沉睡用户'
	END AS tag
FROM (SELECT uid,TIMESTAMPDIFF(DAY,DATE(out_time),'2021-11-04') interval_time,COUNT(uid) over(PARTITION BY uid) user_cnt
FROM tb_user_log) AS t1)
SELECT tag,ROUND(COUNT(DISTINCT uid)/(SELECT COUNT(DISTINCT uid) total_user_cnt FROM tb_user_log),2) ratio
FROM T