1、累计计算窗口函数
sum、avg、max、min
sum(...A...) over (partition by...B... order by ...C... rows between....D1...and...D2...)
partition by 分组字段
order by 按什么字段排序
rows...between 计算的行数范围 移动求平均值常用
2、分区排序窗口函数
row_number() over (partition by...B... order by ...C...) 1 2 3 4 5
rank() over (partition by...B... order by ...C...) 1 1 1 4 5
dense_rank() over (partition by...B... order by ...C...) 1 1 1 2 3
3、分组排序
ntile() 不支持分组排序 rows ..between ...
ntile() over (partition by...B... order by ...C...) 前 10%的用户
4、偏移分析窗口函数
应用场景 :解决今天和昨天的数据差
前offset 行数据
lag(exp_str,offset,defval) over (partition by...B... order by ...C...)
后offset 行数据
lead(exp_str,offset,defval) over (partition by...B... order by ...C...)
创建表,字段有 id,long_time
create table user_login(user_id int,login_time date);
insert into user_login values
(1,'2019-06-01'),
(1,'2019-06-02'),
(1,'2019-06-03'),
(1,'2019-06-06'),
(1,'2019-06-07'),
(1,'2019-06-08'),
(1,'2019-06-11'),
(1,'2019-06-12'),
(2,'2019-06-01'),
(2,'2019-06-02'),
(2,'2019-06-04'),
(3,'2019-06-01'),
(3,'2019-06-02'),
(4,'2019-06-01'),
(5,'2019-06-01'),
(5,'2019-06-02');
1、显示出每个用户连续登录天数,不去重用户
1、对原表增加分区排序窗口函数 row_number,按序号分组,时间排序
2、用date_sub()函数,登录日期减去序号,如果是连续登录日期,那么date_sub()函数结果就会一致
3、最后,将数据结果按id 和 date_sub()函数分组
SELECT user_id,
date_sub(login_time, INTERVAL rn DAY) AS login_group, -- 连续登录数据会一致
min(login_time) AS start_login_time,
max(login_time) AS end_login_time,
count(login_time) AS continuous_days
(SELECT user_id,login_time,
row_number () over (PARTITION BY user_id ORDER BY login_time) AS rn 分区排序窗口函数,分组、排序
user_login) t
GROUP BY user_id, date_sub(login_time, INTERVAL rn DAY);/2、
2、计算连续出现n次数据问题
成绩连续出现三次
解题思路 :
1、偏移分析窗口函数,将成绩向上偏移两位
2、计算score列与偏移后成绩的差值,0值即连续的成绩
SELECT id,score,(score - lead_score) as score3 from
(select id,score,lead(score,2) over (order by id) as lead_score from score order by id) a
WHERE (score - lead_score)=0;