1. 窗口函数概述
1.1. 什么是窗口函数
- 简单理解,就是对查询的结果多出一列,这一列可以是聚合值,也可以是排序值。
- 开窗函数一般就是说的是over()函数,其窗口是由一个 OVER 子句 定义的多行记录
- 开窗函数一般分为两类,聚合开窗函数和排序开窗函数。
1.2. 窗口函数的功能
<窗口函数> over (partition by <用于分组的列名> order by <用于排序的列名> [rows between ?? and ???])
- 同时具有分组和排序的功能
- 不减少原表的行数
1.3. 窗口函数的种类和位置
- 专用窗口函数,包括rank, dense_rank, row_number等专用窗口函数
- 聚合函数,如sum(). avg(), count(), max(), min()等,rows between…and…
- 因为窗口函数是对where或者group by子句处理后的结果进行操作,所以窗口函数原则上只能写在select子句中
1.4. 窗口函数说明
- over():是窗口函数的核心,需要开窗的全部都需要使用这个函数
- partition by:用来对表分组,后续跟上分组的字段,会给该字段中每个不同的值开一个窗(可以添加多个字段)
- order by:是对分组后的结果进行排序,默认是按照升序(asc)排列,可以进行多个字段降序
- rows between ?? and ???:用来对窗口中已经排序完成后的上下数据进行筛选
2. 聚合窗口函数
聚合窗口函数就是在通过over开窗的情况下,使用sum、max等聚合函数对开窗之后的结果进行聚合汇总;跟普通的聚合函数不同,普通聚合函数只会生成一条结果数据,聚合窗口函数会在每一行中都新增一列,用了存储这聚合结果,这个窗口有多少条数据,那就会有多少条一模一样的结果数据。
原始数据如下表所示(只列举出了部分数据):
| user_id | cat_id | score |
| 618 | 488 | 0.0320793066566 |
| 7325 | 5648 | 0.1794971885383 |
| 7325 | 5653 | 0.0195317941826 |
| 9549 | 472 | 0.2734451185567 |
| 9549 | 316 | 0.0102923143349 |
| 10187 | 560 | 0.09287464288 |
| 16093 | 299 | 0.8203353556701 |
| 19109 | 450 | 0.0129389094682 |
| 29043 | 5580 | 0.0780796261011 |
| 29043 | 360 | 0.0445708866509 |
| 30207 | 301 | 0.1283038644617 |
| 37469 | 318 | 0.0672950094793 |
| 42121 | 364 | 0.0285768131938 |
| 45021 | 476 | 0.0290241345941 |
| 48161 | 308 | 1.3997567045733 |
| 48161 | 530 | 0.0280696255429 |
| 53008 | 510 | 0.0292976912317 |
| 53568 | 302 | 0.6823961209357 |
| 58397 | 300 | 0.0356667545944 |
SQL编写和查询结果展示:
select
user_id
, third_cat_id as cat_id
, score
, max(score) over(partition by user_id) max_score_of_user
, min(score) over(partition by user_id) min_score_of_user
, count(score) over(partition by user_id) cnt_score_of_user
, avg(score) over(partition by user_id) avg_score_of_user
, sum(score) over(partition by user_id) sum_score_of_user
from yishou_recommendation_system.user_third_category_score_15day
| user_id | cat_id | score | max_score_of_user | min_score_of_user | cnt_score_of_user | avg_score_of_user | sum_score_of_user |
| 1697 | 325 | 0.24999999928 | 0.24999999928 | 0.24999999928 | 1 | 0.24999999928 | 0.24999999928 |
| 13638 | 401 | 0.1924408114735 | 2.3144764447547 | 0.0153345765378 | 29 | 0.30592698646857247 | 8.871882607588601 |
| 13638 | 318 | 0.8433564379566 | 2.3144764447547 | 0.0153345765378 | 29 | 0.30592698646857247 | 8.871882607588601 |
| 13638 | 295 | 0.8473076857954 | 2.3144764447547 | 0.0153345765378 | 29 | 0.30592698646857247 | 8.871882607588601 |
| 13638 | 299 | 0.3413452337674 | 2.3144764447547 | 0.0153345765378 | 29 | 0.30592698646857247 | 8.871882607588601 |
| 13638 | 556 | 0.4636178949445 | 2.3144764447547 | 0.0153345765378 | 29 | 0.30592698646857247 | 8.871882607588601 |
| 13638 | 472 | 0.0769230769231 | 2.3144764447547 | 0.0153345765378 | 29 | 0.30592698646857247 | 8.871882607588601 |
| 13638 | 314 | 0.1531243564897 | 2.3144764447547 | 0.0153345765378 | 29 | 0.30592698646857247 | 8.871882607588601 |
| 13638 | 363 | 0.0153345765378 | 2.3144764447547 | 0.0153345765378 | 29 | 0.30592698646857247 | 8.871882607588601 |
| 13638 | 431 | 0.0928746430137 | 2.3144764447547 | 0.0153345765378 | 29 | 0.30592698646857247 | 8.871882607588601 |
| 13638 | 303 | 0.1512020592657 | 2.3144764447547 | 0.0153345765378 | 29 | 0.30592698646857247 | 8.871882607588601 |
| 13638 | 316 | 0.1252590167457 | 2.3144764447547 | 0.0153345765378 | 29 | 0.30592698646857247 | 8.871882607588601 |
| 13638 | 490 | 0.0153345765378 | 2.3144764447547 | 0.0153345765378 | 29 | 0.30592698646857247 | 8.871882607588601 |
| 13638 | 301 | 0.4550003444724 | 2.3144764447547 | 0.0153345765378 | 29 | 0.30592698646857247 | 8.871882607588601 |
| 13638 | 488 | 0.4961183604825 | 2.3144764447547 | 0.0153345765378 | 29 | 0.30592698646857247 | 8.871882607588601 |
| 13638 | 300 | 0.2698680479978 | 2.3144764447547 | 0.0153345765378 | 29 | 0.30592698646857247 | 8.871882607588601 |
| 13638 | 361 | 0.0769230769231 | 2.3144764447547 | 0.0153345765378 | 29 | 0.30592698646857247 | 8.871882607588601 |
| 13638 | 493 | 0.0153345765378 | 2.3144764447547 | 0.0153345765378 | 29 | 0.30592698646857247 | 8.871882607588601 |
| 13638 | 521 | 0.1332110723867 | 2.3144764447547 | 0.0153345765378 | 29 | 0.30592698646857247 | 8.871882607588601 |
| 13638 | 561 | 0.0769230769231 | 2.3144764447547 | 0.0153345765378 | 29 | 0.30592698646857247 | 8.871882607588601 |
3. 排序窗口函数
3.1. 不同排序窗口函数的区别
- rank():如果有并列名次的行,会占用下一名次的位置,排序结果是 1,1,3
- dense_rank():如果有并列名次的行,不占用下一名次的位置,排序结果是 1,1,2
- row_number():不考虑并列名次的情况,排序结果是 1,2,3
3.2. SQL编写和查询结果展示
select
user_id
, third_cat_id as cat_id
, score
, rank() over(partition by user_id order by score) score_rank_of_user
, dense_rank() over(partition by user_id order by score) score_dense_rank_of_user
, row_number() over(partition by user_id order by score) score_row_number_of_user
from yishou_recommendation_system.user_third_category_score_15day
| user_id | cat_id | score | score_rank_of_user | score_dense_rank_of_user | score_row_number_of_user |
| 1697 | 325 | 0.24999999928 | 1 | 1 | 1 |
| 13638 | 493 | 0.0153345765378 | 1 | 1 | 1 |
| 13638 | 363 | 0.0153345765378 | 1 | 1 | 2 |
| 13638 | 5657 | 0.0153345765378 | 1 | 1 | 3 |
| 13638 | 490 | 0.0153345765378 | 1 | 1 | 4 |
| 13638 | 325 | 0.0153345765378 | 1 | 1 | 5 |
| 13638 | 340 | 0.0379776680745 | 6 | 2 | 6 |
| 13638 | 472 | 0.0769230769231 | 7 | 3 | 7 |
| 13638 | 361 | 0.0769230769231 | 7 | 3 | 8 |
| 13638 | 561 | 0.0769230769231 | 7 | 3 | 9 |
| 13638 | 562 | 0.0769230769231 | 7 | 3 | 10 |
| 13638 | 510 | 0.0928746430137 | 11 | 4 | 11 |
| 13638 | 431 | 0.0928746430137 | 11 | 4 | 12 |
| 13638 | 316 | 0.1252590167457 | 13 | 5 | 13 |
| 13638 | 521 | 0.1332110723867 | 14 | 6 | 14 |
| 13638 | 394 | 0.1401607366123 | 15 | 7 | 15 |
| 13638 | 303 | 0.1512020592657 | 16 | 8 | 16 |
| 13638 | 314 | 0.1531243564897 | 17 | 9 | 17 |
| 13638 | 401 | 0.1924408114735 | 18 | 10 | 18 |
| 13638 | 300 | 0.2698680479978 | 19 | 11 | 19 |
| 13638 | 299 | 0.3413452337674 | 20 | 12 | 20 |
| 13638 | 304 | 0.4085346499663 | 21 | 13 | 21 |
| 13638 | 308 | 0.4296759077854 | 22 | 14 | 22 |
| 13638 | 301 | 0.4550003444724 | 23 | 15 | 23 |
| 13638 | 556 | 0.4636178949445 | 24 | 16 | 24 |
| 13638 | 488 | 0.4961183604825 | 25 | 17 | 25 |
| 13638 | 294 | 0.4990914022087 | 26 | 18 | 26 |
| 13638 | 318 | 0.8433564379566 | 27 | 19 | 27 |
| 13638 | 295 | 0.8473076857954 | 28 | 20 | 28 |
| 13638 | 302 | 2.3144764447547 | 29 | 21 | 29 |
4. 排序截取数据函数
4.1. 排序截取数据函数的区别
- lag(col,n,default_val):获取往前第n行数据,col是列名,n是往上的行数,当第n行为null的时候取default_val
- lead(col,n, default_val):往后第n行数据,col是列名,n是往下的行数,当第n行为null的时候取default_val
- ntile(n):把有序分区中的行分发到指定数据的组中,各个组有编号,编号从1开始,对于每一行,NTILE返回此行所属的组的编号。
- cume_dist(),计算某个窗口或分区中某个值的累积分布。假定升序排序,则使用以下公式确定累积分布:小于等于当前值x的行数 / 窗口或partition分区内的总行数。其中,x 等于 order by 子句中指定的列的当前行中的值。
4.2. 原始数据
| user_id | record_date |
| 11343325 | 2022-09-04 06:17:24 |
| 11343325 | 2022-09-04 06:17:24 |
| 11343326 | 2022-09-04 06:17:25 |
| 11343326 | 2022-09-04 06:17:25 |
| 11343327 | 2022-09-04 06:17:26 |
| 11343327 | 2022-09-04 06:17:26 |
| 11343326 | 2022-09-04 06:17:27 |
| 11343328 | 2022-09-04 06:17:27 |
| 11343328 | 2022-09-04 06:17:27 |
| 11343329 | 2022-09-04 06:17:28 |
| 11343329 | 2022-09-04 06:17:28 |
| 4654923 | 2022-09-04 06:17:30 |
| 11343330 | 2022-09-04 06:17:32 |
| 11343330 | 2022-09-04 06:17:32 |
| 11343331 | 2022-09-04 06:17:35 |
| 11343331 | 2022-09-04 06:17:35 |
| 11343332 | 2022-09-04 06:17:37 |
| 11343332 | 2022-09-04 06:17:37 |
| 11343333 | 2022-09-04 06:17:37 |
| 11343333 | 2022-09-04 06:17:37 |
| 11343334 | 2022-09-04 06:17:40 |
| 11343334 | 2022-09-04 06:17:40 |
4.3. lag和lead函数的使用和结果
select
user_id
, record_date
, lag(record_date, 1, '1970-01-01 00:00:00') over(partition by user_id order by record_date) as start_date
, lead(record_date, 1, '9999-99-99 00:00:00') over(partition by user_id order by record_date) as end_date
from user_table
使用user_id分组,并使用record_date进行升序排序;lag函数会获取之前一行的record_date数据作为start_date字段的值,并回使用 '1970-01-01 00:00:00' 作为默认值;lead函数会获取之后一行的record_date数据作为end_date字段的值,并回使用 '9999-99-99 00:00:00' 作为默认值;结果如下所示:
| user_id | record_date | start_date | end_date |
| 1000045 | 2022-09-03 11:41:50 | 1970-01-01 00:00:00 | 9999-99-99 00:00:00 |
| 1000061 | 2022-09-03 15:22:21 | 1970-01-01 00:00:00 | 9999-99-99 00:00:00 |
| 100013 | 2022-09-03 08:14:36 | 1970-01-01 00:00:00 | 2022-09-03 14:47:43 |
| 100013 | 2022-09-03 14:47:43 | 2022-09-03 08:14:36 | 2022-09-03 14:48:25 |
| 100013 | 2022-09-03 14:48:25 | 2022-09-03 14:47:43 | 2022-09-03 14:49:11 |
| 100013 | 2022-09-03 14:49:11 | 2022-09-03 14:48:25 | 2022-09-03 15:08:41 |
| 100013 | 2022-09-03 15:08:41 | 2022-09-03 14:49:11 | 2022-09-03 15:09:27 |
| 100013 | 2022-09-03 15:09:27 | 2022-09-03 15:08:41 | 9999-99-99 00:00:00 |
具体赋值如下图所示:

5. 排序限制分区函数
5.1. 各排序限制分区函数说明
- rows必须跟在Order by 子句之后,对排序的结果进行限制,使用固定的行数来限制分区中的数据行数量;
- OVER():指定分析函数工作的数据窗口大小,这个数据窗口大小可能会随着行的变而变化;
- current row:当前行;
- n preceding:往前n行数据;
- n following:往后n行数据;
- unbounded preceding:该窗口数据的起点;
- unbounded following:该窗口数据的终点;
5.2. SQL编写和结果展示
select
user_id
, third_cat_id as cat_id
, score
-- 以user_id分组,按照score进行升序排序
, row_number() over(partition by user_id order by score) as row_number_id
-- 以user_id分组,按照score进行升序排序,求该窗口中第一行到当前行的和
, sum(score) over(partition by user_id order by score rows between unbounded preceding and current row) sum_1
-- 以user_id分组,按照score进行升序排序,求该窗口中之前一行到当前行的和
, sum(score) over(partition by user_id order by score rows between 1 preceding and current row) sum_2
-- 以user_id分组,按照score进行升序排序,求该窗口中当前行到之后一行的和
, sum(score) over(partition by user_id order by score rows between current row and 1 following) sum_3
-- 以user_id分组,按照score进行升序排序,求该窗口中当前行到最后一行的和
, sum(score) over(partition by user_id order by score rows between current row and unbounded following) sum_3
from yishou_recommendation_system.user_third_category_score_15day
| user_id | cat_id | score | row_number_id | sum_1 | sum_2 | sum_3 | sum_3 |
| 1697 | 325 | 0.24999999928 | 1 | 0.24999999928 | 0.24999999928 | 0.24999999928 | 0.24999999928 |
| 13638 | 490 | 0.0153345765378 | 1 | 0.0153345765378 | 0.0153345765378 | 0.0306691530756 | 8.8718826075886 |
| 13638 | 493 | 0.0153345765378 | 2 | 0.0306691530756 | 0.0306691530756 | 0.0306691530756 | 8.8565480310508 |
| 13638 | 363 | 0.0153345765378 | 3 | 0.0460037296134 | 0.0306691530756 | 0.0306691530756 | 8.841213454513 |
| 13638 | 325 | 0.0153345765378 | 4 | 0.0613383061512 | 0.0306691530756 | 0.0306691530756 | 8.8258788779752 |
| 13638 | 5657 | 0.0153345765378 | 5 | 0.076672882689 | 0.0306691530756 | 0.053312244612300004 | 8.8105443014374 |
| 13638 | 340 | 0.0379776680745 | 6 | 0.1146505507635 | 0.053312244612300004 | 0.1149007449976 | 8.7952097248996 |
| 13638 | 561 | 0.0769230769231 | 7 | 0.19157362768660002 | 0.1149007449976 | 0.1538461538462 | 8.7572320568251 |
| 13638 | 472 | 0.0769230769231 | 8 | 0.26849670460970004 | 0.1538461538462 | 0.1538461538462 | 8.680308979902 |
| 13638 | 562 | 0.0769230769231 | 9 | 0.34541978153280006 | 0.1538461538462 | 0.1538461538462 | 8.6033859029789 |
| 13638 | 361 | 0.0769230769231 | 10 | 0.4223428584559001 | 0.1538461538462 | 0.1697977199368 | 8.5264628260558 |
| 13638 | 431 | 0.0928746430137 | 11 | 0.5152175014696001 | 0.1697977199368 | 0.1857492860274 | 8.449539749132699 |
| 13638 | 510 | 0.0928746430137 | 12 | 0.6080921444833001 | 0.1857492860274 | 0.21813365975940002 | 8.356665106119 |
| 13638 | 316 | 0.1252590167457 | 13 | 0.733351161229 | 0.21813365975940002 | 0.2584700891324 | 8.2637904631053 |
| 13638 | 521 | 0.1332110723867 | 14 | 0.8665622336157 | 0.2584700891324 | 0.273371808999 | 8.1385314463596 |
| 13638 | 394 | 0.1401607366123 | 15 | 1.006722970228 | 0.273371808999 | 0.29136279587799996 | 8.0053203739729 |
| 13638 | 303 | 0.1512020592657 | 16 | 1.1579250294937 | 0.29136279587799996 | 0.3043264157554 | 7.8651596373606 |
| 13638 | 314 | 0.1531243564897 | 17 | 1.3110493859834 | 0.3043264157554 | 0.3455651679632 | 7.713957578094901 |
| 13638 | 401 | 0.1924408114735 | 18 | 1.5034901974569 | 0.3455651679632 | 0.4623088594713 | 7.5608332216052005 |
| 13638 | 300 | 0.2698680479978 | 19 | 1.7733582454546999 | 0.4623088594713 | 0.6112132817652001 | 7.368392410131699 |
| 13638 | 299 | 0.3413452337674 | 20 | 2.1147034792221 | 0.6112132817652001 | 0.7498798837337 | 7.098524362133899 |
| 13638 | 304 | 0.4085346499663 | 21 | 2.5232381291883996 | 0.7498798837337 | 0.8382105577517001 | 6.7571791283665 |
| 13638 | 308 | 0.4296759077854 | 22 | 2.9529140369738 | 0.8382105577517001 | 0.8846762522578 | 6.3486444784002 |
| 13638 | 301 | 0.4550003444724 | 23 | 3.4079143814462 | 0.8846762522578 | 0.9186182394169 | 5.918968570614799 |
| 13638 | 556 | 0.4636178949445 | 24 | 3.8715322763907 | 0.9186182394169 | 0.959736255427 | 5.4639682261424 |
| 13638 | 488 | 0.4961183604825 | 25 | 4.3676506368732 | 0.959736255427 | 0.9952097626912 | 5.0003503311979 |
| 13638 | 294 | 0.4990914022087 | 26 | 4.8667420390819 | 0.9952097626912 | 1.3424478401653 | 4.5042319707154 |
| 13638 | 318 | 0.8433564379566 | 27 | 5.7100984770385 | 1.3424478401653 | 1.690664123752 | 4.0051405685067 |
| 13638 | 295 | 0.8473076857954 | 28 | 6.5574061628338995 | 1.690664123752 | 3.1617841305501 | 3.1617841305501 |
| 13638 | 302 | 2.3144764447547 | 29 | 8.8718826075886 | 3.1617841305501 | 2.3144764447547 | 2.3144764447547 |
| 16530 | 314 | 0.0833333330933 | 1 | 0.0833333330933 | 0.0833333330933 | 0.1666666661866 | 1.6825354114854 |
| 16530 | 318 | 0.0833333330933 | 2 | 0.1666666661866 | 0.1666666661866 | 0.23571003949270003 | 1.5992020783921002 |
| 16530 | 493 | 0.1523767063994 | 3 | 0.319043372586 | 0.23571003949270003 | 0.3047534127988 | 1.5158687452988 |
| 16530 | 303 | 0.1523767063994 | 4 | 0.47142007898540006 | 0.3047534127988 | 0.3381259921594 | 1.3634920388994 |
| 16530 | 325 | 0.18574928576 | 5 | 0.6571693647454 | 0.3381259921594 | 0.37149857152 | 1.2111153325 |
| 16530 | 489 | 0.18574928576 | 6 | 0.8429186505054 | 0.37149857152 | 1.02536604674 | 1.02536604674 |
| 16530 | 324 | 0.83961676098 | 7 | 1.6825354114854 | 1.02536604674 | 0.83961676098 | 0.83961676098 |