OceanBase 分析函数

简介

分析函数（某些数据库下也叫做窗口函数）与聚合函数类似，计算总是基于一组行的集合，不同的是，聚合函数一组只能返回一行，而分析函数每组可以返回多行，组内每一行都是基于窗口的逻辑计算的结果。分析函数可以显著优化需要 self-join 的查询。

分析函数语法

“窗口”也称为 FRAME，OceanBase 数据库同时支持 ROWS 与 RANGE 两种 FRAME 语义，前者是基于物理行偏移的窗口，后者则是基于逻辑值偏移的窗口。

分析函数语法如下：

analytic_function:
  analytic_function([ arguments ]) OVER (analytic_clause)

analytic_clause:
  [ query_partition_clause ] [ order_by_clause [ windowing_clause ] ]

query_partition_clause:
  PARTITION BY { expr[, expr ]... | ( expr[, expr ]... ) }

order_by_clause:
  ORDER [ SIBLINGS ] BY{ expr | position | c_alias } [ ASC | DESC ] [ NULLS FIRST | NULLS LAST ] [, { expr | position | c_alias } [ ASC | DESC ][ NULLS FIRST | NULLS LAST ]]...

windowing_clause:
  { ROWS | RANGE } { BETWEEN { UNBOUNDED PRECEDING | CURRENT ROW | value_expr {
  PRECEDING | FOLLOWING } } AND{ UNBOUNDED FOLLOWING | CURRENT ROW | value_expr { 
  PRECEDING | FOLLOWING } } | { UNBOUNDED PRECEDING | CURRENT ROW| value_expr 
  PRECEDING}}

SUM/MIN/MAX/COUNT/AVG

声明

SUM 的语法为：SUM([ DISTINCT | ALL ] expr) [ OVER (analytic_clause) ]

MIN 的语法为：MIN([ DISTINCT | ALL ] expr) [ OVER (analytic_clause) ]

MAX 的语法为：MAX([ DISTINCT | ALL ] expr) [ OVER (analytic_clause) ]

COUNT 的语法为：COUNT({ * | [ DISTINCT | ALL ] expr }) [ OVER (analytic_clause) ]

AVG 的语法为：AVG([ DISTINCT | ALL ] expr) [ OVER(analytic_clause) ]

说明

以上分析函数都有对应的聚合函数，其中，SUM 返回 expr 的和，MIN/MAX 返回 expr 的最小值/最大值，COUNT 返回窗口中查询的行数，AVG 返回 expr 的平均值。

对于 COUNT 函数，如果指定了 expr，即返回 expr 不为 NULL 的统计个数，如果指定 COUNT(*) 返回所有行的统计数目。

例子

obclient> create table exployees(last_name char(10), salary decimal, job_id char(32));
Query OK, 0 rows affected (0.17 sec)

obclient> insert into exployees values('jim', 2000, 'cleaner');
Query OK, 1 row affected (0.03 sec)

obclient> insert into exployees values('mike', 12000, 'engineering');
Query OK, 1 row affected (0.00 sec)

obclient> insert into exployees values('lily', 13000, 'engineering');
Query OK, 1 row affected (0.01 sec)

obclient> insert into exployees values('tom', 11000, 'engineering');
Query OK, 1 row affected (0.00 sec)

obclient>select last_name, sum(salary) over(partition by job_id) totol_s, min(salary) over(partition by job_id) min_s, max(salary) over(partition by job_id) max_s, count(*) over(partition by job_id) count_s from exployees;
+-----------+---------+-------+-------+---------+
| last_name | totol_s | min_s | max_s | count_s |
+-----------+---------+-------+-------+---------+
| jim       |    2000 |  2000 |  2000 |       1 |
| mike      |   36000 | 11000 | 13000 |       3 |
| lily      |   36000 | 11000 | 13000 |       3 |
| tom       |   36000 | 11000 | 13000 |       3 |
+-----------+---------+-------+-------+---------+
4 rows in set (0.01 sec)

NTH_VALUE/FIRST_VALUE/LAST_VALUE

声明

NTH_VALUE 的语法为：NTH_VALUE (measure_expr, n) [ FROM { FIRST | LAST } ] [ { RESPECT | IGNORE } NULLS ] OVER (analytic_clause)

FIRST_VALUE 的语法为：FIRST_VALUE { (expr) [ {RESPECT | IGNORE} NULLS ] | (expr [ {RESPECT | IGNORE} NULLS ])} OVER (analytic_clause)

LAST_VALUE 的语法为：LAST_VALUE { (expr) [ {RESPECT | IGNORE} NULLS ] | (expr [ {RESPECT | IGNORE} NULLS ])} OVER (analytic_clause)

说明

NTH_VALUE 函数表示第几个值，方向由 [ FROM { FIRST | LAST } ] 确定，默认为 FROM FIRST，含有是否忽略 NULL 值的标志。其窗口为统一的 analytic_clause。这里 n 应该是正数，如果 n 是 NULL，函数将返回错误；如果 n 大于窗口内所有的行数，此函数将返回 NULL。

FIRST_VALUE 和 LAST_VALUE 表示从第一个开始计数或者是从最后一个开始计数。

例子

obclient> create table exployees(last_name char(10), salary decimal, job_id char(32));
Query OK, 0 rows affected (0.08 sec)

obclient> insert into exployees values('jim', 2000, 'cleaner');
Query OK, 1 row affected (0.11 sec)

obclient> insert into exployees values('mike', 12000, 'engineering');
Query OK, 1 row affected (0.00 sec)

obclient> insert into exployees values('lily', 13000, 'engineering');
Query OK, 1 row affected (0.00 sec)

obclient> insert into exployees values('tom', 11000, 'engineering');
Query OK, 1 row affected (0.01 sec)

obclient> select last_name, first_value(salary) over(partition by job_id) totol_s, last_value(salary) over(partition by job_id) min_s, max(salary) over(partition by job_id) max_s from exployees;
+-----------+---------+-------+-------+
| last_name | totol_s | min_s | max_s |
+-----------+---------+-------+-------+
| jim       |    2000 |  2000 |  2000 |
| mike      |   12000 | 11000 | 13000 |
| lily      |   12000 | 11000 | 13000 |
| tom       |   12000 | 11000 | 13000 |
+-----------+---------+-------+-------+
4 rows in set (0.01 sec)

LEAD/LAG

声明

LEAD 的语法为：LEAD { ( value_expr [, offset [, default]]) [ { RESPECT | IGNORE } NULLS ] | ( value_expr [ { RESPECT | IGNORE } NULLS ] [, offset [, default]] )} OVER ([ query_partition_clause ] order_by_clause)

LAG 的语法为：LAG { ( value_expr [, offset [, default]]) [ { RESPECT | IGNORE } NULLS ] | ( value_expr [ { RESPECT | IGNORE } NULLS ] [, offset [, default]] )} OVER ([ query_partition_clause ] order_by_clause)

说明

LEAD 和 LAG 含义为可以在一次查询中取出当前行的同一个字段的前面或后面第 N 行的数据，这种操作可以使用相同表的自连接来实现，但 LEAD/LAG 窗口函数有更高的效率。

其中，value_expr 是要做比对的字段，offset 是 value_expr 的偏移量，default 参数的默认值为 NULL，即如果在 LEAD/LAG 没有显示的设置 default 值的情况下，返回值为 NULL。例如：对 LAG 来说，当前行为 4，offset 值为 6，这时候所要找的数据就是第 -2 行，不存在此行即返回 default 的值。

[ { RESPECT | IGNORE } NULLS ] 的语法为是否考虑 NULL 值，默认为 RESPECT，考虑 NULL 值。

注意 LEAD/LAG 两个函数后必须有 order_by_clause，数据应该在一个列上排序之后才能有前多少行后多少行的概念。query_partition_clause 是可选的，如果没有 query_partition_clause，就是全局的数据。

例子

obclient> create table exployees(last_name char(10), salary decimal, job_id char(32));
Query OK, 0 rows affected (0.08 sec)

obclient> insert into exployees values('jim', 2000, 'cleaner');
Query OK, 1 row affected (0.11 sec)

obclient> insert into exployees values('mike', 12000, 'engineering');
Query OK, 1 row affected (0.01 sec)

obclient> insert into exployees values('lily', 13000, 'engineering');
Query OK, 1 row affected (0.00 sec)

obclient> insert into exployees values('tom', 11000, 'engineering');
Query OK, 1 row affected (0.00 sec)

obclient> select last_name, lead(salary) over(order by salary) lead, lag(salary) over(order by salary) lag from exployees;
+-----------+-------+-------+
| last_name | lead  | lag   |
+-----------+-------+-------+
| jim       | 11000 |  NULL |
| tom       | 12000 |  2000 |
| mike      | 13000 | 11000 |
| lily      |  NULL | 12000 |
+-----------+-------+-------+
4 rows in set (0.01 sec)

STDDEV/VARIANCE/STDDEV_SAMP/STDDEV_POP

声明

VARIANCE 的语法为：VARIANCE([ DISTINCT | ALL ] expr) [ OVER (analytic_clause) ]

STDDEV 的语法为：STDDEV([ DISTINCT | ALL ] expr) [ OVER (analytic_clause) ]

STDDEV_SAMP 的语法为：STDDEV_SAMP(expr) [ OVER (analytic_clause) ]

STDDEV_POP 的语法为：STDDEV_POP(expr) [ OVER (analytic_clause) ]

说明

VARIANCE 返回的是 expr 的方差，expr 可能是数值类型或者可以转换成数值类型的类型，方差的类型和输入的值的类型相同。

STDDEV 返回的是 expr 的标准差，参数类型方面和 VARIANCE 的相同。

STDDEV_SAMP 返回的是样本标准差。

STDDEV_POP 返回的是总体标准差。

例子

obclient> create table exployees(last_name char(10), salary decimal, job_id char(32));
Query OK, 0 rows affected (0.08 sec)

obclient> insert into exployees values('jim', 2000, 'cleaner');
Query OK, 1 row affected (0.11 sec)

obclient> insert into exployees values('mike', 12000, 'engineering');
Query OK, 1 row affected (0.01 sec)

obclient> insert into exployees values('lily', 13000, 'engineering');
Query OK, 1 row affected (0.00 sec)

obclient> insert into exployees values('tom', 11000, 'engineering');
Query OK, 1 row affected (0.00 sec)

obclient> select last_name, stddev(salary) over(order by salary) std, variance(salary) over(order by salary) var, stddev_pop(salary) over() std_pop, stddev_samp(salary) over() from exployees;
+-----------+-------------------+--------------------+-------------------+----------------------------+
| last_name | std               | var                | std_pop           | stddev_samp(salary) over() |
+-----------+-------------------+--------------------+-------------------+----------------------------+
| jim       |                 0 |                  0 | 4387.482193696061 |          5066.228051190222 |
| tom       |              4500 |           20250000 | 4387.482193696061 |          5066.228051190222 |
| mike      | 4496.912521077347 | 20222222.222222224 | 4387.482193696061 |          5066.228051190222 |
| lily      | 4387.482193696061 |           19250000 | 4387.482193696061 |          5066.228051190222 |
+-----------+-------------------+--------------------+-------------------+----------------------------+
4 rows in set (0.00 sec)

NTILE

声明

NTILE(expr) OVER ([ query_partition_clause ] order_by_clause)

说明

NTILE 函数将分区中已经排序的行划分为大小尽可能相同的指定数量的分组，并返回给每行组号。expr 如果是 NULL，则返回 NULL。

例子

obclient> create table exployees(last_name char(10), salary decimal, job_id char(32));
Query OK, 0 rows affected (0.08 sec)

obclient> insert into exployees values('jim', 2000, 'cleaner');
Query OK, 1 row affected (0.11 sec)

obclient> insert into exployees values('mike', 12000, 'engineering');
Query OK, 1 row affected (0.01 sec)

obclient> insert into exployees values('lily', 13000, 'engineering');
Query OK, 1 row affected (0.00 sec)

obclient> insert into exployees values('tom', 11000, 'engineering');
Query OK, 1 row affected (0.00 sec)

obclient> select last_name, ntile(10) over(partition by job_id order by salary) ntl from exployees;
+-----------+------+
| last_name | ntl  |
+-----------+------+
| jim       |    1 |
| tom       |    1 |
| mike      |    2 |
| lily      |    3 |
+-----------+------+
4 rows in set (0.01 sec)

ROW_NUMBER

声明

ROW_NUMBER( ) OVER ([ query_partition_clause ] order_by_clause)

说明

ROW_NUMBER 函数按照 order_by_clause 子句中指定的行的顺序，为每一行分配一个编号。

例子

obclient> create table exployees(last_name char(10), salary decimal, job_id char(32));
Query OK, 0 rows affected (0.08 sec)

obclient> insert into exployees values('jim', 2000, 'cleaner');
Query OK, 1 row affected (0.11 sec)

obclient> insert into exployees values('mike', 12000, 'engineering');
Query OK, 1 row affected (0.01 sec)

obclient> insert into exployees values('lily', 13000, 'engineering');
Query OK, 1 row affected (0.00 sec)

obclient> insert into exployees values('tom', 11000, 'engineering');
Query OK, 1 row affected (0.00 sec)

obclient> select last_name, row_number() over(partition by job_id order by salary) ntl from exployees;
+-----------+------+
| last_name | ntl  |
+-----------+------+
| jim       |    1 |
| tom       |    1 |
| mike      |    2 |
| lily      |    3 |
+-----------+------+
4 rows in set (0.00 sec)

RANK/DENSE_RANK/PERCENT_RANK

声明

RANK 的语法为：RANK( ) OVER ([ query_partition_clause ] order_by_clause)

DENSE_RANK 的语法为：DENSE_RANK( ) OVER([ query_partition_clause ] order_by_clause)

PERCENT_RANK 的语法为：PERCENT_RANK( ) OVER ([ query_partition_clause ] order_by_clause)

说明

RANK 计算每一行数据在某列上的排序，该列由 order_by_clause 中的列决定。例如，按照 salary 排序可以看出员工的收入排名。

DENSE_RANK 的语义基本和 RANK 函数相同，但是 RANK 的排序中间会有‘跳过’，但是 DENSE_RANK 中不会有。

PERCENT_RANK 的语义基本和 RANK 函数相同，但是 PERCENT_RANK 排序的结果是百分比，计算的是给定行的百分比。

例子

obclient> create table exployees(last_name char(10), salary decimal, job_id char(32));
Query OK, 0 rows affected (0.10 sec)

obclient> insert into exployees values('jim', 2000, 'cleaner');
Query OK, 1 row affected (0.11 sec)

obclient> insert into exployees values('mike', 12000, 'engineering');
Query OK, 1 row affected (0.00 sec)

obclient> insert into exployees values('lily', 13000, 'engineering');
Query OK, 1 row affected (0.00 sec)

obclient> insert into exployees values('tom', 11000, 'engineering');
Query OK, 1 row affected (0.00 sec)

obclient> select last_name, rank() over(partition by job_id order by salary) rank, dense_rank() over(partition by job_id order by salary) dense_rank, percent_rank() over(partition by job_id order by salary) percent_rank from exployees;
+-----------+------+------------+----------------------------------+
| last_name | rank | dense_rank | percent_rank                     |
+-----------+------+------------+----------------------------------+
| jim       |    1 |          1 | 0.000000000000000000000000000000 |
| tom       |    1 |          1 | 0.000000000000000000000000000000 |
| mike      |    2 |          2 | 0.500000000000000000000000000000 |
| lily      |    3 |          3 | 1.000000000000000000000000000000 |
+-----------+------+------------+----------------------------------+
4 rows in set (0.01 sec)

CUME_DIST

声明

CUME_DIST() OVER ([ query_partition_clause ] order_by_clause)

说明

该函数计算一个值的分布，返回值为大于 0 小于等于 1 的值。作为一个分析函数，CUME_DIST 在升序情况下计算比当前行的特定列小的数据的占比。例如如下例子中，按 job_id 分组并在薪水排序的情况下，每行数据在窗口内的排序列上的占比。

例子

obclient> create table exployees(last_name char(10), salary decimal, job_id char(32));
Query OK, 0 rows affected (0.10 sec)

obclient> insert into exployees values('jim', 2000, 'cleaner');
Query OK, 1 row affected (0.11 sec)

obclient> insert into exployees values('mike', 12000, 'engineering');
Query OK, 1 row affected (0.00 sec)

obclient> insert into exployees values('lily', 13000, 'engineering');
Query OK, 1 row affected (0.00 sec)

obclient> insert into exployees values('tom', 11000, 'engineering');
Query OK, 1 row affected (0.00 sec)

obclient> select last_name, cume_dist() over(partition by job_id order by salary) cume_dist from exployees;
+-----------+----------------------------------+
| last_name | cume_dist                        |
+-----------+----------------------------------+
| jim       | 1.000000000000000000000000000000 |
| tom       | 0.333333333333333333333333333333 |
| mike      | 0.666666666666666666666666666667 |
| lily      | 1.000000000000000000000000000000 |
+-----------+----------------------------------+
4 rows in set (0.01 sec)

w3cschool 编程狮，随时随地学编程

OceanBase 分析函数

简介

分析函数语法

SUM/MIN/MAX/COUNT/AVG

NTH_VALUE/FIRST_VALUE/LAST_VALUE

LEAD/LAG

STDDEV/VARIANCE/STDDEV_SAMP/STDDEV_POP

NTILE

ROW_NUMBER

RANK/DENSE_RANK/PERCENT_RANK

CUME_DIST

OceanBase 产品简介

OceanBase 快速入门

OceanBase 教程

OceanBase 控制台指南

OceanBase 集群工作台

OceanBase 租户

OceanBase 监控

OceanBase 诊断

OceanBase 备份恢复管理

OceanBase 安全设置

OceanBase 参数管理

OceanBase 数据传输

OceanBase 产品简介

OceanBase 产品架构

OceanBase 产品功能

OceanBase 快速入门

OceanBase 迁移项目管理

OceanBase MySQL数据库迁移

OceanBase Oracle数据库迁移

OceanBase 常见问题

OceanBase ODC 使用指南

OceanBase Web 版 ODC

OceanBase 连接数据库

OceanBase 使用工作台

OceanBase 使用工具

OceanBase 数据导出和导入

OceanBase 任务管理

OceanBase 数据库对象

OceanBase 表对象

OceanBase 视图对象

OceanBase 函数对象

OceanBase 存储过程对象

OceanBase 序列对象

OceanBase 程序包对象

OceanBase 触发器对象

OceanBase 类型对象

OceanBase 同义词对象

OceanBase 客户端版 ODC

OceanBase 连接数据库

OceanBase 使用工作台

OceanBase 使用工具

OceanBase 数据导出和导入

OceanBase 任务管理

OceanBase 数据库对象

OceanBase 表对象

OceanBase 视图对象

OceanBase 函数对象

OceanBase 存储过程对象

OceanBase 序列对象

OceanBase 程序包对象

OceanBase 触发器对象

OceanBase 类型对象

OceanBase 同义词对象

OceanBase Connector/J 开发者指南

OceanBase 什么是OceanBase Connector/J

OceanBase Connector/J 使用指南

OceanBase 基本操作

OceanBase Java数据流

OceanBase 使用LOB

OceanBase 数据源和URL

OceanBase 结果集

OceanBase 语句缓存

OceanBase 参考信息

OceanBase Oracle 模式特有功能

OceanBase 分布式事务

OceanBase SQL 参考（MySQL 模式）

OceanBase 基本元素

OceanBase 数据类型