OceanBase 基于代价的查询改写

OceanBase 数据库目前只支持一种基于代价的查询改写——或展开（OR-EXPANSION）。

数据库中很多高级的改写规则（例如 complex view merge 和窗口函数改写）都需要基于代价进行改写，OceanBase 数据库后续版本会支持这些复杂的改写规则。

或展开（OR-EXPANSION）

OR-EXPANSION 是将一个查询改写成若干个用 UNION 组成的子查询，可以为每个子查询提供更优的优化空间，但是也会导致多个子查询的执行，所以这个改写需要基于代价去判断。

OR-EXPANSION 的改写主要有如下三个作用:

允许每个分支使用不同的索引来加速查询。

如下例所示，Q1 会被改写成 Q2 的形式，其中 Q2 中的谓词 LNNVL(t1.a = 1) 保证了这两个子查询不会生成重复的结果。如果不进行改写，Q1 一般来说会选择主表作为访问路径，对于 Q2 来说，如果 t1 上存在索引（a）和索引（b），那么该改写可能会让 Q2 中的每一个子查询选择索引作为访问路径。

Q1: 
obclient>SELECT * FROM t1 WHERE t1.a = 1 OR t1.b = 1;
Q2: 
obclient>SELECT * FROM t1 WHERE t1.a = 1 UNION ALL SELECT * FROM t1.b = 1 
       AND LNNVL(t1.a = 1);

完整示例如下：

obclient>CREATE TABLE t1(a INT, b INT, c INT, d INT, e INT, INDEX IDX_a(a), 
         INDEX IDX_b(b));
Query OK, 0 rows affected (0.17 sec)

/*如果不进行 OR-EXPANSION 的改写，该查询只能使用主表访问路径*/
obclient> EXPLAIN SELECT/*+NO_REWRITE()*/ * FROM t1 WHERE t1.a = 1 OR t1.b = 1;
+--------------------------------------------------------------+
| Query Plan                                                                         |
+--------------------------------------------------------------+
| ===================================
|ID|OPERATOR  |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1  |4        |649 |
===================================

Outputs & filters:
-------------------------------------
  0 - output([t1.a], [t1.b], [t1.c], [t1.d], [t1.e]), filter([t1.a = 1 OR t1.b = 1]),
      access([t1.a], [t1.b], [t1.c], [t1.d], [t1.e]), partitions(p0)

/*改写之后，每个子查询能使用不同的索引访问路径*/
obclient>EXPLAIN SELECT * FROM t1 WHERE t1.a = 1 OR t1.b = 1;
+------------------------------------------------------------------------+
| Query Plan                                                                                         |
+------------------------------------------------------------------------+
| =========================================
|ID|OPERATOR   |NAME     |EST. ROWS|COST|
-----------------------------------------
|0 |UNION ALL  |         |3        |190 |
|1 | TABLE SCAN|t1(idx_a)|2        |94  |
|2 | TABLE SCAN|t1(idx_b)|1        |95  |
=========================================

Outputs & filters:
-------------------------------------
  0 - output([UNION(t1.a, t1.a)], [UNION(t1.b, t1.b)], [UNION(t1.c, t1.c)], [UNION(t1.d, t1.d)], [UNION(t1.e, t1.e)]), filter(nil)
  1 - output([t1.a], [t1.b], [t1.c], [t1.d], [t1.e]), filter(nil),
      access([t1.a], [t1.b], [t1.c], [t1.d], [t1.e]), partitions(p0)
  2 - output([t1.a], [t1.b], [t1.c], [t1.d], [t1.e]), filter([lnnvl(t1.a = 1)]),
      access([t1.a], [t1.b], [t1.c], [t1.d], [t1.e]), partitions(p02

允许每个分支使用不同的连接算法来加速查询，避免使用笛卡尔联接。

如下例所示，Q1 会被改写成 Q2 的形式。对于 Q1 来说，它的联接方式只能是 NESTED LOOP JOIN (笛卡尔乘积)，但是被改写之后，每个子查询都可以选择 NESTED LOOP JOIN、HASH JOIN 或者 MERGE JOIN，这样会有更多的优化空间。

Q1: 
obclient>SELECT * FROM t1, t2 WHERE t1.a = t2.a OR t1.b = t2.b;

Q2: 
obclient>SELECT * FROM t1, t2 WHERE t1.a = t2.a UNION ALL
     SELECT * FROM t1, t2 WHERE t1.b = t2.b AND LNNVL(t1.a = t2.a);

完整示例如下：

obclient> CREATE TABLE t1(a INT, b INT);
Query OK, 0 rows affected (0.17 sec)

obclient> CREATE TABLE t2(a INT, b INT);
Query OK, 0 rows affected (0.13 sec)

/*如果不进行改写，只能使用 NESTED LOOP JOIN*/
obclient> EXPLAIN SELECT/*+NO_REWRITE()*/ * FROM t1, t2 
       WHERE t1.a = t2.a OR t1.b = t2.b;
+--------------------------------------------------------------------------+
| Query Plan                                                                                          |
+--------------------------------------------------------------------------+
| ===========================================
|ID|OPERATOR        |NAME|EST. ROWS|COST  |
-------------------------------------------
|0 |NESTED-LOOP JOIN|    |3957     |585457|
|1 | TABLE SCAN     |t1  |1000     |499   |
|2 | TABLE SCAN     |t2  |4        |583   |
===========================================

Outputs & filters:
-------------------------------------
  0 - output([t1.a], [t1.b], [t2.a], [t2.b]), filter(nil),
      conds(nil), nl_params_([t1.a], [t1.b])
  1 - output([t1.a], [t1.b]), filter(nil),
      access([t1.a], [t1.b]), partitions(p0)
  2 - output([t2.a], [t2.b]), filter([? = t2.a OR ? = t2.b]),
      access([t2.a], [t2.b]), partitions(p0)

/*被改写之后，每个子查询都使用了 HASH JOIN*/
obclient>  EXPLAIN SELECT * FROM t1, t2 WHERE t1.a = t2.a OR t1.b = t2.b;
+--------------------------------------------------------------------------+
| Query Plan                                                                                         |
+--------------------------------------------------------------------------+
|ID|OPERATOR    |NAME|EST. ROWS|COST|
-------------------------------------
|0 |UNION ALL   |    |2970     |9105|
|1 | HASH JOIN  |    |1980     |3997|
|2 |  TABLE SCAN|t1  |1000     |499 |
|3 |  TABLE SCAN|t2  |1000     |499 |
|4 | HASH JOIN  |    |990      |3659|
|5 |  TABLE SCAN|t1  |1000     |499 |
|6 |  TABLE SCAN|t2  |1000     |499 |
=====================================

Outputs & filters:
-------------------------------------
  0 - output([UNION(t1.a, t1.a)], [UNION(t1.b, t1.b)], [UNION(t2.a, t2.a)], [UNION(t2.b, t2.b)]), filter(nil)
  1 - output([t1.a], [t1.b], [t2.a], [t2.b]), filter(nil),
      equal_conds([t1.a = t2.a]), other_conds(nil)
  2 - output([t1.a], [t1.b]), filter(nil),
      access([t1.a], [t1.b]), partitions(p0)
  3 - output([t2.a], [t2.b]), filter(nil),
      access([t2.a], [t2.b]), partitions(p0)
  4 - output([t1.a], [t1.b], [t2.a], [t2.b]), filter(nil),
      equal_conds([t1.b = t2.b]), other_conds([lnnvl(t1.a = t2.a)])
  5 - output([t1.a], [t1.b]), filter(nil),
      access([t1.a], [t1.b]), partitions(p0)
  6 - output([t2.a], [t2.b]), filter(nil),
      access([t2.a], [t2.b]), partitions(p0)

允许每个分支分别消除排序，更加快速的获取 TOP-K 结果。

如下例所示，Q1 会被改写成 Q2。对于 Q1 来说，执行方式是只能把满足条件的行数找出来，然后进行排序，最终取 TOP-10 结果。对于 Q2 来说，如果存在索引（a,b）, 那么 Q2 中的两个子查询都可以使用索引把排序消除，每个子查询取 TOP-10 结果，然后最终对这 20 行数据排序一下取出最终的 TOP-10 行。

Q1: 
obclient>SELECT * FROM t1 WHERE t1.a = 1 OR t1.a = 2 ORDER BY b LIMIT 10;

Q2: 
obclient>SELECT * FROM  
    (SELECT * FROM  t1 WHERE t1.a = 1 ORDER BY b LIMIT 10 UNION ALL
     SELECT * FROM  t1 WHERE t1.a = 2 ORDER BY b LIMIT 10) AS TEMP
    ORDER BY temp.b LIMIT 10;

完整示例如下：

obclient> CREATE TABLE t1(a INT, b INT, INDEX IDX_a(a, b));
Query OK, 0 rows affected (0.20 sec)

/*不改写的话，需要排序最终获取 TOP-K 结果*/
obclient> EXPLAIN SELECT/*+NO_REWRITE()*/ * FROM t1 WHERE t1.a = 1 OR t1.a = 2 
        ORDER BY b LIMIT 10;
+-------------------------------------------------------------------------+
| Query Plan                                                                                         |
+-------------------------------------------------------------------------+
| ==========================================
|ID|OPERATOR    |NAME     |EST. ROWS|COST|
------------------------------------------
|0 |LIMIT       |         |4        |77  |
|1 | TOP-N SORT |         |4        |76  |
|2 |  TABLE SCAN|t1(idx_a)|4        |73  |
==========================================

Outputs & filters:
-------------------------------------
  0 - output([t1.a], [t1.b]), filter(nil), limit(10), offset(nil)
  1 - output([t1.a], [t1.b]), filter(nil), sort_keys([t1.b, ASC]), topn(10)
  2 - output([t1.a], [t1.b]), filter(nil),
      access([t1.a], [t1.b]), partitions(p0)

/* 进行改写的话，排序算子可以被消除，最终获取 TOP-K 结果*/
obclient>EXPLAIN SELECT * FROM t1 WHERE t1.a = 1 OR t1.a = 2 
        ORDER BY b LIMIT 10;
+-------------------------------------------------------------------------+
| Query Plan                                                                                          |
+-------------------------------------------------------------------------+
| ===========================================
|ID|OPERATOR     |NAME     |EST. ROWS|COST|
-------------------------------------------
|0 |LIMIT        |         |3        |76  |
|1 | TOP-N SORT  |         |3        |76  |
|2 |  UNION ALL  |         |3        |74  |
|3 |   TABLE SCAN|t1(idx_a)|2        |37  |
|4 |   TABLE SCAN|t1(idx_a)|1        |37  |
===========================================

Outputs & filters:
-------------------------------------
  0 - output([UNION(t1.a, t1.a)], [UNION(t1.b, t1.b)]), filter(nil), limit(10), offset(nil)
  1 - output([UNION(t1.a, t1.a)], [UNION(t1.b, t1.b)]), filter(nil), sort_keys([UNION(t1.b, t1.b), ASC]), topn(10)
  2 - output([UNION(t1.a, t1.a)], [UNION(t1.b, t1.b)]), filter(nil)
  3 - output([t1.a], [t1.b]), filter(nil),
      access([t1.a], [t1.b]), partitions(p0),
      limit(10), offset(nil)
  4 - output([t1.a], [t1.b]), filter([lnnvl(t1.a = 1)]),
      access([t1.a], [t1.b]), partitions(p0),
      limit(10), offset(nil)

w3cschool 编程狮，随时随地学编程

OceanBase 基于代价的查询改写

或展开（OR-EXPANSION）

OceanBase 产品简介

OceanBase 快速入门

OceanBase 教程

OceanBase 控制台指南

OceanBase 集群工作台

OceanBase 租户

OceanBase 监控

OceanBase 诊断

OceanBase 备份恢复管理

OceanBase 安全设置

OceanBase 参数管理

OceanBase 数据传输

OceanBase 产品简介

OceanBase 产品架构

OceanBase 产品功能

OceanBase 快速入门

OceanBase 迁移项目管理

OceanBase MySQL数据库迁移

OceanBase Oracle数据库迁移

OceanBase 常见问题

OceanBase ODC 使用指南

OceanBase Web 版 ODC

OceanBase 连接数据库

OceanBase 使用工作台

OceanBase 使用工具

OceanBase 数据导出和导入

OceanBase 任务管理

OceanBase 数据库对象

OceanBase 表对象

OceanBase 视图对象

OceanBase 函数对象

OceanBase 存储过程对象

OceanBase 序列对象

OceanBase 程序包对象

OceanBase 触发器对象

OceanBase 类型对象

OceanBase 同义词对象

OceanBase 客户端版 ODC

OceanBase 连接数据库

OceanBase 使用工作台

OceanBase 使用工具

OceanBase 数据导出和导入

OceanBase 任务管理

OceanBase 数据库对象

OceanBase 表对象

OceanBase 视图对象

OceanBase 函数对象

OceanBase 存储过程对象

OceanBase 序列对象

OceanBase 程序包对象

OceanBase 触发器对象

OceanBase 类型对象

OceanBase 同义词对象

OceanBase Connector/J 开发者指南

OceanBase 什么是OceanBase Connector/J

OceanBase Connector/J 使用指南

OceanBase 基本操作

OceanBase Java数据流

OceanBase 使用LOB

OceanBase 数据源和URL

OceanBase 结果集

OceanBase 语句缓存

OceanBase 参考信息

OceanBase Oracle 模式特有功能

OceanBase 分布式事务

OceanBase SQL 参考（MySQL 模式）

OceanBase 基本元素

OceanBase 数据类型

OceanBase 运算符

OceanBase 函数

OceanBase 单行函数

OceanBase 查询和子查询

OceanBase SQL语句

OceanBase SQL 参考（Oracle 模式）

OceanBase 基本元素

OceanBase 内建数据类型

OceanBase 字符数据类型