PostgreSQL_树形结构的递归查询

栏目: 数据库 · 发布时间: 7年前

内容简介：处理不确定深度的层级结构，比如组织机构，一个常用的设计是在一张表里面保存 ID 和 Parent_ID ，并且通过自联结的办法构造一颗树。这种方式对写数据的过程很友好，但是查询过程就变得相对复杂。在不引入MPTT模型的前提下，必须通过递归算法来查询某个节点和下级子节点。Oracle提供的connect by扩展语法，简单好用。但是其他的RDBMS就没这么人性化了（或者我不知道）。最近在项目中使用PostgreSQL来查询树形数据，记录一下。如果安装了tablefunc 扩展，就可以使用PG版本的connec

处理不确定深度的层级结构，比如组织机构，一个常用的设计是在一张表里面保存 ID 和 Parent_ID ，并且通过自联结的办法构造一颗树。这种方式对写数据的过程很友好，但是查询过程就变得相对复杂。在不引入MPTT模型的前提下，必须通过递归算法来查询某个节点和下级子节点。

Oracle提供的connect by扩展语法，简单好用。但是其他的RDBMS就没这么人性化了（或者我不知道）。最近在项目中使用PostgreSQL来查询树形数据，记录一下。

构造样本数据

drop table if exists demo.tree_data;
create table demo.tree_data (
    id integer,
    code text,
    pid integer,
    sort integer
);

insert into demo.tree_data values(1, '中国', null, 1);
insert into demo.tree_data values(2, '四川', 1, 1);
insert into demo.tree_data values(3, '云南', 1, 2);
insert into demo.tree_data values(4, '成都', 2, 1);
insert into demo.tree_data values(5, '绵阳', 2, 2);	
insert into demo.tree_data values(6, '武侯区', 4, 1);
insert into demo.tree_data values(7, '昆明', 3, 1);	
复制代码

connectby函数

如果安装了tablefunc 扩展，就可以使用PG版本的connectby函数。这个没有Oracle那么强大，但是可以满足基本要求。

-- API 如下
connectby(text relname, 			-- 表名称
          text keyid_fld, 			-- id字段
          text parent_keyid_fld		-- 父id字段	
          [, text orderby_fld ], 	-- 排序字段
          text start_with, 			-- 起始行的id值
          int max_depth				-- 树深度，0表示无限
          [, text branch_delim ])	-- 路径分隔符
复制代码

-- 基本用法如下，必须通过AS子句定义返回的字段名称和类型
select * 
	from connectby('demo.tree_data', 'id', 'pid', 'sort', '1', 0, '~')
	as (id int, pid int, lvl int, branch text, sort int);
	
-- 查询结果
id | pid | lvl | branch  | sort
----+-----+-----+---------+------
  1 |     |   0 | 1       |    1
  2 |   1 |   1 | 1~2     |    2
  4 |   2 |   2 | 1~2~4   |    3
  6 |   4 |   3 | 1~2~4~6 |    4
  5 |   2 |   2 | 1~2~5   |    5
  3 |   1 |   1 | 1~3     |    6
  7 |   3 |   2 | 1~3~7   |    7
(7 rows)
复制代码

-- 仅仅使用基本用法，只能查询出id的相关信息，如果要查询code等其他字段，就需要通过额外的join操作来实现。
select 
	t.id, n.code, t.pid, p.code as pcode, lvl, branch
from (
	select * from connectby('demo.tree_data', 'id', 'pid', 'sort', '1', 0, '~')
		as (id int, pid int, lvl int, branch text, sort int)
) as t
	left join demo.tree_data as n on (t.id = n.id)
	left join demo.tree_data as p on (t.pid = p.id)
order by t.sort ;	

 id |  code  | pid | pcode | lvl | branch
----+--------+-----+-------+-----+---------
  1 | 中国   |     |       |   0 | 1
  2 | 四川   |   1 | 中国  |   1 | 1~2
  4 | 成都   |   2 | 四川  |   2 | 1~2~4
  6 | 武侯区 |   4 | 成都  |   3 | 1~2~4~6
  5 | 绵阳   |   2 | 四川  |   2 | 1~2~5
  3 | 云南   |   1 | 中国  |   1 | 1~3
  7 | 昆明   |   3 | 云南  |   2 | 1~3~7
(7 rows)
复制代码

PS：虽然通过join可以查询出节点的code，但是branch部分不能直接转换成对应的code，使用上还是不太方便。

CTE语法

使用CTE语法，通过 with recursive 来实现树形数据的递归查询。这个方法虽然没有connectby那么直接，但是灵活性和显示效果更好。

-- 
with recursive cte as
(
  -- 先查询root节点  
  select
    id, code, pid, '' as pcode,
    code as branch
  from  demo.tree_data where id = 1
  union all
  -- 通过cte递归查询root节点的直接子节点  
  select
    origin.id, origin.code, cte.id as pid, cte.code as pcode,
    cte.branch || '~' || origin.code
  from cte
  join demo.tree_data as origin on origin.pid = cte.id
)
select
  id,code, pid, pcode, branch, 
  -- 通过计算分隔符的个数，模拟计算出树形的深度
  (length(branch)-length(replace(branch, '~', ''))) as lvl
from cte;

-- 
 id |  code  | pid | pcode |        branch         | lvl
----+--------+-----+-------+-----------------------+-----
  1 | 中国   |     |      | 中国                 |   0
  2 | 四川   |   1 | 中国  | 中国~四川            |   1
  3 | 云南   |   1 | 中国  | 中国~云南            |   1
  4 | 成都   |   2 | 四川  | 中国~四川~成都        |   2
  5 | 绵阳   |   2 | 四川  | 中国~四川~绵阳        |   2
  7 | 昆明   |   3 | 云南  | 中国~云南~昆明        |   2
  6 | 武侯区 |   4 | 成都  | 中国~四川~成都~武侯区   |   3
(7 rows)
复制代码

执行过程说明

从上面的例子可以看出，WITH RECURSIVE语句包含了两个部分

non-recursive term（非递归部分），即上例中的union all前面部分
recursive term（递归部分），即上例中union all后面部分

执行步骤如下

执行non-recursive term。（如果使用的是union而非union all，则需对结果去重）其结果作为recursive term中对result的引用，同时将这部分结果放入临时的working table中
重复执行如下步骤，直到working table为空：用working table的内容替换递归的自引用，执行recursive term，（如果使用union而非union all，去除重复数据），并用该结果（如果使用union而非union all，则是去重后的结果）替换working table

以上面的query为例，来看看具体过程

执行non-recursive query

-- step 1 执行
  select
    id, code, pid, '' as pcode,
    code as branch
  from  demo.tree_data where id = 1
  
-- 结果集和working table为
 id | code | pid | pcode | branch
----+------+-----+-------+--------
  1 | 中国 |     |       | 中国
复制代码

执行recursive query

-- step 2 执行递归，此时自引用cte中的数据是step 1的结果
  select
    origin.id, origin.code, cte.id as pid, cte.code as pcode,
    cte.branch || '~' || origin.code
  from cte
  join demo.tree_data as origin on origin.pid = cte.id
  
 -- 结果集和working table为
  id |  code  | pid | pcode |        branch        
----+--------+-----+-------+---------------------
  2 | 四川   |   1 | 中国  | 中国~四川            
  3 | 云南   |   1 | 中国  | 中国~云南            
复制代码

继续执行recursive query，直到结果集和working table为空
结束递归，将前三个步骤的结果集合并，即得到最终的WITH RECURSIVE的结果集。

严格来讲，这个过程实现上是一个迭代的过程而非递归，不过RECURSIVE这个关键词是 SQL 标准委员会定立的，所以PostgreSQL也延用了RECURSIVE这一关键词。

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

Natural Language Processing with Python

Steven Bird、Ewan Klein、Edward Loper / O'Reilly Media / 2009-7-10 / USD 44.99

This book offers a highly accessible introduction to Natural Language Processing, the field that underpins a variety of language technologies, ranging from predictive text and email filtering to autom......一起来看看《Natural Language Processing with Python》这本书的介绍吧!

码农工具