Oracle RDBMS 11gR2 – goodbye Connect By or: the end of hierarchical querying as we know it

http://technology.amis.nl/2009/09/01/oracle-rdbms-11gr2-goodbye-connect-by-or-the-end-of-hierarchical-querying-as-we-know-it/

Many years ago, Oracle basically set the standard in SQL. Whatever was Oracle SQL could be seen as the standard. This has never been absolutely true – ANSI SQL was different from Oracle SQL. In some respects the differences can be traced back to lack of functional richness in the standard. However, in certain areas, Oracle has walked its own path with certain functions and features with the rest of the RDBMS pack following another route. It seems to be as if starting with Oracle RDBMS 9iR2, Oracle has made several important steps towards embracing the ANSI SQL standard syntax, usually while maintaining its own specific flavor of those same functions.

Some examples of Oracle specific syntax that are also available through their ANSI SQL counterpart – that in most instances has even more functionality – and that is almost always the preferred approach going forward:

  • outer join syntax: in addition to Oracle’s (+) notation, 9iR2 introduced the left outer join, right outer join and full outer join (like (+) on both ends of the join condition, something that Oracle does not support)
  • the Decode function that with 9iR2 can (and should) be replace by the CASE expression
  • the to_char function that in many instances can be replace by extract {year|month|day|HOUR | MINUTE | SECOND} from date-value
  • the ltrim and rtrim functions that can be replaced by the trim function with leading and trailing settings

The 11gR2 release has another example of Oracle specific SQL that is now complemented with the ANSI SQL standard approach:the CONNECT BY syntax for hierarchical querying that can now be replaced by the ANSI SQL standard Recursive Subquery Factoring clause, which is an extension to the WITH clause introduced as long ago as Oracle7 and still relatively underused by Oracle SQL developers.

 

The table I will use for demonstrating this new syntax is EMP. The records in EMP refer via their MGR column to other records in EMP, allowing us to create an organization hierarchy starting with KING – an employee without manager – and traversing the org-chart all the way down.

The classic way of performing the hierarchical query looks like this:

with employees (empno, name, level) as
( select empno, ename, job, mgr, level
  from   emp
  start with mgr is null
  connect by prior empno = mgr
)
select *
from   employees

This syntax has been available – in this very Oracle proprietary way – from the very first releases of the Oracle RDBMS (release 2 or 3). Later releases add some niceties to hierarchical queries, most notably the Oracle RDBMS 9iR2 and 10g releases.

  • order siblings by
  • sys_connect_path
  • connect_by_root, connect_by_is_leaf,
  • nocycle, connect_by_is_cycle

However, ANSI SQL defines an alternative approach to Hierarchical Queries. It is called Recursive Subquery Factoring (or Recursive With Clause). Apparently it is supported (to some extend) in DB2, SQL Server and PostgreSQL. It extends the WITH clause to enable formulation of recursive queries:  â€œThe query in the WITH clause (subquery) can refer to itself”

This Recursive WITH clause is ANSI. SQL The introduction of this syntax in 11gR2 makes Oracle ANSI-SQL-compatible for recursive queries.

So what is the equivalent of the standard hierarchical query shown above in the new syntax:

with employees (empno, name, job, mgr, hierlevel) as
( select empno, ename, job, mgr, 1
  from   emp
  where  mgr is null
  union all
  select e.empno, e.ename, e.job
  ,      e.mgr, m.hierlevel + 1
  from   emp e
         join
         employees m
         on (m.empno = e.mgr)
)
select *
from   employees

It looks a little complex at first. And it is more lines of code – so how can it be better? Well, give it a moment or two. When we look at more complex hierarchical queries and the use of the specials like sys_connect_path and the connect_by_root some advantages of this syntax will start to shine through.

For the moment, let’s disect this query:

The with clause consists of two queries that are combined the UNION ALL. Both queries return the same list of columns – as is required by UNION ALL. The first query is used to retrieve the root-node(s), the starting point(s) of the tree. In this case, this query only returns KING – the employee with MGR is null. The second query will add all other tree nodes to the result set returned by the subquery. This is where the magic lives: the second query refers to the subquery as a whole. The query that returns the other-than-the-root tree nodes joins with the subquery itself. What this is saying is something to the effect of:

  • the second query in the recursive subuquery is recursively processed, in subsequent iterations
  • in every iteration, we enter a deeper level in the tree
  • each iteration uses the the collection of all nodes retrieved by the first subquery (the root nodes) and all previous iterations. that means for example that the first iteration of   select e.empno, e.ename from   emp e cross join employees m on (m.empno = e.mgr) has the root-node KING to join with from the employees set. The second iteration will have the other managers serving under KING to join with.

With this understanding in our heads, could we then come up with the syntax we need for retrieving what we can do with the Oracle goodie SYS_CONNECT_PATH, the path from every tree node all the way up to its root with node values added to a string concattenated together for all nodes in the path?

It is actually surprisingly simple. To return the alternative to SYS_CONNECT_PATH(ename, ‘/’) all we need is the following:

with employees (empno, name, mgr, hierlevel, path) as
( select empno, ename, job, mgr, 1, ename
  from   emp
  where  mgr is null
  union all
  select e.empno, e.ename, e.job
  ,      e.mgr, m.hierlevel + 1
  ,      m.path||'/'||e.ename
  from   emp e
         join
         employees m
         on (m.empno = e.mgr)
)
select *
from   employees

This will select ‘KING’ as path for the first node and then for each node deeper in the tree take the path value from the parent node and add ‘/’||ename to it.

In answer to questions that arose as to how multiple subqueries can be combined with the recursive subquery, I would like to present the next query that makes a few combinations:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
with departments as
( select deptno dept_id
  ,      dname  dept_name
  from   dept
  where  loc in ('NEW YORK' ,'DALLAS')
)
, employees (empno, name, job, mgr, hierlevel) as
( select empno, ename, job, mgr, 1
  from   emp
  where  mgr is null
  union all
  select e.empno, e.ename, e.job
  ,      e.mgr, m.hierlevel + 1
  from   emp e
         join
         employees m
         on (m.empno = e.mgr)
         join
         departments
         on (e.deptno = departments.dept_id)
)
, subordinate_numbers as
( select mgr emp_id
  ,      count(*) number_of_subordinates
  from   employees
  group by mgr
)
select *
from   employees e
       left outer join
       subordinate_numbers s
       on (e.empno = s.emp_id)

In later articles we will look at the more advanced aspects of the new recursive subquery factoring syntax and functionality.

Resources

Oracle 11gR2 SQL Language reference – Recursive With Clause

Advertisements
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s