Common Table Expressions (CTE)
CTEs are one of the beautiful and the most powerful feature of SQL Server. It is introduced in SQL Server 2005 and I guess is one of the best things in SQL Server. This not only simplifies most of the complex and impossible queries in T-SQL, but also provides one medium to deal with recursive queries. Moreover, I can say it is mainly for recursive queries. It can be used in replacement of derived tables (sub queries), temp tables, table variables, inline user-defined functions etc.
The Common Table Expression (CTE) is called as a temporary named RESULT SET, within the scope of an executing statement that can be used within a SELECT, INSERT, UPDATE, or DELETE or even in a CREATE VIEW or MERGE statement.
[ WITH <common_table_expression_name>]
[ ( column_name [ ,…n ] ) ]
( CTE_query_definition ),…n
- < common_table_expression_name> : name of the CTE, used as a identifier. This should be different from any other CTE defined within the same “WITH” clause. The expression name can be same as the name of the base table or view (any reference to the expression name will use the CTE and not the base table).
- [ ( column_name [ ,…n ] ) ]: Represents the column names for the defined CTE (Optional if there is no duplicity in column names in the result set of the CTE_query_definition). And if the column names are defined with the CTE, then the number of columns should match with the inner result set’s columns.
- ( CTE_query_definition ): This specifies a SELECT statement whose result set populates the common table expression. A CTE cannot define another CTE inside it and the SELECT statement must meet the same criteria as for creating a view.
More than one CTEs can be created within one “WITH” statement, but scope of all those CTEs will be limited to the immediate t-sql statement after “WITH” (described below in detail).
Stored procedures are mainly used to deal with complex logics either for security purpose or to accomplish some combined tasks, Or in most of the cases we use temporary tables or table variables to keep intermediate results which at the end tempt to be deleted when not in use. But among all these, CTE is the best as it act as a normal view only.
Simple CTE Example: To get the employee information from the ‘employee’ table.
WITH emp_CTE as ( select EmpName, empEmailId, salary from dbo.employee) select * from emp_CTE;
This is just to show how CTEs are working. Let’s take another example, where we want information from employee along with departmental information and then want to do some operations on it.
WITH emp_CTE as ( select EmpName, empEmailId, salary, DeptName, DeptStatus from dbo.employee A inner join dbo.Department D on A.DeptID = D.DeptID ) select * from emp_CTE where DeptStatus = 'Active';
This is yet another simple query, where one CTE named “emp_CTE” is defined to keep information about employees along with departmental information.
Some more CTEs examples:
WITH CTE1(CL11,CL22) as (select 1 as CL1, 3 as CL2) ,CTE2(CL3,cl5,cl6) as (select 1,5,7) select * from CTE1 inner join CTE2 on CL11 = CL3
Here we defined two CTEs (i.e. CTE1 and CTE2) within the same “WITH” statement, and both these CTEs are used in the immediate t-sql statement.
CTEs with Union ALL
WITH CTE_union(CL1,CL2) AS ( SELECT 1 AS CL1, 3 AS CL2 UNION ALL SELECT 5 as CL11,7 as CL22) select * from CTE_union
CTEs are very useful for on the fly IDENTITY columns:
WITH emp_CTE as ( select ROW_NUMBER() over(order by (select 1)) as IdentityId, * from dbo.employee ) select * from emp_CTE
SQL Server supports two types of Common Table Expressions (CTEs):
- Recursive CTEs: Here the query executes itself, like we use recursive functions (functions calling itself) in C, java or C# .net programming languages. This is very useful in case of parent-child hierarchy relationship or base-sub class or category-subcategory, etc.
- Non- Recursive CTEs: These are the other complex queries that can be simplified with the help of CTEs like using cursors, getting sequence numbers, filtrations, etc.
Recursive CTEs: The Recursive CTEs must have two types of query definition members:
- Recursive Member: If the query definition references the CTE itself, then it is called Recursive member. This is defined at the last i.e. after defining all the anchor members.
- Anchor Member: All other CTE query definitions without reference of CTE are Anchor Members. All anchor member query definitions must be put before the first recursive member definition.
The following are the restrictions to Recursive CTEs:
- In Recursive CTEs, it’s not possible to have multiple query definitions (i.e. multiple CTEs [, separated]) within a single “WITH” statement.
- All anchor members supports UNION ALL, UNION, INTERSECT, or EXCEPT for combining the anchor definition. But only UNION ALL is supported for combining the last anchor member and the first recursive member and even for multiple recursive members.
- The total number of columns and the data types of those columns must match for the anchor member and the recursive member. (Note: data type should be same for recursive member and the corresponding anchor member)
- CTE expression_name can be referenced only once in the FROM clause of a recursive member.
Recursive Common Table Expression Examples: These are very useful in case of hierarchical data, because the CTE Query continues to execute till it reaches its lower granular level.
The below example is based on the “AdventureWorksDW2008R2” sample database.
WITH EmpDetails (EmployeeKey, FirstName, LastName, ParentEmployeeKey, PFirstName, PLastName, EmployeeRank) AS ( SELECT EmployeeKey ,FirstName ,LastName ,ParentEmployeeKey ,cast('NULL' as varchar(50)) as PFirstName ,cast('NULL' as varchar(50)) as PLastName ,1 as EmployeeRank FROM dbo.DimEmployee WHERE ParentEmployeeKey IS NULL UNION ALL SELECT e.EmployeeKey ,e.FirstName ,e.LastName ,e.ParentEmployeeKey ,cast(CTE_Emp.FirstName as varchar(50)) as PFirstName ,cast(CTE_Emp.LastName as varchar(50)) as PLastName ,CTE_Emp.EmployeeRank + 1 as EmployeeRank FROM dbo.DimEmployee e INNER JOIN EmpDetails CTE_Emp ON e.ParentEmployeeKey = CTE_Emp.EmployeeKey ) select * from EmpDetails order by EmployeeRank
Let’s dissect the above query to have a better picture of its working mechanism.
Basically any recursive CTE consists of three elements:
- First invocation of the CTE: In first invocation, all the members of the CTE will be executed i.e. both the anchor members and the recursive members. And as we have CTE referenced in the recursive members, it will automatically invoke the CTE for further iterations and thus called as recursive mode or recursive invocation. In this case Anchor member’s result set will be considered as the referenced data for the CTE. As in the above figure, in the first invocation, r0 is the value for the Anchor query and hence become the data for the CTE when referenced from the Recursive query.
- Recursive invocation of the CTE: In all recursive invocation of the CTE, only recursive members will be executed and the anchor members will remain idle. And the data for the CTE referenced will be the result set built from the last call of the CTE. Elaborated briefly below.
- Recursive Termination Condition: Well practically, we don’t have any control on the termination of the Recursive CTEs. I mean it is implicit; it will stop itself when no rows are returned from the previous invocation. Hence, the control condition can be applied indirectly i.e. the query’s result set can be limited with the help of filter conditions.
Let’s look around the execution process of the above recursive CTE.
First Round: As in the above picture, r0 is the result set of the Anchor query. And when coming to Recursive query, r0 become the CTE value when referenced and joined with the Employee table E.
Anchor Resultset: r0 [Top level employee with id 112]
Recursive Resultset (r1): E inner join r0 [based on joining condition E.ParentEmployeeKey = r0.EmployeeKey] i.e. all 2nd level employees
Second Round: From second round, the Anchor query will remain idle. Only the recursive queries will be executed. But the referenced CTE value will be the value of the last invocation only (r1) and not the union of [(r0) and (r1)]
Anchor Resultset: No result set (query remain idle)
Recursive Resultset (r2): E inner join r1 [based on joining condition E.ParentEmployeeKey = r1.EmployeeKey] i.e. all 3rd level employees
and so on till (rn), till it reaches all the levels i.e. when there will be no records from the previous invocation of the CTE.
After execution of all the iterations, the final result set will be accumulated by merging all partial result sets by UNION ALL i.e. [(r0) + (r1) + (r2) + … + (rn)]
If the conditions applied are not valid inside CTE query definition i.e. if the CTE is queried incorrectly, it may enter to an infinite loop. For example, in the above example, instead of “ON e.ParentEmployeeKey = CTE_Emp.EmployeeKey” if we apply “ON e.EmployeeKey= CTE_Emp.EmployeeKey”; it will enter an infinite loop. May be the system will crash out of memory (kidding ). Once it extends beyond the number of recursion permitted (by default it is 100), it will show an error like the below:
Msg 530, Level 16, State 1, Line 1 The statement terminated. The maximum recursion 100 has been exhausted before statement completion.
In order to avoid this type of error we can use MAXRECURSION hint in the OPTION clause of the outer query where the CTE is referenced. We can limit the number of recursion levels allowed for a particular statement by using the MAXRECURSION hint and a value between 0 and 32,767. Like for example, in the above problem, if we want to see the result set till the employee Rank 3 (employee level), we can use the MAXRECURSION hint to ‘2’ (two sub level + 1 root level) as below:
WITH EmpDetails (EmployeeKey, FirstName, LastName, ParentEmployeeKey, PFirstName, PLastName, EmployeeRank) AS ( SELECT EmployeeKey ,FirstName ,LastName ,ParentEmployeeKey ,cast('NULL' as varchar(50)) as PFirstName ,cast('NULL' as varchar(50)) as PLastName ,1 as EmployeeRank FROM dbo.DimEmployee WHERE ParentEmployeeKey IS NULL UNION ALL SELECT e.EmployeeKey ,e.FirstName ,e.LastName ,e.ParentEmployeeKey ,cast(CTE_Emp.FirstName as varchar(50)) as PFirstName ,cast(CTE_Emp.LastName as varchar(50)) as PLastName ,CTE_Emp.EmployeeRank + 1 as EmployeeRank FROM dbo.DimEmployee e INNER JOIN EmpDetails CTE_Emp ON e.ParentEmployeeKey = CTE_Emp.EmployeeKey ) select * from EmpDetails -- order by EmployeeRank OPTION (MAXRECURSION 2)
But in this case we cannot use order by clause with hints.
Some more examples for recursive CTEs:
with echo_CTE(st) as ( select st = cast('Hi ' as varchar(50)) union all select cast(st + 'Arun! ' as varchar(50)) from echo_CTE ) select st from echo_CTE OPTION (MAXRECURSION 3)
I mentioned CAST operator in the above example, because recursive CTEs are data type sensitive as well as data size sensitive. So, be careful, while selecting the columns for the CTE.
Here the above statement is controlled by “OPTION (MAXRECURSION 3)” (to run only three iteration) but it can be also controlled by using filter condition as below (I modified the above query to give more clarity to the solution).
with echo_CTE(st, lencount) as ( select st = cast('Hi ' as varchar(50)), LEN('Hi ') union all select cast(st + 'Arun! ' as varchar(50)) , len(st) from echo_CTE where lencount < 50 ) select st, lencount from echo_CTE
Note in the above output, the Lencount column is always calculating for the value of st from the previous iteration.
Now let’s try one complex operation with CTE, i.e. to split comma separated values (CSV) into table set.
Declare @stringst varchar(8000) = 'Sign UP, today, and, start up ,with, new, ideas and, discussions, in, SQL Lion, Forums' SET @stringst = @stringst + ','; --';' is needed before starting CTE defination With SplitString(startid, endid) AS ( select 0 as startid, CHARINDEX(',',@stringst,0) as endid UNION ALL select endid + 1 as startid, CHARINDEX(',',@stringst, endid + 1) as endid from SplitString where endid < LEN(@stringst) ) select SUBSTRING(@stringst,startid,endid - startid) as SplitedValue, *, (endid - startid) as wordSize from SplitString
Non-Recursive Common Table Expressions: These are CTEs with only Anchor Members and no Recursive members. Even CTEs in non-recursive mode are more powerful. Let me show one example.
To delete redundant records from a table without using any identity column or any other third table.
WITH CTE as ( select ROW_NUMBER() over(partition by col1, col2, col3 order by col1) as RecID, * from dbo.ABC ) DELETE from CTE where RecID > 1
And the coolest part of this is that it will not hamper the structure of the table i.e. all constraints and defaults will remain same as it is.
Well, this is not the end of the story! The CTE has enormous capabilities to take over any complex situation. Check out more in my upcoming posts. Register or sign-up for the latest updates.