TSQL archives | DawaBI TSQL Blog Category

TSQL

A Simple TSQL Pivot

I’ve always had issues writing a TSQL Pivot statement without going back to books-on-line to figure it out.  Maybe because I haven’t done it enough, but it always seems a bit convoluted.  That meant it was time to write about it and break it down so I could make sense of it.  Let’s set up the demo data first.  Based on the data I was working on pivoting I wrote up some simple demo data.  It has a name, objective and a result for the said objective.  It has one record per objective and the pivot should line up those objectives so there is one row per name instead of one row per objective.

DECLARE @BaseData table (  

Name   varchar(10)  

,Objective  varchar(5)  

,Result   varchar(15)  )

 

INSERT @BaseData

SELECT ‘Bob’,’Obj1′,’Complete’

UNION

SELECT ‘Bob’,’Obj2′,’Pending’

UNION

SELECT ‘Lisa’,’Obj1′,’Pending’

UNION

SELECT ‘Lisa’,’Obj2′,’Pending’

UNION

SELECT ‘Don’,’Obj1′,’Complete’

UNION

SELECT ‘Seth’,’Obj1′,’Complete’

UNION

SELECT ‘Seth’,’Obj2′,’Complete’

UNION

SELECT ‘Megan’,’Obj1′,’Pending’

UNION

SELECT ‘Megan’,’Obj2′,’Complete’

 

 

SELECT * FROM @BaseData

 

 Now to work on understanding the pivot.  Here’s the finished Pivot statement for reference: 

PIVOT(MAX(Result) for Objective in ([OBJ1],[OBJ2])) as ResultPivot

 The [Result] column is the data that needs to be pivoted.  The first stumbling part of the pivot is that the data that is being pivoted has to be wrapped in some kind of accumulation.  So if you are dealing with numerical data it can be a MIN, MAX, AVG, etc.  With string data, which is what I dealt with the easiest is MIN or MAX.  Since there is only one result per objective in this data, the MAX will be that value.

The next part of the pivot is naming the columns that the [Result] data will be pivoted into.  I associated this to telling the PIVOT command what the key was to the data to be pivoted.  In this case there are consistent values in the Objective column of Obj1 and Obj2.  Not all Names in the data have an Obj1 and Obj2 which is fine as you will see.  If you don’t have a consistent value like this I could see using a derived column like a ROWNUMBER() or something similar to give the data values to create columns on.

After naming the columns that the pivoted data will go into the entire statement needs an alias.  It won’t necessarily be used in the SELECT statement but the PIVOT statement will throw errors if it is not aliased.

Here is how the PIVOT is worked into the query. 

SELECT Name, OBJ1 as Objective1, OBJ2 as Objective2                FROM @BaseData

                PIVOT(MAX(Result) for Objective in ([OBJ1],[OBJ2])) as ResultPivot

 Notice in the SELECT statement the columns from the pivot get the names from the PIVOT statement.  I did this because I wanted to alias the column names to be more descriptive.  If you get rid of the column names and just put in SELECT it will work too.  If you put the [RESULT] column in the columns to SELECT the statement will error and throw Invalid Column Name.’  The [RESULT] column basically no longer exists when the data is pivoted.  When the query is run (with the table variable above to create the data) the results will look like this:

Name Objective1 Objective2
Bob Complete Pending
Don Complete NULL
Lisa Pending Pending
Megan Pending Complete
Seth Complete Complete

 If you look at Don’s record you will see that Objective 2 is NULL.  That objective does not exist for him.  If there are hundreds or thousands of records that are being pivoted and a flag needs to be created if there are missing objective records it’s simple.  Those objective columns can be used in the query just as normal columns.  In the query below you can see how a CASE statement was added to flag those people with missing records.

SELECT Name, OBJ1 as Objective1, OBJ2 as Objective2                ,CASE

                                WHEN OBJ1 is NULL THEN ‘TRUE’

                                WHEN OBJ2 is NULL THEN ‘TRUE’

                                ELSE ”

                                END as ErrorFlag

                FROM @BaseData

                PIVOT(MAX(Result) for Objective in ([OBJ1],[OBJ2])) as ResultPivot

 The results of this query will have an additional column called ErrorFlag that will have a value of TRUE if there is a NULL in Objective 1 or Objective 2.  This will allow for analysis of the data to easily point out those records.

Name Objective1 Objective2 ErrorFlag
Bob Complete Pending  
Don Complete NULL TRUE
Lisa Pending Pending  
Megan Pending Complete  
Seth Complete Complete  

 Another way of making the NULL data stand out or if NULL data doesn’t matter at all the OBJ1 and OBJ2 could be wrapped in an ISNULL.  That could be used to put in an empty string or some value to indicate problems with the data.

Hopefully, this helps out.  I’m sure the next time I write a PIVOT I’ll have to dig out this posting and skim through it real quick to remember the details.

By |July 8th, 2014|TSQL|

WRITING QUERIES WITH MULTIPLE GROUPING SETS

Using the basic group-by in a T-SQL query is very common when dealing with grouping, but there is one other option that often gets overlooked. In SQL-Server, you can also do multiple grouping sets, by grouping on different sets of fields. The benefits, a more efficient and all-in-one query; versus doing separate individual, group-by, queries put together via a union. Here is a basic example of using grouping sets on the Adventure Works database.

SELECT
[CustomerID]
,TerritoryID
,SUM([SubTotal]) AS TotalSpent
FROM [AdventureWorks2012].[Sales].[SalesOrderHeader]
GROUP BY GROUPING SETS
(
(CustomerID),
(TerritoryID),
()
)
ORDER BY TotalSpent desc;

In this query, we are doing three types of grouping, on CustmerID, TerritoryID, and empty group. The results of the query:

GroupingSets1

Each of the groups are distinguishable by null values, used as placeholders in the columns that are not used in the set.

 1) CustomerID Gives the total spent per customer, with null value in the TerritoryID column. Results found in row 12 and up.

 2) TerritoryID Gives the total spent in each territory, with null value in the CustomerID column. Results found in row 2 11.

 3) An empty grouping set, giving a grand total for all rows, with null values in both TerritoryID and CustomerID columns. Results found in row 1.

With the grouping set, there are also two short-cut abbreviations (CUBE, ROLLUP) that can be applied instead of writing out every possibility of those sets.

CUBE

Cube, is a way to define all possibilities of a set-group down to the empty set. A query written without the short-cut:

SELECT
[CustomerID]
,TerritoryID
,SUM([SubTotal]) AS TotalSpent
FROM [AdventureWorks2012].[Sales].[SalesOrderHeader]
GROUP BY GROUPING SETS
(
(CustomerID,TerritoryID),
(CustomerID),
(TerritoryID),
()
)
ORDER BY TotalSpent desc;

Can be shortened to:

SELECT
[CustomerID]
,TerritoryID
,SUM([SubTotal]) AS TotalSpent
FROM [AdventureWorks2012].[Sales].[SalesOrderHeader]
GROUP BY CUBE (CustomerID,TerritoryID)
ORDER BY TotalSpent desc;

ROLLUP

Rollup is a short-cut used when dealing with hierarchical fields. For this example, the hierarchy of the fields, from broad to narrow (Group, CountryRegionCode, Name, TerritoryID). For the long version of the query:

SELECT
st.[Group]
,st.CountryRegionCode
,st.[Name]
,st.TerritoryID
,SUM([SubTotal]) AS TotalSpent
FROM [AdventureWorks2012].[Sales].[SalesOrderHeader] soh
INNER JOIN [AdventureWorks2012].[Sales].[SalesTerritory] st on soh.TerritoryID = st.TerritoryID
GROUP BY GROUPING SETS
(
(st.[Group],st.CountryRegionCode,st.[Name],st.TerritoryID),
(st.[Group],st.CountryRegionCode,st.[Name]),
(st.[Group],st.CountryRegionCode),
(st.[Group]),
()
)
ORDER BY TotalSpent desc;

Can be replaced with:

SELECT
st.[Group]
,st.CountryRegionCode
,st.[Name]
,st.TerritoryID
,SUM([SubTotal]) AS TotalSpent
FROM [AdventureWorks2012].[Sales].[SalesOrderHeader] soh
INNER JOIN [AdventureWorks2012].[Sales].[SalesTerritory] st on soh.TerritoryID = st.TerritoryID
GROUP BY ROLLUP (st.[Group],st.CountryRegionCode,st.[Name],st.TerritoryID)
ORDER BY TotalSpent desc;

The table results:

GroupingSets2

COMBINING ROLLUP & CUBE

When combining grouping sets,

SELECT
st.[Group]
,st.CountryRegionCode
,st.[Name]
,st.TerritoryID
,[CustomerID]
,SUM([SubTotal]) AS TotalSpent
FROM [AdventureWorks2012].[Sales].[SalesOrderHeader] soh
INNER JOIN [AdventureWorks2012].[Sales].[SalesTerritory] st on soh.TerritoryID = st.TerritoryID
GROUP BY ROLLUP (st.[Group],st.CountryRegionCode,st.[Name],st.TerritoryID), CUBE([CustomerID],st.TerritoryID)
ORDER BY TotalSpent desc;

The total resulting set ends up as the multiplication of each set.

CUBE (CustomerID,TerritoryID) will give 4 sets
(
CustomerID + TerritoryID,
CustomerID,
TerritoryID,
{}
)

ROLLUP (st.[Group],st.CountryRegionCode,st.[Name],st.TerritoryID) will give 5 sets
(
Group + Region + Name + TerritoryID,
Group + Region + Name,
Group + Region,
Group,
{}
)

The resulting set total will end up being 20 sets:

DEALING WITH NULL

Using null-values to distinguish sets, works as long as there are no null values coming through the query. If we used a query, where SalesPersonID and TerritoryID could be null:

SELECT
SUM(SubTotal) as CalcSubTotal,
SalesPersonID,
TerritoryID
FROM
[Sales].[SalesOrderHeader] soh
GROUP BY CUBE (SalesPersonID, TerritoryID)
ORDER BY CalcSubTotal desc;

GroupingSets3

The results can end up being confusing. Is row 1 or row 2 the empty set total? To get around this problem, we have two functions that can be used, Grouping, and Grouping_Id.

Grouping

SELECT
SUM(SubTotal) as CalcSubTotal,
SalesPersonID,
GROUPING(SalesPersonID) AS SalesPersonGrouping,
TerritoryID,
GROUPING(TerritoryID) AS TerritoryGrouping
FROM
[Sales].[SalesOrderHeader] soh
GROUP BY CUBE (SalesPersonID, TerritoryID)
ORDER BY SalesPersonGrouping desc, TerritoryGrouping desc, CalcSubTotal desc;

The results from the above query, a little backwards, where 0 represents the grouping column(s), and 1 represents the column(s) not used in the grouping. From the table below, row 1 represents the empty set, and row 2 represents the total with no or unknown sales person.

GroupingSets4

Grouping_ID

 

SELECT
SUM(SubTotal) as CalcSubTotal,
SalesPersonID,
TerritoryID,
GROUPING_ID(SalesPersonID, TerritoryID) as GrpingID
FROM
[Sales].[SalesOrderHeader] soh
GROUP BY CUBE (SalesPersonID, TerritoryID)
ORDER BY CalcSubTotal desc;

The grouping_id functions accepts a list of grouping columns to be used. The results, giving an integer value, is actually the conversion from a binary value (e.g. 00 = 0, 01 = 1, 10 = 2, 11 = 3). Just like the grouping function, the concept of ones and zero are reversed; where, 0 represents the grouping and 1 is where the column was not used.

GroupingSets5

SQL 2014 Natively Compiled Stored Procedures: Unsupported TSQL by In-Memory OLTP

One of the major new features for Microsoft SQL Server 2014 is the ability to have In-Memory OLTP.  If you have not heard of this, it boils down to tables are able to be completely stored in memory and all operations against that table and it’s data occur in memory.  This gives SQL Server the ability of working with data without the I/O waits that are present in disk based tables.  In versions of SQL Server prior to 2014, stored procedures were strictly an interpreted language.  With the in-memory tables for SQL Server 2014 the ability to have Natively Compiled stored procedures was added.  The advantage this presents is better performance when stored procedures are run against in memory tables.  As with anything there are caveats that go along with it.  Natively Compiled Stored Procedures will not run against disk-based tables and there are limitations as to what TSQL code you can use in them.  Interpreted stored procedures will still run against an in-memory table, but has to do so through a query interop.

The part of the limitations that jumped out at me was the limitations to TSQL.  There is definitely going to be a learning curve because there are some very common TSQL features that are not supported.  The first thing that jumped out was that cursors are not supported! (Yay!)  I know this will cause problems for some people, but I’ve always gone out of my way to find ways around using cursors.  Temp tables will not be supported in the natively compiled stored procedures since they are created on disc.  Natively compiled stored procedures cannot make interactions with tables that are stored on disc.  This will require some work, but can be overcome by using table variables that have a defined table type.  There is also an option of defining a SCHEMA_ONLY in-memory table to use as a temp table.  This means that the table structure is durable, but not the data within it.  When the SQL server is shut down any data within the SCHEMA_ONLY table(s) will be gone but the table structure will be re-created when the server starts.

Here are some of the features that I considered commonly used that will not be supported in a Natively Compiled Stored Procedure.

Feature Alternative
Cursor Set based logic or While Loops
Temp Tables Table variables (not inline) or an in-memory table created as SCHEMA_ONLY
Inline Table Variables Table variables can still be used, but the table type has to be created as a user defined table type.
Common Table Expressions (CTE) Re-writing the query to not use a CTE.
Subqueries (nested queries) Re-write the query to not use a Subquery.
User Defined Functions (UDF) The logic from the UDF will have to be built into the stored procedure.
Execute or Insert Exec Include the logic from the external stored proc
OR and IN and CASE Create queries for each case
NOT This is just a change in syntax.  Use != instead
LIKE  

 

For a full list of unsupported TSQL features in Natively Compiled Stored Procedures go here:  http://msdn.microsoft.com/en-us/library/dn246937(v=sql.120).aspx

Also, here is the white paper on In-Memory OLTP if you want to dig deeper:

http://download.microsoft.com/download/5/F/8/5F8D223F-E08B-41CC-8CE5-95B79908A872/SQL_Server_2014_In-Memory_OLTP_TDM_White_Paper.pdf

By |February 4th, 2014|TSQL|