Using the basic group-by in a T-SQL query is very common when dealing with grouping, but there is one other option that often gets overlooked. In SQL-Server, you can also do multiple grouping sets, by grouping on different sets of fields. The benefits, a more efficient and all-in-one query; versus doing separate individual, group-by, queries put together via a union. Here is a basic example of using grouping sets on the Adventure Works database.

SELECT
[CustomerID]
,TerritoryID
,SUM([SubTotal]) AS TotalSpent
FROM [AdventureWorks2012].[Sales].[SalesOrderHeader]
GROUP BY GROUPING SETS
(
(CustomerID),
(TerritoryID),
()
)
ORDER BY TotalSpent desc;

In this query, we are doing three types of grouping, on CustmerID, TerritoryID, and empty group. The results of the query:

GroupingSets1

Each of the groups are distinguishable by null values, used as placeholders in the columns that are not used in the set.

 1) CustomerID Gives the total spent per customer, with null value in the TerritoryID column. Results found in row 12 and up.

 2) TerritoryID Gives the total spent in each territory, with null value in the CustomerID column. Results found in row 2 11.

 3) An empty grouping set, giving a grand total for all rows, with null values in both TerritoryID and CustomerID columns. Results found in row 1.

With the grouping set, there are also two short-cut abbreviations (CUBE, ROLLUP) that can be applied instead of writing out every possibility of those sets.

CUBE

Cube, is a way to define all possibilities of a set-group down to the empty set. A query written without the short-cut:

SELECT
[CustomerID]
,TerritoryID
,SUM([SubTotal]) AS TotalSpent
FROM [AdventureWorks2012].[Sales].[SalesOrderHeader]
GROUP BY GROUPING SETS
(
(CustomerID,TerritoryID),
(CustomerID),
(TerritoryID),
()
)
ORDER BY TotalSpent desc;

Can be shortened to:

SELECT
[CustomerID]
,TerritoryID
,SUM([SubTotal]) AS TotalSpent
FROM [AdventureWorks2012].[Sales].[SalesOrderHeader]
GROUP BY CUBE (CustomerID,TerritoryID)
ORDER BY TotalSpent desc;

ROLLUP

Rollup is a short-cut used when dealing with hierarchical fields. For this example, the hierarchy of the fields, from broad to narrow (Group, CountryRegionCode, Name, TerritoryID). For the long version of the query:

SELECT
st.[Group]
,st.CountryRegionCode
,st.[Name]
,st.TerritoryID
,SUM([SubTotal]) AS TotalSpent
FROM [AdventureWorks2012].[Sales].[SalesOrderHeader] soh
INNER JOIN [AdventureWorks2012].[Sales].[SalesTerritory] st on soh.TerritoryID = st.TerritoryID
GROUP BY GROUPING SETS
(
(st.[Group],st.CountryRegionCode,st.[Name],st.TerritoryID),
(st.[Group],st.CountryRegionCode,st.[Name]),
(st.[Group],st.CountryRegionCode),
(st.[Group]),
()
)
ORDER BY TotalSpent desc;

Can be replaced with:

SELECT
st.[Group]
,st.CountryRegionCode
,st.[Name]
,st.TerritoryID
,SUM([SubTotal]) AS TotalSpent
FROM [AdventureWorks2012].[Sales].[SalesOrderHeader] soh
INNER JOIN [AdventureWorks2012].[Sales].[SalesTerritory] st on soh.TerritoryID = st.TerritoryID
GROUP BY ROLLUP (st.[Group],st.CountryRegionCode,st.[Name],st.TerritoryID)
ORDER BY TotalSpent desc;

The table results:

GroupingSets2

COMBINING ROLLUP & CUBE

When combining grouping sets,

SELECT
st.[Group]
,st.CountryRegionCode
,st.[Name]
,st.TerritoryID
,[CustomerID]
,SUM([SubTotal]) AS TotalSpent
FROM [AdventureWorks2012].[Sales].[SalesOrderHeader] soh
INNER JOIN [AdventureWorks2012].[Sales].[SalesTerritory] st on soh.TerritoryID = st.TerritoryID
GROUP BY ROLLUP (st.[Group],st.CountryRegionCode,st.[Name],st.TerritoryID), CUBE([CustomerID],st.TerritoryID)
ORDER BY TotalSpent desc;

The total resulting set ends up as the multiplication of each set.

CUBE (CustomerID,TerritoryID) will give 4 sets
(
CustomerID + TerritoryID,
CustomerID,
TerritoryID,
{}
)

ROLLUP (st.[Group],st.CountryRegionCode,st.[Name],st.TerritoryID) will give 5 sets
(
Group + Region + Name + TerritoryID,
Group + Region + Name,
Group + Region,
Group,
{}
)

The resulting set total will end up being 20 sets:

DEALING WITH NULL

Using null-values to distinguish sets, works as long as there are no null values coming through the query. If we used a query, where SalesPersonID and TerritoryID could be null:

SELECT
SUM(SubTotal) as CalcSubTotal,
SalesPersonID,
TerritoryID
FROM
[Sales].[SalesOrderHeader] soh
GROUP BY CUBE (SalesPersonID, TerritoryID)
ORDER BY CalcSubTotal desc;

GroupingSets3

The results can end up being confusing. Is row 1 or row 2 the empty set total? To get around this problem, we have two functions that can be used, Grouping, and Grouping_Id.

Grouping

SELECT
SUM(SubTotal) as CalcSubTotal,
SalesPersonID,
GROUPING(SalesPersonID) AS SalesPersonGrouping,
TerritoryID,
GROUPING(TerritoryID) AS TerritoryGrouping
FROM
[Sales].[SalesOrderHeader] soh
GROUP BY CUBE (SalesPersonID, TerritoryID)
ORDER BY SalesPersonGrouping desc, TerritoryGrouping desc, CalcSubTotal desc;

The results from the above query, a little backwards, where 0 represents the grouping column(s), and 1 represents the column(s) not used in the grouping. From the table below, row 1 represents the empty set, and row 2 represents the total with no or unknown sales person.

GroupingSets4

Grouping_ID

 

SELECT
SUM(SubTotal) as CalcSubTotal,
SalesPersonID,
TerritoryID,
GROUPING_ID(SalesPersonID, TerritoryID) as GrpingID
FROM
[Sales].[SalesOrderHeader] soh
GROUP BY CUBE (SalesPersonID, TerritoryID)
ORDER BY CalcSubTotal desc;

The grouping_id functions accepts a list of grouping columns to be used. The results, giving an integer value, is actually the conversion from a binary value (e.g. 00 = 0, 01 = 1, 10 = 2, 11 = 3). Just like the grouping function, the concept of ones and zero are reversed; where, 0 represents the grouping and 1 is where the column was not used.

GroupingSets5