In my previous life I did support, in both a direct customer support role and most recently a backline type support.  In that backline role I did a lot of data manipulation (updates and deletes) trying to get the customer’s data corrected.  Something was always wrong through their doing or ours.  (Mostly theirs, because we never made mistakes? ;) )  One of the greatest tools I found was the Common Table Expression.  I love CTEs!  They were great for hunting down subsets of data, especially when I had to aggregate data somehow.  Once I had the data I needed I could write updates or deletes directly against that data.

For example, let’s pretend that person data was imported into a table.  Someone realized after the import was complete that they imported the same phone numbers for multiple people and so those phone numbers need updated.  They don’t know what phone numbers are duplicated for what people, so they need a list of all the people in the table with the same phone number.  In this list they need FirstName, LastName and PhoneNumber.  (To keep it simple everyone has one phone number, pretend it’s 1980)  Because adding in the first and last name to the query would create a unique value (since you have to group by everything in the select list) only the phone number can be used at first.  This is a simple query, and I think one of the first SQL queries I ever learned.

SELECT PhoneNumber, COUNT(*) 
FROM Person 
GROUP BY PhoneNumber 
HAVING COUNT(*) > 1

With this query we will have a list of all the phone numbers in the table that are in the table more than once.  But we don’t have the FirstName or LastName.  There are a number of ways to do this.  Once I started using CTEs they became my preferred method because it’s so simple to put the query in parenthesis and add the with’ statement.  Here is the resulting query that will return all of the data requested.  You will see the original query is just wrapped into the CTE and a new query added to get the additional information.

;WITH cte_Duplicates
	 AS ( SELECT PhoneNumber
				 , Count(*) [Count]
		  FROM   Person
		  GROUP  BY PhoneNumber
		  HAVING Count(*) > 1 )
SELECT FirstName
	   , LastName
	   , PhoneNumber
FROM   Person xPer
JOIN   cte_Duplicates xDupe ON xPer.PhoneNumber = xDupe.PhoneNumber

A few notes about the CTE and the syntax if you are not familiar with the SQL Server documentation on CTE’s.  The semi-colon before the with’ is required if there is any other SQL statements running before the CTE.  I found it’s just easier to remember to put it in there no matter what.  The documentation on CTE’s show a column list after the name of the cte.  Ex. ;with cte_Duplicates (PhoneNumber,Count) The column list is optional.  The requirement with column names is that they are not duplicated within the CTE and that every column has a name.  If you have duplicates you can either alias a column within the SELECT statement or put in a column name list.  The same goes for columns with no names like the COUNT in the query.  You will notice in the CTE I put in an alias of [Count] so that it has a name and I put the name in brackets since it’s a reserved word.  I could have left the count out of the CTE since the final query doesn’t display it, but it’s nice to have it in there if you want to highlight just the select statement within the CTE and run it to look at the original data.

That list went to the person lucky enough to have to sort out what phone numbers were right and wrong.  They gave us back the list with what the phone numbers should be.  Assuming that the list was worked with in Excel I would take that list and convert it to a CSV.  I had a macro written in Notepad++ that would wrap every line in parentheses and put a comma on the end.  Then I could take that data and put it into an insert for a table variable or a temp table depending on size.  Once the data is staged somehow the person data can be fixed using the same CTE from above.  To do this we’ll just wrap the second query into a second CTE and then write the update.  For this example assume there is code above the CTE’s that declares a table variable of @Corrections and inserts the correct data.

;WITH cte_Duplicates
	 AS ( SELECT PhoneNumber
				 , Count(*) [Count]
		  FROM   Person
		  GROUP  BY PhoneNumber
		  HAVING Count(*) > 1 ),
	 cte_List
	 AS ( SELECT FirstName
				 , LastName
				 , PhoneNumber
		  FROM   Person xPer
		  JOIN   cte_Duplicates xDupe ON xPer.PhoneNumber = xDupe.PhoneNumber )
UPDATE cte_List
SET    PhoneNumber = xNew.PhoneNumber
FROM   cte_List xList
	   JOIN @Corrections xNew ON xList.FirstName = xNew.FirstName

This statement will update the underlying table (Person) with the correct phone number from the @Corrections table variable.  The second select got wrapped into a CTE, which is as easy as putting in a comma, adding a new alias and putting in the parentheses.  Then I wrote the update against the second CTE.  (The first CTE cannot be updated because it aggregates data)

There are some caveats to using CTEs with DDL, as there is with everything in SQL Server.  You can only delete if there is one table in the select statement in the CTE.  That limits the capabilities a little.  Hopefully, you are updating data more than you are deleting data, because deleting data is never good.  You cannot update multiple tables at one time.  If your CTE has multiple tables in the select you can only write updates back to columns in one of the tables.  For example, let’s pretend that a group of orders were entered in not accounting for a leap year.  So the order date needs updating.  The first thing we do is write that select to get the data.  Instead of re-writing everything for the update just wrap it into a CTE and then write the update.  It is only updating data from the Sales.Orders table so it will run successfully.

;WITH cte_Updates
	 AS ( SELECT xPer.FirstName + ' ' + xPer.LastName AS Name
				 , xOrd.OrderID
				 , xOrd.OrderDate
		  FROM   Person.Person xPer
		  JOIN   Sales.Orders xOrd ON xPer.PersonID = xOrd.PersonID
		  WHERE  xOrd.OrderDate >= '2/28/2012' )
UPDATE cte_Updates
SET    OrderDate = Dateadd(dd, 1, OrderDate)

These are really simple examples.  In the real world there would be a PersonID of some kind that could be used for a much simpler update for the first example.  (I would hope)  The CTE method really shines when you have multiple queries to get to the data that needs to be updated or deleted.  The single most compelling reason for updating/deleting via a CTE in my opinion is this:

Updating/deleting data directly through the CTE gives you a peace of mind because you can run the selects to see exactly what data you are inflicting change upon.  There is no guess work on what data you touch, if there is just re-run the select statements in the CTE to view the data.

That is the key.  Instead of writing select statements and finding the data.  Then trying to translate those select statements into an update.  You can take the select statements and wrap them up in CTE(s) and then update the CTE.  SO SIMPLE!