Many of you will have heard the mantra that loops are bad, set based is good. But how do you get around them?
The fact of the matter is that there are very few circumstances where a loop is the only way to achieve your objective. A look through the SQL forums will show you many a technique to turn iterative row by row processing into single transaction set based processing. I’m refering to hierarchical query techniques (see nested set theory) and of course the subject of this post, the numbers table.
A common way to build a date dimension is to run a procedure that iterates from a certain start date and inserts one date record at a time into the table. This probably works quite well if you are only processing one day at a time using incremental loads, however, if you’re rebuilding your dimension everyday and there are dates ranging from 30 years ago to 5 years from now, you will be talking about a fairly CPU intensive operation. The same goes for the time dimension…
Here’s a typical example of this (I have picked an extreme range of dates to help illustrate the performance gain):-
On my local machine the above query takes around 6 seconds even after the first execution. So, in order to return the same results as above in a set based approach, the first thing I need to do is create a numbers table.
There are other ways to populate a numbers table, using loops or the row_number() function, but this is by far the fastest.
Having populated our table we can now execute the equivalent code to the loop above.
On my local machine this runs in 2 seconds. That’s quite an improvement. Obviously there are several factors here that will change the performance even further. We are performing a simple insert into an in-memory table which in the real world would not be the case. Also, our numbers table is a million records long. If we made this smaller to fit our purpose, then it could be even quicker still.
Lastly, the numbers table is a fantastic tool for many queries that you might think were only possible with a cursor or a loop. Give it a try and watch how your performance improves.