I was reading a post by BI Monkey and found myself in agreement with what he says. It also got me thinking about the wider implications of the problem of not “helping you make better decisions”.
I’d add to BI Monkey’s question with the specific question I always ask business analysts as they begin to list out the attributes they want added to a dimension in a requirements meeting. “What question does this answer?”. It’s a BI specialist’s responsibility not just to resolve the technical aspects of the requirements, but also to help steer the requirements so that something useful is delivered at the end.
I think this is one of the reasons why support from the business can fade. And without champion users in the wider business, it’s very likely that a BI implementation will be paralysed. People don’t understand the information provided and so don’t use it. Why keep funding the project if nobody uses it? Or the last (over-scoped) project took so long that the team isn’t trusted to take on new projects. Here are some scenarios…
Professional Microsoft SQL Server 2008 Reporting Services (Wrox Programmer to Programmer)”
Continue reading “What question does this answer?”
The script task in SSIS was a fantastic improvement on DTS Active X script. The reasons for this in my view are 2 fold:-
Firstly, by allowing the user to code in Visual Studio it is far easier to develop and debug. Because the developer can make use of intellisense & object explorer, it means even the casual programmer can read & write code.
- Secondly, because we can use it to access other external .net assemblies, it provides functionality for logic that can’t be achieved using the standard SSIS components.
These are both reasons in themselves that make it a compelling choice when deciding how to develop your package logic.
However, I have noticed at both client sites and on the forums, that there are those (the script task junkies!) that use the script task in almost any scenario when it is just not necessary. It makes maintenance and support tricky at best and sometimes impossible…
Continue reading “SSIS – The script task junkie!”
Here’s the scenario… You have a RAW file which contains data from many files.
In the subsequent dataflow you need to perform a lookup against a large reference table, however you want just a subset that reflects the period contained within your RAW file.
Question 1, how do you find out the earliest date used within your RAW file data? And question 2, how do you write it to a variable so that you can use it in the subsequent data flow?
Continue reading “SSIS – Writing to a package variable in a dataflow”
Many of you will have heard the mantra that loops are bad, set based is good. But how do you get around them?
The fact of the matter is that there are very few circumstances where a loop is the only way to achieve your objective. A look through the SQL forums will show you many a technique to turn iterative row by row processing into single transaction set based processing. I’m refering to hierarchical query techniques (see nested set theory) and of course the subject of this post, the numbers table.
Continue reading “A practical use of a numbers table.”
I talked in a previous post about the possibility of using the format command and bulk insert task inside of a foreach loop in order to load all your tables using a single package… I don’t much like the method as it means you have to run each table load in series, and you’re not taking advantage of SSIS (high speed dataflow task, parallelism and eliminating staging with a single pass transformation).
Continue reading “Dynamic table loading in SSIS (Part 2)”
A very frequent requirement in a database, especially in an OLTP system is to be able to capture the identity and values of the last inserted record in a table so that we can use it to populate the foreign key in a child table and any other tables that require it.
It is easy enough to capture the last inserted identity if we are only dealing one record at a time using
@@IDENTITY, but what happens when we need to do batch inserts? Well, the old way of doing things would mean you had the following options:
- Execute a loop (either a cursor or a WHILE statement), and for each iteration we’d write a record in each of the child tables
- You’d have to select the same data twice, once to insert the audit records and once to update the data
- You’d rely on a trigger to write to the audit table
Well that should now not be necessary as using the OUTPUT clause we can capture all the data we’ve inserted, updated or deleted in a temporary area for use later on, or say in an auditing scenario, we can output the data directly to the auditing table. More importantly we can use that data in a set based transaction instead of row based. The advantages here are obvious.
Amazon Link: Sams Teach Yourself Microsoft SQL Server T-SQL in 10 Minutes
Continue reading “OUTPUT clause – the SQL developers panacea.”
After some prompting, I decided it was about time I posted the solution to my initial post on the subject of metadata driven SSIS.
If you have read part one of this series (or other posts related to replication and sql agent), it will come as no surprise to you to know that I am a big proponent of metadata driven control in SQL Server. Whilst more complex to code, it is infinately easier to manage across environments.
I will add further explanations to this package at a later date, but for now, here is the solution and a very brief summary of what it does.
Amazon Link: Business Metadata: Capturing Enterprise Knowledge
Continue reading “SSIS ETL Control (Part 2)”
I can’t stress enough the importance of designing the framework for your ETL prior to getting stuck in with individual packages.
I bring this up, because I have seen that look of dread on a developers face (including my own) when they are about to deploy a package to the live environment and aren’t too sure what gremlins are hiding in the live environment that don’t exist in development. Not only this, with the proliferation of SSIS packages across the corporation, it can become extremely difficult to figure out and manage the process flow across all your servers.
It is for this reason, that I am a big fan of metadata controlled ETL. And I’m not just referring to a configuration table set up as part of an SSIS package, I’m talking about modelling a set of tables within a control database that can hold all the data that a package might need to run. I’m also talking about a generic parent package that can use this meta data to control the execution of any packages defined within the tables.
Amazon Link: The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data
Continue reading “SSIS ETL Control (Part 1)”
From time to time these days I have to put my DBA hat back on and review database schema designed by those whose focus is on application development. What I often find is that attempts to improve performance through indexing have been done in vain as the query optimizer either doesn’t use them enough to merit their creation or they have been created on the wrong columns.
Frequently, optimisation is referred to as a dark art, but here I am going to attempt to give some basic pointers to help enighten the novice database designer…
Amazon Link: High Performance SQL Server DBA: Tuning & Optimization Secrets: Tuning and Optimization Secrets (IT In-Focus)
Continue reading “Indexing for Dummies”