Archive for the 'Datawarehouse' Category

What question does this answer?

Friday, July 24th, 2009

I was reading a post by BI Monkey and found myself in agreement with what he says. It also got me thinking about the wider implications of the problem of not “helping you make better decisions”.

I’d add to BI Monkey’s question with the specific question I always ask business analysts as they begin to list out the attributes they want added to a dimension in a requirements meeting. “What question does this answer?”. It’s a BI specialist’s responsibility not just to resolve the technical aspects of the requirements, but also to help steer the requirements so that something useful is delivered at the end.

I think this is one of the reasons why support from the business can fade. And without champion users in the wider business, it’s very likely that a BI implementation will be paralysed. People don’t understand the information provided and so don’t use it. Why keep funding the project if nobody uses it? Or the last (over-scoped) project took so long that the team isn’t trusted to take on new projects. Here are some scenarios…

Professional Microsoft SQL Server 2008 Reporting Services (Wrox Programmer to Programmer)

(more…)

Excel VBA to prevent user generated errors in SSIS

Tuesday, December 16th, 2008

I don’t know about you, but one of the things that drives me to distraction when writing SSIS packages is being forced to use excel as a data source. I’d like to qualify that statement by adding that it’s the use of excel in the face of all reason, to do the job that other software would be better for. Unfortunately though, corporations and people being what they are, they are a fact of life for an ETL developer.

Excel 2007 Power Programming with VBA (Mr. Spreadsheets Bookshelf)

(more…)

SSIS - Writing to a package variable in a dataflow

Thursday, December 11th, 2008

Here’s the scenario… You have a RAW file which contains data from many files.

In the subsequent dataflow you need to perform a lookup against a large reference table, however you want just a subset that reflects the period contained within your RAW file.

Question 1, how do you find out the earliest date used within your RAW file data? And question 2, how do you write it to a variable so that you can use it in the subsequent data flow?

Programming Microsoft SQL Server 2008 (PRO-Developer)

(more…)

A practical use of a numbers table.

Thursday, September 11th, 2008

Many of you will have heard the mantra that loops are bad, set based is good. But how do you get around them?

The fact of the matter is that there are very few circumstances where a loop is the only way to achieve your objective. A look through the SQL forums will show you many a technique to turn iterative row by row processing into single transaction set based processing. I’m refering to hierarchical query techniques (see nested set theory) and of course the subject of this post, the numbers table.

Hauppauge Wireless Media MVP - Watch PC based media on your TV set!

(more…)

Dynamic table loading in SSIS (Part 2)

Wednesday, July 16th, 2008

I talked in a previous post about the possibility of using the format command and bulk insert task inside of a foreach loop in order to load all your tables using a single package… I don’t much like the method as it means you have to run each table load in series, and you’re not taking advantage of SSIS (high speed dataflow task, parallelism and eliminating staging with a single pass transformation).

SQL Server Developer Edition 2005 Win32/X64/IA64 English CD/DVD

(more…)

SSIS - download a file from a website

Wednesday, May 14th, 2008

Some of the companies I’ve worked for have had reporting contracts with smaller 3rd party providers. Campaign or website analysis provided by them would have to be loaded to the datawarehouse

Security policy dissallowed downloading the files over FTP, because credentials are passed in clear text, and our 3rd party report providers didn’t have a sFTP facility, so in this scenario what other options are available?

Microsoft Office 2007 SharePoint Designer Edition (PC)

(more…)

SSIS system variable - ExecutionInstanceGUID

Monday, May 12th, 2008

ExecutionInstanceGUID is a system variable of type string. If you are exporting this variable to a package management, custom logging or auditing table ensure that you have the correct mapping in your execute SQL task. It is a string variable and not a GUID (the same applies to other variables whose type may not be the same as the value they represent).

SQL Server 2005 Reporting Services Custom Controls

(more…)

Dynamic table loading in SSIS (Part 1)

Wednesday, February 20th, 2008

The quick answer to this is that you can’t… at least not in a dataflow. Whilst you can dynamically change environments and databases, the column metadata has to remain the same. It’s therefore fairly limited as to what you can load “on the fly”.

There is a work around however for those of you who wish to be able to create a looping mechanism in a single package with a single task to load the data. 

Amazon Link: SQL Server 2000 Fast Answers for DBAs and Developers, Signature Edition

(more…)

SSIS ETL Control (Part 2)

Thursday, December 27th, 2007

After some prompting, I decided it was about time I posted the solution to my initial post on the subject of metadata driven SSIS.

If you have read part one of this series (or other posts related to replication and sql agent), it will come as no surprise to you to know that I am a big proponent of metadata driven control in SQL Server. Whilst more complex to code, it is infinately easier to manage across environments.

I will add further explanations to this package at a later date, but for now, here is the solution and a very brief summary of what it does.

Business Metadata: Capturing Enterprise Knowledge

(more…)

SSIS ETL Control (Part 1)

Wednesday, October 10th, 2007

I can’t stress enough the importance of designing the framework for your ETL prior to getting stuck in with individual packages.

I bring this up, because I have seen that look of dread on a developers face (including my own) when they are about to deploy a package to the live environment and aren’t too sure what gremlins are hiding in the live environment that don’t exist in development. Not only this, with the proliferation of SSIS packages across the corporation, it can become extremely difficult to figure out and manage the process flow across all your servers.

It is for this reason, that I am a big fan of metadata controlled ETL. And I’m not just referring to a configuration table set up as part of an SSIS package, I’m talking about modelling a set of tables within a control database that can hold all the data that a package might need to run. I’m also talking about a generic parent package that can use this meta data to control the execution of any packages defined within the tables. 

The Data Warehouse  ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data

(more…)