Archive for the 'Datawarehouse' Category
Friday, July 24th, 2009
I was reading a post by BI Monkey and found myself in agreement with what he says. It also got me thinking about the wider implications of the problem of not “helping you make better decisions”.
I’d add to BI Monkey’s question with the specific question I always ask business analysts as they begin to list out the attributes they want added to a dimension in a requirements meeting. “What question does this answer?”. It’s a BI specialist’s responsibility not just to resolve the technical aspects of the requirements, but also to help steer the requirements so that something useful is delivered at the end.
I think this is one of the reasons why support from the business can fade. And without champion users in the wider business, it’s very likely that a BI implementation will be paralysed. People don’t understand the information provided and so don’t use it. Why keep funding the project if nobody uses it? Or the last (over-scoped) project took so long that the team isn’t trusted to take on new projects. Here are some scenarios…

(more…)
Posted in Datawarehouse | 1 Comment »
Tuesday, December 16th, 2008
I don’t know about you, but one of the things that drives me to distraction when writing SSIS packages is being forced to use excel as a data source. I’d like to qualify that statement by adding that it’s the use of excel in the face of all reason, to do the job that other software would be better for. Unfortunately though, corporations and people being what they are, they are a fact of life for an ETL developer.

(more…)
Posted in Datawarehouse, Methodology, Jet 4.0, Excel, VBA | No Comments »
Thursday, December 11th, 2008
Here’s the scenario… You have a RAW file which contains data from many files.
In the subsequent dataflow you need to perform a lookup against a large reference table, however you want just a subset that reflects the period contained within your RAW file.
Question 1, how do you find out the earliest date used within your RAW file data? And question 2, how do you write it to a variable so that you can use it in the subsequent data flow?

(more…)
Posted in Nuggets, SSIS, SQL Server 2005, Datawarehouse, Performance, Methodology, Scripting | No Comments »
Thursday, September 11th, 2008
Many of you will have heard the mantra that loops are bad, set based is good. But how do you get around them?
The fact of the matter is that there are very few circumstances where a loop is the only way to achieve your objective. A look through the SQL forums will show you many a technique to turn iterative row by row processing into single transaction set based processing. I’m refering to hierarchical query techniques (see nested set theory) and of course the subject of this post, the numbers table.

(more…)
Posted in Nuggets, SSIS, SQL Server DBA, SQL Server 2005, Datawarehouse, Performance, Methodology, T-SQL | No Comments »
Wednesday, July 16th, 2008
I talked in a previous post about the possibility of using the format command and bulk insert task inside of a foreach loop in order to load all your tables using a single package… I don’t much like the method as it means you have to run each table load in series, and you’re not taking advantage of SSIS (high speed dataflow task, parallelism and eliminating staging with a single pass transformation).

(more…)
Posted in Nuggets, SSIS, SQL Server 2005, Datawarehouse, Methodology | 5 Comments »
Wednesday, May 14th, 2008
Some of the companies I’ve worked for have had reporting contracts with smaller 3rd party providers. Campaign or website analysis provided by them would have to be loaded to the datawarehouse
Security policy dissallowed downloading the files over FTP, because credentials are passed in clear text, and our 3rd party report providers didn’t have a sFTP facility, so in this scenario what other options are available?

(more…)
Posted in Nuggets, SSIS, SQL Server 2005, Datawarehouse, Methodology, Scripting | 2 Comments »
Monday, May 12th, 2008
ExecutionInstanceGUID is a system variable of type string. If you are exporting this variable to a package management, custom logging or auditing table ensure that you have the correct mapping in your execute SQL task. It is a string variable and not a GUID (the same applies to other variables whose type may not be the same as the value they represent).

(more…)
Posted in Nuggets, SSIS, SQL Server 2005, Datawarehouse, Methodology | No Comments »
Wednesday, February 20th, 2008
The quick answer to this is that you can’t… at least not in a dataflow. Whilst you can dynamically change environments and databases, the column metadata has to remain the same. It’s therefore fairly limited as to what you can load “on the fly”.
There is a work around however for those of you who wish to be able to create a looping mechanism in a single package with a single task to load the data.
Amazon Link: SQL Server 2000 Fast Answers for DBAs and Developers, Signature Edition
(more…)
Posted in Nuggets, SSIS, SQL Server 2005, Datawarehouse, Methodology | 4 Comments »
Thursday, December 27th, 2007
After some prompting, I decided it was about time I posted the solution to my initial post on the subject of metadata driven SSIS.
If you have read part one of this series (or other posts related to replication and sql agent), it will come as no surprise to you to know that I am a big proponent of metadata driven control in SQL Server. Whilst more complex to code, it is infinately easier to manage across environments.
I will add further explanations to this package at a later date, but for now, here is the solution and a very brief summary of what it does.

(more…)
Posted in Nuggets, SSIS, SQL Server 2005, Datawarehouse, Methodology | 2 Comments »
Wednesday, October 10th, 2007
I can’t stress enough the importance of designing the framework for your ETL prior to getting stuck in with individual packages.
I bring this up, because I have seen that look of dread on a developers face (including my own) when they are about to deploy a package to the live environment and aren’t too sure what gremlins are hiding in the live environment that don’t exist in development. Not only this, with the proliferation of SSIS packages across the corporation, it can become extremely difficult to figure out and manage the process flow across all your servers.
It is for this reason, that I am a big fan of metadata controlled ETL. And I’m not just referring to a configuration table set up as part of an SSIS package, I’m talking about modelling a set of tables within a control database that can hold all the data that a package might need to run. I’m also talking about a generic parent package that can use this meta data to control the execution of any packages defined within the tables.

(more…)
Posted in SSIS, SQL Server 2005, Datawarehouse, Methodology | 3 Comments »