I recently had a problem with a new source I was asked to load into the data warehouse. I’m sure its quite familiar to many…
A 3rd party report that was scheduled to run, had no obvious way of determining its date.
Its file name had no convention & changed at random.
The number of header rows also changed at random.
One field was a page title which often contained double quotes, but not always an even number.
After much tinkering with the original file trying to write a script that would count the number of header rows and use that value each time to set the property of the connection manager at run time, I realised that I was wasting my time. There seemed to be so many variables that I could never guarantee a successful automated load.
The answer for me was to recreate a loader file of my own that never changed its connection manager properties.
I achieved this by creating a script component task which read each row:-
Blank rows were not written to the new file. (removed the problem with variable header rows).
Header rows were not written to the new file, however they were read so that if they contained useful meta data (report date, report name), this could be stored in variables.
Each line was passed through a small quote cleaning function in an attempt to remove any double quotes that didn’t have a pair.
Finally the now valid rows were written to the new loader file.
Doing this brought with it the obvious advantage of predictability. It now didn’t matter how many header rows, new line characters, quotation marks were in the report. The loader file was always in the format I expected.
This entry was posted
on Thursday, December 14th, 2006 at 5:00 pm and is filed under Nuggets, SSIS, SQL Server 2005.
You can follow any responses to this entry through the RSS 2.0 feed.
You can leave a response, or trackback from your own site.