I don’t know about you, but one of the things that drives me to distraction when writing SSIS packages is being forced to use excel as a data source. I’d like to qualify that statement by adding that it’s the use of excel in the face of all reason, to do the job that other software would be better for. Unfortunately though, corporations and people being what they are, they are a fact of life for an ETL developer.
So given the prevalence of Excel in your ETL packages, how do you mitigate the risk of it causing your processes to fail. There are various steps you can take – restricting access to the workbooks, inserting dummy rows, protecting cells – however the central problem is still there. Excel doesn’t have any strict datatypes and so users are able to input what they like.
A method I’ve found for reducing these problems is to use my own VBA validation routines. Where my ETL requires certain keywords, country codes, numeric values etc, I have found it helpful to undo the action of the user and give them a message to explain the problem if their input doesn’t meet the criteria the ETL requires. It’s probably quite annoying for them for a while, however, its much less annoying for me as I’m not spending vast amounts of time debugging last nights load.
I’ve attached an example spreadsheet, which you are free to have a look at. (If you are worried about the code, disable the macros first and have a look at it. To see the VBA code key [ALT] + F11).
We have a spreadsheet containing records related to sales people for our (very small & pretend) organisation. On a daily basis this is updated by our HR manager who can add new records; update or delete existing ones; and input their own records on the spreadsheet without them being loaded to the production database.
So in order to ensure that the HR Manager provides our ETL with the data we expect, we have added 3 validation routines which perform the following checks:-
- In the action column we only allow the keywords “ADD”, “UPDATE”, “DELETE”, “LIVE”, “REMOVED” (or it can be left blank).
- We only operate in 4 countries and so we need to ensure that only country codes we recognise in our lookup table are used.
- Finally, once a record is LIVE, the only column we will allow to be changed is the date of birth. This is because we are going to use CountryCode, FirstName & LastName as our key. (Great DB modelling happening at this organisation).
Here is what happens if I try to put “Insert” as the action instead of “ADD”.
Now, if I try to alter one of the columns after its been marked DELETE. (Note that if we want our ETL to delete this record in our table, we would need to ensure that CountryCode, FirstName & LastName remain the same).
Obviously these are very specific business rules, but there’s no reason why you couldn’t adapt the code to your own scenario. Open it up, play with it and see what you think.
I’d be interested to hear from any VBA experts out there who could improve on these routines.
Click to download the spreadsheet.