Here’s the scenario… You have a RAW file which contains data from many files.
In the subsequent dataflow you need to perform a lookup against a large reference table, however you want just a subset that reflects the period contained within your RAW file.
Question 1, how do you find out the earliest date used within your RAW file data? And question 2, how do you write it to a variable so that you can use it in the subsequent data flow?
Here is the control flow for the scenario, I have given earlier…
Question 1 – How do you find out the earliest date?
The answer to this, I’m sure many would already be aware of. SSIS provides an aggregate component that gives you the ability to perform certain grouping functions dependant on datatype. A datetime column datatype has 5 functions. COUNT, COUNT DISTINCT, GROUP BY, MINIMUM, and MAXIMUM.
If while we are looping through our flat files, we pass the pipeline through a multicast component, we can create 2 outputs. We send one output to be appended to our RAW file, and the other into the aggregate component. We only require one column (the date column) as we want to aggregate all records in the file. If we select the MINIMUM operation, then a single row will be output containing the earliest date given by this set of records. Here is the dataflow so far…
Question 2 – How do we write our value to a SSIS variable?
So our next task requires two pieces of logic. 1) To decide whether this value is less than the value assigned during previous iterations of the loop; and 2) To assign a new value to it if it is.
The only way to write to a package variable within a dataflow (unless you simply want a rowcount for which you can use the rowcount component) is to use a transformation script component. Lets first look at how we need to configure it.
If we drop a script component on to the canvas, the first choice we are presented with is what type of component we are going to code.
We select the destination component. Now, we drag the pipeline from the aggregate component onto the script component on the canvas. It is now possible for us to start configuring.
Check the single date column.
Then, we need to declare the variable that we will be writing to. In this case dMinDate (highlighted). It is important to declare it here otherwise the access method we use in the script won’t work.
Now for the script itself. Firstly, we declare the variable MyMinDate (Highlighted with blue circle in the image below) outside of the ProcessInputRow() method. This is so that it it scoped in a way that allows us to use it in the PostExecute() method.
The next part of the code we are going to add to the ProcessInputRow() method. This is the code that will run for each record. Here we want to assign the incoming record’s value to MyMinDate. Note that because we know that there will only be one record in the pipeline, no further code here is required. Also note that when you type “row.”, Intellisense provides you with the available methods and properties (highlighted with an orange circle in the image above), one of which is the input column that you checked earlier.
Once we have the value, we need to decide whether this value should be assigned to our SSIS variable. As stated earlier, the logic we require is that we only change the value of “dMinDate” if we have found a new lower value in this iteration of the loop. We represent this logic in the post execute method with a simple IF statement. If our new date (MyMinDate) is less than the date held in the SSIS variable ( Me.Variables.dMinDate ), then assign the value of MyMinDate to it.
So, having populated this variable with the lowest date found in any of the files that we loaded into the RAW file, we can now transform and load it. And our very large reference table can be limited to a time period to speed up the process.