Using the Row Number Transform to Assign Surrogate Keys in the Pipeline

by Ellen 30. September 2009 01:28

I suspect that most of us who have been using SQL Server 2005/2008 SSIS for any length of time know what a great resource SQLIS is (if you don't know, check it out - www.sqlis.com). This site provides downloads of several custom SSIS components that are now integral parts of any Pcon SSIS project.

One of these can't-live-without components is the Row Number transform. This component, like my other must-haves, was developed by Konesans (www.konesans.com).

When I first started using SSIS, I used a Script transform, as described in Donald Farmer's The Rational Guide to Extending SSIS 2005 with Script, to assign surrogate keys to data mart dimensions and facts in the data pipeline. Although I was grateful to have this model at the time, I don't really like to use script components if I can avoid them. I've run into too many compile issues, especially with 64-bit environments. The Row Number transform delivers a much simpler way of assigning surrogate keys (simpler because I don't have to write any code…always a plus for me!).

The Row Number transform…numbers rows. Yeah, sounds obvious, right? The transform UI allows you to assign a seed value, an increment value, a target pipeline column to store the sequential numbers and presents an option to store the final number in a target variable:

row_number_ui 

 

The Number Column can be a pipeline column that already exists (I always create the surrogate_key column in a Derived Column transform), or you can choose to let the Row Number transform create a new column for you. 

In my standard table processing methodology, I retrieve the next surrogate key value for the table prior to launching the data flow task and store it in a variable.  I then use SSIS Expressions to assign this seed value to the Seed property of the Row Number transform.

As with all data flow transforms, you set properties for the Row Number transform at the data flow task grain.  Just click anywhere on the Data Flow design surface to be able to access the properties for the transform:

row_number_properties

Since I haven’t found a way yet to insert an expression without popping open the Property Expression Editor window (by clicking the ellipsis next to Expressions – but you knew that already…), I open the editor and locate the Seed property for the Row Number transform (I’ve named it rnAssignSurrogateKeys in the screen shots):

row_number_expression_zero

There’s an issue with the way the custom transform interacts with the Property Expression Editor that sometimes results in an error when you try to directly assign the Seed property to a variable – regardless of the data type of your variable, the editor chokes and thinks you’re trying to assign a decimal value to a property that only accepts integers.  On the SQLIS page for the Row Number transform, the Troubleshooting section describes this as a known issue, and shows how to manually edit the .dtsx code to fix the issue.  Since I don’t want to mess with the code directly (no coding for me – see above), I prefer to fool the editor.  By assigning a zero to the Seed property as shown, you can successfully exit the editor.  Then, in the Properties pane, you can change the expression value to your variable as shown in the Properties screen shot above.

This only appears to be an issue the first time you try to assign a value to the property.  Once you’ve changed the value to a variable in the Properties pane, you can open and close the editor with no further errors – it will present the variable as the expression value:

row_number_expression_w_variable

Downstream of the Row Number transform, you can then assign the output column to the surrogate key column for your table.  Piece of cake.

Powered by BlogEngine.NET 1.5.0.7
Theme by Perkins Consulting Content Copyright 2009 Perkins Consulting, LLC All rights reserved.