Cookie Cutter SSIS, Part 3 – The Audit Database

by Ellen 12. August 2010 23:52

I think it's safe to say that most (if not all) data warehousing experts recommend keeping track of ETL execution results in some way. I started working with custom audit tables back in the DTS days, starting with some code I stole from Professional SQL Server 2000 DTS by Chaffin, Knight and Robinson. Over time, I've morphed the audit database into a critical part of the Cookie Cutter SSIS methodology.

Our standard audit database contains the following schemas:

  • audit
  • metadata
  • etl
  • util
  • and the ever-popular dbo, of course

Our standard audit database contains the following tables:

audit.package_execution ETL execution information at the package grain. We capture the execution GUID to enable linking to the standard SSIS logging table, dbo.sysdtslog90.
audit.job ETL execution information at the table grain. Cookie Cutter SSIS is about table-centric processing. This table will probably cause most DBAs to run, screaming, from the room (I try to protect Serena, our ace database designer, from it as much as I can). It's set up to be as generic as possible - consequently, a number of the columns in any given row will likely be null if they don't apply to the particular table being processed.
audit.execution_event This table stores event-related data. A SQL task with the stored procedure that inserts the event record can be inserted in your package when an event handler fires.
audit.reject_rows This table works with the error database - it stores a daily summary of total error rows for every non-empty table in the error database. Originally we developed this for use in client audit dashboards to give a quick view of potential problem data.
audit.control_parameter Control parameters are the mechanism we use to expose changing, low-cardinality data outside of ETL code. We use the parameters to populate variables inside of the SSIS packages. By setting up a control parameter for a potentially changing data item, we avoid hard-coding values inside of SSIS packages (where they're harder to find) and allow our ETL to be data-driven and dynamic.
metadata.dim_role Data mart dimensions may play multiple roles in a given fact. The most notorious role-player is the date dimension (at one client site, the date dimension is used in over a hundred different ways.) This table is a way to keep track of dimension roles; however, it was originally developed to make IBM Cognos Framework Manager easier to update. Rick, our ace developer, wrote a tool to assist Kathy, our ace report designer, in keeping the Framework Manager model up to date with terminology for both technical and business users (the terminology is captured from data mart object extended properties). This table was created to facilitate renaming all the role-playing dimensions inside the Framework Manager model.
metadata.stage_extended_properties

I took the Kimball University class from Joy Mundy and Warren Thornethwaite based on their book, The Microsoft Data Warehouse Toolkit, back when the book was just published. One of the concepts they presented was using SQL Server extended properties to store metadata about data mart objects. I took this idea and ran with it, and it has been one of the most transformative concepts in all my data warehousing experience. (Thanks, Joy and Warren!) Extended properties will get their own post - this particular table is used in the process of applying extended properties to data mart objects.

…and the following stored procedures:

audit.pr_insert_audit_job Instantiate audit job record (job_key as int data type)
audit.pr_insert_audit_job_bigint Instantiate audit job record (when installation requires job_key of type bigint)
audit.pr_insert_execution_event Add an event handler record
audit.pr_insert_package_execution Instantiate package execution record (package_exec_key as int data type)
audit.pr_insert_package_execution_bigint Instantiate package execution record (when installation requires package_exec_key of type bigint)
audit.pr_update_audit_job Update audit job record (configured for int type job_key)
audit.pr_update_package_execution Update package execution record (configured for int type package_exec_key)
etl.pr_add_table_extended_properties Add extended property metadata to data mart objects (used with SSIS metadata update package)
etl.pr_get_row_count Get row count from a specified table
etl.pr_get_surrogate_key Get the next surrogate key from a specified table
etl.pr_truncate_table Truncate a specified table

I’ve attached a script to create the audit database and all of the above schemas, tables and procedures.  The script assumes an int type for the job_key and the package_exec_key, and you’ll need to amend the code for the database create statements to point to the correct path in your environment:

In the next post in this series, I’ll discuss the other standard databases and some utility functions.

Tags: , ,

Data Warehousing | Business Intelligence

64-bit SQL Server and iSeries ODBC

by Ellen 12. November 2009 00:31

This week, I started work on a data mart for a client whose source data resides on an AS400. The data access for the source system is handled via the iSeries ODBC driver from IBM. Getting the DSNs set up correctly to work with the client's 64-bit SQL Server installation was a little irritating, so I thought I'd share some lessons learned.

As you may already know, even though the database engine for 64-bit SQL Server is a 64-bit application (with a 64-bit dtexec executable for SSIS packages), the BIDS designer is a 32-bit application. This presents a challenge in package development and deployment when data connections either do not have a 64-bit driver (as in the case of Jet for Excel or Access) or when the 64-bit and 32-bit drivers exist, but require different executable versions (as with ODBC).

In this case, I needed to connect to the AS400 with the iSeries ODBC driver from within SSIS. The client had created the DSN for me and believed he had created a 32-bit DSN (since the driver description within the ODBC Administrator was labeled as 32-bit). However, the DSN did not appear in the DSN selection drop-down in the ODBC connection manager in SSIS.

The problem was that, regardless of what the driver was labeled, the DSN had to be created with the 32-bit ODBC Administrator in order for it to exist as a 32-bit DSN and become accessible to SSIS. Furthermore, in order for the package to run both in debug mode from within the designer (32-bit) and as a scheduled job in SQL Server Agent using the Integration Services step type (64-bit), you have to create DSNs with the identical name in both the 32-bit ODBC Administrator and the 64-bit Administrator.

So, in the Program menu for the iSeries access, you'll see a 64-bit and 32-bit ODBC Administrator:

 

iseries_menu

You need to create identical system DSNs using both Administration modules.  The DSN defined in the 32-bit Administrator will be exposed and utilized from with BIDS; the DSN defined in the 64-bit Administrator will be utilized by SQL Server Agent and 64-bit dtexec.

To add a little more excitement to your day, SSIS has no ODBC source adapter for the Data Flow task, and in SQL Server 2008, the Data Reader source adapter has been retired and replaced with the ADO.NET source adapter.  In order to use the iSeries DSN, you need to create your connection manager as an ADO.NET provider (.Net Providers\Odbc Data Provider) rather than as an ODBC connection manager type.

Tags: , , , ,

Business Intelligence | Data Warehousing

Using the Trash Destination Adapter for SSIS Housekeeping

by Ellen 23. October 2009 19:23

In an earlier post about the custom Row Number transformation, I mentioned the resources available from Konesans (www.konesans.com) via SQLIS (www.sqlis.com). The Trash Destination adapter is another one of my can't-live-without custom components, although I may use it a little bit differently than is described on its SQLIS page (http://www.sqlis.com/post/Trash-Destination-Adapter.aspx).

The description of the adapter on that page presents it as something you would use only in development and testing, as an option for breaking out the data flow to evaluate performance bottlenecks, for instance, or for creating a path to associate with a Data Viewer. The description even states:

"It is also obvious that this is for development or diagnostic purposes, and is clearly not a part of the functional design of the package."

Actually, I use this component in the majority of my SSIS packages.  In the ETL logic flow, it’s frequently necessary to evaluate data  and determine whether a particular row should continue in pipeline or be discarded.  The Trash Destination is a low-maintenance way to ensure that every row of data in the pipeline has a definite endpoint – even if that endpoint is the garbage.  It’s easily configurable.  Well, no configuration really required.  Can’t get much easier than that.

As an example, here’s a picture of a section of a data flow task that processes a type 1 slowly changing dimension: 

type_1_split_trash_destination

 

The Conditional Split evaluates whether the type 1 attributes have changed based on an upstream Checksum transformation.  If the data has sustained a change, the row is diverted to the type 1 change staging table.  If the row is unchanged, it’s discarded (diverted to the Trash Destination).  This appeals to my sense of closure – all those rows of data in the pipeline have a place to go.  But by inserting the Row Count Plus transformation (another freebie from Konesans/SQLIS), I can count the unchanged rows and store that information in the audit data for the ETL run, so I can squeeze a bit more use out of the Trash Destination branch.  It’s all good.

Using the Row Number Transform to Assign Surrogate Keys in the Pipeline

by Ellen 30. September 2009 01:28

I suspect that most of us who have been using SQL Server 2005/2008 SSIS for any length of time know what a great resource SQLIS is (if you don't know, check it out - www.sqlis.com). This site provides downloads of several custom SSIS components that are now integral parts of any Pcon SSIS project.

One of these can't-live-without components is the Row Number transform. This component, like my other must-haves, was developed by Konesans (www.konesans.com).

When I first started using SSIS, I used a Script transform, as described in Donald Farmer's The Rational Guide to Extending SSIS 2005 with Script, to assign surrogate keys to data mart dimensions and facts in the data pipeline. Although I was grateful to have this model at the time, I don't really like to use script components if I can avoid them. I've run into too many compile issues, especially with 64-bit environments. The Row Number transform delivers a much simpler way of assigning surrogate keys (simpler because I don't have to write any code…always a plus for me!).

The Row Number transform…numbers rows. Yeah, sounds obvious, right? The transform UI allows you to assign a seed value, an increment value, a target pipeline column to store the sequential numbers and presents an option to store the final number in a target variable:

row_number_ui 

 

The Number Column can be a pipeline column that already exists (I always create the surrogate_key column in a Derived Column transform), or you can choose to let the Row Number transform create a new column for you. 

In my standard table processing methodology, I retrieve the next surrogate key value for the table prior to launching the data flow task and store it in a variable.  I then use SSIS Expressions to assign this seed value to the Seed property of the Row Number transform.

As with all data flow transforms, you set properties for the Row Number transform at the data flow task grain.  Just click anywhere on the Data Flow design surface to be able to access the properties for the transform:

row_number_properties

Since I haven’t found a way yet to insert an expression without popping open the Property Expression Editor window (by clicking the ellipsis next to Expressions – but you knew that already…), I open the editor and locate the Seed property for the Row Number transform (I’ve named it rnAssignSurrogateKeys in the screen shots):

row_number_expression_zero

There’s an issue with the way the custom transform interacts with the Property Expression Editor that sometimes results in an error when you try to directly assign the Seed property to a variable – regardless of the data type of your variable, the editor chokes and thinks you’re trying to assign a decimal value to a property that only accepts integers.  On the SQLIS page for the Row Number transform, the Troubleshooting section describes this as a known issue, and shows how to manually edit the .dtsx code to fix the issue.  Since I don’t want to mess with the code directly (no coding for me – see above), I prefer to fool the editor.  By assigning a zero to the Seed property as shown, you can successfully exit the editor.  Then, in the Properties pane, you can change the expression value to your variable as shown in the Properties screen shot above.

This only appears to be an issue the first time you try to assign a value to the property.  Once you’ve changed the value to a variable in the Properties pane, you can open and close the editor with no further errors – it will present the variable as the expression value:

row_number_expression_w_variable

Downstream of the Row Number transform, you can then assign the output column to the surrogate key column for your table.  Piece of cake.

Powered by BlogEngine.NET 1.5.0.7
Theme by Perkins Consulting Content Copyright 2009 Perkins Consulting, LLC All rights reserved.