Cookie Cutter SSIS, Part 3 – The Audit Database

by Ellen 12. August 2010 23:52

I think it's safe to say that most (if not all) data warehousing experts recommend keeping track of ETL execution results in some way. I started working with custom audit tables back in the DTS days, starting with some code I stole from Professional SQL Server 2000 DTS by Chaffin, Knight and Robinson. Over time, I've morphed the audit database into a critical part of the Cookie Cutter SSIS methodology.

Our standard audit database contains the following schemas:

  • audit
  • metadata
  • etl
  • util
  • and the ever-popular dbo, of course

Our standard audit database contains the following tables:

audit.package_execution ETL execution information at the package grain. We capture the execution GUID to enable linking to the standard SSIS logging table, dbo.sysdtslog90.
audit.job ETL execution information at the table grain. Cookie Cutter SSIS is about table-centric processing. This table will probably cause most DBAs to run, screaming, from the room (I try to protect Serena, our ace database designer, from it as much as I can). It's set up to be as generic as possible - consequently, a number of the columns in any given row will likely be null if they don't apply to the particular table being processed.
audit.execution_event This table stores event-related data. A SQL task with the stored procedure that inserts the event record can be inserted in your package when an event handler fires.
audit.reject_rows This table works with the error database - it stores a daily summary of total error rows for every non-empty table in the error database. Originally we developed this for use in client audit dashboards to give a quick view of potential problem data.
audit.control_parameter Control parameters are the mechanism we use to expose changing, low-cardinality data outside of ETL code. We use the parameters to populate variables inside of the SSIS packages. By setting up a control parameter for a potentially changing data item, we avoid hard-coding values inside of SSIS packages (where they're harder to find) and allow our ETL to be data-driven and dynamic.
metadata.dim_role Data mart dimensions may play multiple roles in a given fact. The most notorious role-player is the date dimension (at one client site, the date dimension is used in over a hundred different ways.) This table is a way to keep track of dimension roles; however, it was originally developed to make IBM Cognos Framework Manager easier to update. Rick, our ace developer, wrote a tool to assist Kathy, our ace report designer, in keeping the Framework Manager model up to date with terminology for both technical and business users (the terminology is captured from data mart object extended properties). This table was created to facilitate renaming all the role-playing dimensions inside the Framework Manager model.
metadata.stage_extended_properties

I took the Kimball University class from Joy Mundy and Warren Thornethwaite based on their book, The Microsoft Data Warehouse Toolkit, back when the book was just published. One of the concepts they presented was using SQL Server extended properties to store metadata about data mart objects. I took this idea and ran with it, and it has been one of the most transformative concepts in all my data warehousing experience. (Thanks, Joy and Warren!) Extended properties will get their own post - this particular table is used in the process of applying extended properties to data mart objects.

…and the following stored procedures:

audit.pr_insert_audit_job Instantiate audit job record (job_key as int data type)
audit.pr_insert_audit_job_bigint Instantiate audit job record (when installation requires job_key of type bigint)
audit.pr_insert_execution_event Add an event handler record
audit.pr_insert_package_execution Instantiate package execution record (package_exec_key as int data type)
audit.pr_insert_package_execution_bigint Instantiate package execution record (when installation requires package_exec_key of type bigint)
audit.pr_update_audit_job Update audit job record (configured for int type job_key)
audit.pr_update_package_execution Update package execution record (configured for int type package_exec_key)
etl.pr_add_table_extended_properties Add extended property metadata to data mart objects (used with SSIS metadata update package)
etl.pr_get_row_count Get row count from a specified table
etl.pr_get_surrogate_key Get the next surrogate key from a specified table
etl.pr_truncate_table Truncate a specified table

I’ve attached a script to create the audit database and all of the above schemas, tables and procedures.  The script assumes an int type for the job_key and the package_exec_key, and you’ll need to amend the code for the database create statements to point to the correct path in your environment:

In the next post in this series, I’ll discuss the other standard databases and some utility functions.

Tags: , ,

Data Warehousing | Business Intelligence

Powered by BlogEngine.NET 1.5.0.7
Theme by Perkins Consulting Content Copyright 2009 Perkins Consulting, LLC All rights reserved.