Cookie Cutter SSIS, Part 3 – The Audit Database

by Ellen 12. August 2010 23:52

I think it's safe to say that most (if not all) data warehousing experts recommend keeping track of ETL execution results in some way. I started working with custom audit tables back in the DTS days, starting with some code I stole from Professional SQL Server 2000 DTS by Chaffin, Knight and Robinson. Over time, I've morphed the audit database into a critical part of the Cookie Cutter SSIS methodology.

Our standard audit database contains the following schemas:

  • audit
  • metadata
  • etl
  • util
  • and the ever-popular dbo, of course

Our standard audit database contains the following tables:

audit.package_execution ETL execution information at the package grain. We capture the execution GUID to enable linking to the standard SSIS logging table, dbo.sysdtslog90.
audit.job ETL execution information at the table grain. Cookie Cutter SSIS is about table-centric processing. This table will probably cause most DBAs to run, screaming, from the room (I try to protect Serena, our ace database designer, from it as much as I can). It's set up to be as generic as possible - consequently, a number of the columns in any given row will likely be null if they don't apply to the particular table being processed.
audit.execution_event This table stores event-related data. A SQL task with the stored procedure that inserts the event record can be inserted in your package when an event handler fires.
audit.reject_rows This table works with the error database - it stores a daily summary of total error rows for every non-empty table in the error database. Originally we developed this for use in client audit dashboards to give a quick view of potential problem data.
audit.control_parameter Control parameters are the mechanism we use to expose changing, low-cardinality data outside of ETL code. We use the parameters to populate variables inside of the SSIS packages. By setting up a control parameter for a potentially changing data item, we avoid hard-coding values inside of SSIS packages (where they're harder to find) and allow our ETL to be data-driven and dynamic.
metadata.dim_role Data mart dimensions may play multiple roles in a given fact. The most notorious role-player is the date dimension (at one client site, the date dimension is used in over a hundred different ways.) This table is a way to keep track of dimension roles; however, it was originally developed to make IBM Cognos Framework Manager easier to update. Rick, our ace developer, wrote a tool to assist Kathy, our ace report designer, in keeping the Framework Manager model up to date with terminology for both technical and business users (the terminology is captured from data mart object extended properties). This table was created to facilitate renaming all the role-playing dimensions inside the Framework Manager model.
metadata.stage_extended_properties

I took the Kimball University class from Joy Mundy and Warren Thornethwaite based on their book, The Microsoft Data Warehouse Toolkit, back when the book was just published. One of the concepts they presented was using SQL Server extended properties to store metadata about data mart objects. I took this idea and ran with it, and it has been one of the most transformative concepts in all my data warehousing experience. (Thanks, Joy and Warren!) Extended properties will get their own post - this particular table is used in the process of applying extended properties to data mart objects.

…and the following stored procedures:

audit.pr_insert_audit_job Instantiate audit job record (job_key as int data type)
audit.pr_insert_audit_job_bigint Instantiate audit job record (when installation requires job_key of type bigint)
audit.pr_insert_execution_event Add an event handler record
audit.pr_insert_package_execution Instantiate package execution record (package_exec_key as int data type)
audit.pr_insert_package_execution_bigint Instantiate package execution record (when installation requires package_exec_key of type bigint)
audit.pr_update_audit_job Update audit job record (configured for int type job_key)
audit.pr_update_package_execution Update package execution record (configured for int type package_exec_key)
etl.pr_add_table_extended_properties Add extended property metadata to data mart objects (used with SSIS metadata update package)
etl.pr_get_row_count Get row count from a specified table
etl.pr_get_surrogate_key Get the next surrogate key from a specified table
etl.pr_truncate_table Truncate a specified table

I’ve attached a script to create the audit database and all of the above schemas, tables and procedures.  The script assumes an int type for the job_key and the package_exec_key, and you’ll need to amend the code for the database create statements to point to the correct path in your environment:

In the next post in this series, I’ll discuss the other standard databases and some utility functions.

Tags: , ,

Data Warehousing | Business Intelligence

Managing IBM Cognos Transformer using MDL, Part 2: Model Building Blocks

by Ellen 24. February 2010 19:38

In the first post in this series, http://blogs.perkinsconsulting.com/post/Managing-IBM-Cognos-Transformer-Using-MDL-Part-1.aspx, I described what I consider the optimum configuration for working with IBM Cognos Transformer using the MDL language.  In this post, I’ll talk about one of my favorite uses for MDL – creating a library of reusable model component “building blocks”.

Note:  For the purposes of this post, I’m using IBM Cognos series 7 with delimited text file data sources.   I’m also using the recommended model design protocol - “structural” queries to define the dimensions and “transactional” queries to define the measures and numerical accumulations. 

In our standard data mart development, we use the Kimball model of a constellation of dimensions surrounding a fact – the “star schema”.  Since we always try to deploy conformed dimensions, any given dimension may relate to multiple facts, yet the structure of the dimension and its internal hierarchies remain constant across the reporting environment.  This being the case, it’s really annoying to have to define the same dimensions in every Transformer model you create.  Furthermore, if your company has more than one Transformer designer, it might be possible (even with the best data mart design) for individual designers to configure their Transformer model differently, albeit inadvertently, and consequently introduce inconsistencies across reports.

MDL to the rescue!

Using MDL, you can develop a centrally maintained library of model object definitions which can then be agglomerated into a new model (also using MDL).  This is particularly effective with dimensions.  Measures, since they are dependent on the “transactional” or fact data sources are likely to vary more than dimensions.  This doesn’t mean you can’t make a measure building block – just that it might not be as universal as a dimension.

In order for this methodology to be effective, the Transformer object (we’ll refer to this object as a “dimension” from now on) should have a discrete data source, preferably that describes the single dimension only.  Each building block will contain the dimension and its data source – but nothing else.

As I mentioned in the first post in the series, it’s possible to create your MDL model file completely from scratch by typing commands into a text editor, but I have better things to do with my time.  Instead, I prefer to take an existing model with which I’m satisfied and break it apart into component blocks.

Make sure VerbOutput is set to 1 and ObjectIDOutput is set to 0 in your configuration file, as outlined in the first series post.item_dimension

  1. Open the source model with the Transformer UI.
  2. Save the model as another name (say, for the dimension MDL file you’re creating), being sure to use the .mdl file type.
  3. Remove all the dimensions except the one you’re creating.
  4. Remove all the data sources except the ones that support the dimension you’re creating
  5. Since you’re creating a generic template for the dimension, you don’t want to store any existing category values, so perform a Clean House on the model to remove all category values.
  6. Save the model.
  7. Open the model using a text editor (not Transformer).
  8. Remove the “New Model” node at the top of the file. (Note: Be sure and do this each time you save your template from Transformer, since the New Model node will start a new model definition each time it’s encountered).new_model_highlighted
  9. Save the file again. You now have a dimension template.

 

Even though you don’t want the “NewModel” command in all your dimension templates, in order to put together a new model using MDL, you have to have at least one “NewModel” command in the MDL code.  So, we’re going to make a “New Model” template that’s nothing but a NewModel command node.

  1. Create a new model in Transformer (don’t put anything in it).
  2. Save it as “NewModelTemplate.mdl” or some similar exciting name.

new_model_template

This is your template for new models.

Okay, now for the fun part.  We’re going to put our building blocks together to create a new model.

In addition to the item_dimension and NewModelTemplate, I’ve also got a date_dimension template.  To pull all these together, I’m going to write a small MDL code file.

  1. Open a text editor such as Notepad.
  2. The first line of the code file should reference the new model template, using the MDL command “OpenMDL”.mdl_code_file
  3. On each succeeding line, specify another MDL template to open using OpenMDL. Transformer treats each successive file cumulatively, so it will add any code it finds in each subsequent file to the code it’s already read.
  4. Save the file with the .mdl file extension.
  5. Open the file with the Transformer UI.

created_model

Transformer combines all the code in the template files to create a new model using all the building blocks you specified.  Pretty nifty, eh?

For more information on the MDL language itself, check out the documentation that ships with Transformer.  The Transformer MDL Reference guide is pretty well hidden in versions prior to series 8:

mdl_documentation_path

In series 8, it’s one of the selections on the Transformer Documentation page and consequently a little easier to find, although it’s now called the Transformer Developer Guide.

Tags: , ,

Business Intelligence

Managing IBM Cognos Transformer Using MDL, Part 1: Configuration

by Ellen 22. February 2010 19:40

 

I started working with IBM Cognos Transformer (the tool that builds multi-dimensional cubes) back in the day – the version 6.0 day, as a matter of fact, before the advent of SQL Server Analysis Services.  I believe that at that time, the binary model files had a file extension of .pyg, and the only way I knew to manipulate the model was inside the Transformer UI.

Ah, the innocent days of (relative) youth.

Well, innocent and really annoying, too.  I’ve always had issues with programs that were only accessible one way.  I like knowing about back doors and troubleshooting methods to invoke when the compiled UI lets you down or corrupts your files. 

Then Cognos released an update to Transformer that changed the file extension from .pyg to .pyh and I discovered the .mdl file format.

When you save your model in the .mdl format, it’s stored in MDL, the proprietary text-based modeling language that is the “universal translator” between versions of Transformer. 

In those early days, other than just using the .mdl file format as a conversion function, I also used it to compress model files.  The binary .py? files never release any space, since they store information about changes to the model.  Consequently, they continue to grow forever.  The only way to wrangle them back into a manageable size is to save them as .mdl files.  When you save the .mdl file back to .py? format, all that unseen space is removed and the binary file is restored to a reasonable size.

After a while, I wondered…why bother to go back to a .py? file anyway?  The only thing you really gain using the .py? file is a slight advantage in load time.  You can do anything with the .mdl formatted model that you can with a .py? file.  And I discovered that you could do even more. 

In addition to just being able to open and edit the model file in a text editor (a huge bonus for me, see reference to back doors above), you can actually use the MDL language to alter, manipulate and create all or part of a Transformer model.  Granted, creating an entire model from scratch using MDL would not be my idea of a good time, but luckily this is not an all-or-nothing proposition. 

My two favorite uses for MDL are using it to create model “building blocks” (reusable model components that can be standardized and used across multiple models), and using MDL to automate model maintenance and cube builds. 

In this series of blog posts, I’ll cover both these scenarios, but first, we need to do a little housekeeping. 

There are two types of MDL – Structured MDL and Verb MDL.  Structured MDL was the original version, but it’s more restrictive – you have to pay attention to the order of the commands in the file, for instance.  Verb MDL is easier to work with.  However, the default setting for the MDL version for Transformer out-of-the-box is Structured MDL, so we need to change a couple of settings in trnsfrmr.ini (for pre-series 8 Transformer) or cogtr.xml (for series 8).

The VerbOutput setting toggles between Structured MDL and Verb MDL.  VerbOutput = 0 sets Transformer to Structured MDL mode.  VerbOutput = 1 sets Transformer to Verb MDL mode.  The second setting (ObjectIDOutput) affects the way object values are presented in the MDL text file.  ObjectIDOutput = 1 presents the object identifiers as numeric values, which are not easy to evaluate in a human-readable way.  ObjectIDOutput = 0 presents the object identifiers using their text value, which is much easier on the eye (and brain).

  • Series 7
    • Open …cer[2,3 or 4]/bin/trnsfrmr.ini
    • In the [PowerPlay Transformer] section, set the VerbOutput option to 1
    • In the same section, set the ObjectIDOutput option to 0
  • Series 6.*
    • Open …cer1/bin/trnsfrmr.ini
    • At the end of the [PowerPlay Transformer] section, add a line and type in VerbOutput=1
    • Add another line and type in ObjectIDOutput=0
  • Series 8
    • Open …cognos/c8/configuration/cogtr.xml
    • Add the following node under the <Section Name="Transformer"> node:: <Preference Name="VerbOutput" Value="1"/>
    • Add the following node under the <Section Name="Transformer"> node: <Preference Name="ObjectIDOutput" Value="0" />

The following examples show the same Transformer MDL model category definitions.  First, as Structured MDL (ObjectIDOutput is the default of 1):

structured_mdl

Next, Verb MDL with ObjectIDOutput = 1:

object_id_mdl

Next, Verb MDL with ObjectIDOutput = 0:

verb_object_value_mdl

As you can see, setting the ObjectIDOutput to 0 presents the .mdl file in a much friendlier way.  After you set your VerbOutput and ObjectIDOutput configuration options, open one of your models from the Transformer UI, then use the Save As option to save the model as an .mdl file with the new configuration options applied.  You can then open the .mdl file in your favorite text editor for your viewing pleasure.

Tags: , ,

Business Intelligence

Simplifying date logic in IBM Cognos 8 – Part 1

by Bill 14. December 2009 17:23

A common challenge that we see with all of our many IBM Cognos 8 clients is dealing with date logic. The most common forms for this challenge are around relative dates:

  • How can I filter this report to only show the last x months of data?
  • Can we display only the closed financial periods?
  • I want to display the name of the current month in the header of my report

Unfortunately the most common approach to solving these sorts of issues is to either make the person running the report respond to a prompt to determine the needed filtering values or to put very complex date logic into functions in the report. Making the user provide the information is annoying to the user (why can’t this report just know what the last x months are?) and prone to misinterpretation at runtime (e.g. I thought I was supposed to pick the invoice dates that I wanted – not the ship dates!) Putting complex date logic in the reports is difficult to maintain, poorly performing and runs the risk that each report developer may take a slightly different and inconsistent approach to delivering a solution.

Our recommended approach is to have the Extraction, Transformation and Loading (ETL) process do the heavy lifting of date calculations and then leverage a powerful IBM Cognos feature called Parameter Maps to easily expose those calculations for use in reports.

The ETL process is crucial because it runs on a regular time interval (often on a nightly basis) so it can easily deliver calculations that involve relative dates – dates that are relative to now. We create a SQL table in our data mart that is very simple like this:

CREATE TABLE [dbo].[pcon_ctl_status](
    [status_key] [int] IDENTITY(1,1) NOT NULL,
    [status_parameter_name] [varchar](50) NOT NULL,
    [status_parameter_description] [varchar](100) NULL,
    [status_int_value] [int] NULL,
    [status_char_value] [varchar](100) NULL,
    [status_smalldatetime_value] [smalldatetime] NULL,
    [status_float_value] [float] NULL,
CONSTRAINT [pcon_ctl_status_PK] PRIMARY KEY CLUSTERED
(
    [status_key] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

The idea is that we populate this table with one row for every useful scalar value that we can think of – some examples include:

  • The last time our ETL process was executed (so we know how current our data is)
  • The date 13 months before the current date (if we do a lot of rolling 13 month reporting.)
  • Any sort of calculated time period factors (what percentage of the working days in the current month have already elapsed?)

So the data in the resulting table looks something like this:

ctl_status table

 

Every night the ETL process runs and updates the values in the status_xxx_value columns of this table. These are the four columns to the far right in the screen shot above. The most common way for the ETL process to do this is via a simple SQL script that gets executed at the end of the ETL process.

data manager screenshot 

The screen shot above shows some of the update statements executing from within the IBM Cognos DecisionStream ETL tool but we also commonly do this from within Microsoft SQL Server SSIS jobs.

The next post in this series will explain how we now use this data inside our IBM Cognos 8 reporting environment.

Tags:

Business Intelligence | Data Warehousing

Using SSIS Package Explorer to Eliminate Undisplayed Package Objects

by Ellen 4. December 2009 20:11

Last night I saw upon the stair

A little man who wasn’t there

He wasn’t there again today

Oh, how I wish he’d go away

--from Antigonish by Hughes Mearns

 

Recently, I’ve been assisting a client in moving their data mart databases and SSIS packages to a new server.  Prior to decommissioning the old server, I was comparing package execution results, and came across an anomaly as exposed by our standard audit infrastructure tables.  In reviewing the package execution table, it appeared that several of the packages were being logged more than once (same package guid and execution guid but different execution keys).

package_exec_listing 

I could tell by reviewing the SQL Agent job history that each of these packages only executed once, and the rows in the package execution table were not completely identical – the columns that are normally updated at the end of the execution were null for one of the rows in each duplcate set.  Hmmmmm…. Obviously, something goofy was happening in the package, related to the instantiation and update of the package execution record.

When I opened the packages in question, I saw that the Control Flow pane included the three normal audit infrastructure sequence containers that are part of all our packages.  However, I noticed something.  Two of the infrastructure sequence containers had the numeral one appended to the container name.

 

exposed_sequence_containers

Usually, this means that the SSIS developer has copied and pasted an object with a name that already exists in the package context.  BIDS appends sequential numerals to objects in these cases, to make sure the object names are unique.  I searched the Control Flow pane, but could find no duplicates for the containers.  Sure, it was possible that the developer who created the package had copied from an open package and hadn’t renamed the objects in his destination package, but I had the evidence of the package execution table screaming at me that the code was firing more than once.  Also, I wasn’t able to remove the trailing numerals from the container names without BIDS throwing an error, so SSIS still believed that the containers existed within the package. 

Invisible code.  Oh, yeah.  We hate that.

I was reduced to observing what was right in front of my nose – the Package Explorer tab.  I’m embarrassed to say that in over five years of working with SSIS, I have never looked at that tab before.  However, the Package Explorer tab does not hold a grudge – even though I’d ignored it all this time, it helped me out anyway.

When I expanded the Executables node, I could see that the two infrastructure containers were indeed duplicated.  By right-clicking on the ones that were not displayed on the Control Flow tab (the items without the trailing numeral), I was able to delete them.  I was then able to rename the visible containers, removing the trailing numeral without error.

 

package_explorer_listing

Tags: , ,

Data Warehousing | Business Intelligence

Technoxenophobia

by Ellen 23. November 2009 18:02

 

“Klaatu barada nikto”

--The Day the Earth Stood Still

 

A few weeks ago, I posted a blog entry about the phenomenon I call the Cheshire Data Mart - a data mart that disappears from the perception of the end user whose only interaction with the data is through a presentation tool.

Today I want to talk about the opposite effect - the inclination of the user to distrust the data mart data in all instances when the data does not tie perfectly to the user's program of choice. I call this "technoxenophobia" - the fear of alien technology.alien_flying_saucer

I'm not talking about the need to validate the data mart loads - that's a necessary and understood process (at least from our perspective). I'm talking about the resistance that business users can experience when asked to work with data or tools that are outside their normal comfort zone.

None of our clients, to my knowledge, has ever had an ERP or OLTP system that perfectly matches their business. They're always forced to do some kind of work-around or to store supplemental data in odd corners (cough***Excel***cough) in order to meet their reporting needs. These "data cubbies" are not usually supported by a tight business process or (which would be even more preferable) enforced by the API of the OLTP system. The more manually-maintained and/or distributed these data cubbies are, the more likely it is one or more maintenance steps could be missed.

When we design and build a data mart, we try to incorporate all of these special cases and additional data, so that the data mart actually does align with the business's reporting expectations and requirements.

Since the data mart data is normally distributed to a broader business community via a business intelligence architecture than functionally specific OLTP applications (point-of-sale systems or accounting applications, for instance), any errors or omissions in the data cubby maintenance are exposed in this larger environment, frequently in a much more rapid life cycle than the users responsible for their manual maintenance expect. Additionally, the extended business community may know nothing about the supplemental data manually maintained by users outside their own sphere.

Result? Any presentation of unexpected data is blamed on the data mart, since it's the only new guy in town.

The data mart is the only place where all these disparate parts are brought together. The results can sometimes be startling for the end user, exposing data usage from different parts of the organization that can be either redundant or conflicting.

In my very earliest days in data mart development, I always accepted these accusations at face value and tried to find the errors in my code. I've learned over the years, however, that the first place to check is any source file that is heavily dependent on human intervention. The gradual exposure to the end users of these points of fragility in their own business systems is, in my opinion, one of the cool things about data mart implementation. The business community is given the opportunity to tighten their own procedures by observing the results of those procedures as defined by their own data.

The true secret to a successful data mart invasion is not conquest but self-knowledge and evolution. Not all aliens are hostile, after all.

Tags:

Data Warehousing | Business Intelligence

64-bit SQL Server and iSeries ODBC

by Ellen 12. November 2009 00:31

This week, I started work on a data mart for a client whose source data resides on an AS400. The data access for the source system is handled via the iSeries ODBC driver from IBM. Getting the DSNs set up correctly to work with the client's 64-bit SQL Server installation was a little irritating, so I thought I'd share some lessons learned.

As you may already know, even though the database engine for 64-bit SQL Server is a 64-bit application (with a 64-bit dtexec executable for SSIS packages), the BIDS designer is a 32-bit application. This presents a challenge in package development and deployment when data connections either do not have a 64-bit driver (as in the case of Jet for Excel or Access) or when the 64-bit and 32-bit drivers exist, but require different executable versions (as with ODBC).

In this case, I needed to connect to the AS400 with the iSeries ODBC driver from within SSIS. The client had created the DSN for me and believed he had created a 32-bit DSN (since the driver description within the ODBC Administrator was labeled as 32-bit). However, the DSN did not appear in the DSN selection drop-down in the ODBC connection manager in SSIS.

The problem was that, regardless of what the driver was labeled, the DSN had to be created with the 32-bit ODBC Administrator in order for it to exist as a 32-bit DSN and become accessible to SSIS. Furthermore, in order for the package to run both in debug mode from within the designer (32-bit) and as a scheduled job in SQL Server Agent using the Integration Services step type (64-bit), you have to create DSNs with the identical name in both the 32-bit ODBC Administrator and the 64-bit Administrator.

So, in the Program menu for the iSeries access, you'll see a 64-bit and 32-bit ODBC Administrator:

 

iseries_menu

You need to create identical system DSNs using both Administration modules.  The DSN defined in the 32-bit Administrator will be exposed and utilized from with BIDS; the DSN defined in the 64-bit Administrator will be utilized by SQL Server Agent and 64-bit dtexec.

To add a little more excitement to your day, SSIS has no ODBC source adapter for the Data Flow task, and in SQL Server 2008, the Data Reader source adapter has been retired and replaced with the ADO.NET source adapter.  In order to use the iSeries DSN, you need to create your connection manager as an ADO.NET provider (.Net Providers\Odbc Data Provider) rather than as an ODBC connection manager type.

Tags: , , , ,

Business Intelligence | Data Warehousing

So Not the Drama (or not)

by Ellen 29. October 2009 18:00

 

In the fall of 1983, I was entering my last year of a three-year M.F.A. program in Theater Administration at the Yale School of Drama. Yeah, I know what you're thinking…what does that have to do with data warehouse consulting? Well, consulting as we know it at Pcon involves solving problems. Here are some of the problems presented to students in the Admin program at the time:

  • How do you convince an actor that the fluorescent lights in the rehearsal hall are not sapping his vitamin D?
  • When an actress complains that the water coming out of the faucet in her theater-provided apartment is too hot, how do you suggest she should add some more cold water to the mix without tacking "you idiot" to the end of your sentence?
  • How do you convince the recipient of a MacArthur "genius" grant that taking a cab from Boston to New Haven is perhaps not the best use of funds?

I was the only student with a finance emphasis at the time, so (thankfully) my problems did not involve actors. I was working for the Business Manager of YSD/Yale Repertory Theater. The problems we faced were slightly different:adding_machine_tape

  • How do we reconcile our (manually maintained) list of department purchase orders with the mainframe printouts from the University when the recent University accounting system conversion truncated the final character of the PO number?
  • How do we avoid suffocation under the mountain of adding machine tape necessary to foot and cross-foot our 11 x 24  green ledger pages?
  • What's the best way to hide the enormous calluses on our fingers (caused by prolonged contact with mechanical pencils)?

Enter the miracle.

My boss brought in a Compaq "portable" computer (which would probably be rejected by most modern airlines as exceeding their carry-on size limits). If I recall correctly, it had a single 5-1/4 inch floppy drive. The character-based screen was about the size of the display on one of my dad's oscilloscopes.

I adored it. Compared to mechanical pencils and green ledger sheets? Oh, yeah. It was heaven.

Using version 1.0 of Lotus 1-2-3 and RTFM, I used key-stroke macros to create a set of consolidated financial reports for the school and theater - the first that did not involve an IBM Selectric typewriter and a bucket of Liquid Paper.

This was the first time I used technology to improve a business process. It was a life-changing event. In a way, this is always in the back of my mind whenever I approach a new project. The client's data access/reporting difficulties play the part of the green ledger sheets and mechanical pencils. Our design and implementation of the data mart and business intelligence reporting solutions play the part of the Compaq and key-stroke macros.

Play the part….right.  Maybe data warehouse consulting isn't that far removed from drama school after all.

Tags:

Data Warehousing | Business Intelligence

Using the Trash Destination Adapter for SSIS Housekeeping

by Ellen 23. October 2009 19:23

In an earlier post about the custom Row Number transformation, I mentioned the resources available from Konesans (www.konesans.com) via SQLIS (www.sqlis.com). The Trash Destination adapter is another one of my can't-live-without custom components, although I may use it a little bit differently than is described on its SQLIS page (http://www.sqlis.com/post/Trash-Destination-Adapter.aspx).

The description of the adapter on that page presents it as something you would use only in development and testing, as an option for breaking out the data flow to evaluate performance bottlenecks, for instance, or for creating a path to associate with a Data Viewer. The description even states:

"It is also obvious that this is for development or diagnostic purposes, and is clearly not a part of the functional design of the package."

Actually, I use this component in the majority of my SSIS packages.  In the ETL logic flow, it’s frequently necessary to evaluate data  and determine whether a particular row should continue in pipeline or be discarded.  The Trash Destination is a low-maintenance way to ensure that every row of data in the pipeline has a definite endpoint – even if that endpoint is the garbage.  It’s easily configurable.  Well, no configuration really required.  Can’t get much easier than that.

As an example, here’s a picture of a section of a data flow task that processes a type 1 slowly changing dimension: 

type_1_split_trash_destination

 

The Conditional Split evaluates whether the type 1 attributes have changed based on an upstream Checksum transformation.  If the data has sustained a change, the row is diverted to the type 1 change staging table.  If the row is unchanged, it’s discarded (diverted to the Trash Destination).  This appeals to my sense of closure – all those rows of data in the pipeline have a place to go.  But by inserting the Row Count Plus transformation (another freebie from Konesans/SQLIS), I can count the unchanged rows and store that information in the audit data for the ETL run, so I can squeeze a bit more use out of the Trash Destination branch.  It’s all good.

The Cheshire Data Mart

by Ellen 14. October 2009 23:14

'Well! I've often seen a cat without a grin,' thought Alice; 'but a grin without a cat! It's the most curious thing I ever saw in all my life!'

--Lewis Carroll, Alice's Adventures in Wonderland

 

Several years ago, I sat in a local Cognos user's group meeting, listening to one of our clients give a presentation about the way she used Cognos Impromptu for her extensive reporting requirements. She gave a good presentation. She discussed limitations of her company's AS400 source system reporting capabilities and how Impromptu expanded her options.

She never once mentioned the data mart I had built - from two separate AS400 ERP systems, Excel and other supplementary data - on which all of the Impromptu reporting was based.

She didn't know it was there.

If she did, she didn't consciously consider the data mart as an entity separate from the presentation tool . Impromptu was her only experience with the data - and the Impromptu catalog insulates the end user from the complexities of its source data. That's its purpose - and the business analyst in charge of maintaining the catalog did an excellent job getting Impromptu to live up to that purpose.

cheshire_cateResult? The Cheshire Data Mart - a data mart that was so successful, it disappeared from view entirely.

We had a good chuckle over this at our own expense. However, over time, we've found the same situation arising with other data mart projects. Since our clients tend to be mid-market companies, many of whom have small IT departments, the consumers of the data mart business intelligence data can be several layers removed from the actual implementation of the data mart.

If the only experience a user has with the data mart is as a consumer of output through some application or third-party reporting tool, that user is less likely to think of the data as something discrete. If they can see data in the UI, then the data must actually be in that UI right there. A part of it. Not something that has to be separately considered and maintained.

Unless you're a confirmed data junkie like me, the concept of data in the abstract is just one more yawn-inducing geekoid topic that causes your eyes to glaze over and the fight-or-flight reflex to kick in. (I know from the number of times I've tried - unsuccessfully - to explain my job to my children and friends.) The average business intelligence consumer may not prepared to think about the data decoupled from its presentation, and perhaps that's okay. But someone within the organization needs to remain cognizant of Cheshire Data Mart lurking out there so that future decisions about how the data mart should grow and evolve - and how business users can take advantage of the richness of data available in the mart for their use - can be made appropriately.

Tags:

Business Intelligence | Data Warehousing

Powered by BlogEngine.NET 1.5.0.7
Theme by Perkins Consulting Content Copyright 2009 Perkins Consulting, LLC All rights reserved.