Quick Tip: Adding Multiple Existing Packages to an SSIS Project

by Ellen 17. January 2011 17:36

I’m a firm believer in “develop once, copy many.”  I use package templates extensively for my SSIS development projects, and I also reuse packages and redeploy from one project to another for new test or development environments.  I always store my packages in the files system because I find them easier to manage that way.

BIDS gives you the option of adding existing packages to your project, but if you choose the Add | Existing Package option from either the project node or the SSIS Packages node in Solution Explorer, you can only add one package at a time.

Tedious.

To add multiple file system packages at once, right click on the package node and choose the Add | Existing Item option instead:

add_existing_item

BIDS opens a file explore window and you can select multiple .dtsx files and at them to your project in one swell foop:

file_explorer_for_multiple_package_selection

Tags: ,

Data Warehousing | Business Intelligence

Cookie Cutter SSIS, Part 3 – The Audit Database

by Ellen 12. August 2010 23:52

I think it's safe to say that most (if not all) data warehousing experts recommend keeping track of ETL execution results in some way. I started working with custom audit tables back in the DTS days, starting with some code I stole from Professional SQL Server 2000 DTS by Chaffin, Knight and Robinson. Over time, I've morphed the audit database into a critical part of the Cookie Cutter SSIS methodology.

Our standard audit database contains the following schemas:

  • audit
  • metadata
  • etl
  • util
  • and the ever-popular dbo, of course

Our standard audit database contains the following tables:

audit.package_execution ETL execution information at the package grain. We capture the execution GUID to enable linking to the standard SSIS logging table, dbo.sysdtslog90.
audit.job ETL execution information at the table grain. Cookie Cutter SSIS is about table-centric processing. This table will probably cause most DBAs to run, screaming, from the room (I try to protect Serena, our ace database designer, from it as much as I can). It's set up to be as generic as possible - consequently, a number of the columns in any given row will likely be null if they don't apply to the particular table being processed.
audit.execution_event This table stores event-related data. A SQL task with the stored procedure that inserts the event record can be inserted in your package when an event handler fires.
audit.reject_rows This table works with the error database - it stores a daily summary of total error rows for every non-empty table in the error database. Originally we developed this for use in client audit dashboards to give a quick view of potential problem data.
audit.control_parameter Control parameters are the mechanism we use to expose changing, low-cardinality data outside of ETL code. We use the parameters to populate variables inside of the SSIS packages. By setting up a control parameter for a potentially changing data item, we avoid hard-coding values inside of SSIS packages (where they're harder to find) and allow our ETL to be data-driven and dynamic.
metadata.dim_role Data mart dimensions may play multiple roles in a given fact. The most notorious role-player is the date dimension (at one client site, the date dimension is used in over a hundred different ways.) This table is a way to keep track of dimension roles; however, it was originally developed to make IBM Cognos Framework Manager easier to update. Rick, our ace developer, wrote a tool to assist Kathy, our ace report designer, in keeping the Framework Manager model up to date with terminology for both technical and business users (the terminology is captured from data mart object extended properties). This table was created to facilitate renaming all the role-playing dimensions inside the Framework Manager model.
metadata.stage_extended_properties

I took the Kimball University class from Joy Mundy and Warren Thornethwaite based on their book, The Microsoft Data Warehouse Toolkit, back when the book was just published. One of the concepts they presented was using SQL Server extended properties to store metadata about data mart objects. I took this idea and ran with it, and it has been one of the most transformative concepts in all my data warehousing experience. (Thanks, Joy and Warren!) Extended properties will get their own post - this particular table is used in the process of applying extended properties to data mart objects.

…and the following stored procedures:

audit.pr_insert_audit_job Instantiate audit job record (job_key as int data type)
audit.pr_insert_audit_job_bigint Instantiate audit job record (when installation requires job_key of type bigint)
audit.pr_insert_execution_event Add an event handler record
audit.pr_insert_package_execution Instantiate package execution record (package_exec_key as int data type)
audit.pr_insert_package_execution_bigint Instantiate package execution record (when installation requires package_exec_key of type bigint)
audit.pr_update_audit_job Update audit job record (configured for int type job_key)
audit.pr_update_package_execution Update package execution record (configured for int type package_exec_key)
etl.pr_add_table_extended_properties Add extended property metadata to data mart objects (used with SSIS metadata update package)
etl.pr_get_row_count Get row count from a specified table
etl.pr_get_surrogate_key Get the next surrogate key from a specified table
etl.pr_truncate_table Truncate a specified table

I’ve attached a script to create the audit database and all of the above schemas, tables and procedures.  The script assumes an int type for the job_key and the package_exec_key, and you’ll need to amend the code for the database create statements to point to the correct path in your environment:

In the next post in this series, I’ll discuss the other standard databases and some utility functions.

Tags: , ,

Data Warehousing | Business Intelligence

Cookie Cutter SSIS, Part 2 – Standard Databases

by Ellen 30. June 2010 18:50

One of the things that may be different for us here at Perkins Consulting compared to your own work environment is that we're developing data marts for a number of different clients. Consequently, we benefit from standardization - names, objects, code, etc. Please keep that in mind as you read this series - some of the naming conventions we use, for instance (and the reasons we use them), may not fit the your company's model. Feel free to modify!

That's the whole point of cookie cutter SSIS - develop once, copy many.

In this second post of the series, I want to describe the standard databases we deploy for our clients and how they fit into the cookie cutter methodology.

For each new data mart project, we deploy a minimum of the following databases:

database_names

(A note on naming: Because we like to keep all the associated databases listed together, we use naming conventions to group them. The SQL Server database names  here are the current "generic" names that are easily transportable from one installation to the next. In the past, we've also used a prefix to denote the individual client - whatever works to keep them together in SSMS. I should also mention that I always use lower case and separate key words with underscores. This means that I won't have to think too hard when I'm working on a client site with a case-sensitive server on the same day that I'm working on another site that's not case-sensitive.)

The Data Mart database is, I hope, self-explanatory. This is the final resting place of the tables - facts, dimensions, aggregates, bridge tables - that comprise the data mart. The tables in this database will differ from client to client, of course, but will always include at least one date dimension and a table that contains reporting control parameters.

The Audit database contains a set of standard tables and other objects (user-defined functions, stored procedures, etc.) that support the auditing subsystem we use to track and support ETL activity. I'll describe the audit database in more detail in a later post.

The Staging database is volatile. By that I mean that the data in the database is not expected to persist, even though the table structures will. It is a repository for interim data, and yes, sometimes you need that interim storage. Even though SSIS is very good at managing an ETL data stream from source to final target, sometimes the interim step grants you better performance or more flexibility in processing.

The Source database is the landing spot for extracted data. The data in these tables is not expected to persist either. Each table in the Source database matches a data source object somewhere in the client's system, and is truncated and repopulated with each ETL run. With our clients, we try to use the concept of "Get In, Get Out and Get On With It" with respect to their source systems. The scheduled extract of the data is usually constrained by a number of things, including:

  • Status changes during the day that would affect reporting in an unexpected way
  • Source system batch processing that must complete in order to present the data correctly
  • Other system activity that consumes resources or locks source tables

We grab the data out of the source systems and park in in the Source database, after which we can work with it without worrying about the impact on the client's operations.

The Error database is the location for storing redirected data that would otherwise cause an insert process to fail. The tables in this database may be truncated and repopulated with each run (for data that will continue to fail until it's repaired at the source, for instance, and will continue to show up as an error until it's fixed). The data may also be allowed to persist in the Error database and be repaired and reloaded from the error tables (this is necessary for source data that cannot be recaptured in the same state from one ETL run to the next - inventory snapshot data, as an example).

Nearly every client installation requires the maintenance of some supplementary data that is not contained anywhere in the ERP system. (Can anyone say "Excel spreadsheets"?) The Lookup database is where we store persistent supplementary data. The data may be updated in its own SSIS package, but it's essentially reference data that's manually maintained by the client, in order to provide additional reporting richness to data mart tables.

The Archive database may or may not be used at any given client installation, depending on need. For instance, at one client site, we pass data back and forth from the data mart to a third-party service using the third party's defined fixed-width flat files. Troubleshooting issues with the files is much easier when we load the contents into database tables. The Archive database is used for non-data mart data only.

In the next post in this series, I'll go into more detail on the Audit database and its structures.

Tags: , , , ,

Data Warehousing

Cookie Cutter SSIS, Part 1 - “Data Mart in a Day”

by Ellen 29. June 2010 01:27

I've been working with SSIS since just before the RTM of SQL Server 2005. Over time, I've been able to steal tips and tricks from a number of sources (sessions at PASS Summits, notably those presented by Rushabh Mehta and Erik Veerman; Ralph Kimball, Joy Mundy and Warren Thornethwaite of the Kimball Group; SQLIS; Brian Knight and others that I apologize ahead of time for not mentioning). From all these sources, distilled by the practical work we've done with our own clients, I've evolved a standard set of patterns and practices that allow quick, efficient development for moderately sized data marts.cookie_cutters

I call this "Cookie Cutter SSIS."

The most extreme example of the use of these concepts is what I refer to as "Data Mart in a Day." Back in 2007, we had a prospective client who was interested in the Cognos (now IBM Cognos) business intelligence toolset, but wasn't certain how the tools would work for their organization. As a result, Perkins Consulting engaged to do a limited proof-of-concept project for this client, that started with the creation of a modest data mart (four dimensions, three base facts and an aggregate fact all with a relatively small number of rows) against which to deploy the Cognos reporting tools. Because the focus of the project was on the reports, we needed to get the data mart built and populated (along with ongoing maintenance) as quickly as possible.

Okay, okay. So the complete data mart was not literally finished in a day (we did some prep work prior to the onsite development day, added one of the facts and the aggregate after the first reporting pass and completed the data validation phase afterwards.) Nevertheless, this was PDQ - and the POC data mart remained in production with minimal downtime at this client site for at least six months.

I'd like to share some of these shortcuts that I continue to use to quickly deploy data mart objects. I'll say up front that the scope of the series is strictly mechanics - how to use the "cookie cutter" method of standard objects and templates to speed data mart ETL.

We're assuming that the data discovery and design phases have already occurred. We've got our data model; we know our data source options. Now we're ready to create our databases, build our tables and use SSIS to populate our data mart.

This methodology is predicated on several things:

  • A standard set of supporting databases, in addition to the data mart database itself
  • Uniform handling of data mart object metadata using SQL Server extended properties
  • Use of configuration files to enable data-driven dynamic Connection Managers
  • SSIS package templates, pre-configured with standard variables, containers and objects
  • Utility tables, functions and stored procedures
  • Custom SSIS components from community resources that extend SSIS functionality
  • A standard date dimension design and data source (Excel spreadsheet) that can be customized for specific client needs

In the next post, we'll talk about the standard databases that are the first step in the process.

Tags: , , , ,

Data Warehousing

Don’t Forget the Parameter

by Ellen 14. May 2010 17:00

This is just a quick post to remind myself as much as anyone else who’s interested…

I am a firm believer in “Develop Once…Copy Many” since as far as I’m concerned, the fun in development is in the problem-solving, not necessarily in multiple implementations of a solution (no matter how cool it happens to be). 

Over time, I’ve evolved a templated “cookie-cutter” approach to ETL that allows me to copy and reuse SSIS containers for data mart object processing.  Normally, this works quite well, although BIDS has some interesting ideas about how to arrange nested containers and objects, which necessitates some fancy footwork at times. 

One of the instances (other than display) in which this fails is in copying a Data Flow task with an OLE DB Source adapter that includes a parameter in a SQL command.  When copying the the data flow task, the placeholder for the parameter is persisted, but the variable mapping for the parameter is not:

data_source_prameter

So, if you’re seeking the path of least resistance in your SSIS development and choose to copy and re-use Data Flow tasks that include parameters mapped in the SQL statement of an OLE DB Source adapter, don’t forget to remap the variable.

Tags: , ,

Data Warehousing

Managing IBM Cognos Transformer using MDL, Part 2: Model Building Blocks

by Ellen 24. February 2010 19:38

In the first post in this series, http://blogs.perkinsconsulting.com/post/Managing-IBM-Cognos-Transformer-Using-MDL-Part-1.aspx, I described what I consider the optimum configuration for working with IBM Cognos Transformer using the MDL language.  In this post, I’ll talk about one of my favorite uses for MDL – creating a library of reusable model component “building blocks”.

Note:  For the purposes of this post, I’m using IBM Cognos series 7 with delimited text file data sources.   I’m also using the recommended model design protocol - “structural” queries to define the dimensions and “transactional” queries to define the measures and numerical accumulations. 

In our standard data mart development, we use the Kimball model of a constellation of dimensions surrounding a fact – the “star schema”.  Since we always try to deploy conformed dimensions, any given dimension may relate to multiple facts, yet the structure of the dimension and its internal hierarchies remain constant across the reporting environment.  This being the case, it’s really annoying to have to define the same dimensions in every Transformer model you create.  Furthermore, if your company has more than one Transformer designer, it might be possible (even with the best data mart design) for individual designers to configure their Transformer model differently, albeit inadvertently, and consequently introduce inconsistencies across reports.

MDL to the rescue!

Using MDL, you can develop a centrally maintained library of model object definitions which can then be agglomerated into a new model (also using MDL).  This is particularly effective with dimensions.  Measures, since they are dependent on the “transactional” or fact data sources are likely to vary more than dimensions.  This doesn’t mean you can’t make a measure building block – just that it might not be as universal as a dimension.

In order for this methodology to be effective, the Transformer object (we’ll refer to this object as a “dimension” from now on) should have a discrete data source, preferably that describes the single dimension only.  Each building block will contain the dimension and its data source – but nothing else.

As I mentioned in the first post in the series, it’s possible to create your MDL model file completely from scratch by typing commands into a text editor, but I have better things to do with my time.  Instead, I prefer to take an existing model with which I’m satisfied and break it apart into component blocks.

Make sure VerbOutput is set to 1 and ObjectIDOutput is set to 0 in your configuration file, as outlined in the first series post.item_dimension

  1. Open the source model with the Transformer UI.
  2. Save the model as another name (say, for the dimension MDL file you’re creating), being sure to use the .mdl file type.
  3. Remove all the dimensions except the one you’re creating.
  4. Remove all the data sources except the ones that support the dimension you’re creating
  5. Since you’re creating a generic template for the dimension, you don’t want to store any existing category values, so perform a Clean House on the model to remove all category values.
  6. Save the model.
  7. Open the model using a text editor (not Transformer).
  8. Remove the “New Model” node at the top of the file. (Note: Be sure and do this each time you save your template from Transformer, since the New Model node will start a new model definition each time it’s encountered).new_model_highlighted
  9. Save the file again. You now have a dimension template.

 

Even though you don’t want the “NewModel” command in all your dimension templates, in order to put together a new model using MDL, you have to have at least one “NewModel” command in the MDL code.  So, we’re going to make a “New Model” template that’s nothing but a NewModel command node.

  1. Create a new model in Transformer (don’t put anything in it).
  2. Save it as “NewModelTemplate.mdl” or some similar exciting name.

new_model_template

This is your template for new models.

Okay, now for the fun part.  We’re going to put our building blocks together to create a new model.

In addition to the item_dimension and NewModelTemplate, I’ve also got a date_dimension template.  To pull all these together, I’m going to write a small MDL code file.

  1. Open a text editor such as Notepad.
  2. The first line of the code file should reference the new model template, using the MDL command “OpenMDL”.mdl_code_file
  3. On each succeeding line, specify another MDL template to open using OpenMDL. Transformer treats each successive file cumulatively, so it will add any code it finds in each subsequent file to the code it’s already read.
  4. Save the file with the .mdl file extension.
  5. Open the file with the Transformer UI.

created_model

Transformer combines all the code in the template files to create a new model using all the building blocks you specified.  Pretty nifty, eh?

For more information on the MDL language itself, check out the documentation that ships with Transformer.  The Transformer MDL Reference guide is pretty well hidden in versions prior to series 8:

mdl_documentation_path

In series 8, it’s one of the selections on the Transformer Documentation page and consequently a little easier to find, although it’s now called the Transformer Developer Guide.

Tags: , ,

Business Intelligence

Managing IBM Cognos Transformer Using MDL, Part 1: Configuration

by Ellen 22. February 2010 19:40

 

I started working with IBM Cognos Transformer (the tool that builds multi-dimensional cubes) back in the day – the version 6.0 day, as a matter of fact, before the advent of SQL Server Analysis Services.  I believe that at that time, the binary model files had a file extension of .pyg, and the only way I knew to manipulate the model was inside the Transformer UI.

Ah, the innocent days of (relative) youth.

Well, innocent and really annoying, too.  I’ve always had issues with programs that were only accessible one way.  I like knowing about back doors and troubleshooting methods to invoke when the compiled UI lets you down or corrupts your files. 

Then Cognos released an update to Transformer that changed the file extension from .pyg to .pyh and I discovered the .mdl file format.

When you save your model in the .mdl format, it’s stored in MDL, the proprietary text-based modeling language that is the “universal translator” between versions of Transformer. 

In those early days, other than just using the .mdl file format as a conversion function, I also used it to compress model files.  The binary .py? files never release any space, since they store information about changes to the model.  Consequently, they continue to grow forever.  The only way to wrangle them back into a manageable size is to save them as .mdl files.  When you save the .mdl file back to .py? format, all that unseen space is removed and the binary file is restored to a reasonable size.

After a while, I wondered…why bother to go back to a .py? file anyway?  The only thing you really gain using the .py? file is a slight advantage in load time.  You can do anything with the .mdl formatted model that you can with a .py? file.  And I discovered that you could do even more. 

In addition to just being able to open and edit the model file in a text editor (a huge bonus for me, see reference to back doors above), you can actually use the MDL language to alter, manipulate and create all or part of a Transformer model.  Granted, creating an entire model from scratch using MDL would not be my idea of a good time, but luckily this is not an all-or-nothing proposition. 

My two favorite uses for MDL are using it to create model “building blocks” (reusable model components that can be standardized and used across multiple models), and using MDL to automate model maintenance and cube builds. 

In this series of blog posts, I’ll cover both these scenarios, but first, we need to do a little housekeeping. 

There are two types of MDL – Structured MDL and Verb MDL.  Structured MDL was the original version, but it’s more restrictive – you have to pay attention to the order of the commands in the file, for instance.  Verb MDL is easier to work with.  However, the default setting for the MDL version for Transformer out-of-the-box is Structured MDL, so we need to change a couple of settings in trnsfrmr.ini (for pre-series 8 Transformer) or cogtr.xml (for series 8).

The VerbOutput setting toggles between Structured MDL and Verb MDL.  VerbOutput = 0 sets Transformer to Structured MDL mode.  VerbOutput = 1 sets Transformer to Verb MDL mode.  The second setting (ObjectIDOutput) affects the way object values are presented in the MDL text file.  ObjectIDOutput = 1 presents the object identifiers as numeric values, which are not easy to evaluate in a human-readable way.  ObjectIDOutput = 0 presents the object identifiers using their text value, which is much easier on the eye (and brain).

  • Series 7
    • Open …cer[2,3 or 4]/bin/trnsfrmr.ini
    • In the [PowerPlay Transformer] section, set the VerbOutput option to 1
    • In the same section, set the ObjectIDOutput option to 0
  • Series 6.*
    • Open …cer1/bin/trnsfrmr.ini
    • At the end of the [PowerPlay Transformer] section, add a line and type in VerbOutput=1
    • Add another line and type in ObjectIDOutput=0
  • Series 8
    • Open …cognos/c8/configuration/cogtr.xml
    • Add the following node under the <Section Name="Transformer"> node:: <Preference Name="VerbOutput" Value="1"/>
    • Add the following node under the <Section Name="Transformer"> node: <Preference Name="ObjectIDOutput" Value="0" />

The following examples show the same Transformer MDL model category definitions.  First, as Structured MDL (ObjectIDOutput is the default of 1):

structured_mdl

Next, Verb MDL with ObjectIDOutput = 1:

object_id_mdl

Next, Verb MDL with ObjectIDOutput = 0:

verb_object_value_mdl

As you can see, setting the ObjectIDOutput to 0 presents the .mdl file in a much friendlier way.  After you set your VerbOutput and ObjectIDOutput configuration options, open one of your models from the Transformer UI, then use the Save As option to save the model as an .mdl file with the new configuration options applied.  You can then open the .mdl file in your favorite text editor for your viewing pleasure.

Tags: , ,

Business Intelligence

Virtual Applications in Windows 7 Virtual PC

by Ellen 9. February 2010 19:45

Recently, I inadvertently discovered a useful feature of Windows 7 Virtual PC that I'd like to share, just in case there are folks out there who haven't experienced this particular functionality.

On my host laptop (a Windows 7 box), I've installed SQL Server 2008 and Visual Studio 2008, although my previous XP box hosted SQL Server/Visual Studio 2005. I use Visio for Enterprise Architects as my data modeling tool and as I discovered during the move to the new Windows 7 laptop, Visio for Enterprise Architects requires Visual Studio 2005 in order to install properly - it won't play with VS2008.

Okay, fine.

This was annoying, but not catastrophic. Since most of our clients are still on SQL Server 2005, I have a Virtual PC with SQL Server/Visual Studio 2005 installed. I installed Visio on this VPC, grumbling since I knew I'd have to launch this VPC whenever I wanted to do any data modeling.

Or so I thought.

One day, I experienced a momentary lapse of consciousness and double-clicked a Visio (.vsd) file on my host machine. Just as I was cursing myself for doing something so dopey, the Virtual PC credentials login for my SQL Server/VS2005 VPC popped up. I entered my credentials, and the VPC started to launch, except instead of opening the VPC window, it launched Visio as a "virtual application" within the context of the host machine.

How cool is that?

I was able to access files from inside the VPC as well as the host machine from the Visio application menus.  The host machine treated it as a remote connection to the VPC through the local network.

I've poked around with this a little more since the first time it surprised me and have discovered a few more things:

  • My VPC is a member of a domain.  If no network is available (even a network that is not the domain network), the remote application cannot launch.  For instance, I was in an airport recently with no internet/network connectivity and was unable to start the virtual application.
  • For this domain-enabled VPC, it doesn’t seem to matter if Integration Features are enabled or not.  The remote application can still launch.
  • Occasionally, Internet Explorer tries to take over and pretend that it’s the application of choice to host .vsd file.  You have to take a firm stand on this – right click the .vsd file and choose Open With.  On my host laptop, the option to open with the remote Visio program appears in the popup menu (SQLServer2005 is the name of my VPC):

open_with_visio_remote

I haven’t tried this with a VPC that isn’t a member of a domain, so I’m not sure what the behavior for that would be like.  I’ll have to give that a try for the next blog post!

Tags: ,

Using SSIS Package Explorer to Eliminate Undisplayed Package Objects

by Ellen 4. December 2009 20:11

Last night I saw upon the stair

A little man who wasn’t there

He wasn’t there again today

Oh, how I wish he’d go away

--from Antigonish by Hughes Mearns

 

Recently, I’ve been assisting a client in moving their data mart databases and SSIS packages to a new server.  Prior to decommissioning the old server, I was comparing package execution results, and came across an anomaly as exposed by our standard audit infrastructure tables.  In reviewing the package execution table, it appeared that several of the packages were being logged more than once (same package guid and execution guid but different execution keys).

package_exec_listing 

I could tell by reviewing the SQL Agent job history that each of these packages only executed once, and the rows in the package execution table were not completely identical – the columns that are normally updated at the end of the execution were null for one of the rows in each duplcate set.  Hmmmmm…. Obviously, something goofy was happening in the package, related to the instantiation and update of the package execution record.

When I opened the packages in question, I saw that the Control Flow pane included the three normal audit infrastructure sequence containers that are part of all our packages.  However, I noticed something.  Two of the infrastructure sequence containers had the numeral one appended to the container name.

 

exposed_sequence_containers

Usually, this means that the SSIS developer has copied and pasted an object with a name that already exists in the package context.  BIDS appends sequential numerals to objects in these cases, to make sure the object names are unique.  I searched the Control Flow pane, but could find no duplicates for the containers.  Sure, it was possible that the developer who created the package had copied from an open package and hadn’t renamed the objects in his destination package, but I had the evidence of the package execution table screaming at me that the code was firing more than once.  Also, I wasn’t able to remove the trailing numerals from the container names without BIDS throwing an error, so SSIS still believed that the containers existed within the package. 

Invisible code.  Oh, yeah.  We hate that.

I was reduced to observing what was right in front of my nose – the Package Explorer tab.  I’m embarrassed to say that in over five years of working with SSIS, I have never looked at that tab before.  However, the Package Explorer tab does not hold a grudge – even though I’d ignored it all this time, it helped me out anyway.

When I expanded the Executables node, I could see that the two infrastructure containers were indeed duplicated.  By right-clicking on the ones that were not displayed on the Control Flow tab (the items without the trailing numeral), I was able to delete them.  I was then able to rename the visible containers, removing the trailing numeral without error.

 

package_explorer_listing

Tags: , ,

Data Warehousing | Business Intelligence

Technoxenophobia

by Ellen 23. November 2009 18:02

 

“Klaatu barada nikto”

--The Day the Earth Stood Still

 

A few weeks ago, I posted a blog entry about the phenomenon I call the Cheshire Data Mart - a data mart that disappears from the perception of the end user whose only interaction with the data is through a presentation tool.

Today I want to talk about the opposite effect - the inclination of the user to distrust the data mart data in all instances when the data does not tie perfectly to the user's program of choice. I call this "technoxenophobia" - the fear of alien technology.alien_flying_saucer

I'm not talking about the need to validate the data mart loads - that's a necessary and understood process (at least from our perspective). I'm talking about the resistance that business users can experience when asked to work with data or tools that are outside their normal comfort zone.

None of our clients, to my knowledge, has ever had an ERP or OLTP system that perfectly matches their business. They're always forced to do some kind of work-around or to store supplemental data in odd corners (cough***Excel***cough) in order to meet their reporting needs. These "data cubbies" are not usually supported by a tight business process or (which would be even more preferable) enforced by the API of the OLTP system. The more manually-maintained and/or distributed these data cubbies are, the more likely it is one or more maintenance steps could be missed.

When we design and build a data mart, we try to incorporate all of these special cases and additional data, so that the data mart actually does align with the business's reporting expectations and requirements.

Since the data mart data is normally distributed to a broader business community via a business intelligence architecture than functionally specific OLTP applications (point-of-sale systems or accounting applications, for instance), any errors or omissions in the data cubby maintenance are exposed in this larger environment, frequently in a much more rapid life cycle than the users responsible for their manual maintenance expect. Additionally, the extended business community may know nothing about the supplemental data manually maintained by users outside their own sphere.

Result? Any presentation of unexpected data is blamed on the data mart, since it's the only new guy in town.

The data mart is the only place where all these disparate parts are brought together. The results can sometimes be startling for the end user, exposing data usage from different parts of the organization that can be either redundant or conflicting.

In my very earliest days in data mart development, I always accepted these accusations at face value and tried to find the errors in my code. I've learned over the years, however, that the first place to check is any source file that is heavily dependent on human intervention. The gradual exposure to the end users of these points of fragility in their own business systems is, in my opinion, one of the cool things about data mart implementation. The business community is given the opportunity to tighten their own procedures by observing the results of those procedures as defined by their own data.

The true secret to a successful data mart invasion is not conquest but self-knowledge and evolution. Not all aliens are hostile, after all.

Tags:

Data Warehousing | Business Intelligence

Powered by BlogEngine.NET 1.5.0.7
Theme by Perkins Consulting Content Copyright 2009 Perkins Consulting, LLC All rights reserved.