Cookie Cutter SSIS, Part 2 – Standard Databases

by Ellen 30. June 2010 18:50

One of the things that may be different for us here at Perkins Consulting compared to your own work environment is that we're developing data marts for a number of different clients. Consequently, we benefit from standardization - names, objects, code, etc. Please keep that in mind as you read this series - some of the naming conventions we use, for instance (and the reasons we use them), may not fit the your company's model. Feel free to modify!

That's the whole point of cookie cutter SSIS - develop once, copy many.

In this second post of the series, I want to describe the standard databases we deploy for our clients and how they fit into the cookie cutter methodology.

For each new data mart project, we deploy a minimum of the following databases:

database_names

(A note on naming: Because we like to keep all the associated databases listed together, we use naming conventions to group them. The SQL Server database names  here are the current "generic" names that are easily transportable from one installation to the next. In the past, we've also used a prefix to denote the individual client - whatever works to keep them together in SSMS. I should also mention that I always use lower case and separate key words with underscores. This means that I won't have to think too hard when I'm working on a client site with a case-sensitive server on the same day that I'm working on another site that's not case-sensitive.)

The Data Mart database is, I hope, self-explanatory. This is the final resting place of the tables - facts, dimensions, aggregates, bridge tables - that comprise the data mart. The tables in this database will differ from client to client, of course, but will always include at least one date dimension and a table that contains reporting control parameters.

The Audit database contains a set of standard tables and other objects (user-defined functions, stored procedures, etc.) that support the auditing subsystem we use to track and support ETL activity. I'll describe the audit database in more detail in a later post.

The Staging database is volatile. By that I mean that the data in the database is not expected to persist, even though the table structures will. It is a repository for interim data, and yes, sometimes you need that interim storage. Even though SSIS is very good at managing an ETL data stream from source to final target, sometimes the interim step grants you better performance or more flexibility in processing.

The Source database is the landing spot for extracted data. The data in these tables is not expected to persist either. Each table in the Source database matches a data source object somewhere in the client's system, and is truncated and repopulated with each ETL run. With our clients, we try to use the concept of "Get In, Get Out and Get On With It" with respect to their source systems. The scheduled extract of the data is usually constrained by a number of things, including:

  • Status changes during the day that would affect reporting in an unexpected way
  • Source system batch processing that must complete in order to present the data correctly
  • Other system activity that consumes resources or locks source tables

We grab the data out of the source systems and park in in the Source database, after which we can work with it without worrying about the impact on the client's operations.

The Error database is the location for storing redirected data that would otherwise cause an insert process to fail. The tables in this database may be truncated and repopulated with each run (for data that will continue to fail until it's repaired at the source, for instance, and will continue to show up as an error until it's fixed). The data may also be allowed to persist in the Error database and be repaired and reloaded from the error tables (this is necessary for source data that cannot be recaptured in the same state from one ETL run to the next - inventory snapshot data, as an example).

Nearly every client installation requires the maintenance of some supplementary data that is not contained anywhere in the ERP system. (Can anyone say "Excel spreadsheets"?) The Lookup database is where we store persistent supplementary data. The data may be updated in its own SSIS package, but it's essentially reference data that's manually maintained by the client, in order to provide additional reporting richness to data mart tables.

The Archive database may or may not be used at any given client installation, depending on need. For instance, at one client site, we pass data back and forth from the data mart to a third-party service using the third party's defined fixed-width flat files. Troubleshooting issues with the files is much easier when we load the contents into database tables. The Archive database is used for non-data mart data only.

In the next post in this series, I'll go into more detail on the Audit database and its structures.

Tags: , , , ,

Data Warehousing

Cookie Cutter SSIS, Part 1 - “Data Mart in a Day”

by Ellen 29. June 2010 01:27

I've been working with SSIS since just before the RTM of SQL Server 2005. Over time, I've been able to steal tips and tricks from a number of sources (sessions at PASS Summits, notably those presented by Rushabh Mehta and Erik Veerman; Ralph Kimball, Joy Mundy and Warren Thornethwaite of the Kimball Group; SQLIS; Brian Knight and others that I apologize ahead of time for not mentioning). From all these sources, distilled by the practical work we've done with our own clients, I've evolved a standard set of patterns and practices that allow quick, efficient development for moderately sized data marts.cookie_cutters

I call this "Cookie Cutter SSIS."

The most extreme example of the use of these concepts is what I refer to as "Data Mart in a Day." Back in 2007, we had a prospective client who was interested in the Cognos (now IBM Cognos) business intelligence toolset, but wasn't certain how the tools would work for their organization. As a result, Perkins Consulting engaged to do a limited proof-of-concept project for this client, that started with the creation of a modest data mart (four dimensions, three base facts and an aggregate fact all with a relatively small number of rows) against which to deploy the Cognos reporting tools. Because the focus of the project was on the reports, we needed to get the data mart built and populated (along with ongoing maintenance) as quickly as possible.

Okay, okay. So the complete data mart was not literally finished in a day (we did some prep work prior to the onsite development day, added one of the facts and the aggregate after the first reporting pass and completed the data validation phase afterwards.) Nevertheless, this was PDQ - and the POC data mart remained in production with minimal downtime at this client site for at least six months.

I'd like to share some of these shortcuts that I continue to use to quickly deploy data mart objects. I'll say up front that the scope of the series is strictly mechanics - how to use the "cookie cutter" method of standard objects and templates to speed data mart ETL.

We're assuming that the data discovery and design phases have already occurred. We've got our data model; we know our data source options. Now we're ready to create our databases, build our tables and use SSIS to populate our data mart.

This methodology is predicated on several things:

  • A standard set of supporting databases, in addition to the data mart database itself
  • Uniform handling of data mart object metadata using SQL Server extended properties
  • Use of configuration files to enable data-driven dynamic Connection Managers
  • SSIS package templates, pre-configured with standard variables, containers and objects
  • Utility tables, functions and stored procedures
  • Custom SSIS components from community resources that extend SSIS functionality
  • A standard date dimension design and data source (Excel spreadsheet) that can be customized for specific client needs

In the next post, we'll talk about the standard databases that are the first step in the process.

Tags: , , , ,

Data Warehousing

Technoxenophobia

by Ellen 23. November 2009 18:02

 

“Klaatu barada nikto”

--The Day the Earth Stood Still

 

A few weeks ago, I posted a blog entry about the phenomenon I call the Cheshire Data Mart - a data mart that disappears from the perception of the end user whose only interaction with the data is through a presentation tool.

Today I want to talk about the opposite effect - the inclination of the user to distrust the data mart data in all instances when the data does not tie perfectly to the user's program of choice. I call this "technoxenophobia" - the fear of alien technology.alien_flying_saucer

I'm not talking about the need to validate the data mart loads - that's a necessary and understood process (at least from our perspective). I'm talking about the resistance that business users can experience when asked to work with data or tools that are outside their normal comfort zone.

None of our clients, to my knowledge, has ever had an ERP or OLTP system that perfectly matches their business. They're always forced to do some kind of work-around or to store supplemental data in odd corners (cough***Excel***cough) in order to meet their reporting needs. These "data cubbies" are not usually supported by a tight business process or (which would be even more preferable) enforced by the API of the OLTP system. The more manually-maintained and/or distributed these data cubbies are, the more likely it is one or more maintenance steps could be missed.

When we design and build a data mart, we try to incorporate all of these special cases and additional data, so that the data mart actually does align with the business's reporting expectations and requirements.

Since the data mart data is normally distributed to a broader business community via a business intelligence architecture than functionally specific OLTP applications (point-of-sale systems or accounting applications, for instance), any errors or omissions in the data cubby maintenance are exposed in this larger environment, frequently in a much more rapid life cycle than the users responsible for their manual maintenance expect. Additionally, the extended business community may know nothing about the supplemental data manually maintained by users outside their own sphere.

Result? Any presentation of unexpected data is blamed on the data mart, since it's the only new guy in town.

The data mart is the only place where all these disparate parts are brought together. The results can sometimes be startling for the end user, exposing data usage from different parts of the organization that can be either redundant or conflicting.

In my very earliest days in data mart development, I always accepted these accusations at face value and tried to find the errors in my code. I've learned over the years, however, that the first place to check is any source file that is heavily dependent on human intervention. The gradual exposure to the end users of these points of fragility in their own business systems is, in my opinion, one of the cool things about data mart implementation. The business community is given the opportunity to tighten their own procedures by observing the results of those procedures as defined by their own data.

The true secret to a successful data mart invasion is not conquest but self-knowledge and evolution. Not all aliens are hostile, after all.

Tags:

Data Warehousing | Business Intelligence

The Cheshire Data Mart

by Ellen 14. October 2009 23:14

'Well! I've often seen a cat without a grin,' thought Alice; 'but a grin without a cat! It's the most curious thing I ever saw in all my life!'

--Lewis Carroll, Alice's Adventures in Wonderland

 

Several years ago, I sat in a local Cognos user's group meeting, listening to one of our clients give a presentation about the way she used Cognos Impromptu for her extensive reporting requirements. She gave a good presentation. She discussed limitations of her company's AS400 source system reporting capabilities and how Impromptu expanded her options.

She never once mentioned the data mart I had built - from two separate AS400 ERP systems, Excel and other supplementary data - on which all of the Impromptu reporting was based.

She didn't know it was there.

If she did, she didn't consciously consider the data mart as an entity separate from the presentation tool . Impromptu was her only experience with the data - and the Impromptu catalog insulates the end user from the complexities of its source data. That's its purpose - and the business analyst in charge of maintaining the catalog did an excellent job getting Impromptu to live up to that purpose.

cheshire_cateResult? The Cheshire Data Mart - a data mart that was so successful, it disappeared from view entirely.

We had a good chuckle over this at our own expense. However, over time, we've found the same situation arising with other data mart projects. Since our clients tend to be mid-market companies, many of whom have small IT departments, the consumers of the data mart business intelligence data can be several layers removed from the actual implementation of the data mart.

If the only experience a user has with the data mart is as a consumer of output through some application or third-party reporting tool, that user is less likely to think of the data as something discrete. If they can see data in the UI, then the data must actually be in that UI right there. A part of it. Not something that has to be separately considered and maintained.

Unless you're a confirmed data junkie like me, the concept of data in the abstract is just one more yawn-inducing geekoid topic that causes your eyes to glaze over and the fight-or-flight reflex to kick in. (I know from the number of times I've tried - unsuccessfully - to explain my job to my children and friends.) The average business intelligence consumer may not prepared to think about the data decoupled from its presentation, and perhaps that's okay. But someone within the organization needs to remain cognizant of Cheshire Data Mart lurking out there so that future decisions about how the data mart should grow and evolve - and how business users can take advantage of the richness of data available in the mart for their use - can be made appropriately.

Tags:

Business Intelligence | Data Warehousing

Powered by BlogEngine.NET 1.5.0.7
Theme by Perkins Consulting Content Copyright 2009 Perkins Consulting, LLC All rights reserved.