The Politics of Data Warehousing
By Daniel PetersonWarehouse Architect with Greenbrier & Russel, Inc.
If you want to immerse yourself in corporate politics, enter the high-pressure world of data warehousing. Why? First of all, a warehousing project can be very expensive, often in the seven-digit range depending on platform, volume, and functional requirements. Consequently, budgets of this scope are approved only by senior management, often at the VP level and above. Secondly, there are numerous data issues to deal with. By definition, a warehouse is the fully integrated combination of several independent operational systems. Putting the spotlight on data quality issues in an attempt to integrate these systems can cause tempers to flare and e-mails to fly. And remember, there may be millions of dollars and a vice-president's reputation riding on the outcome. When it comes to corporate politics, you can't eliminate, but you can alleviate. Let's look at some practical ways to do just that.Fully Documented Requirements and Specifications
Albeit cliché, the discipline of documentation as applied to data warehousing becomes even more crucial to the success of the project. Detailed requirements should be gathered prior to any major development taking place. The crux of the documentation needs to be the data transformation diagram. This diagram should fully explain all mappings and business rules, source to target, down to the column level. Never trust prior documentation of the source systems. Always make comparisons to actual data samples. Source system documentation will rarely point out data quality problems, even the most obvious. That will be your job. Political arguments may arise, but will be less charged when you're 10 percent into development rather than 80 or 90 percent. Publish documentation to the corporate intranet after it has been thoroughly reviewed to make it more accessible to all parties involved.Regularly Scheduled Data Definition Meetings
Thorough documentation is necessary, but any IS professional can attest to the fact that it more often than not goes largely unread, at least not in detail. Publishing to the web will increase the chance that your prized documentation, gathered through the blood and sweat and tears of you and your staff, will be perused by expert eyes (but not by much.) Let's face it, data flow is hardly the most interesting of topics. For this reason, you need to pull in the key players from each source system for regularly scheduled status meetings where key data issues will be hashed out and discussed. Again, the key here is to uncover data quality or integration issues as soon as they are discovered, before things get out of hand.Assign data quality roles
It is crucial to assign the necessary warehousing roles to participants. One individual can take on several roles, but all roles must be covered. One such role is that of the data quality specialist. This is the person in charge of overall data quality, usually the person who knows the data the best. If you are in a short-term consulting position, THIS IS NOT YOU! Usually, this will be an individual who has been working with the operational data for quite some time, preferably the majority of systems being sourced. In addition, a subject matter expert from each operational system should be designated.Provide Multiple Views
Present the data the way the end user is used to seeing it, as well as the way you'd like them to start seeing it. Suppose that you're designing a marketing data mart which contains sales figures by rep, rolled up to sales manager. Analysts are used to seeing the data in the operational system in a strictly historic view. This means that one would expect to see the exact sales in October for a particular manager, according to the exact sales reps that he supervised. But suppose in this particular organization, sales reps are constantly being assigned and reassigned to different managers. For trending purposes, you notice that it would be beneficial to look back in time according to a consistent sales rep to manager relationship. In other words, how would a manager's sales look if he were supervising the same reps for the past year? Both views to the data are beneficial. Instead of forcing the issue, provide both and let the users decide. Furthermore, incorporating the views into the same data mart will allow for side by side comparisons in the same report.Easily Accessible Metadata
To provide accurate metadata is to portray a policy of complete openness into your system. The system itself should scream, "I've got nothing to hide." Put each transformation that occurred and each business rule applied at the user's fingertips via the web. That way there is no question as to what a number represents, and how it got there. Training along these lines is also beneficial. In addition, metadata should include up to date audit information, tying directly to audits from each source system. In some cases, when the timing of feeds is a factor, it becomes appropriate to set margin of error thresholds on audits, and publish these as well. Notification to the designated end-users via e-mail or pager when these thresholds are reached will assure that bad data is not utilized.
Steps taken to become an effective mediator of political corporate struggles can be summarized in two ways. First, provide a completely open system, fully documenting and explaining as you develop, and as you deliver. Second, be flexible with the way you present the information, and let the user determine what is valuable to them and what is not. In doing so, you will meet resistance to change with gentle nudging, rather than forceful prodding. These methods will certainly not eliminate problems, but will be useful in removing some of the sting along the way.
Daniel Peterson is a warehouse architect with Greenbrier & Russel, Inc. He specializes in large-scale data warehouse implementations and he can be reached at firstname.lastname@example.org. This article originally appeared in the Greenbrier and Russel Observer Newsletter, June 1999, www.gr.com/new/politics.asp.
Heather Swanson, Communications Coordinator, Greenbrier and Russel, Inc., provided permission to use this article at DSSResources.COM on Thursday, December 20, 2001. For more information check http://www.gr.com/. Founded in 1984, Greenbrier and Russel is a consulting and training company. This article was posted at DSSResources.COM on December 20, 2001.