The classic system development lifecycle (SDLC) model for requirements gathering places an emphasis on formal interviews with subject matter experts who are in complete and conscious command of their needs. This approach does not work in data warehousing projects. The often incomplete, flawed and mischaracterized requirements gathered via this method are a leading cause of data warehousing projects that fail – usually from neglect – in production, as business analysts discover that the analytical system they expected to meet their every need meets few, or none, of those needs.
This commonly-acknowledged fact – combined with increasing pressure on data warehouse project teams to “start small,” and diligently quantify both project costs and ROI prior to embarking on large data warehouse or data mart projects – underscores the need for data warehousing practitioners to begin using rapid prototype-and-iterate (P&I) techniques to build functional working prototypes early in the data warehouse lifecycle (DWLC). This will allow them to manage more effectively both the hard (requirements, budget) and soft (user perception, internal politics) aspects of their complicated data warehousing projects.
Prototyping in the Data Warehouse Lifecycle (DWLC) Model
Although specific vocabularies vary from organization to organization, the data warehousing industry is in agreement that the data warehouse lifecycle (DWLC) model is fundamentally as described in the diagram below in Figure 1.
The model, which is a cycle rather than a serialized timeline, and which repeats every 12 to 18 months, consists of five major phases:
Design: the development, from both available data inventories and analyst requirements and analytical needs, of robust data models. Practically speaking, we believe that the best data warehousing practitioners today are working simultaneously from (a) the available data inventory in the organization and (b) the often incomplete expression of needs on the part of multiple heterogeneous analytical communities, making data warehouse and data mart design a complex and intricate process more akin to architecture and diplomacy than to traditional IT systems design. Key activities in this phase typically include end-user interview cycles, source system cataloguing, definition of key performance indicators and other critical business metrics, mapping of decision-making processes underlying information needs, and logical and physical schema design tasks, which feed the prototyping phase of the lifecycle model quite directly.
Prototype: the deployment, for a select group of opinion-makers and leading practitioners in the end-user analytical communities, of a populated, working model of a data warehouse or data mart design, suitable for actual use. The purpose of prototyping shifts, as the design team moves back and forth between design and prototype. In early prototypes, the primary objective is to constrain and in some cases reframe end-user requirements by showing opinion-leaders and heavyweight analysts in the end-user communities precisely what they had asked for in the previous iteration. As the gap between stated needs and actual needs closes over the course of 2 or more design-prototype iterations, the purpose of the prototype shifts toward diplomacy – gaining commitment to the project at hand from opinion leaders in the end-user communities to the design, and soliciting their assistance in gaining similar commitment.
Deploy: the formalization of a user-approved prototype for actual production use, including the development of documentation, training, operations & management (O&M) processes and the host of activities traditionally associated with enterprise IT system deployment. Deployment typically involves at least two separate deployments – the deployment of a prototype into a production-test environment, and the deployment of a stress-tested, performance-tested production configuration into an actual production environment. It is at this phase that the single most often neglected component of a successful production warehouse – documentation – stalls both the transition to operations and management personnel (who cannot manage what they cannot understand) and to end-user organizations (who have to be taught at some level about the data and metadata the warehouse or mart contains, prior to roll-out).
Operation: the day-to-day maintenance of the data warehouse or mart, the data delivery services and client tools that provide analysts with their access to warehouse and mart data, and the management of ongoing extraction, transformation and loading processes (from traditional drop-and-reload bulk updates to cutting edge near-real-time trickle-charging processes) that keep the warehouse or mart current with respect to the authoritative transactional source systems.
Enhancement: the modification of (a) physical technological componentry, (b) operations and management processes (including ETL regimes and scheduling) and (c) logical schema designs in response to changing business requirements. In cases where external business conditions change discontinuously, or organizations themselves undergo discontinuous changes (as in the case of asset sales, mergers and acquisitions), enhancement moves seamlessly back into fundamental design.
Seen properly, the model is actually a cycle of cycles, as identified in the diagram (Figure 2) below:
Of these cycles, the one most critical to ensuring user commitment, proper scoping, lower project risk and the optimal chance of a successful, widely-used data warehouse or mart, is the design-to-prototype cycle. In the most successful projects, this cycle is executed using rapid prototype-and-iterate (P&I) techniques that present key members of the user community with successively more accurate approximations of the end-state data warehouse or mart, securing their commitment to the project, while allowing data warehousing practitioners to gather data on project scope, project cost and dimensions of project risk.
The emphasis on prototyping in successful data warehousing projects springs from a generally-recognized but little-discussed principle shared by experienced data warehousing practitioners: formal requirements in a data warehousing project, even when formally stated by business users, are generally a waste of everyone’s time.
The reason for this apparently extreme statement is actually quite straightforward, and obvious to anyone who’s participated in the early stages of a data warehousing project gone wrong - requirements for a data warehouse are typically formulated either by:
In either case, the gap between stated requirements and what business analysts actually need to make a measurable impact on corporate performance is significant - so significant that 1 in 3 “failed” data warehousing projects fail not because poor technology decisions are made, or because technology is integrated poorly, but because end-users decline to use the warehouse or mart after it goes into production. The warehouse is therefore irrelevant.
All experienced practitioners are familiar with the signs of a neglected or ignored data warehouse: the low inbound query levels; the relegation of the warehouse to use by a small cadre of “report runners” who extract information from the warehouse or mart and extensively post-process or enrich that data before passing it on to the actual decision-makers, usually as a spreadsheet or PC database file; the increasingly tenuous attempts to reposition what was supposed to be a major piece of end-user-centered infrastructure as an IT convenience for data movement, data quality or “master data management”.
Breaking this cycle is surprisingly simple. All that is required is a two- to six-week prototyping cycle at the start of each warehouse or mart project, in which IT team leads present key members of the end-user constituencies with real, working prototypes of their warehouse or mart, populated with their data, and integrated with their query and reporting tools of choice. In face-to-face working sessions, the entire team reviews the prototype in operation, identifies major and minor design changes, and agrees on priorities for rework. This allows IT team members, often in a matter of hours or days, to reimplement the prototype and re-present it to the business users, confirming their requirements while at the same time building significant confidence in IT’s competency and capability.
Generally, experienced data warehousing prototypers identify six major reasons why prototyping is the key element of a successful data warehousing project:
Data Quality And Data Noise
Today, with the rise of master data management and other corporate data quality disciplines, the mentality that data quality is a problem to be solved as part of a data warehousing project is in decline, and the integration of data quality facilities into ETL processes is viewed - as we believe it should be - with increasing amounts of skepticism.
Nevertheless, during P&I sessions data warehousing practitioners frequently uncover dirty data - often where data is presumed to be neat and clean - and still struggle with the normal, but dangerous, tendency to want to solve data quality problems as part of the data warehousing project.
Experienced data warehouse practitioners report that a few simple rules are usually sufficient to keep data warehousing prototypes from veering into master data management or data governance areas:
Clarifying project objectives with project sponsors is of course critical. In the view of experienced data warehouse practitioners, projects are better used to highlight issues with data quality than they are to solve those issues, and knowing whether project sponsors intend to spend project budget to resolve transactional source system hygiene - as well as knowing whether end-users are prepared to live with ‘noisy data’ while those source system problems are fixed - is critically important to overall project success.
Natural Boundaries in Data Warehouse Prototyping
Practitioners familiar with RAD, JAD, agile programming and other development methodologies that emphasize prototype-and-iterate techniques for producing better designs and getting user constituency commitment do not typically need education on why P&I is a good thing to do: they already know that.
What is perhaps not so obvious to experienced prototypers is that prototyping in data warehousing environments is constrained, fundamentally, in a way that prototyping in a transactional application development project is not. Whereas, in a typical transactional application, the prototyping process can produce functionality that covers a very broad area - what begins as a simple general ledger application can transform itself, through P&I, into a comprehensive spend management or supplier orchestration application - data warehouse prototyping is always constrained, first and finally, by available data in the source systems targeted for extraction. You cannot prototype for data elements you cannot actually source.
In some cases, this constraint produces strange behavior on the part of IT professionals and end-users alike - complex dimensional schema, with sparse population, as teams give in to “If we could just get this bit of data, and that bit of data, we’d really be able to do something valuable here.” In practice, it is hard - in the context of a data warehousing or data marting project - to obtain new data elements not already in the company’s transactional repertoire.
Generally, IT practitioners interested in steering a project through to conclusion will emphasize data availability to end-users during the P&I cycle. In fact, producing a prototype based solely on available source data prior to the first face-to-face P&I workshop is useful in part because it makes tangible to end-users just what data is in fact available for their analytical purposes.
If P&I cycles produce a prototype that is mostly unpopulated due to lack of data (rather than due to poor quality data, which is another problem entirely), prudent IT staff will tend to declare the project a failure for data availability reasons, immediately propose a project to capture the data elements that the P&I activities clearly indicate are required to meet end-user analytical requirements, and schedule a restart of the data warehousing project for some time after the conclusion of the data capture project.
When Should You Prototype?
Prototype-and-iterate methodologies should, we believe, be incorporated into every data warehousing or data marting project practitioners undertake, whether the project is a new, from-scratch data warehouse or data mart, or an extension of an existing system.
Particularly, prototype-and-iterate methodologies can materially impact project success when:
Prototyping in a Normalized Environment
We often hear objections to the use of prototyping in normalized environments, where warehouse schema are normal-form affairs that are often purchased as templates or complete schema from vendors or consultants. Why do I need to prototype, the question is often asked, if my target schema is already complete?
Prototyping is just as important an activity in normalized environments as it is in classical dimensional modeling situations.
First of all, many of the benefits associated with prototyping - including those associated with governance, project scope and end-user commitment - have nothing whatsoever to do with target schema design: they have to do with the benefits of following a prototyping process that involves users in the design of the user experience.
Secondly, the availability of a complete normal-form data warehouse schema as the target for ETL tools does not, in a normalized environment, complete the mapping between source data and end-user analytical needs. Nearly every normalized data warehouse presents a dimensional view of its normalized data to end-users, either directly or through a modeling or mapping layer in the BI tools employed by end-users, and that dimensional “mapping layer” between the normalized warehouse schema and end-users’ analytical environments should be modeled, iteratively and with user participation, just like any other dimensional model.
How Should You Prototype?
Prototypes are not toys - they are functional approximations of working data warehouses, using production-grade technology, that demonstrate the value of the system-under-design, and the benefits of that system, to users in a hands-on, user-experience centered way.
Best practices with respect to prototyping exercises include:
The Prototype Trap
DSS analyst Phil Russom has commented on more than one occasion that prototyping has sharp edges. Russom points out that, in practice:
This is definitely a risk in traditional ‘prototyping’ projects. WhereScape’s Live Prototyping(tm) approach obviates this risk almost completely, by driving the P&I process with the same technologies that will be used for the production warehouse or mart. Real ETL, real databases, real client interfaces, real data - such prototyping ensures that everything about the prototype is capable of being made production-ready in a matter of days.
P&I is the time to take calculated risks - to expose users to possible functionality and analytical capabilities, to push the what-if boundary, to try new and intrinsically useful things, to enjoy the process of shaping end-user needs and expectations.
Finally, many of our practitioners have indicated that they think prototyping has what we’ve come to call a “fail fast” value. These practitioners claim that rigorously implementing a P&I discipline in data warehousing projects allow them to determine, within days or weeks, what projects can never come to fruition - because end-user expectations are unreasonable, because required data elements are simply unavailable, because the gap between expectations for functionality, and available budget is too wide, and so forth. P&I methods, these practitioners claim, are not only excellent at outlining what is possible, but are also equally facile at demonstrating what is not possible, and getting all constituents to agree either to change scope and expectations, change budget, or abandon the project before significant time and cost has been sunk.
Satisficing And Treaty-Brokering: The Wicked Requirements Problem
Pragmatically speaking, one of the greatest benefits of using P&I techniques in data warehousing projects is the satisficing and treaty-brokering benefits that such techniques produce - for IT and for end-user organizations.
Particularly in situations where project scope is specified without much regard to implementability - situations in which 90% of the project’s objectives can be met with 50% of the project’s total person-effort, and the remaining 10% can only be met with great difficulty or heroic endeavor - prototyping sessions have a remarkable ability to illustrate the presence, in the project scope, of a requirement that cannot be conveniently - or even inconveniently - met. In conventional waterfall-methodology projects, debate over this single wicked requirement tends to become heated - neither side is in possession of much information about how difficult a particular requirement is to implement, or how much value fulfilling that requirement actually adds to the project. The debate becomes, for lack of a better term, rhetorical.
In P&I-driven projects, by contrast, the end-user community sees quickly all the easily-obtainable value the project can deliver before the debate over the wicked requirement begins, and is therefore in a better position to evaluate options: including the option of pushing the wicked requirement to a later phase of the project, and the option of abandoning the wicked requirement all together. Often what are fiercely-defended “requirements” in waterfall-style projects become “nice but not necessary” requirements in P&I-driven projects.
In a nutshell, P&I methodologies promote satisficing and agreement, whereas waterfall methodologies tend to promote standoffs.
By bringing wicked requirements to the fore early in the process, P&I methodologies focus data warehouse designers and end users on the pertinent problems early, and promote problem-solving. Jointly, IT staff and end-users decide whether scarce resources should be focused on addressing wicked requirements, or on addressing the bulk of the project’s requirements, which are more soluble.
Similarly, IT staff often become wedded to technology requirements or objectives that are nominally part of a project but are not in fact required to fulfill the analytical requirements of the end-user community. P&I cycles, by building end-user expectations, tend to draw end-users into the project and their advocacy helps IT staff make appropriate technology trade-offs.
Prototype-and-iterate methodologies and tools are the best - and indeed the only known - means of:
Given that even experienced data warehousing practitioners still struggle with complex projects that fail to meet business expectations, prototype-and-iterate methods and tools should become a part of every organization’s data warehouse project methodological toolset.
 Even at this late date, the ETL process still forms the center of IT professionals’ thinking about data warehousing and data marting projects. In the late 1980s and early 1990s, it was indeed the case that much of the data theoretically of interest to user constituencies was extremely difficult to locate, extract and load - there was a reason, at that point in the development of the profession, for practitioners to fixate on ETL. These days, this is the exception, not the norm. Almost any data - from inside or outside the firm - is readily available in merchant DBMSs, or in forms easily loaded into merchant DBMSs, and in general data quality and commensurability (direct comparability) is higher than at any previous time in the discipline’s history. Yet far too many IT organizations begin and end their technology investigations with ETL tool qualification and selection -- as though back-end plumbing could in itself deliver any business value whatsoever. One of our collective challenges, in the next few years, will be decentering ETL in our designs and projects, in order to focus on warehouse design and end-user information delivery as the actual sources of business benefit.
 Often, this second boundary -- what the company’s data does not tell us - occasions the desire on the part of IT staff and end-users alike to bypass dysfunctional transaction processing systems and processes, and source data from elsewhere. When that data is external to the organization -- demographic data or syndicated data from industry consolidators, for example - that instinct is usually sound. But when that data is manufactured, for the warehouse or mart, by bypassing the transactional system that ought to be the authoritative source for the data elements, the project team is effectively enabling dysfunction - allowing a low-quality transactional system, or a broken business process, to persist.
 Recent surveys indicate that about 44% of scoped warehouse and mart projects include time and cost “slop” - buffers, often substantial, built in to deal with the uncharacterized complexities of the project. This can represent a substantial amount of frozen cash and human resources - all of which are freed up for other purposes when P&I methods are used to scope a warehouse or mart project.
About the Author
Marc Demarest is the CEO of Noumenal, Inc., a private consultancy focused on business turn-around and go-to-market services for intellectual property-based companies in technical and commercial markets. Prior to Noumenal, Marc was Chairman and CEO of DecisionPoint Applications, Inc., the industry's leading provider of packaged data warehousing software for financial, human resources, manufacturing and distribution analysis. Marc has also held executive positions with The Sales Consultancy, Inc., a high-technology sales and marketing consulting firm based in Dallas, Texas and the United Kingdom, and Sequent Computer Systems, Inc. in Portland, Oregon, where he led the development of worldwide sales and marketing process development and created an APQC award-winning knowledge management technology environment. Widely known as the father of data marting, Marc publishes and speaks regularly on topics ranging from knowledge management and decision support systems to IT technology futures and information ethics. He holds BA degrees in Political Science and English Literature from Bucknell University in Lewisburg, PA., an MA from the University of South Carolina, and is a graduate of the Stanford Institute's high-technology business management program.
WhereScape is the data warehousing industry’s leading independent supplier of data warehouse lifecycle management tools. WhereScape 24, our rapid prototype-and-iterate (P&I) tool, is available free-of-charge for individual designers from our web site, www.wherescape.com. WhereScape RED, our flagship product, is the only comprehensive Integrated Development Environment for data warehousing that supports the entire data warehouse management life cycle for all major merchant DBMSs - Oracle, SQL Server, DB2 and Teradata - integrating source system exploration, schema design, metadata management, warehouse scheduling and enhancement into a single, simple integrated design facility. WhereScape RED users build, prototype and enhance data warehouses 10-100 times faster than their colleagues who use traditional disconnected toolsets. WhereScape RED’s Live Prototyping™ facilities allow small teams of designers to prototype data warehouses, in dimensional or normalized form, and work with business analysts in real time using design choices populated with real source data. WhereScape RED’s scheduling and data warehouse management tools can transition tested designs into production in a few hours or days, with complete end-user and administrator documentation. And the WhereScape RED Repository™ – our world-class data warehouse metadata repository – provides complete and transparent design and operational metadata, with version control, allowing administrators to enhance in-production warehouse, roll back design changes to previous states, and automatically produce relevant data warehouse documentation. WhereScape maintains sales and support offices in the US, Europe and Asia Pacific.
Demarest, M., "Data Warehouse Prototyping: Reducing Risk, Securing Commitment and Improving Project Governance m", DSSResources.COM, 01/25/2008.
Marc Demarest and Michael Whitehead provided permission to archive and feature this article at DSSResources.COM on November 13, 2007. This article was posted at DSSResources.COM on January 25, 2008. Copyright© 2000-2007 by WhereScape Software. All rights reserved. This document may be distributed in its entirety with all elements retained in their original form without permission, but may not be excerpted without the explicit written permission of WhereScape Software. WhereScape® is a registered trademark of WhereScape Software. WhereScape24, WhereScape RED, WhereScape Administrator, RED Repository, Live Prototyping, Live Metadata, Rapid Deployment, Closed Loop Enhancement, Pragmatic Data Warehousing and Pragmatic Data Warehousing Methodology are trademarks of WhereScape USA, Inc. All other trade and service marks are property of their respective holders.