Ten mistakes to avoid if your data warehouse is to deliver quality information

By Larry P. English
President and Principal of INFORMATION IMPACT International, Inc.
© 2002 INFORMATION IMPACT International

As the era of “second generation” data warehousing has arrived, organizations should be exploiting their information to provide effective business intelligence and to increase customer satisfaction through effective CRM (customer relationship management). But, alas, for many organizations second generation means scrapping their first data warehouse and starting over:

A manufacturing firm wasted $1 million on its data warehouse before it recognized the need for a quality data architecture and for data quality control in its data warehousing processes.
A major bank scrapped a $29 million data warehouse to start over from scratch. The reason? Failing to understand and avoid the pitfalls described here. The tragedy here was that three years ago they were warned that their data warehouse plan was flawed. But the lost opportunity caused by the nonquality data in the warehouse was the real tragedy.
I recently visited a government entity that threw away a $25 million data warehouse project that failed to deliver. They are now starting over, this time beginning with an information quality approach.

The tragedy is that these examples are not the exception-they are the norm! According to some data warehouse specialists, only eight percent (8%) of all data warehouses are successful after 3 years of operation, as measured by increase in new customers of the data warehouse and increase in new information types being propagated to support more decision support requirements.

The definition of a wise person is “someone who learns from the mistakes of others.” How are organizations learning from their own data warehousing mistakes, and what shall they learn from the mistakes of others?

This white paper is dedicated to those wise persons who can learn from others how to avoid the ten mistakes that, if made, virtually guarantee a failed - or seriously sub-optimized at best - data warehouse. Here’s the list:

Failing to understand the purpose of data warehousing and the types of business problems to be solved with data warehouse information
Failing to understand who are the real “customers” of the data warehouse
Assuming the source data is “OK” because the operational systems seem to work just fine
Not developing enterprise-focused information architecture-even if only developing a departmental data mart
Focusing on performance over information quality in data warehousing
Not solving the information quality problems at the source
Inappropriate "Ownership" of data correction/cleanup processes
Not developing effective audit and control processes for the data Extract, Correct, Transform and Load (ECTL) processes
Misuse of information quality software in the data warehousing processes
Failing to exploit this opportunity to "correct" some of the wrongs created by the previous 40 years of bad habits

Here's how to avoid making the same costly mistakes.

1. Failing to understand the purpose of data warehousing and the types of business problems to be solved with data warehouse information

The first mistake is failure to understand the problems that the organization must solve with the data warehouse. Here we use an encompassing definition of data warehouse to include operational data stores (ODS), aggregated data warehouses (DW) and data marts. (See the Glossary at the end of the article.)

Many organizations have sub-optimized their data marts by taking "departmental" or "functional" collections of data, aggregating them into "departmental" or "functional" data marts. This creates two problems: 1) it merely replicates any problems in design that existed in the source databases or files; and 2) it usually fails to be extendable to "integrate" other department or functional information that can provide an "enterprise" view of its operational health without considerable cost.

A variation of this mistake is to design the data warehouse to solve the problem of getting predominately operation reports of queries that should be produced from operational data, but cannot be for a variety of reasons. Those reasons include poor operational database design, lack of easy access to the operational database among others and poor quality data in the source databases. The objective to redesign a database to meet these operational needs is worthy. The use of a data warehouse to solve these problems (as a permanent solution) is highly problematic. All this does is: 1) make permanent the nonquality operational databases; 2) create information float of extracting, transforming and loading the data into the data warehouse or ODS (operational data store); and it fails to address and solve the problem at its root cause in the operational processes and databases. The correct strategy is to develop an enterprise-oriented "Operational Database" (ODB) as an architected operational database-not just an ODS to which operational systems can be migrated on an as-needed basis.

For a data warehouse to support strategic processes, you must design the data structure to:

Produce the key performance indicators or core business indicators easily, automatically, and without manual intervention, massaging or convolution.
Identify patterns of customer, product or other resource behavior that can be used to increase business effectiveness and exploit new opportunities
Answer questions beyond the capacity of operational databases
As an Operational Database, develop a sharable, integrated information architecture to migrate deficient applications and databases to a stable, flexible and reusable operational data environment. This objective enables you to eliminate the unnecessary and disparate redundancy of functionally driven data structures that create many of the information quality and integration issues.

2. Failing to understand who are the real "customers" of the data warehouse

Once you know the purpose of a data warehouse, you must understand the customers. One early mistake in data warehousing was the "build it and they will come" syndrome without considering who the "they" was. Without knowing the knowledge workers, or information customers, who have the need for information to make their decisions, you cannot know the customer expectations. Without knowing customer expectations, it is difficult-if not impossible-to design a data warehouse to meet them. Another mistake here was to build a data warehouse or data mart to support only one narrow set of customers. Many times the warehouse design is centered around a narrow set of customers, usually from one functional area. Designing the data warehouse around one small set of functionally oriented information views , virtually dooms the data warehouse to an inflexible design. That design will require even more costly modification to change, because of all of the new ECTL (Extract, Correct, Transform and Load) applications that must now be modified also.

The data warehouse must be implemented incrementally, but not myopically. In order to build a successful data warehouse you must:

Identify a business sponsor who has strategic information problems to solve or complex decisions to make.
Identify the subject(s) of information required to solve that problem.
Identify other strategic management, decision makers, and analytical knowledge workers who are stakeholders in the information subjects to be contained in the data warehouse. (PS: it is not just the original sponsor's data warehouse.)
Develop information views from the different information customer groups, even if the data warehouse or data mart will initially only support a subset of these information customers.
Design the data architecture for the data warehouse or data mart to meet needs of all information customers of the subject of information in the data warehouse. This ensures the data warehouse will be extensible without destructive modification as you add new data to support the decisions and queries of subsequent stakeholders.
Select a manageable set of data to implement with quality for the initial development project.

NOTE: The customer is not [just] the data warehouse sponsor, but all enterprise knowledge workers who have a tactical or strategic interest in the data being designed.

3. Assuming the source data is "OK" because the operational systems seem to work just fine

An insurance company was unpleasantly surprised with its data warehouse analysis of newly loaded claims data needed to analyze risk. The data showed a whopping 80% of its claims were paid for a diagnosis of "broken leg." One quick call revealed an all-too-common occurrence; the claims processors routinely-80% of the time-allowed the system to default to a diagnosis code of "broken leg." Why? It was faster than entering a code. Why did they need to enter it faster, rather than enter a correct code? They were paid for how many claims a day they processed. This is the root cause of many information quality problems.

The valid-but inaccurate-data value of "broken leg" was sufficient quality to pay a claim. But was the same data "quality" data for the process "analyze risk"? Absolutely not.

Message: operational data quality will be of sufficient, or "minimum" quality to support the operational processes within the performance measures of the information producers, but not necessarily of acceptable quality for other processes.

This mistake dramatically illustrates the absolute denial most organizations exhibit as to the extent of their information quality problems-the assumption that because the operational processes [seem] to perform properly, the same data must be okay to satisfy decision support processes.

Because most of today's applications and databases were designed to support only departmental, functional needs, those databases may contain "quality" data for those operational processes only.

There are many root causes, including, but not limited to:

Obsolete and single-purpose database designs that support a functional view by definition will not support general-purpose information requirements. Credit card applications that only "need to know" month and year of birth date will never support the touch point requirement to send out a birthday announcement.
"This is our data!-We use it the way we want to."
Performance measures reward for speed regardless of the cost of nonquality information on downstream information "customers" and processes.

Because of this you must analyze the state of quality of the source databases or files. Your information quality assessment should include:

Basic data analysis or profiling, such as completeness, value frequency distribution, outlier analysis of low occurrence values, reasonability analysis, such as consistency of salutation title and gender ("Ms." and "male").
Validity assessment, that means it is a valid value (as defined) and it conforms to all business rules (only females may have a medical procedure of "hysterectomy").
Accuracy assessment, that compares the data values to the real world objects or events. The accuracy assessment confirms that the value is the correct value (the data value "hysterectomy" procedure supposedly conducted was actually an "appendectomy").

While this last assessment is the most difficult, it should be performed for all A-priority and most B-priority data, i.e., the most important, even if only conducted on a small sample of 300 records. Why? One company found no invalid values for Marital-Status-Code, but discovered that 23.3% of those "valid" values were incorrect! Without an accuracy assessment, you may have a false sense of security as to the true state of quality of the data.

By conducting a quality assessment, you now know the state of quality. By conducting prototype data-correction activities, you can estimate the level of effort for corrective maintenance.

4. Not developing enterprise-focused information architecture-even if only developing a departmental data mart.

Developing an enterprise-focused information architecture does not mean you must develop a full enterprise-wide data model. But the model of the data subject must contain enterprise-wide data definition, so that it can be extended to support all information stakeholders-not just the departmental workers-in data that is of enterprise-wide interest. The database must also be designed in a way that it is extendable by "addition" of new tables and data elements as opposed to "destructive" change because the table design would have to be modified to support the extensions.

Data warehouses and even data marts are not functional databases. They must be designed to support decisions, cross-functional data analysis and cross-functional knowledge processes-those that need to see an integrated view of the enterprise data.

Data warehouses provide the most value when they solve business problems that cross organizational boundaries.

Failure to design the data warehouse as an "Enterprise" resource will sub-optimize its benefits. If the operational data structures are not well-defined with common, consensus definitions, simply aggregating current departmental data into a data warehouse will exacerbate the legacy data problems of the source. This "departmental" data warehouse will be valuable only to the one business area and will be difficult and expensive to extend to be useful on a broader perspective.

Modeling the data warehouse data model need not be an excessive time-consuming exercise. By bringing the right subject matter experts together with a charter of empowerment and a skilled facilitator can produce a consensus, enterprise-focused data model in no more time than is required to "gather" data requirements using traditional interview and consolidate approaches.

It is impossible to achieve an "accepted" enterprise data model without bringing key stakeholders who represent different functional views of the information together to hammer out a consensus definition of the most common and visible entity types and attributes. There are two approaches organizations make to this dilemma. The first is to say it is too hard and do not attempt it. This condemns the organization to live with a sub-optimal result. The second approach is to acknowledge it is hard-but it is required-and invest in the future of the enterprise and do it right.

5. Focusing on performance over information quality in data warehousing

As the name implies, the product of a data warehouse is "data." Actually the product is information derived from data that supports important decisions, identifies trends that lead to significant new opportunities, and identifies potential problems before they become enterprise-threatening problems. Information quality ("consistently meeting all knowledge workers and end-customers expectations") will always be the most important objective in the data warehouse product. Is performance and response time important? Yes, but not to the threat of enterprise failure if you sacrifice quality, and the resulting decisions cause the demise of the enterprise. What value is it if you get an answer to a query in five seconds instead of five hours if the result is wrong? If five-second response time is a real requirement, chances are good the data warehouse is solving the wrong problem, one that must be solved in the operational system. The worst-case result of this mistake is that the enterprise will make bad decisions faster than ever. The best-case scenario is that the very customers you seek to satisfy, will not be satisfied, will not use-or minimize their use of-the data warehouse, leaving you with credibility problems for the next project you "try" to propose. While response-time performance is a requirement, it must always be secondary to the reliability of the information results. There is an important difference in transaction performance when you are serving a customer, versus when you are analyzing data to support a decision. Timeliness of information delivery, or "response-time" performance must be balanced with the other information quality characteristics of completeness, accuracy, definition completeness and correctness, and presentation clarity and intuitiveness. The wrong answer to a query in superior response time is still a wrong answer. Never mind that it can cause lack of credibility in the data warehouse-the danger is it can threaten the success of the enterprise. The product of the data warehouse is, after all, "Information," information organized to support the strategic and tactical processes of the enterprise, such as trend analysis, Key Performance Indicators or Core Business Indicators, and decisions that impact the future of the enterprise.

6. Not solving the information quality problems at the source

Because the data warehouse tends to be an expensive project, the temptation is to cut the costs of development by cutting the wrong costs.

Many high-priced consultants advise you not to attempt to correct information quality problems at the source. They may be unaware of or uncaring about problems in the source or other value chain processes that are failing; or they may tell you it is "out of scope" to your data warehouse and will "lengthen" the project. They may also tell you that the politics of the organization will not let you get at the source data. If someone tells you this, you know they do not have the best interest of your enterprise at heart. Implementing this advice causes several problems:

It fails to solve the root cause of the problem, causing you to sub-optimize the effort in data correction. By not performing data correction at the source, you fail to get the benefits of the corrected data within the source system and the downstream systems that depend on the data from that source.
It creates a brand new quality problem-inconsistency. Data in the source is now inconsistent with data in the data warehouse. Queries from the two databases will not yield the same answers, straining the credibility of the information systems staff. One data warehouse developer considered the data warehouse a success if the business "trusted the data in the data warehouse more than the data in the source systems." The sad reality of this was that the organization would have been far better served if they had just corrected the data at the source. They would have been able to answer the same queries without all the expense of the data warehouse!!!
It can cause subsequent corruption to the data warehouse when uncorrected or duplicate data from the source gets propagated inadvertently in a subsequent extract and load process.
It increases the overall costs of information scrap and rework by "institutionalizing" the corrective maintenance, keeping you in the data "sanitation" business permanently, with the continuing costs of information scrap and rework and process failure caused by the defective data in the source databases. By managing data at the source you gain the benefits of:
Solving IQ problems at its true source
Leveraging and increasing the value of your data warehouse data cleansing effort by reducing the costs of information scrap and rework in the processes that continue to use data from the source databases
Reducing the ongoing costs of data correction (information scrap and rework) in the ongoing ECTL* processes
Increasing the value of the operational data, enabling trust in it, and the ability to solve more problems from the operational databases. In some cases this can prevent the need to build operational data stores, data warehouses or data marts in the first place because the business needs can now be met from the operational databases.

7. Inappropriate "Ownership" of data correction/cleanup processes

Often the data correction process is delegated to "take care of cleaning" the data. Management of the source information areas do not have time nor resources to correct it, nor do they have motivation to correct it since their processes "work just fine."

But software alone cannot "cleanse" the data unattended. Nor can IT personnel correct and "cleanse" data to an acceptable level for the downstream knowledge workers, the actual "customers" of the data." The data correction process must be a partnership between the managers of the source information business areas, the data warehouse and information quality staff, and the manager(s) of the downstream information customer or beneficiary business area(s). The managers of the source business areas are the actual process owners of the data correction process, but the requirements of the data correction activities come from the downstream information customer business areas.

Funding for the data correction may come from the beneficiary customer areas or from a one-time enterprise data correction fund. Where you find significant IQ problems, you should:

Improve the source processes to error proof them and prevent further nonquality information
Provide education and training in information quality to the managers and information producers, including an identification of who are the customers of the information, the processes and decisions using it, and the costs when the quality is deficient
Develop a procedure for all employees who discover errors to correct them directly or communicate the corrections to the process owner of the respective create processes

After the initial data correction activities and the above information quality principles are put in place, the source business area managers are held accountable for the quality of the information required by their downstream information customers. At this point, you may implement charge back mechanisms to charge back the data correction costs to the respective process owners. The quality principle from Kaizen is "make no defects, pass no defects on, and accept no defects."

8. Not developing effective audit and control processes for the data Extract, Correct, Transform and Load (ECTL) processes

The data acquisition, standardization, reengineering, correction, consolidation, enhancement, transformation, aggregation, and loading processes are processes that must be controlled just as any business processes. With the complexity of most environments, where there are multiple source files that must have data about common entity types such as customer, products, or business transactions from different business lines or geographies, there is even more importance on the ECTL processes. Failure to assure quality of the ECTL process is correction to the populated data warehouse, expensive scrap and rework to back out and reprocess the data, or to restore the data warehouse, including the loss of opportunity caused by the downtime and lack of access.

The principles of effective ECTL include:

Manage the base data in the source databases for quality. This minimizes the data correction required.
Define the controls for each step of the ECTL process and the handling of the data at each “hand-off”. Many audit controls are available within the ECTL software, but often, you must develop your own control methods for control of the content movement and transformation. For example, for financial data and quantitative data you may need to develop cross totals to assure any manipulation has been performed properly and no errors have been introduced.
Define control points that allow the application to shut down the job stream, should abnormal conditions occur in the data handling that may be able to cause a shutdown within the ECTL product(s).
Maintain an audit trail of all update, correction, consolidation and transformation activity to enable recovery should process or application failure occur.
Provide audits of the quality of the data and the nature of what records and / or data elements where modified along with the reason of change. While the knowledge workers who are the customers of the data warehouse may not need zero-defect data, they do need to know the state of reliability and nature of alterations from the source data, so they can develop a trust for the information and so they can factor the degree of problems such as missing, inaccurate, information, or missing records from the source systems into their decisions or analysis conclusions drawn.

9. Misuse of information quality software in the data warehousing processes

There are four fundamental misuses of information quality software tools in data warehousing and data migration. They are:

1. Not using IQ software when you should. Your goal is to minimize the costs of information scrap and rework. Automate as much of the corrective maintenance as is possible.

The human intervention to correct data is time consuming and expensive. To be sure it is required for many types of error correction. But seek for ways to automate the routine and correction processes where it is possible. Use your people processes to tackle and correct the data where it is not possible for software to perform it. Use software to:

Parse and analyze data to identify anomalies and patterns of use and error in the data.
Identify potential duplicates in not only person and organization data, but also in other data types, such as product and material.
Perform conversions from one domain (data value set) to another.
Perform computations and aggregations.
Perform correction activity routines that can be guaranteed with 100% accuracy.
Provide audit and control capabilities
Provide ECTL process functions that can be done faster than, as (or more) accurately than and less expensively than humans can.

2. Failing to keep "improving" the ECTL processes as your business grows.

Exploit the tools you have to their maximum capability. A major bank recently "improved the IQ software cleansing" processes that were implemented in 1992 and have been largely unchanged since them. They discovered many of the original business rules were no longer effective for matching and consolidating their customer data. New business rules they have introduced since then, and new types of customers were being overlooked in the data consolidation process. They enhanced the business rules for matching and consolidation, and increased the effectiveness of the matching processes.

Test the "standard" business rules implemented within the IQ software against your own data.
Run various analysis tests against your data to find additional correlation attributes and tests to augment or modify the standard software business rules.
Identify and implement new attributes about the entity types, such as customer and product that enable more effective matches, such as last 4 digits of social security number or driver's license number and historical addresses.
Reevaluate your business rule tests periodically to assure continued improvement.

3. Assuming that IQ software will "solve" all your IQ problems.

One organization "cleansed" their Customer address data with one of the cleansing software tools. They subsequently conducted a physical accuracy assessment and discovered a twelve percent (12%) error rate in that "cleansed" data. The result include customers who had moved, address numbers within a correct range of numbers for the street, but inaccurate, among others. The fact of the matter is that no software tool by itself can correct all data to a zero-defect level.

Please note that the message here is not to avoid the use of IQ software tools. The message is to know the limitations of the tools and to implement effective IQ processes that incorporate IQ (accuracy and customer satisfaction) assessment, process improvement, and transformation of the enterprise culture to a culture of quality of information products. The goal is to minimize the costs of low quality data and increase the opportunity gain as a result of high quality information.

Errors can be left uncorrected by the software-or even introduced by it. Such errors include:

Electronic cleansing can transform data according to known business rules, but the data may be valid without being correct.
Name and address software can correct an address, but the addressee may have moved without notifying the Postal authority or without their Change-of-Address form having been processed and propagated to the database used by the cleansing software. Find out from your Software provider the information currency (age of the data) of the NCOA data.
Name and address software can "correct" an address to a form that is not the customer's preferred address "form," alienating the very people with whom you desire a good "customer relationship." When the choice is Customer Lifetime Value versus saving a few cents by using postal standard address format, err on the side of your customer. Create and maintain a "Customer-Preferred-Address-Format-Code" attribute to indicate that the customer has requested a specific address format.
Software may have errors in its own data, due in many cases to data provided by third party information providers who do not have robust Information Quality processes or techniques applied to the information they provide or sell. The information quality of your cleansed or enhanced data can be no higher than the quality of the information used by your IQ tool provider to cleanse or enhance it. Find out what information quality improvement processes your information providers and their sources have implemented. Ask what level of information quality (accuracy, completeness and timeliness) they will certify.
Software that uses name data alone to derive the gender of a person can introduce errors for people whose names are names "normally" held by persons of the opposite sex. I used to work with a programmer, George A. Davis, who did not appreciate it when she received mail addressed to her as "Mr."
Manage the base data in the source databases for quality. This minimizes the data correction required in the ECTL processes When you use cleansing software:
Extract a statistical random sample and conduct a physical accuracy assessment to confirm the accuracy of the data result
Document the accuracy findings in your repository or data warehouse directory or on your Intranet, accessible to the data warehouse customers, so they are aware of the reliability of their data
Use your findings to communicate with your cleansing software provider to enable them to increase the effectiveness of their tools for future releases for both you and others. All who are in the business of Information Quality desire to improve our processes and need to have your feedback.

To derive maximum value from your IQ software tools, you must implement them in the context of a robust Information Quality Management framework with well-defined, effective IQ processes that focus on defect prevention and customer satisfaction. You must understand the strengths and weaknesses of your IQ software tools. Exploit the strengths, and put in place processes that mitigate against the inherent limitations in the software.

4. Not using your IQ software for defect-prevention once you have your batch cleansing/corrective maintenance processes in control.

The goal of information quality management is to design quality into the processes that produce or capture data at the source, maintain it, and move it when necessary in a controlled way so that the information "consistently meets [all] knowledge worker and end-customer expectations."

Remember that the money spent in data cleansing or correction is a cost of "information scrap and rework." If information is created, and there are processes in place to keep it maintained and current, there is no need for extensive data cleansing. It is cost-adding to fix something that was not captured correctly, maintained correctly as it changed or was corrupted in the process of moving it. Therefore, the most effective IQ software tool capability is defect prevention. Defect prevention is the capability to invoke routines, and apply tests to assure data is correct and non-duplicate within the applications that create and maintain data and that control required data movement processes.

Use information quality software effectively and efficiently to analyze, assess and correct or cleans data.

Exploit IQ software by implementing defect-prevention capabilities. The real value of IQ work is when we improve the processes to eliminate the cause of defective data at the source.
Know the limitations of your tools.
Design processes and quality techniques to overcome any limitations.

Information Quality Management requires implementing processes of quality assessment, process improvement, education, and transforming the performance measures to include customer satisfaction of downstream information customers.

10. Failing to exploit this opportunity to "correct" some of the wrongs created by the previous 40 years of bad habits

The data warehouse opportunity represents an opportunity for organizations to learn from their past mistakes. When developed properly, as described above as to how to avoid the mistakes that can cripple the data warehouse, squander valuable resources, and damage your credibility, you have a golden opportunity to reengineer the business processes and legacy applications to significantly reduce the waste and increase the value of information systems and services.

The data warehouse provides an opportunity of a way out of the legacy problems of the past. By building an Operational Database or ODB (not just a retrieve-only Operational Data Store (ODS)) designed to reengineer the operational data structures, you can begin to migrate source applications to a new architected and truly integrated operational data environment. This enables a phased approach to eliminate legacy applications as they come up for reengineering. For organizations with ERP systems that invariably do not contain all the data your organization needs to know about its important information types, the ODB is a way forward. The ODB is a mechanism that can be either an origination source for master data for an ERP system, or an extracted and accessible record-of-reference database in which additional attributes can be added to ERP data that enhances its value to other enterprise processes and applications.

Conclusion

Avoiding these mistakes not only reduces the risk of data warehouse failure, it increases the value of the information products provided. With that, there will be an increase in the effectiveness of business decisions that are supported by your data warehouse environment.

The vision for using an ODB as a basis for an integrated operational environment is described on pages 278-283 in Improving Data Warehouse and Business Information Quality, by Larry English, New York: John Wiley & Sons, 1999. The book further describes the six processes of TIQM that can significantly reduce the cost of information scrap and rework and increase the opportunity gain enabled from your data warehouse and operational information environments.

References

English, Larry P., Improving Data Warehouse and Business Information Quality, New York: John Wiley & Sons, 1999, p. 24.

Brief Glossary

Data Warehouse (DW): is a system of information technologies and a collection of data designed in a way to support the strategic, management and decisional processes of the enterprise. In other words, data designed to support the business in ways that operational databases cannot.

ECTL: Acronym for Extract, Correct, Transform and Load, the processes of propagating data from a source to a target database or data warehouse, that includes retrieving data from a source, correcting it (in the source), converting it to the proper format, aggregating it, and storing it or updating it into the target.

Knowledge Worker: 1. The role of individuals in which they use information in any form as part of their job function or in the course of performing a process, whether operational or strategic. Also referred to as an information consumer or customer. Accountable for work results created as a result of the use of information and for adhering to any policies governing the security, privacy, and confidentiality of the information used. 2. Coined by Peter Drucker no later than 1973, the term refers to business personnel or "workers" in the Information Age who use information in their work, as opposed to the term "user."

Operational Data Store (ODS): A database designed to become standard and enterprise sharable at a detailed level used to support operational activities of an enterprise. It is used as a means for reengineering of operational applications to eliminate obsolete legacy databases or files in a planned, controlled fashion.

About the Author

Larry P. English, President and Principal of INFORMATION IMPACT International, Inc. (www.infoimpact.com), is an internationally recognized speaker, teacher, consultant, and author in information and knowledge management and information quality improvement. He has provided consulting and education in 28 countries on five continents. Mr. English was featured as one of the "21 Voices for the 21st Century" in the January, 2000 issue of Quality Progress. DAMA awarded him the 1998 "Individual Achievement Award" for his contributions to the field of information resource management. He has chaired 10 Information Quality Conferences in the US and Europe.

Mr. English's methodology for information quality improvement-Total Information Quality Management (TIQM™)-has been implemented in several organizations worldwide. He writes the "Plain English on Data Quality" column in the DM Review. Mr. English's widely acclaimed book, Improving Data Warehouse and Business Information Quality, has been translated into Japanese by the first information services organization to win the Deming Prize for Quality. English provides consulting and education to organizations to help them implement effective IQ processes that reduce the costs of nonquality data and exploit the value of quality information.

Mr. English's mission is to provide guidance, consulting and education to organizations to help them increase their competitiveness and effectiveness by implementing information management and information quality processes that reduce the costs of nonquality data and exploit the value of their information assets. He can be reached at Larry.English@infoimpact.com.

Larry P. English provided permission to publish and store the article "Mistakes to Avoid if Your Data Warehouse is to Deliver Quality Information" at DSSResources.COM on Friday, July 26, 2002. The final article was received from Diane English on Friday, August 2, 2002. This formatted version of the article was posted at DSSResources.COM on August 11, 2002. Copyright © 2002 INFORMATION IMPACT International.

Citation

English, L. P., "Ten mistakes to avoid if your data warehouse is to deliver quality information", DSSResources.COM, 08/11/2002.