*********************************************************** DSS News D. J. Power, Editor December 17, 2006 -- Vol. 7, No. 26 A Free Bi-Weekly Publication of DSSResources.COM approximately 1,900 Subscribers ************************************************************ Check the interview with Henry Morris "Decision support and analytic applications" at DSSResources.COM ************************************************************ Featured: * Ask Dan: How can organizations keep data-driven DSS and data warehouses current with changes in source systems? * DSS News Releases ************************************************************ Rule Developer Needed! Check http://decisionautomation.com/ads/requestinformationform.html ************************************************************ Ask Dan! How can organizations keep data-driven DSS and data warehouses current with changes in source systems? by Dan Power Editor, DSSResources.com In my Ask Dan column of October 22, 2006 (Vol. 7, No. 22), I answered some of the challenging questions that Aileen MacKay had sent me. In this Ask Dan!, I'll focus only on her system change questions. She asks "What is the process companies are using to keep their Data Warehouse up-to-date? Once the source system is changed, what is the process for moving these changes into the Data Warehouse and DSS? Once changes are made to the source system, how long does it usually take to make the changes in the data warehouse and DSS? How long should this process take if it is run well? Do you have examples of state- of-the-art-processes that work really well where managers are willing to share their process from the source system to DW to DSS? Specific to the health care insurance industry or medicaid? Thank you." In most data-driven decision support environments, multiple internal and external computerized systems provide source data that must be extracted, transformed and then loaded (ETL) into a data mart/warehouse (DSS data store). Routinely data is moved from source systems to a decision support data store. This process can be and usually is automated and routinized. But as businesses and organizations change processes, products and services the data models for operational systems change and because of changes in these source systems, the design of data warehouses and data marts used by data-driven DSS must also change. A design challenge occurs whenever it becomes necessary to make any changes to the data model in the source system. The data model defines what is stored in the source system, how it is stored, and how items in the database are related to each other. This source system data model impacts the data model in the decision support data store. Let's quickly look at some of the changes that can occur in operational system data models. Products or services may be added, product codes may change, fields may be added or removed, data types may change, and data values may be altered. The process of data transformation can deal with some of these changes, like simple changes in product identifiers. As more significant changes occur in source systems more changes must be made in the DSS data store. New fields or tables may be needed. In some cases the DSS data store may need to be transformed, manipulated or even radically redesigned. Waymire and Sawtell (2006) note a data transformation can be one of a number of "operations applied to your data before it is stored in the destination. This can include the calculation of new values, the concatenation of values, breaking of values like a name field into a first name and last name field, as well as any type of data cleaning you want to enforce." In general, data transformation involves processes for selecting, targeting, converting, and mapping data so that it may be used in a DSS data store. Data may be transformed using predefined rules or specific formulas or techniques. When change is contemplated in an operational source system, analysts also need to examine the ramifications for systems using the data from the operational system. Part of the analysis and design should involve mapping the data elements between the data models. IDEALLY implementing changes in a source system should coincide with implementing changes in the DSS data store and the data-driven DSS. A specific data-driven DSS may need to be changed to display new data or query the revised data model. Synchronizing changes between operational and external source systems and DSS data stores insures the quality of the data for decision support. Examining the mapping between the original and the new data model helps insure the DSS data is consistent, timely, well understood, and satisfies the needs of managers. The danger is that the two models will not be consistent and that users will be dissatisfied with the validity of the DSS data and the information derived from that data. In general, we must deal with both simple and complex transformations. In some instances we can consolidate data. For example, if we have two similar products and we eliminate one and create a new shopkeeping unit (SKU) for the other, we may want to consolidate the historical sales data for the two products. A more difficult transformation is required if we further differentiate a product or service. For example, we may have one academic major in a department and we replace it with two more differentiated majors with new codes. In this case major #23 may become majors #230 and #231. In this situation, we may allocate historical enrollment data between the two new majors. We may use an arbitrary allocation (fixed %) or a more complex rule-based allocation. The easiest transformation involves an addition or a replacement. For example, adding a new SKU as a data value or substituting a new SKU for an old SKU. Time lags in adjusting the operational data model and the DSS data store can create many unnecessary problems for managers and IT staff in reconciling results from diverse systems. Ralph Kimball identifies some "graceful modifications to existing fact and dimension tables". These include: new dimensional attributes, new types of measured facts, new dimensions, a dimension becomes specific or granular, an explicit hierarchy is added relating two dimensions, new dimensions are added from a new data source. Sadly many changes that result from changes in source systems are complex, messy and even awkward to make. In some situations, it seems easier to "start over" and not try to bring historical data forward to the revised data model and resulting DSS data store. This "blank slate" approach should be a last resort decision. Part of the administration of a DW/DSS data store should involve governance of the data model and those staff members who are maintaining the integrity of the decision support system should be monitoring potential changes in source systems. Also, a process for change management should be established as part of the ongoing decision support/BI administration. Data-driven DSS fail when source data is missing, inconsistent, or erroneous. Current, accurate data models are essential. Without them, data-driven decisions cannot be made with any confidence. At TDWI (tdwi.org), a research article on data transformation (1999) suggests both the magnitude of the problem and one technology solution: "as we embrace a greater number of sources and targets, we centralize all data transformation for the entire enterprise so that, effectively, the tool becomes the cornerstone of the corporate decision-making apparatus. This means a standard data transformation function, like data tracing, can be applied across the entire enterprise. This means aggregations and data analysis apply to all data, not just what is stored in a recognizable database. In all cases, data transformation has one goal: to guarantee accurate, clean information throughout the enterprise. To prevent garbage in, garbage out." The data warehouse/data-driven DSS team must continually monitor what data should go into the data store and where those particular pieces of information can be found. Some of the data will be internal to an organization and hence change is under management control. Source data extraction, selection, and transformation processes must be continually evaluated. According to a data warehousing introduction in the Minnesota State Archives, "Source data analysis and the efficient and accurate movement of source data into the warehouse environment are critical to the success of a data warehouse project. Good metadata is essential to the effective operation of a data warehouse and it is used in data acquisition/collection, data transformation, and data access. Acquisition metadata maps the translation of information from the operational system to the analytical system. This includes an extract history describing data origins, updates, algorithms used to summarize data, and frequency of extractions from operational systems. Transformation metadata includes a history of data transformations, changes in names, and other physical characteristics." Part of any change process is updating, revising and expanding the metadata for the DSS data store. The metadata helps users understand how the change in a source system impacted the historical record that is stored in the DSS data store. Let's finish Aileen's questions. How long should this process take if it is run well? In general, updating the DSS data store should be less time consuming than making changes in source systems. The length of the change process is a function of the magnitude and complexity of the change in the source system. At some point the change may become so great that the historical record is no longer relevant to the "new" data model. Do you have examples of state-of-the-art-processes that work really well? NO, perhaps some readers or vendors will provide a case study for DSSResources.com. For additional information, I suggest reading some of the other Ask Dan questions on data warehouses and data-driven DSS and then check and search the following web sites: 1. Business Intelligence Network http://www.b-eye-network.com/home/ B-eye-network delivers industry-based content hosted by domain experts and industry leaders. The Business Intelligence Network includes horizontal technology coverage from the most respected thought leaders in Business Intelligence, Business Performance Management, Data Warehousing and Data Quality. 2. Data Warehousing Information Center http://www.dwinfocenter.org/ This site aims to help visitors learn about data warehousing and decision support systems by providing links to vendors of tools for end-user data retrieval and analysis, tools for building and maintaining system infrastructure, and tools directed at specific industries and functions. 3. The Data Warehousing Institute http://www.tdwi.org The Data Warehousing Institute (TDWI) is a member-based organization whose goal is to educate decision-makers and information professionals on data warehousing strategies and technologies. This site serves as a clearinghouse for case studies, white papers, and data warehousing events and conferences worldwide. Membership includes a subscription to the quarterly Business Intelligence Journal. 4. DM Review http://www.dmreview.com/ The DM Review web site offers content from DM Review magazine as well as online-only articles, resource lists, discussion forums, issue archives, etc. DM Review focuses on data warehousing and related topics such as metadata management. 5. Inmon Associates, Inc. http://www.inmongif.com This is the official web site produced by Bill Inmon, one of the leaders in data warehousing. This web site focuses on Inmon's framework for government information systems, the Government Information Factory (GIF). 6. Ralph Kimball Associates http://www.rkimball.com/ Ralph Kimball is a leading proponent of the dimensional approach to designing large data warehouses. He teaches data warehousing design skills and helps selected clients with specific data warehouse designs. As always your comments, suggestions, examples, etc. are appreciated. References Kimball, R., "Design Tip #29 Graceful Modifications To Existing Fact and Dimension Tables," 2001, URL http://www.rkimball.com/html/designtipsPDF/DesignTips2001/ KimballDT29Graceful.pdf . Minnesota State Archives, "Data Warehouse Description," URL http://www.mnhs.org/preserve/records/dwintro.html. Power, D., Does data modeling differ for a Data-Driven DSS? DSS News, Vol. 2, No. 17, August 12, 2001. Power, D., Is a Data Warehouse a DSS? What is a star schema? How does a snowflake schema differ from a star schema? DSS News, Vol. 3, No. 4, February 17, 2002. Power, D., What is ETL software and how is it related to DSS? DSS News, Vol. 3, No. 16, August 4, 2002. Power, D., What do I need to know about Data Warehousing/OLAP? DSS News, Vol. 4, No. 5, March 02, 2003. Power, D.J., "What is the process for designing decision support software?" DSS News, Vol. 7, No. 17, August 13, 2006. Power, D.J., "What staffing is needed for DW/BI/DSS?" DSS News, Vol. 7, No. 22, October 22, 2006. Schwartz, J., "Complex Data Transformation," Line 56, Tuesday, March 14, 2006, URL http://www.line56.com/articles/default.asp?ArticleID=7412&pg=3&topicID=4 . TDWI, "Beyond the Data Mart: A New Look at Data Transformation," What Works: Volume 8, November 1999, URL http://www.tdwi.org/research/display.aspx?ID=5385 . Waymire, Richard and Rick Sawtell, Sams Teach Yourself Microsoft SQL Server 7 in 21 Days, 2006, URL URL http://www.samspublishing.com . ************************************************************ DSS News Releases - December 3 to December 17, 2006 Read them at DSSResources.COM and search the DSS News Archive 12/15/2006 Sun Microsystems helps retailers manage busy holiday season with secure SOA technology solutions. 12/14/2006 MapInfo® Location Intelligence keeps James River Insurance one step ahead of storms. 12/13/2006 Teradata Warehouse Miner 5.0 to optimize advanced analytics. 12/13/2006 Oracle delivers industry's broadest selection of integrated, embedded databases. 12/13/2006 Oracle ranked #1 in retail by leading industry magazine. 12/13/2006 Microsoft Robotics Studio now available to provide common development platform. 12/12/2006 BearingPoint 'goes to bat' for Sport Chalet to improve business processes in over 40 retail stores. 12/12/2006 Applix secures new TM1 customers with 64-bit technology. 12/12/2006 Intalio to release version of its business process management software under open source license. 12/11/2006 EDS' eight tech trends: What CXOs need to know for a happy IT in 2007. 12/11/2006 Appian Enterprise BPM Suite selected by Archstone-Smith to automate mission critical business processes. 12/11/2006 Polycom announces IP Multimedia subsystem strategy. 12/11/2006 The Principal Financial Group introduces new tools, making informed healthcare choices easier. 12/11/2006 SAS BI® to bolster economic development. 12/11/2006 University of Pittsburgh Health Sciences group manages research and laboratory space allocations using JReport. 12/07/2006 Naval Research Laboratory selects SGI Systems for large data retrieval capability technology demonstration. 12/06/2006 Website's lofty goal: data dashboard for a democratic society. 12/06/2006 Fair Isaac's new ScoreNet Borrower Portal meets critical need for construction lending industry. 12/06/2006 SAS® Learning edition brings power of data analytics to wider business, academic audiences. 12/06/2006 Oracle announces general availability of Oracle(R) Enterprise Single Sign-On Suite. 12/06/2006 Cisco selects India as site for the Cisco Globalization Center. 12/05/2006 Cognos positioned in 'Leaders' Quadrant in 2006 analyst report on Corporate Performance Management Suites. 12/05/2006 First segment of new Internet2 network goes live, NYSERNet becomes first connector. 12/05/2006 ProModel Corporation’s simulation software used by the 2006 RIST prize winning Center for Army Analysis Army Forces Generation Modeling Simulation study team. 12/05/2006 Plan4Demand introduces Demand Curve Analyzer(TM) supply chain tool. 12/05/2006 ILOG JViews creates enhanced displays for supervision and monitoring applications. 12/05/2006 Integrated performance management with PerformancePoint Server 2007. 12/04/2006 Stottler Henke launches SimVentive Toolkit for building "serious games". 12/04/2006 Launch of Thomson Clinical Xpert delivers essential medical information to physicians via PDAs. 12/04/2006 'Laptop lag' constrains mobile professionals. 12/04/2006 Hyperion enhances global partner program to meet demand for industry-specific BPM solutions. ************************************************************ Please tell your DSS friends about DSSResources.COM ************************************************************ DSS News is copyrighted (c) 2006 by D. J. Power. Please send your questions to daniel.power@dssresources.com |