*********************************************************** DSS News D. J. Power, Editor April 9, 2006 -- Vol. 7, No. 8 A Free Bi-Weekly Publication of DSSResources.COM approximately 1,600 Subscribers ************************************************************ "Decision Support for Global Enterprises" Check ICDSS2007.org ************************************************************ Featured: * Ask Dan! - Is parallel database technology needed for data-driven DSS? * DSS Conferences * DSS News Releases ************************************************************ Check the case by Mike Tully, "E-Docs Asset GIS: Washington County, Iowa" at DSSResources.COM ************************************************************ Ask Dan! by Dan Power Editor, DSSResources.COM Is parallel database technology needed for data-driven DSS? Few managers have heard of parallel computer processing and parallel databases. Nonetheless the desire of managers for more and better historical data is increasing the need for such capabilities. The call for papers for the ACM Eighth International Workshop on Data Warehousing and OLAP states that "Data Warehouse (DW) and Online Analytical Processing (OLAP) technologies are the core of current Decision Support Systems. ... Research in data warehousing and OLAP has produced important technologies for the design, management and use of information systems for decision support." Norman and Thanisch with Bloor Research Group argue the future of commercial databases "is bound up with the ability of databases to exploit hardware platforms that provide multiple CPUs." They also note "There is a tremendous amount of confusion in the market over parallel database technology, among both customers and the vendors. The root of this is a general lack of understanding of the technical issues on both sides. Although the workings of a parallel database are more complex than an ordinary database, understanding it requires more of a change in mind set than an astronomically high IQ. Unfortunately, few decision makers have much understanding of parallel database, and consequently, it is open season for database and hardware marketing people to confuse the market with technical mumbo-jumbo, that they don't fully understand themselves." Parallel database technology makes it possible to process very large databases for data-driven decision support. What is the history of parallel database technology? According to James Gray, "During the 1970s there was great enthusiasm for database machines -- special-purpose computers that would be much faster than general-purpose systems running conventional databases. The problem was that general purpose systems were improving at 50% per year, so it was difficult for customized systems to compete with them. By 1980, most researchers recognized the futility of special-purpose approaches, and the database machine community switched to research on using arrays of general purpose processors and disks to process data in parallel. The University of Wisconsin was home to the major proponents of this idea in the US. Funded by the government and industry, they built a parallel database machine called Gamma. That system produced ideas and a generation of students who went on to staff all the database vendors. Today the parallel systems from IBM, Tandem, Oracle, Informix, Sybase, and AT&T all have a direct lineage from the Wisconsin research on parallel database systems. The use of parallel database systems for data mining is the fastest-growing component of the database server industry." Also, Gray notes "projects at UCLA gave rise to Teradata." Today NCR Teradata (www.teradata.com) is the premier vendor of parallel database software. Most of my experience with parallel database software occurred at NCR Teradata Partners conferences. The current NCR Massively Parallel Processing (MPP) platform is designed to run the Teradata Database software efficiently for data warehousing and decision support. What is parallel database processing? This question is challenging to answer for a broad audience. I'll try to limit the buzz words and technical jargon. I'll also emphasize 2 nontechnical examples. Let's start with a simple generalization. Parallel processing divides a computing task into smaller tasks that can be processed independently. Hence, the larger task is completed more quickly. Parallel relational database systems store data that is spread across many storage disks and accessed by many processing units. Whatis.com states massively parallel processing "is the coordinated processing of a program by multiple processors that work on different parts of the program, with each processor using its own operating system and memory." A Teradata Warehouse technical overview includes the following example of parallel processing: "Imagine that you were handed a shuffled stack of playing cards and were not allowed to scan the cards beforehand. Then you were asked a simple question, 'How many aces are in the stack?' The only way to get the answer would be to scan the entire deck of cards. Now imagine that the same cards were distributed among four people, each receiving one-fourth of the cards. The time required to answer this same query is now reduced by four times. Each person would simply have to scan their cards, and the four totals would be aggregated for the correct answer. In this simple example, we can refer to these four people as parallelized units of work. As you can see, more available parallelized units of work will result in faster query processing. The larger the data volume and the more complex the queries, the bigger the payoff from using parallel processing. It’s also important to note that the most efficient way to distribute the playing cards (or data) is to distribute them evenly among the four people (or parallelized units of work)." Mahapatra and Mishra provide another example: "Your local grocery store provides a good, real-life analogy to parallel processing. Your grocer must collect money from customers for the groceries they purchase. He could install just one checkout stand, with one cash register, and force everyone to go through the same line. However, the line would move slowly, people would get fidgety, and some would go elsewhere to shop. To speed up the process, your grocer doubtless uses several checkout stands, each with a cash register of its own. This is parallel processing at work. Instead of checking out one customer at a time, your grocer can now handle several at a time." So imagine many relational databases linked together where each database has the same data organization and individual questions are simultaneously asked of all the databases and individual answers are then summarized. Is parallel database technology critical to the future success of data-driven DSS? YES. According to Todd Walter of NCR Teradata, three issues are driving the increasing use of parallel processing in database environments: the need for increased speed or performance for large databases, the need for scalability and the need for high availability. Finally, Mahapatra and Mishra (2000) conclude "Intra-query parallelism is very beneficial in decision support system (DSS) applications, which often have complex, long-running queries. As DSS have become more widely used, database vendors have been increasing their support for intra-query parallelism." In general, parallel processing is necessary to provide timely results from complex, decision support database queries needed by managers in data intensive organizations. References Abdelguerfi, M. and K. Wong, Parallel Database Techniques, Wiley-IEEE Computer Society Press, July 1998. Barney, B., "Introduction to Parallel Computing," Livermore Computing, URL http://www.llnl.gov/computing/tutorials/parallel_comp/ DeWitt, D. J. and J. Gray, "Parallel Database Systems: The Future of High Performance Database Processing", Communications of the ACM, Vol. 36, No. 6, June 1992, http://www.cs.wisc.edu/~dewitt/includes/paralleldb/cacm.pdf Gray, J. N., "Database Systems: A Textbook Case of Research Paying Off," 1997, URL http://www.cs.washington.edu/homes/lazowska/cra/database.html Mahapatra, T. and S. Mishra, "Oracle Parallel Processing," O'Reilly, 2000, URL http://www.oreilly.com/catalog/oraclepp/chapter/ch01.html Norman, M. G. and P. Thanisch, "Parallel Database Technology: An Evaluation and Comparison of Scalable Systems," Bloor Research Group, URL http://www.dpu.se/blopdt_e.html Teradata Warehouse Technical Overview: Teradata Pioneered Data Warehousing, EB-3025, September 2005, URL http://www.teradata.com/t/pdf.aspx?a=83673&b=84876 . Walter, T., "Scalability, Performance, Availability," Teradata Magazine Online, URL http://www.teradata.com/t/go.aspx/index.html?id=115886 ************************************************************ Purchase Dan Power's DSS FAQ book 83 frequently asked questions about computerized DSS http://dssresources.com/dssbookstore/power2005.html ************************************************************ DSS Conferences 1. Crystal Ball User Conference, May 1-3, 2006 at the Westin Tabor Center in Denver, Colorado. Check http://crystalball.com/cbuc/index.html . 2. ISCRAM2006, the Third International Conference on Information Systems for Crisis Response and Management, Newark, New Jersey, USA, at the New Jersey Institute of Technology from May 14-17, 2006. Check http://www.iscram.org . 3. ICKEDS 2006, the Second International Conference on Knowledge Engineering and Decision Support, Lisbon, Portugal, May 9-12, 2006. Check http://www.gecad.isep.ipp.pt/ICKEDS06/ . 4. CIDMDS 2006, International Conference on Creativity and Innovation in Decision Making and Decision Support sponsored by IFIP WG 8.3, June 28th - July 1st 2006, London, UK. Check http://www.ifip-dss.org/ . 5. DEXA 2006, 17th International Conference on Database and Expert Systems Applications, September 4-8, 2006, Krakow, Poland. Check http://www.dexa.org . 6. ICDSS 2007, 9th International Conference on DSS, Jan. 2-4, 2007, Calcutta, India. Theme: Decision Support for Global Enterprises. Check http://www.ICDSS2007.org . Papers due May 10, 2006. ************************************************************ Visit DSSResources.com; Support our advertisers Advertise here! ************************************************************ DSS News Releases - March 25, 2006 to April 8, 2006 Read them at DSSResources.COM and search the DSS News Archive 04/06/2006 Bank of America strengthens leading position in global treasury management with expansion of sales team. 04/06/2006 Teradata price optimization software proves sales lift to Bottega Verde. 04/06/2006 The Canal+ Group consolidates its decision-making system with a Teradata Enterprise Data Warehouse. 04/05/2006 U.S. businesses show high interest in converged voice services to improve efficiency, productivity and cost savings. 04/04/2006 Applix offers satisfaction upgrade to Cognos customers. 04/03/2006 FBI awards contract to deploy agency-wide investigative analysis software. 04/03/2006 Iceland Foods expands its Teradata® Warehouse™ to optimize decision-support capabilities. 03/30/2006 New collaborative partnerships are the future of supply chain management. 03/29/2006 Cognos scores leading position in performance management market survey. 03/29/2006 Internet adoption slowing - but dependence on it continues to grow. 03/27/2006 Go digital: toss pin boards and flipcharts with Fraunhofer Software. 03/27/2006 Fastest-growing open source BI project joins JBoss certified partner program; Pentaho offers first comprehensive open source BI solution for leading open source middleware platform. 03/27/2006 SAS expands data integration initiative; new software, resources and R&D commitment. 03/27/2006 2006 GeoTec event programming and registration now online. 03/26/2006 Data Warehousing and Knowledge Discovery (DaWaK 2006) conference abstract deadline April 1, 2006. ************************************************************ Please tell your DSS friends about DSSResources.COM ************************************************************ DSS News is copyrighted (c) 2006 by D. J. Power. Please send your questions to daniel.power@dssresources.com |