*********************************************************** DSS News D. J. Power, Editor May 6, 2007 -- Vol. 8, No. 9 A Free Bi-Weekly Publication of DSSResources.COM approximately 2000 Subscribers ************************************************************ Support DSS News! Advertise here! ************************************************************ Featured: * Report from MySQL Conference and Expo * DSS Conferences * DSS News Releases ************************************************************ Report from MySQL Conference and Expo by Alex Power Webmaster, DSSResources.com The fifth annual MySQL Conference & Expo was held April 23-26, 2007 at the Santa Clara Convention Center. My flight to San Francisco on Saturday was relatively uneventful, though I misjudged how easy it would be to take public transit to my hotel and couldn't find a taxi after leaving the airport. On Sunday I went around the San Francisco area and saw some tourist attractions. MySQL is an open source database engine, that hopes to be known as the "online database". MySQL is the most common database component in the LAMP stack of open source web software. It uses SQL (Structured Query Language) to accept queries and interact with the database. The most notable difference between MySQL and commercial databases is that MySQL is released under the General Public License (GPL), which means it can be redistributed freely. MySQL Features MySQL generally accepts standard SQL queries. The syntax is somewhat different from other SQL-based databases, but the code is generally similar. Some advanced features are still being developed in MySQL. Views (which allow you to create a virtual table based on other tables stored on the database), stored procedures, and triggers (automatic actions that occur on the database when performing a certain action) were added in version 5.0. Future enhancements that are in development include the Falcon transactional storage engine, online backups, foreign keys, and hash joins. As a transactional engine, Falcon is designed to ensure that each update to the database is processed independently, and that no data will be lost in crashes. Falcon is also designed to be easy to configure and to restore from database crashes. MySQL Business Model The business model for MySQL is similar to many other open source projects. The base MySQL source code is available free of charge under the GPL to anyone. Compiled versions of the code are made available every few months. The "Enterprise" edition is sold for between $600 to $5000 per server per year, or on an unlimited number of servers for $40000 per year. (In comparison, MySQL reports that Oracle Enterprise Edition costs $40000 per CPU per year). The features available in the Enterprise edition include monthly rapid updates, advanced administration tools such as the new Monitoring and Advisory Service, performance tuning, legal indemnification, and technical support with up to 24x7 support with 1 hour response. The performance expectations of the Enterprise Edition is also higher, with 99.999% uptime expected for a MySQL cluster. Open source products such as MySQL are generally cheaper to develop and support for several reasons. Firstly, there are a large number of volunteer contributors to the MySQL engine. Also, the community-based development allows for problems that are discovered and fixed by one organization to have that fix applied to the entire product. In his keynote, MySQL CEO Marten Mickos claimed that "open source is unstoppable". He described the open source model as "disruptive", in that it will displace older, less efficient models. The potential pool of open source developers is huge, and MySQL claims to have at least a dozen different ways for people to participate in development. Monday I arrived at the Santa Clara Convention Center at around 8:00. Both the convention center and my hotel were located close to a light rail line, so it was easy to get to and from the conference. The conference had a continental breakfast with juices, bagels, croissants, etc. every day of the convention. In the morning, there were 7 tutorial sessions. I started by going to the Ruby on Rails tutorial. This tutorial was given by Joe O'Brien, a consultant from Columbus, Ohio. He started by describing Ruby as a purely object-oriented language. Even primitives like integers are objects in Ruby; one can do 6.div(3) and get 2. He emphasized that code generation and templating were not the most important features of Ruby, though they are the ones most often presented to newcomers. Ruby does make it very easy to create a basic CRUD (Create, Read, Update, Delete) record, but much of this seems to be more of a trick than a pattern of ease of use throughout the language. Ruby on Rails is designed to work easily with a database to form an application with the ActiveRecord class. The general mantra of Ruby is "Convention over Configuration". For example, if you are dealing with a table that stores appointments, the class might be called "Appointment" and the table in the database would be called "Appointments". When accessed, each row of the table would be instantiated as an object, and each cell would correspond to an attribute. A large numbers of functions based on the column titles would also be defined. I also went to part of the talk on new features in MySQL 5.1 given by Brian Aker, director of architecture at MySQL. Some of the advancements in 5.1 include improved handling of XML-based data, improvements in the ARCHIVE storage engine for static data, improved replication capability, and an internal task scheduler. The session was a good summary of the new features in MySQL, and was one of the more popular morning sessions. The tutorial on "Scaling and High Availability Architectures" was the most popular, but when I went by there was no remaining room in the session. In the afternoon session I started at the "Wikipedia: Site Internals, Configuration and Code Examples, and Management Issues" tutorial. The session was mostly technical, explaining how they use caching and content acceleration to handle their large traffic load (#10 in the world according to Alexa). The Squid cache engine is used extensively by Wikipedia for this purpose. I also attended part of the "Real-world MySQL Performance Tuning" tutorial with Ask Bjorn Hansen and Jay Pipes. One of the most important parts of getting good performance from a database is to know the type of queries that will be performed on the database. A database cannot be optimized for every query. Using the "EXPLAIN SELECT" command and creating proper indices, as well as choosing the correct server settings, can ensure that your database is tuned to the queries you perform most. A proper and sane schema is also necessary; Jay referred to a particularly poor choice at http://worsethanfailure.com/Articles/Journey_to_the_Center_of_the_Database.aspx , where there is a separate table storing every day along with an ID used in all the other tables. Tuesday Tuesday began with a series of keynote speeches in the main ballroom. In the introduction, Zack Urlocker said that there were over 1400 attendees, 66 vendors, and 12 open-source projects represented at the conference from over 30 countries. This was the largest MySQL conference ever. In his keynote, Marten Mickos talked about the general state of MySQL. He described software as a service as the next stage of evolution in the computing industry. He called the new development model "disruptive", in a sense that it disrupts inefficient models of development currently being used. He emphasized the importance of user contribution to MySQL. He also highlighted some of the new features, such as the Monitoring and Advisory Service available with MySQL Enterprise Edition (which was highlighted by most of the MySQL people speaking at the conference) and the uptime possibilities of different configurations. He emphasized that the goal of MySQL is to become known as the "online database", and how they are working with a number of partners on that. He also described a number of challenges that MySQL has had to overcome as a global organization, where most of their developers work in remote locations and only see each other occasionally. Some of the lessons they learned are to pay attention to cultural differences to let them become assets, applying the principle of "Doveryai, no Proveryai" - "Trust, but Verify", having multiple communication channels for different tasks, virtually encouraging people for good work, and having face to face reunions where the participants worry about fun first and meetings later. In the next keynote, Guy Kawasaki, venture capitalist and former Mac Evangelist, presented his 10 ways to succeed in software development. Guy is a very amusing speaker. The general point of his presentation is that you need to have a clear focus in your product and do something much better than everyone else. You also need to make sure that you actually ship a product rather than continually trying to improve it. His last two points were slightly different: his 10/20/30 rule for Powerpoint presentations (have 10 slides that take 20 minutes, with text in 30 point font), and "Don't let the bozos grind you down", where normally intelligent people can fail to see the next wave of technology. This included quotes such as Thomas Watson's "I think there is a world market for maybe five computers" as well as Guy Kawasaki's own casual rejection of an opportunity to become CEO of Yahoo! back in 1995. Some of the other sessions I attended on Tuesday were: "Web-based Ad-hoc Query and Reporting with Pentaho" - This talk was mostly a demo of the Pentaho business intelligence suite. The presentation described the levels of business intelligence, starting with calculations done completely on paper and using more advanced computer technology up to today's data mining. The goal of Pentaho is to "protect the user from the database". The product uses data from any of multiple databases, and after configuration queries can be easily constructed by the end user. The key to such a product is to make it easy for the end users, and this product mostly succeeds. The support for localization and internationalization of the product which was highlighted towards the end of the presentation, but the configuration seemed long and not streamlined. The basic business model of Pentaho was also discussed; in general support and other services are sold, with a full system costing around 20% of the cost of a fully proprietary alternative. "Professional Cat Herding: A FOSS Community Panel" and part of "Panel with Industry CTOs – How Enterprises Tackle Scaling Problems". Both of these panels were discussing some of the issues involved in open source, either development or in utilization. As the title suggests, it is very difficult to get a group of opinionated volunteers to agree on anything in open source development. The implementation of these technologies also has issues. When the software itself is free and most of the hardware involved is commoditized, the primary outlet for gaining a competitive advantage is in how the software is managed and utilized. "PHP Performance and Scalability Best Practices" - This was a more technical session. In general, most code, including PHP, can be optimized to get a significant performance increase. However, this optimization comes at a cost of ease of development and maintainability. Extensive optimization should be the last step in software development, and should not be a major focus during development. Some of the major tips to be aware of include profiling code to know where bottlenecks occur, caching as many database calls and generated pages as possible, avoiding external connections as much as possible, and avoiding doing "extra work" that PHP already does faster. The talk should be available at http://www.omniti.com/~george/talks/ . After the PHP talk, the expo floor was open for a reception. There were around 40 exhibits on the floor. Of these, most were of smaller companies that work primarily with MySQL. The only large companies were Google, HP, and Sun. Google and Sun seemed to be primarily there to give away T-Shirts and other corporate paraphrenalia (or schwag). I talked with representatives of some of the companies on the floor. Wednesday The first keynote on Wednesday was by Eben Moglen of the Software Freedom Law Center on "Freedom Businesses Protect Privacy". His talk raised a lot of questions on the consequences of technologies which encourage people to reveal and aggregate large amounts of personal data. However, he stated "I don't have answers". Many large sites today operate on the basis of people volunteering data, and this accumulation leads to the possibility that other people will be able to use this data to take advantage of you. In the next keynote, Kaj Arno of MySQL moderated a discussion called the "Clash of the Database Egos". This was a generally light-hearted talk, with Kaj wearing a referee's uniform, entry music for the participants including the Monty Python theme and a Finnish ring tone, and a heckler yelling "Where's Microsoft SQL Server". There were no major disagreements between the egos, however. Some of the other sessions I attended on Wednesday were: "The Declarative Power of Views" - This talk by Beat Vontobel of MeteoNews AG was interesting from a theoretical point of view, though not particularly practical much of the time. In application, the most common purpose of views is to abstract a large or complex SQL query into the database, so it can be accessed by a single, simple query. It can also be used to improve access control to a database, making only some parts of a table visible to certain users. MeteoNews uses this to calculate weather information from entered data. He also discussed using views in MySQL in such a way as for the language to behave as a declarative programming language. He created a program which behaves as a decision tree. You query a question and enter "Yes" or "No" into an answer table. The views automatically update so the question table displays the next question in the sequence. Another example is that with a "parent_of" table, you could create a "sibling_of" table. However, in general, this has scalability and efficiency issues. There are also issues with an inability to loop. "MySQL Server Roadmap" - This session described the future path of the MySQL server development. The main new feature in the next version of MySQL (version 6.0) will be the Falcon transactional storage engine. Some of the features being targetted for later releases include online backups, improvements in subquery optimization, and support for foreign keys. Thursday The first keynote on Thursday was given by Rasmus Lerdorf, the creator of PHP. The notes for the talk are available at http://talks.php.net/show/mysql07key/ . Rasmus discussed the history of PHP, and how it has grown from a few small functions to make his web design easier to a full, object-oriented language with dozens of libraries. Rasmus discussed four reasons that people work on open-source projects: Self-interest, Self-expression, Hormones, and Improving the World. He suggested the defining characteristic of Web 2.0 is that having more users improves the site. He also discussed the security issues present in virtually all pages on the internet. The next keynote was by Paul Tuckfield, DBA at Youtube. He discussed in brief what actions Youtube has taken to scale their rapidly growing database. Given its presence online today, it's hard to realize that Youtube is barely 18 months old. Some of the other sessions I attended on Thursday were: "Ruby on Rails and the ActiveRecord ORM" - How Ruby on Rails works with a database. Much of the material covered was basic information that is covered earlier in the report. In general, Ruby on Rails works well for the "expected" behavior, but this comes at a significant disadvantage for when you are doing unexpected behavior. If you are executing complicated queries, you will be forced to revert to plain SQL, which neutralizes many of the advantages of Rails. Furthermore, some of the conventions such as "pluralization of class name to make the table name" are filled with potential for errors, especially with programmers from different cultures. "Managing MySQL the Slack Way" - This talk was about how Google manages its servers. In general, the basic principle behind Google hardware is "Assume the hardware will fail". The software they use to build new machines is called Slack. It basically works by assigning a machine a list of roles, such as database slave, LDAP server, etc., and installing software based on that list. Changing a machine's roles is generally accomplished by rebuilding the machine from scratch. They also discussed how they set up database slaves to start replicating a database master server. The basic approach used is to temporarily stop one of the other slaves, sending the database files across the network with SCP or netcat, and cleaning up the database for the new machine name and other issues. "Managing Hierarchical Data in MySQL" - This talk discussed three ways to store hierarchical data in MySQL. Hierarchical data is data in the form of a tree, where each piece of data except the root has a parent. The adjacency list model is the simplest, with each node containing a cell linking to the parent. The lack of a "CONNECT BY PRIOR" ability in MySQL makes using this model inconvenient. The nested set model is less intuitive, but is very quick for doing aggregate data when linking against it. However, it is very slow for inserting or removing elements. The materialized path model can do some queries easier by storing the entire path from the root with each node, but performs slower for other queries. "High Performance Data Warehousing with MySQL: Tricks and Tips from the Field" - This talk was about optimization of performance for large-scale data warehouses. Some of the topics addressed included selecting the proper database engine for your expected usage, conducting backups in an intelligent manner, and configuring the database settings. One of the most important portions of the talk was discussing the proper engine choice. As different engines are optimized for different tasks, this can have a significant performance effect. The basic engines are the MyISAM and InnoDB engines that are developed with MySQL. MyISAM is the default engine and is used for the MySQL files. InnoDB performs well for applications with lots of streaming data. Other, specialized engines are packaged with MySQL as well, such as the MEMORY engine (stores data only in RAM), the CSV engine (stores data in a CSV file), the ARCHIVE engine (can only insert and select, but more compact), and the BLACKHOLE engine (does not store data at all). In addition, there are proprietary storage engines available. These include Nitro, which allows for very high insert rates, InfoBright, which performs very good compression and aggregation, and SolidDB, which supports higher numbers of concurrent users and inserts. The closing keynote was by Pasha Sadri of Yahoo! on Yahoo Pipes. Yahoo Pipes is basically a fancy RSS and web services reader; it can take multiple input sources of data and run operations on them to filter or transform the input. One application of the system was as a translation plugin for Second Life. Conclusions: It was a relatively successful conference. I learned a lot and got to meet people interested in MySQL. Some specific conclusions I came to regarding the future of MySQL and open source development as they relate to decision support systems are: 1) MySQL is being used by many of the largest websites today. However, the needs of a website are different than the needs of a large data warehouse. Most of the large websites at the MySQL conference use a relatively similar architecture, with a large number of similarly configured machines. The cost savings of an open source solution are greatest in this situation. Most data warehouses involve only a few machines, and thus a licensing cost is less important than ease of configuration, usage, and support. 2) MySQL is tailoring its future development towards the enterprise market. Many "enterprise-class" features have been added in recent versions, and others are currently in development. There are rumors that MySQL will have an IPO at some point in the near future. MySQL will clearly be a significant database player in the next few years, and will become increasingly suited for large-scale enterprise installations. 3) There are no real barriers to entry for a small open-source project. There are dozens, if not hundreds of PHP frameworks that can work with MySQL to design a database-backed website. However, many of these are supported by only a few people, are poorly documented, and may have bugs or other performance issue. The key to a successful open source project is developing a community. 4) Marten Mickos' description of the open source model as "disruptive" is revealing for looking at how these projects can be successful in a commercial environment. There are a large number of hidden costs associated with keeping source code proprietary. These include interoperability issues, knowledge-sharing issues, etc. The majority of the value associated with software is not the code itself, it is in guaranteeing the software will perform in a certain way and in supporting installation and usage. References MySQL Staff, "Cox Communications powers massive data warehouse with MySQL", March 23, 2007 at URL DSSResources.COM. Power, D., "How will Web 2.0 impact design and development of decision support systems?" DSS News, April 22, 2007. Power, D., "What software tools should be used to build Web-based DSS?" DSS News, November 5, 2006. ************************************************************ Sign the new guestbook at DSSResources.COM ************************************************************ DSS Conferences 1. ISCRAM 2007, May 13-16, 2007 Delft, The Netherlands. Check http://www.iscram.org . 2. MWAIS data warehousing workshop with Ron Swift, Friday morning, May 18, 2007, check http://mis.uis.edu/MWAIS2007/ . 3. Crystal Ball User Conference, May 21-23, 2007 Denver. Check http://www.crystalball.com/cbuc/index.html. 4. AMCIS 2007, Americas Conference on Information Systems, Keystone, CO USA, August 9-12, 2007. SIG DSS mini-tracks. Check http://www.biz.colostate.edu/amcis07/ . 5. DaWaK 2007, 9th International Conference on Data Warehousing and Knowledge Discovery, Regensburg, Germany, September 3-7, 2007. Check http://www.dexa.org/ . ************************************************************ Purchase Dan Power's DSS FAQ book 83 frequently asked questions about computerized DSS http://dssresources.com/dssbookstore/power2005.html ************************************************************ DSS News Releases - April 23 to May 3, 2007 Read them at DSSResources.COM and search the DSS News Archive 05/03/2007 CDC's new influenza tracking prototype honored with SAS Enterprise Intelligence Award. 04/30/2007 Microsoft lights way for next-generation web development and design at MIX07. 04/30/2007 New release of Crystal Ball gives users Microsoft Office 2007 compatibility and scatter charts. 04/30/2007 ILOG marks 20th anniversary by ringing NASDAQ opening bell. 04/30/2007 Norwich Union selects Teradata CRM to support customer value management program. 04/27/2007 Informatica World to showcase special keynote from NYPD's Executive Officer, Office of Information Technology. 04/26/2007 The Conference Board holds its first 'virtual' meeting inside second life. 04/26/2007 Kaidara's knowledge management software selected by Alcatel-Lucent for its enterprise customer support center. 04/25/2007 MySQL AB and IBM announce open source database support for the IBM System i platform. 04/25/2007 LeanLogistics wins 4th consecutive honor as top IT Logistics provider. 04/25/2007 MicroStrategy recognizes three German companies for their business intelligence achievements. 04/24/2007 CEO of MySQL AB celebrates “The Disruptive Spirit of the Dolphin”. 04/24/2007 SAP unveils road map for next wave of SAP NetWeaver innovations. 04/24/2007 HP unveils Neoview data warehouse and new services for enhanced business intelligence. 04/24/2007 New study reveals password vulnerabilities and concerns: sharing and abuse are common and can be disastrous. 04/24/2007 Southwest Airlines selects Clarity Systems to optimize corporate performance management. 04/23/2007 Hyperion System 9 software continues to drive BPM standardization through innovation, market leadership, and customer adoption. ************************************************************ Please tell your DSS friends about DSSResources.COM ************************************************************ DSS News is copyrighted (c) 2007 by D. J. Power. Please send your questions to daniel.power@dssresources.com |