from DSSResources.com

***********************************************************

                        DSS News
                  D. J. Power, Editor
             May 6, 2007 -- Vol. 8, No. 9

     A Free Bi-Weekly Publication of DSSResources.COM
            approximately 2000 Subscribers

************************************************************

             Support DSS News! Advertise here!

************************************************************

Featured:

* Report from MySQL Conference and Expo 
* DSS Conferences
* DSS News Releases

************************************************************

Report from MySQL Conference and Expo

by Alex Power
Webmaster, DSSResources.com

The fifth annual MySQL Conference & Expo was held April 23-26, 2007 at
the Santa Clara Convention Center. My flight to San Francisco on Saturday
was relatively uneventful, though I misjudged how easy it would be to
take public transit to my hotel and couldn't find a taxi after leaving
the airport. On Sunday I went around the San Francisco area and saw
some tourist attractions.

MySQL is an open source database engine, that hopes to be known as
the "online database". MySQL is the most common database component in
the LAMP stack of open source web software. It uses SQL (Structured
Query Language) to accept queries and interact with the database. The
most notable difference between MySQL and commercial databases is that
MySQL is released under the General Public License (GPL), which means
it can be redistributed freely.

MySQL Features

MySQL generally accepts standard SQL queries. The syntax is somewhat
different from other SQL-based databases, but the code is generally
similar. Some advanced features are still being developed in MySQL.
Views (which allow you to create a virtual table based on other tables
stored on the database), stored procedures, and triggers (automatic
actions that occur on the database when performing a certain action)
were added in version 5.0.

Future enhancements that are in development include the Falcon
transactional storage engine, online backups, foreign keys, and hash
joins. As a transactional engine, Falcon is designed to ensure that
each update to the database is processed independently, and that no
data will be lost in crashes. Falcon is also designed to be easy to
configure and to restore from database crashes.

MySQL Business Model

The business model for MySQL is similar to many other open source
projects. The base MySQL source code is available free of charge under
the GPL to anyone. Compiled versions of the code are made available
every few months. The "Enterprise" edition is sold for between $600 to
$5000 per server per year, or on an unlimited number of servers for
$40000 per year. (In comparison, MySQL reports that Oracle Enterprise
Edition costs $40000 per CPU per year). The features available in the
Enterprise edition include monthly rapid updates, advanced
administration tools such as the new Monitoring and Advisory Service,
performance tuning, legal indemnification, and technical support with
up to 24x7 support with 1 hour response. The performance expectations
of the Enterprise Edition is also higher, with 99.999% uptime expected
for a MySQL cluster. 

Open source products such as MySQL are generally cheaper to develop
and support for several reasons. Firstly, there are a large number of
volunteer contributors to the MySQL engine. Also, the community-based
development allows for problems that are discovered and fixed by one
organization to have that fix applied to the entire product. In his
keynote, MySQL CEO Marten Mickos claimed that "open source is
unstoppable". He described the open source model as "disruptive", in
that it will displace older, less efficient models. The potential pool
of open source developers is huge, and MySQL claims to have at least a
dozen different ways for people to participate in development.

Monday

I arrived at the Santa Clara Convention Center at around 8:00. Both
the convention center and my hotel were located close to a light rail
line, so it was easy to get to and from the conference. The conference
had a continental breakfast with juices, bagels, croissants, etc.
every day of the convention. In the morning, there were 7 tutorial
sessions. I started by going to the Ruby on Rails tutorial. This
tutorial was given by Joe O'Brien, a consultant from Columbus, Ohio.
He started by describing Ruby as a purely object-oriented language.
Even primitives like integers are objects in Ruby; one can do 6.div(3)
and get 2. He emphasized that code generation and templating were not
the most important features of Ruby, though they are the ones most
often presented to newcomers. Ruby does make it very easy to create a
basic CRUD (Create, Read, Update, Delete) record, but much of this
seems to be more of a trick than a pattern of ease of use throughout
the language.

Ruby on Rails is designed to work easily with a database to form an
application with the ActiveRecord class. The general mantra of Ruby is
"Convention over Configuration". For example, if you are dealing with
a table that stores appointments, the class might be called
"Appointment" and the table in the database would be called
"Appointments". When accessed, each row of the table would be
instantiated as an object, and each cell would correspond to an
attribute. A large numbers of functions based on the column titles
would also be defined.

I also went to part of the talk on new features in MySQL 5.1 given by
Brian Aker, director of architecture at MySQL. Some of the
advancements in 5.1 include improved handling of XML-based data,
improvements in the ARCHIVE storage engine for static data, improved
replication capability, and an internal task scheduler. The session
was a good summary of the new features in MySQL, and was one of the
more popular morning sessions. The tutorial on "Scaling and High
Availability Architectures" was the most popular, but when I went by
there was no remaining room in the session.

In the afternoon session I started at the "Wikipedia: Site Internals,
Configuration and Code Examples, and Management Issues" tutorial. The
session was mostly technical, explaining how they use caching and
content acceleration to handle their large traffic load (#10 in the
world according to Alexa). The Squid cache engine is used extensively
by Wikipedia for this purpose.

I also attended part of the "Real-world MySQL Performance Tuning"
tutorial with Ask Bjorn Hansen and Jay Pipes. One of the most
important parts of getting good performance from a database is to know
the type of queries that will be performed on the database. A database
cannot be optimized for every query. Using the "EXPLAIN SELECT"
command and creating proper indices, as well as choosing the correct
server settings, can ensure that your database is tuned to the queries
you perform most. A proper and sane schema is also necessary; Jay
referred to a particularly poor choice at
http://worsethanfailure.com/Articles/Journey_to_the_Center_of_the_Database.aspx
, where there is a separate table storing every day along with an ID
used in all the other tables.

Tuesday

Tuesday began with a series of keynote speeches in the main ballroom.
In the introduction, Zack Urlocker said that there were over 1400
attendees, 66 vendors, and 12 open-source projects represented at the
conference from over 30 countries. This was the largest MySQL
conference ever.

In his keynote, Marten Mickos talked about the general state of
MySQL. He described software as a service as the next stage of
evolution in the computing industry. He called the new development
model "disruptive", in a sense that it disrupts inefficient models of
development currently being used. He emphasized the importance of user
contribution to MySQL. He also highlighted some of the new features,
such as the Monitoring and Advisory Service available with MySQL
Enterprise Edition (which was highlighted by most of the MySQL people
speaking at the conference) and the uptime possibilities of different
configurations. He emphasized that the goal of MySQL is to become
known as the "online database", and how they are working with a number
of partners on that. He also described a number of challenges that
MySQL has had to overcome as a global organization, where most of
their developers work in remote locations and only see each other
occasionally. Some of the lessons they learned are to pay attention to
cultural differences to let them become assets, applying the principle
of "Doveryai, no Proveryai" - "Trust, but Verify", having multiple
communication channels for different tasks, virtually encouraging
people for good work, and having face to face reunions where the
participants worry about fun first and meetings later.

In the next keynote, Guy Kawasaki, venture capitalist and former Mac
Evangelist, presented his 10 ways to succeed in software development.
Guy is a very amusing speaker. The general point of his presentation
is that you need to have a clear focus in your product and do
something much better than everyone else. You also need to make sure
that you actually ship a product rather than continually trying to
improve it. His last two points were slightly different: his 10/20/30
rule for Powerpoint presentations (have 10 slides that take 20
minutes, with text in 30 point font), and "Don't let the bozos grind
you down", where normally intelligent people can fail to see the next
wave of technology. This included quotes such as Thomas Watson's "I
think there is a world market for maybe five computers" as well as Guy
Kawasaki's own casual rejection of an opportunity to become CEO of
Yahoo! back in 1995.

Some of the other sessions I attended on Tuesday were:

"Web-based Ad-hoc Query and Reporting with Pentaho" - This talk was
mostly a demo of the Pentaho business intelligence suite. The
presentation described the levels of business intelligence, starting
with calculations done completely on paper and using more advanced
computer technology up to today's data mining. The goal of Pentaho is
to "protect the user from the database". The product uses data from
any of multiple databases, and after configuration queries can be
easily constructed by the end user. The key to such a product is to
make it easy for the end users, and this product mostly succeeds. The
support for localization and internationalization of the product which
was highlighted towards the end of the presentation, but the
configuration seemed long and not streamlined. The basic business
model of Pentaho was also discussed; in general support and other
services are sold, with a full system costing around 20% of the cost
of a fully proprietary alternative.

"Professional Cat Herding: A FOSS Community Panel" and part of "Panel
with Industry CTOs – How Enterprises Tackle Scaling Problems".
Both of these panels were discussing some of the issues involved in
open source, either development or in utilization. As the title
suggests, it is very difficult to get a group of opinionated
volunteers to agree on anything in open source development. The
implementation of these technologies also has issues. When the
software itself is free and most of the hardware involved is
commoditized, the primary outlet for gaining a competitive advantage
is in how the software is managed and utilized.

"PHP Performance and Scalability Best Practices" - This was a more
technical session. In general, most code, including PHP, can be
optimized to get a significant performance increase. However, this
optimization comes at a cost of ease of development and
maintainability. Extensive optimization should be the last step in
software development, and should not be a major focus during
development. Some of the major tips to be aware of include profiling
code to know where bottlenecks occur, caching as many database calls
and generated pages as possible, avoiding external connections as much
as possible, and avoiding doing "extra work" that PHP already does
faster. The talk should be available at
http://www.omniti.com/~george/talks/ .

After the PHP talk, the expo floor was open for a reception. There
were around 40 exhibits on the floor. Of these, most were of smaller
companies that work primarily with MySQL. The only large companies
were Google, HP, and Sun. Google and Sun seemed to be primarily there
to give away T-Shirts and other corporate paraphrenalia (or schwag). I
talked with representatives of some of the companies on the floor.

Wednesday

The first keynote on Wednesday was by Eben Moglen of the Software
Freedom Law Center on "Freedom Businesses Protect Privacy". His talk
raised a lot of questions on the consequences of technologies which
encourage people to reveal and aggregate large amounts of personal
data. However, he stated "I don't have answers". Many large sites
today operate on the basis of people volunteering data, and this
accumulation leads to the possibility that other people will be able
to use this data to take advantage of you.

In the next keynote, Kaj Arno of MySQL moderated a discussion called
the "Clash of the Database Egos". This was a generally light-hearted
talk, with Kaj wearing a referee's uniform, entry music for the
participants including the Monty Python theme and a Finnish ring tone,
and a heckler yelling "Where's Microsoft SQL Server". There were no
major disagreements between the egos, however.

Some of the other sessions I attended on Wednesday were:

"The Declarative Power of Views" - This talk by Beat Vontobel of
MeteoNews AG was interesting from a theoretical point of view, though
not particularly practical much of the time. In application, the most
common purpose of views is to abstract a large or complex SQL query
into the database, so it can be accessed by a single, simple query. It
can also be used to improve access control to a database, making only
some parts of a table visible to certain users. MeteoNews uses this to
calculate weather information from entered data. He also discussed
using views in MySQL in such a way as for the language to behave as a
declarative programming language. He created a program which behaves
as a decision tree. You query a question and enter "Yes" or "No" into
an answer table. The views automatically update so the question table
displays the next question in the sequence. Another example is that
with a "parent_of" table, you could create a "sibling_of" table.
However, in general, this has scalability and efficiency issues. There
are also issues with an inability to loop.

"MySQL Server Roadmap" - This session described the future path of
the MySQL server development. The main new feature in the next version
of MySQL (version 6.0) will be the Falcon transactional storage
engine. Some of the features being targetted for later releases
include online backups, improvements in subquery optimization, and
support for foreign keys.

Thursday

The first keynote on Thursday was given by Rasmus Lerdorf, the
creator of PHP. The notes for the talk are available at
http://talks.php.net/show/mysql07key/ . Rasmus discussed the history
of PHP, and how it has grown from a few small functions to make his
web design easier to a full, object-oriented language with dozens of
libraries. Rasmus discussed four reasons that people work on
open-source projects: Self-interest, Self-expression, Hormones, and
Improving the World. He suggested the defining characteristic of Web
2.0 is that having more users improves the site. He also discussed the
security issues present in virtually all pages on the internet.

The next keynote was by Paul Tuckfield, DBA at Youtube. He discussed
in brief what actions Youtube has taken to scale their rapidly growing
database. Given its presence online today, it's hard to realize that
Youtube is barely 18 months old.

Some of the other sessions I attended on Thursday were:

"Ruby on Rails and the ActiveRecord ORM" - How Ruby on Rails works
with a database. Much of the material covered was basic information
that is covered earlier in the report. In general, Ruby on Rails works
well for the "expected" behavior, but this comes at a significant
disadvantage for when you are doing unexpected behavior. If you are
executing complicated queries, you will be forced to revert to plain
SQL, which neutralizes many of the advantages of Rails. Furthermore,
some of the conventions such as "pluralization of class name to make
the table name" are filled with potential for errors, especially with
programmers from different cultures.

"Managing MySQL the Slack Way" - This talk was about how Google
manages its servers. In general, the basic principle behind Google
hardware is "Assume the hardware will fail". The software they use to
build new machines is called Slack. It basically works by assigning a
machine a list of roles, such as database slave, LDAP server, etc.,
and installing software based on that list. Changing a machine's roles
is generally accomplished by rebuilding the machine from scratch. They
also discussed how they set up database slaves to start replicating a
database master server. The basic approach used is to temporarily stop
one of the other slaves, sending the database files across the network
with SCP or netcat, and cleaning up the database for the new machine
name and other issues.

"Managing Hierarchical Data in MySQL" - This talk discussed three
ways to store hierarchical data in MySQL. Hierarchical data is data in
the form of a tree, where each piece of data except the root has a
parent. The adjacency list model is the simplest, with each node
containing a cell linking to the parent. The lack of a "CONNECT BY
PRIOR" ability in MySQL makes using this model inconvenient. The
nested set model is less intuitive, but is very quick for doing
aggregate data when linking against it. However, it is very slow for
inserting or removing elements. The materialized path model can do
some queries easier by storing the entire path from the root with each
node, but performs slower for other queries.

"High Performance Data Warehousing with MySQL: Tricks and Tips from
the Field" - This talk was about optimization of performance for
large-scale data warehouses. Some of the topics addressed included
selecting the proper database engine for your expected usage,
conducting backups in an intelligent manner, and configuring the
database settings. One of the most important portions of the talk was
discussing the proper engine choice. As different engines are
optimized for different tasks, this can have a significant performance
effect.

The basic engines are the MyISAM and InnoDB engines that are
developed with MySQL. MyISAM is the default engine and is used for the
MySQL files. InnoDB performs well for applications with lots of
streaming data. Other, specialized engines are packaged with MySQL as
well, such as the MEMORY engine (stores data only in RAM), the CSV
engine (stores data in a CSV file), the ARCHIVE engine (can only
insert and select, but more compact), and the BLACKHOLE engine (does
not store data at all). In addition, there are proprietary storage
engines available. These include Nitro, which allows for very high
insert rates, InfoBright, which performs very good compression and
aggregation, and SolidDB, which supports higher numbers of concurrent
users and inserts.

The closing keynote was by Pasha Sadri of Yahoo! on Yahoo Pipes.
Yahoo Pipes is basically a fancy RSS and web services reader; it can
take multiple input sources of data and run operations on them to
filter or transform the input. One application of the system was as a
translation plugin for Second Life.

Conclusions:

It was a relatively successful conference. I learned a lot and got to
meet people interested in MySQL. Some specific conclusions I came to
regarding the future of MySQL and open source development as they
relate to decision support systems are:

1) MySQL is being used by many of the largest websites today.
However, the needs of a website are different than the needs of a
large data warehouse. Most of the large websites at the MySQL
conference use a relatively similar architecture, with a large number
of similarly configured machines. The cost savings of an open source
solution are greatest in this situation. Most data warehouses involve
only a few machines, and thus a licensing cost is less important than
ease of configuration, usage, and support.

2) MySQL is tailoring its future development towards the enterprise
market. Many "enterprise-class" features have been added in recent
versions, and others are currently in development. There are rumors
that MySQL will have an IPO at some point in the near future. MySQL
will clearly be a significant database player in the next few years,
and will become increasingly suited for large-scale enterprise
installations.

3) There are no real barriers to entry for a small open-source
project. There are dozens, if not hundreds of PHP frameworks that can
work with MySQL to design a database-backed website. However, many of
these are supported by only a few people, are poorly documented, and
may have bugs or other performance issue. The key to a successful open
source project is developing a community.

4) Marten Mickos' description of the open source model as
"disruptive" is revealing for looking at how these projects can be
successful in a commercial environment. There are a large number of
hidden costs associated with keeping source code proprietary. These
include interoperability issues, knowledge-sharing issues, etc. The
majority of the value associated with software is not the code itself,
it is in guaranteeing the software will perform in a certain way and
in supporting installation and usage.

References

MySQL Staff, "Cox Communications powers massive data warehouse with
MySQL", March 23, 2007 at URL DSSResources.COM.

Power, D., "How will Web 2.0 impact design and development of
decision support systems?" DSS News, April 22, 2007.

Power, D., "What software tools should be used to build Web-based
DSS?" DSS News, November 5, 2006.

************************************************************

       Sign the new guestbook at DSSResources.COM

************************************************************

DSS Conferences

1. ISCRAM 2007, May 13-16, 2007 Delft, The Netherlands. 
Check http://www.iscram.org .

2. MWAIS data warehousing workshop with Ron Swift, Friday 
morning, May 18, 2007, check http://mis.uis.edu/MWAIS2007/ .

3. Crystal Ball User Conference, May 21-23, 2007 Denver. 
Check http://www.crystalball.com/cbuc/index.html.

4. AMCIS 2007, Americas Conference on Information Systems,
Keystone, CO USA, August 9-12, 2007. SIG DSS mini-tracks.
Check http://www.biz.colostate.edu/amcis07/ .

5. DaWaK 2007, 9th International Conference on Data
Warehousing and Knowledge Discovery, Regensburg, Germany,
September 3-7, 2007. Check http://www.dexa.org/ .

************************************************************ 

         Purchase Dan Power's DSS FAQ book 
   83 frequently asked questions about computerized DSS 
    http://dssresources.com/dssbookstore/power2005.html 

************************************************************

DSS News Releases - April 23 to May 3, 2007
Read them at DSSResources.COM and search the DSS News Archive

05/03/2007 CDC's new influenza tracking prototype honored with
SAS Enterprise Intelligence Award.

04/30/2007 Microsoft lights way for next-generation web development
and design at MIX07.

04/30/2007 New release of Crystal Ball gives users Microsoft Office
2007 compatibility and scatter charts.

04/30/2007 ILOG marks 20th anniversary by ringing NASDAQ opening
bell.

04/30/2007 Norwich Union selects Teradata CRM to support customer
value management program.

04/27/2007 Informatica World to showcase special keynote from NYPD's
Executive Officer, Office of Information Technology.

04/26/2007 The Conference Board holds its first 'virtual' meeting
inside second life.

04/26/2007 Kaidara's knowledge management software selected by
Alcatel-Lucent for its enterprise customer support center.

04/25/2007 MySQL AB and IBM announce open source database support for
the IBM System i platform.

04/25/2007 LeanLogistics wins 4th consecutive honor as top IT
Logistics provider.

04/25/2007 MicroStrategy recognizes three German companies for their
business intelligence achievements.

04/24/2007 CEO of MySQL AB celebrates “The Disruptive Spirit of
the Dolphin”.

04/24/2007 SAP unveils road map for next wave of SAP NetWeaver
innovations.

04/24/2007 HP unveils Neoview data warehouse and new services for
enhanced business intelligence.

04/24/2007 New study reveals password vulnerabilities and concerns:
sharing and abuse are common and can be disastrous.

04/24/2007 Southwest Airlines selects Clarity Systems to optimize
corporate performance management.

04/23/2007 Hyperion System 9 software continues to drive BPM
standardization through innovation, market leadership, and customer
adoption.

************************************************************

 Please tell your DSS friends about DSSResources.COM

************************************************************

DSS News is copyrighted (c) 2007 by D. J. Power. Please send your
questions to daniel.power@dssresources.com

DSS Home |  About Us |  Contact Us |  Site Index |  Subscribe | What's New
Please Tell 
Your Friends about DSSResources.COM Copyright © 1995-2021 by D. J. Power (see his home page). DSSResources.COMsm was maintained by Daniel J. Power. See disclaimer and privacy statement.