from DSSResources.com

Data Science Central member: Jean-Francois Puget, IBM distinguished engineer, solves big data conjecture

SSAQUAH, Wash., Oct. 29, 2013 /PRNewswire/ -- A mathematical problem related to big data was solved by Jean-Francois Puget, engineer in the Solutions Analytics and Optimization group at IBM France. The problem was first mentioned on Data Science Central, and an award was offered to the first data scientist to solve it.

Bryan Gorman, Principal Physicist, Chief Scientist at Johns Hopkins University Applied Physics Laboratory, made a significant breakthrough in July, and won $500. Jean-Francois Puget completely solved the problem, independently from Bryan, and won a $1,000 award.

The competition was organized and financed by Data Science Central. Participants from around the world submitted a number of interesting approaches. The mathematical question was asked by Vincent Granville, a leading data scientist and co-founder at Data Science Central. Granville initially proposed a solution after performing large-scale Monte Carlo simulations, but his solution turned out to be wrong.

The problem consisted in finding an exact formula for a new type of correlation and goodness-of-fit metrics, designed specifically for big data, generalizing the Spearman's rank coefficient, and being especially robust for non-bounded, ordinal data found in large data sets. From a mathematical point of view, the new metric is based on L-1 rather than L-2 theory: In other words, it relies on absolute rather than squared differences. Using squares (or higher powers) is what makes traditional metrics such as R squared notoriously sensitive to outliers, and avoided by savvy statistical modelers. In big data, outliers are plentiful and even extreme outliers are not rare. It can render conclusions from a statistical analysis invalid, so this is a critical issue. This outlier issue is sometimes referred to as the curse of big data.

Jean-Francois and Brian both came with a new approach: Instead of running heavy computations, they used mathematical thinking and leveraged their expertise in mathematical optimization as well as in permutation theory and combinatorics. And they succeeded. This proves that sometimes, mathematical modeling can beat even the most powerful system of clustered computers. Though usually, both work hand in hand.

Additional details can be found here: http://bit.ly/133S6ns.

Data Science Central is the industry's online resource for big data practitioners. From Analytics to Data Integration to Visualization, Data Science Central provides a community experience that includes a robust editorial platform, social interaction, forum-based technical support, the latest in technology, tools and trends and industry job opportunities.

SOURCE Data Science Central



DSS Home |  About Us |  Contact Us |  Site Index |  Subscribe | What's New
Please Tell 
Your Friends about DSSResources.COM Copyright © 1995-2021 by D. J. Power (see his home page). DSSResources.COMsm was maintained by Daniel J. Power. See disclaimer and privacy statement.