from DSSResources.comSchutt talks at Strata on Data ScientistsFebruary 28, 2013 -- Rachel Schutt spoke at Strata yesterday about Next-Gen data science. The abstract of her presentation is: Data Science is an emerging field in industry, yet not well-defined as an academic discipline (or even in industry for that matter). I proposed the “Introduction to Data Science” course at Columbia in March, 2012. This was the first course at Columbia that had the term “Data Science” in the title. I had three primary motivations: 1) Bringing industry to students: I wanted to give students an education in what it’s like to be a data scientist in industry and give them some of the skills data scientists have. This is based on my experience as a lead analyst on the Google+ Data Science team. But I didn’t want to limit them to only my way of seeing the world, so each week, guest speakers from the NYC tech community came to teach the class. 2) I wanted to think more deeply about the science of data science: Data Science has the potential to be a deep and profound research discipline impacting all aspects of our lives. Columbia University and Mayor Bloomberg announced the Institute for Data Sciences and Engineering in July, 2012. This course created an opportunity to develop the theory of Data Science and to formalize it as a legitimate science. 3) Personal Challenge: I kept hearing from data scientists in industry that you can’t teach data science in a classroom or university setting and I took that on as a challenge. I wanted to test the hypothesis that it was possible to train awesome data scientists in the classroom. In February 2013, 2 months will have passed since the class ended. I’ll be able to reflect on how the class went, how I thought about the curriculum, how I engaged the NYC tech community to be involved in the class, who the students were, whether I had impact on them, etc. Schutt argued "Data science needs to be defined in a more deep way to merit the term 'science,'" and she noted "getting more trained people into the profession will also help actually define the scope of data science in a natural and rigorous way." According to Schutt, data scientists often do a great deal of exploratory data analysis to create data visualizations for reporting purposes. They spend a great deal of time using data to come up with unique business insights and metrics. They help companies make important data-driven decisions, and they tend to be skilled users of big data technologies like Hadoop, MapReduce, Hive and Pig. They often are hackers and usually boast proficiency in R, Python, C, Java and other programming languages. Schutt compiled two lists in an effort to zero in on some of the characteristics she thinks are shared by successful data scientists. She presented those lists as part of her talk at Strata. The first one is a rundown of the common credentials and traits of the data scientists she considered. What she found is that many have doctorates in philosophy, although that degree isn't a requirement for the job, and that their fields of study were often in quantitative subjects, such as statistics or math. In addition, she said, they have an innate ability to code and learn programming languages, and they have proven problem-solving skills. Ironically, she noted, one thing they don't necessarily have is the phrase "data scientist" in their job titles. Schutt's second list describes what she called the common "habits of mind" of effective data scientists. For example, she said, they tend to be very persistent people who don't like to give up when faced with challenges. They also are flexible thinkers and are prone to asking questions. And they're the type of people who strive for accuracy, clarity and precision in their thinking and how they communicate, she added. Successful data scientists also are adept at applying past knowledge to new situations, Schutt said. They take calculated risks in their analytics work, they're imaginative and they like to innovate. In addition, they think independently and believe in continuous learning. But they have a lighter side, too: She said they also tend to find the humor in things and to be good listeners who are empathic to the needs of others. Schutt's class blog is at http://columbiadatascience.com/blog/ About Schutt Dr. Rachel Schutt is a Senior Research Scientist at Johnson Research Labs. Prior to that, she was a Senior Statistician at Google Research in the New York office. She is also an Adjunct Assistant Professor in Columbia’s Statistics Department, and is a founding member of the Education Committee for the Institute for Data Sciences and Engineering at Columbia. Rachel is co-authoring a book (with Cathy O’Neil) called “Doing Data Science” to be published by O’Reilly in 2013. Her interests include statistical modeling, exploratory data analysis, machine learning algorithms, and social networks, as well as the ethical dimensions of Data Science, and using Data Science to do good. She holds several pending patents. She is a frequent speaker at conferences and universities. She earned her PhD from Columbia University in Statistics, and Masters degrees in Mathematics and Engineering from the Courant Institute (NYU) and Stanford University, respectively. Her undergraduate degree is in Honors Mathematics from the University of Michigan.
|