Columbia University’s Data Science Institute Graduate Programs

New York, January 15, 2015 -- Master of Science in Data Science. The MS in Data Science is a 30-credit program that offers students an in-depth training in data science, and the opportunity to work closely with diverse Columbia faculty members as well as industry affiliates. The interdisciplinary curriculum provides students with the flexibility to hone in on their particular interests and skill sets through the optional elective track, which include the Institute’s six centers and the entrepreneurship track. The program culminates with a semester-length capstone design project that integrates the training in an experience that has practical applications in the real-world. Part and full-time options available. Video at

Certification of Professional Achievement. The Certification of Professional Achievement in Data Sciences is a 12-credit non-degree program jointly offered through the Fu Foundation School of Engineering and Applied Science and the Graduate School of Arts and Sciences. The foundational data science skills acquired though four courses provide individuals seeking continuing education with the opportunity to either strengthen their existing career prospects in environments where data science skills are valued, or the means of embarking on a new career trajectory that takes advantage of the growing demand for a workforce with data science skills or knowledge. Students may complete the program in as little as two semesters of part-time study.


Candidates for the Master of Science in Data Science are required to complete a minimum of 30 credits, including 21 credits of required/core courses and 9 credits of electives. This program may be pursued on a part-time or full-time basis.


3 pts. Professor Rahul Mazumder.
Prerequisites: MATH V1101 Calculus I and V1102 Calculus II or the equivalent.
A calculus-based introduction to probability theory. Topics covered include random variables, conditional probability, expectation, independence, Bayes' rule, important distributions, joint distributions, moment generating functions, central limit theorem, laws of large numbers and Markov's inequality.

3 pts. Professor Eleni Drinea.
Prerequisites: basic knowledge in programming (e.g., at the level of COMS W1007), a basic grounding in calculus and linear algebra.
Methods for organizing data, e.g. hashing, trees, queues, lists,priority queues. Streaming algorithms for computing statistics on the data. Sorting and searching. Basic graph models and algorithms for searching, shortest paths, and matching. Dynamic programming. Linear and convex programming. Floating point arithmetic, stability of numerical algorithms, Eigenvalues, singular values, PCA, gradient descent, stochastic gradient descent, and block coordinate descent. Conjugate gradient, Newton and quasi-Newton methods. Large scale applications from signal processing, collaborative filtering, recommendations systems, etc.

3 pts. Professor Emanuel Ben-David.
Prerequisites: Working knowledge of calculus and linear algebra (vectors and matrices), and STAT W4105 Probability or equivalent.
In this course, we will systematically cover fundamentals of statistical inference and testing, and give an introduction to statistical modeling. The first half of the course will be focused on inference and teesting, covering topics such as maximum likelihood estimates, hypothesis testing, likelihood ratio test, Bayesian inference, etc. The second half of the course will provide introduction to statistical modeling via introductory lectures on linear regression models, generalized linear regression models, nonparametric regression. and statistical computing.  Throughpout the course, real-data examples will be used in lecture discussion and homework problems.  This course lays the foundation, preparing the MA in Data Science students, for other courses in machine learning, data mining and visualization.

3 pts. Professor Simha Sethumadhavan.
Prerequisites: Background in Computer System Organization and good working knowledge of C/C++. Corequisites: CSOR W4246 Algorithms for Data Science, STAT W4105 Probability, or equivalent as approved by faculty advisor.
An introduction to computer architecture and distributed systems with an emphasis on warehouse scale computing systems. Topics will include fundamental tradeoffs in computer systems, hardware and software techniques for exploiting instruction-level parallelism, data-level parallelism and task level parallelism, scheduling, caching, prefetching, network and memory architecture, latency and throughput optimizations, specialization, and an introduction to programming data center computers.

3 pts. Professor John Paisley.
Prerequisites: Background in linear algebra and probability and statistics.
An introduction to machine learning, with an emphasis on data science. Topics will include least squares methods, Gaussian distributions, linear classification, linear regression,  maximum likelihood, exponential family distributions, Bayesian networks, Bayesian inference, mixture models, the EM algorithm, graphical models, hidden Markov models, support vector machines, and kernel methods. Part of the course will be focused on methods and problems relevant to big data problems.

3 pts. Professor Michael Malecki.
Prerequisite: programming.
Fundamentals of data visualization, layered grammer of graphics, perception of discrete and continuous variables, intreoduction to Mondran, mosaic pots, parallel coordinate plots, introduction to ggobi, linked pots, brushing, dynamic graphics, model visualization, clustering and classification.

3 pts. Members of the faculty.
Prerequisites: CSOR W4246 Algorithms for Data Science, STAT W4105 Probability, COMS W4121 Comptuer Systems for Data Science, or equivalent as approved by faculty advisor. Corequisites: to be completed along side of or after: STAT W4702 Statistical Inference and Modeling, COMS W4721 Maching Learning for Data Science, STAT W4701 Exploratory Data Analysis and Visualization, or equivalent as approved by faculty advisor.
This course provides a unique opportunity for students in the M.S in Data Science program to apply their knowledge of the foundations, theory and methods of data science to address data science problems in industry, government and the non-profit sector. The course activities focus on a semester-length data science project sponsored by a faculty member or local organization. The project synthesizes the statistical, computational, engineering challenges and social issues involved in solving complex real-world problems.


Nine (9) credits of elective courses should be drawn upon existing graduate level courses at Columbia University.  In addition to advisor approval, elective course selection will be subject to course prerequsities, course availability, and the cross-registration procedures of the school/department offering the requested courses.


If you would like to learn more, or if you still have questions about the admissions application process or the academic opportunities through the Data Sciences Institute, please refer to our Frequently Asked Questions or sign up for one of our regularly scheduled online information sessions.

DSS Home |  About Us |  Contact Us |  Site Index |  Subscribe | What's New
Please Tell 
Your Friends about DSSResources.COM Copyright © 1995-2021 by D. J. Power (see his home page). DSSResources.COMsm was maintained by Daniel J. Power. See disclaimer and privacy statement.