Data Analytics in the Undergraduate Curriculum
By David Bressoud @dbressoud
The National Academies will be holding a Roundtable on Data Science Postsecondary Education: Motivating Data Science Education through Social Good on December 10, 2018. Event Website
If I had to choose the most common job title for students who have graduated from Macalester with a degree in Mathematics, it would be analyst. Our graduates seldom wind up in jobs where they have to find derivatives or integrals, solve differential equations, or even find eigenvalues. Instead, they are almost always working with and trying to make sense of the data that can inform and shape business decisions. The habits of mind intrinsic to mathematics have generally prepared them for this role. But as the data available to business and industry has exploded in quantity and complexity, there is a growing need for graduates familiar with the increasingly sophisticated tools available for its analysis. The challenge to our colleges and universities is to provide the education that will equip graduates to become the data analysts that we need today and for the future.
In response to this need, the National Academies have produced a report, Data Science for Undergraduates: Opportunities and Options, that provides a framework for building an undergraduate program in data science. Reflecting the necessarily interdisciplinary nature of such a program, the program is the joint work of the National Research Council’s Computer Science and Telecommunications Board, Board on Mathematical Sciences and Analytics, Committee on Applied and Theoretical Statistics, and Board on Science Education. The official roll out of the report is December 10, 2018 at the round table described at the top of this column.
The need is immense. The report references an estimate that by 2020 the U.S. will have positions for 2.7 million data analysts (p. 1-2). Meeting this need is frustrated by many obstacles, not least of which is the fact that few students understand what data science means or entails. Data analysis is also necessarily highly interdisciplinary, requiring new undergraduate programs that draw on expertise in computer science, mathematics, and statistics. As the report forcefully states, no single one of these fields adequately covers the core concepts of data science. It can only be taught as an interdisciplinary program. The breadth that is needed is reflected in this passage from the report:
Building on the work of De Veaux et al. (2017), the committee puts forth the following key concept areas for data science: mathematical foundations, computational foundations, statistical foundations, data management and curation, data description and visualization, data modeling and assessment, workflow and reproducibility, communication and teamwork, domain-specific considerations, and ethical problem solving. (p. 2-7)
The report goes into a detailed exploration of the necessary contributions from each of these concept areas.
It also briefly describes programs for majors in data science at the University of Michigan, Smith College, Virginia Tech, UC San Diego, University of Rochester, MIT, UC Irvine, and the NYU School of Professional Studies, programs that are variously housed within a business school, a department of mathematics or statistics, or a computer science department. The report describes a variety of data science minors and highlights the need to provide a basic understanding of data science for all undergraduates.
Macalester College has its own minor in data science. We are particularly well situated for such a program since we have a single department of Mathematics, Statistics, and Computer Science. This is a department that is strong in all three areas and has a long history of cooperation among these disciplines, including several cross-disciplinary faculty hires.
Our data science program begins with Introduction to Data Science, a course on the handling, analysis, and interpretation of big data sets that is intended to be accessible to all students. Students minoring in data science need two computer science courses, which could include our junior-level course in Database Management Systems. They also take Introduction to Statistical Modeling plus a course in Machine Learning, Survival Analysis, or Bayesian Statistics, and two courses in a single domain such as bioinformatics that provide an opportunity for the application of data science methods. A complete description of Macalester’s data science minor can be found at here.
Most math departments lost their faculty who worked in computer science decades ago. Statistics has long been a separate department at many universities. Far too often applied mathematics has been spun off, leaving a department that is increasingly insular, isolated from some of the most important developments in the mathematical sciences today. Separate departments are not necessarily a bad idea provided they are able to work collaboratively and share the work that transcends existing boundaries. If they are to serve their students, today’s departments of mathematics must be engaged in the process of shaping and delivering programs in data science.
Read the Bressoud’s Launchings archive.
References
De Veaux, R., M. Agarwal, M. Averett, B.S. Baumer, A. Bray, T.C. Bressoud, L. Bryant, et al. 2017. Curriculum guidelines for undergraduate programs in data science. Annual Review of Statistics and Its Applications 4:2.1-2.6.
National Academies of Sciences, Engineering, and Medicine. 2018. Data Science for Undergraduates: Opportunities and Options. Washington, DC: The National Academies Press. doi.org/10.17226/25104.
National Academies of Sciences, Engineering, and Medicine. 2018. Roundtable on Data Science Postsecondary Education: Motivating Data Science Education through Social Good. www.eventbrite.com/e/motivating-data-science-education-through-social-good-registration-51307330607