Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

4
Starting a Data Science Program
Starting a new data science program is challenging, to say the least. As with any new academic
program, a curriculum needs to be determined, resources and faculty need to be found, and some means
of assessment needs to be implemented. However, data science programs pose particular challenges
owing to their interdisciplinary nature, the broad set of topics they encompass, and the acquisition of data
and large-scale computational infrastructure they require.
Thus, launching a new undergraduate program in data science may be a significant undertaking in
many institutions. Administrators and program developers will face myriad decisions. Should a new
department be created to support this program? Or should existing departments take on the challenge?
Solely, or in collaboration with other departments? What content/level of high school knowledge will be
useful or required of students entering the data science program? How will data science be integrated into
the curriculum? Should it be included at the very beginning of a studentâs coursework? After some
prerequisite coursework? Or as a capstone? Will it be a major, a minor, a general education requirement,
or all of the above? Which mathematics, statistics, and computer science courses should be required of
data science students? Should these be taught as separate courses? Or should the content be integrated?
How might institutions appropriately utilize the collections of online resources available and âdownscaleâ
these to appropriate levels if they are focused on more advanced training? Institutions developing
undergraduate programs will also need to consider how ethics and communication will be included in the
curriculum, as well as how to ensure that the program is accessible to students from varied backgrounds.
New data science programs require resources, broad discussions with faculty and leadership
across the institution, and perhaps approval through formal bodies. The backing of the administration as
well as broad support from multiple departments is typically necessary, and attention to the costs and
funding model from the outset can greatly increase the chance of success.
As institutions examine how best to provide data science education to their students, one solution
may be to reconstitute, combine, or reenvision already existing curricula. Much of the research on how
best to teach science, technology, engineering, and mathematics (STEM) concepts will be readily
applicable. (See the discussion of data acumen attributes in Chapter 2 of this report for examples of
introductory and advanced concepts.) While some coursework could be immediately swapped into a data
science program, it is likely that this will take more forethought and planning to appropriately consider
the learning outcomes and content knowledge data science students need to have. In a number of
programs (see Chapter 3), the first official data science offering is a brand-new class, meant to serve as a
rich introduction to what it means to practice data science. In institutions with less funding or expertise
for course development, the need to get a program up and running may push toward more borrowing of
4-1
PREPUBLICATION COPYâSUBJECT TO FURTHER EDITORIAL CORRECTION

content if not whole courses. However, a strong data science program is likely to need eventually to move
beyond âpatching togetherâ a curriculum or class.
In this section, the committee describes the key challenges that academic institutions will face as
they set up a program. But this section begins with an opportunity: an important element of program
design is ensuring that the program is welcoming and inclusive to all students, regardless of their identity-
related characteristics or educational background and attainment.
ENSURING BROAD PARTICIPATION
According to the South Big Data Innovation Hubâs Keeping Data Science Broad, âthe variety of
perspectives such diversity [in terms of race, gender, religious affiliation, socioeconomic status, ethnicity,
and first-generation status] provides is as essential as that provided by the transdisciplinary nature of data
science for innovation and growth of the fieldâ (Rawlings-Goss, 2018, p. 29). The report explains that the
first step in creating a more inclusive environment is to ensure that students and faculty alikeâat all types
of educational institutionsâhave equitable access to resources (e.g., high-quality data, tools, technology,
adaptable and appropriate curriculum, and advisors). Also crucial to retaining broad participation in data
science were a âculturally relevant curriculum,â a more diverse faculty, and collaborations between
majority-serving and minority-serving institutions (Rawlings-Goss, 2018, p. 31).
Thus, it is the responsibility of academic institutions to ensure inclusion and broad participation
and engagement in data science programs. Master (2017) suggests that data science programs at higher
education institutions increase exposure to data science fields, broaden beliefs about who belongs in these
fields, challenge studentsâ beliefs about fixed abilities, and show that data science can make a difference
in society in order to broaden participation and engagement in data science. Williams (2017) suggests that
faculty adjust curriculum to be more inclusive, create opportunities for students to engage in community
data, affirm student ability, and create diverse teams of students. The efforts highlighted by Master and
Williams not only lead to increased engagement, but they also stand to sustain participation of
underrepresented populations in data science. If data science is to avoid a similar decrease in participation
that occurred in the 1980s in computer science among female students, it is imperative that
underrepresented students are supported both academically and through mentorship, recognizing the
opportunities that the field of data science presents and the value they can add to it.
Some of the introductory data science courses described in this report have made inclusion and
broad participation a central goal, shaping pedagogy, technical infrastructure, and staffing. Some notable
steps include the following:
â¢ Designing the material to avoid the need for mathematics, statistics, or programming
prerequisites beyond that required for entry to the academic institution, thereby avoiding
demographic skews that such prerequisites might induce.
â¢ Using a computing infrastructure that does not rely on personal laptops or access to computer
labs; possibly hosting the infrastructure entirely in the cloud so that it can be accessed
through a web browser.
â¢ Providing teams of laboratory assistants and tutors to give additional support for students
needing assistance.
â¢ Choosing project topics carefully to be of broadest interest and to raise awareness of social
issues.
4-2
PREPUBLICATION COPYâSUBJECT TO FURTHER EDITORIAL CORRECTION

â¢ Operating a cohort-based âdata scholarsâ program1 in concert with the instructional program
to address issues of underrepresentation.
Additionally, the huge opportunity of data science to be a gateway to STEM careers should be
emphasized. The wide range of applications of data science to multiple fields, including humanities,
social sciences, and the arts, expands the reach of STEM into society. Couching data science in terms of a
life skill and a cultural pursuit can help reshape the image of science and increase the number of students
interested in STEM fields. Therefore, use cases should be drawn not just from other STEM or scientific
disciplines; they should also be drawn heavily from the arts, humanities, social sciences, and popular
culture to attract new entrants to the field.
As many data science programs are being freshly created, ample opportunity exists to build broad
participation from the beginning. While not a panacea, there are several actions that data science
programs can take to broaden participation. The Joint Working Group on Improving Underrepresented
Minority Persistence in STEM offered the following recommendations for broadening participation of
underrepresented minorities in STEM programs (Estrada et al., 2016):
â¢ Track and increase awareness of institutional progress toward diversifying STEM;
â¢ Create strategic partnerships with programs that create lift;
â¢ Unleash the power of the curriculum and active learning;
â¢ Address student resource disparities; and
â¢ Stimulate studentsâ creativity.
The Joint Working Group points out that there are many programs that have been successful in
attracting and retaining underrepresented students in STEM disciplines. For example, in 2017, over half
of the computer science graduates from Harvey Mudd College were women (Williams, 2017). Harvey
Mudd has succeeded in attracting and retaining underrepresented students in part owing to its
commitment to fostering a âgrowth mindsetâ in these students (Dweck, 2006). Harvey Mudd faculty teach
problem solving using real-world examples, offer four unique styles of one introductory computer science
course based on student knowledge and interest, require students to work together in completing
homework assignments, and actively encourage students to enroll in a subsequent computer science
course (Williams, 2017). To engage potential future students in STEM, Harvey Mudd also hosts a
conference at which African American and Hispanic middle and high school girls have the opportunity to
build partnerships with professional women practicing in STEM fields. Data science programs may
benefit from looking to these and other successful STEM initiatives as models for attracting and retaining
underrepresented students.
The Joint Working Group also suggests that such programs are most effective when coupled with
assessment and evaluation data that âclearly show the amount of progress or disparity that exists at the
institutional levelâ (Estrada et al., 2016, p. 8). Another way to increase participation is to avoid filter or
gate-keeping courses (especially early in the program) and replace them with courses that entice student
participation through heightening the excitement and applicability of data science. It may also behoove
data science programs to consider which faculty are teaching first-year or introductory data science
courses, ensuring that these faculty members can connect with and engage students.
Data science programs also need to embrace multiple entrance points into the disciplineâthink of
the metaphor of a âwatershedâ in which students from a variety of educational backgrounds and fields can
enter, rather than a âpipelineâ from one or more particular fields into data science. Varma (2006) agrees
1
The website for the University of California, Berkeley, Data Scholars Program is
https://data.berkeley.edu/education/data-scholars, accessed February 20, 2018.
4-3
PREPUBLICATION COPYâSUBJECT TO FURTHER EDITORIAL CORRECTION

that the notion of the pipeline does not extend far enough, as underrepresented minority students face
heightened entry and retention barriers. In combination with the recommendations from the Joint
Working Group, increasing teacher assistant training, awareness for advising staff, communication
between students and faculty, and partnerships with high school teachers could help postsecondary data
science programs retain a more diverse student body (Varma, 2006). Curricular options such as minors or
data science add-ons to substantive disciplines are two possible ways to open the data science
enrollments. In addition to focusing on programs in science fields and broadening participation there,
programs in popular areas such as music or communications could be targeted for outreach.
Finding 4.1: The nature of data science is such that it offers multiple pathways for students of
different backgrounds to engage at levels ranging from basic to expert.
Finding 4.2: Data science would particularly benefit from broad participation by
underrepresented minorities because of the many applications to problems of interest to diverse
populations.
Recommendation 4.1: As data science programs develop, they should focus on attracting
students with varied backgrounds and degrees of preparation and preparing them for
success in a variety of careers.
ACADEMIC INFRASTRUCTURE
The popularity of data science courses and programs will affect academic infrastructure in several
waysânotably, in terms of who will âownâ the program and how it will be delivered. Faculty and
administrators will need to examine how the goals of data science education align with the institutionâs
current infrastructure. What departments should be involved? What colleges? Because data science
intersects with mathematics, statistics, computer science, and other domains, institutions need to consider
whether data science needs to become a stand-alone department or be integrated with other departments.
Administrators will need to consider ways to motivate departments to work with one another across
disciplines, and department chairs will need to consider ways to motivate their faculty to participate in the
implementation of innovative new curricula, whether or not they are in the âhomeâ department. This
holistic approach toward data science education is crucial, particularly given the interdisciplinary nature
of the field of data science.
Furthermore, given its interdisciplinary nature, a new data science program at the undergraduate
level needs to involve the collaboration of several disciplines and programs. However, few instructors are
likely to be available who are equally able to teach classes in the full complement of fields. Initially, at
least, creative ways of involving faculty from multiple departments is likely to be necessary, so that they
can learn from each other and so that students get the broad view of data science that the committee
envisions.
However, cross-departmental or institutional collaboration to develop data science programs may
prove easier in theory than in practice. In some colleges and universities, academic tribalism and the
increased importance of tuition generation might impede these programs from being truly
interdisciplinary. Thus, the flexibility to hire or train faculty in the multiple aspects of data science will be
necessary to ensure that all programs still achieve their educational goals.
As one example, consider Virginia Techâs solution to the organizational model. Virginia Tech
offers a major in computational modeling and data analytics. The departments that host the major (i.e.,
computer science, statistics, and mathematics) span two colleges (i.e., the College of Engineering and the
4-4
PREPUBLICATION COPYâSUBJECT TO FURTHER EDITORIAL CORRECTION

College of Science), making interdisciplinary communication and cooperation extremely important. To
foster productive collaboration among and within its five interdisciplinary programs, the College of
Science set up the Academy of Integrated Science, which is a department-level organizational structure
that helps interdisciplinary programs by managing budgets, undergraduate advising, student recruitment,
and assessment. Having such a body in place allows faculty to focus solely on developing and delivering
curriculum to the students. The Academy of Integrated Science also develops a memorandum of
understanding for new faculty hires that establishes their roles in both their home departments and in the
interdisciplinary programs (Embree, 2017).
Such cross-departmental collaboration requires new mechanisms for both funding and
encouragement. Opportunities for a wide variety of faculty to participate in data science programs will
need to be created, as will incentives and rewards for those faculty teaching data science. Rewards
systems more generally may need to adjust to place greater value on teaching more students, especially
when that means there will be greater diversity in their level of preparation. As data science begins to
enter conversations in many disciplines, educators and administrators will have to consider the roles of
the humanities, social sciences, and arts programs. There are also opportunities for developing programs
for students in non-STEM fields, although there are risks that these become âdata science-liteâ programs
that add limited marketable or intellectual value to students.
Several specific hurdles to launching and sustaining data science programs have been
encountered and to some extent overcome at various academic institutions. Some of these challenges are
associated with growing pains of starting up any new program that is in high demand:
â¢ Overcoming initial resistance. One of the first challenges prospective programs have to
overcome is initial resistance by established departments and programs to launching a new
program that is in intellectual proximity and competes for tuition dollars and other resources.
This is especially challenging for data science, as it has a large footprint across many
professional, scientific, and engineering disciplines.
â¢ Recruiting and retaining faculty. Another important challenge is recruiting faculty to create
and teach integrative introductory courses in data science and to serve as advisors and
mentors for data science students. Departmentally centered tenure and promotion criteria may
lead junior faculty to be reluctant to devote much time to launching new programs. An
additional challenge has been retention of data science faculty in an economic environment
where faculty are increasingly lured away by industry.
â¢ Developing curricula. It is often challenging to develop a consensus on a core curriculum that
best serves the various interests and backgrounds of data science students. In this era where
many of the existing data science-related courses are oversubscribed, other departments can
be reluctant to enroll data science students in their popular courses (e.g., machine learning,
data mining, natural language, applied statistics) because doing so may take seats away from
their own students.
â¢ Providing physical space. To be most effective, data science programs need flexible physical
space to create the collaborative environment in which their students thrive. Such well
situated space is often scarce.
â¢ Facilitating interactive experiences. There is a lack of sustainable and scalable models for
capstone programs and similar experiential integrative experiences that have been shown by
the Association of American Colleges and Universities (2013) and others to be high-impact
educational practices.
â¢ Encouraging industry partnerships. With high turnover in the industry workforce, colleges
are facing the challenge of building lasting industry relationships to keep education and
training well matched to the needs of the rapidly evolving data science workforce.
4-5
PREPUBLICATION COPYâSUBJECT TO FURTHER EDITORIAL CORRECTION

Additional infrastructure considerations include enrollment budgets, strategies to build a data
science major curriculum (i.e., prerequisites, introductory, advanced, applied, capstone), and ways to
align general education requirements with data science. Institutions will need to consider how to provide
and share resources for their varied data science experiences (e.g., textbooks, teaching materials, open
access, clearinghouse). Advising will also be important for the success of data science undergraduate
programs. Formal evaluation methods will need to be implemented to gauge the success of these
programs and improve them. The Moore-Sloan Data Science Environments (2018) have put forth some
suggestions on creating institutional change in data science, including establishing a neutral space for
students and faculty to gather, providing access to professional data scientists and research software
engineers who can assist and serve as role models, developing a data science consulting capability,
considering the scalability of data science educational initiatives, encouraging software and data openness
and reuse, and involving a wide range of people in data-intensive discovery.
Another challenge will be that many fields involved with data science are themselves
experiencing rapid change and evolution. As a consequence, data science curricula will also likely evolve
rapidly, and programs need to be ready and willing to adapt. This will undoubtedly lead to the same types
of questions that have been explored in computer science and other rapidly evolving fields in past years.
Last, it behooves institutions to consider the alternative pathways students might take into data science by
removing obstacles and barriers for students who want to change their concentration to data science
during the course of their studies or making it easier to add a data science minor. Overcoming these
challenges will require institutions to broker between competing interests, to recruit new faculty and staff
in data science, and to make strategic long-term investments to sustain the activity.
Finding 4.3: Institutional flexibility will involve the development of curricula that take advantage
of current course availability and will potentially be constrained by the availability of teaching
expertise. Whatever organizational or infrastructure model is adopted, incentives are needed to
encourage faculty participation and to overcome barriers.
Computational Infrastructure
A major driver of data science education has been the evolution of data and the infrastructure for
accessing it and analyzing it. Hands-on experience with the entire data science life cycle is an essential
part of the training and education of data science students, regardless of the educational modality. In
particular, students need to be taught how to handle large amounts of data and how to run scalable but
sophisticated analysis software on the dataâoften requiring distributed data storage, multicore
processing, and parallel computation. However, maintaining large complex data sets and high-
performance computing systems on college campuses strains the resources of educational institutions.
While several of the larger research universities retain high-performance computing and large server
facilities, most universities and colleges are in the process of transitioning their computing and storage to
cloud service providers that provide students reliable access to their data and the computational resources
to run algorithms against the data. Thus, the cloud has played and will continue to play an important role
in transforming data science education. A logical next step might be for colleges to band together to
federate these cloud resources under an âacademic cloud.â 2 Such a federated academic cloud could
provide common platforms for students across the nation, facilitating data integration and analysis,
reducing costs to educational institutions, and balancing inequities in access to instructional resources.
2
The National Science Foundation, for example, invested $20 million in academic cloud computing in 2014
(Boland, 2014).
4-6
PREPUBLICATION COPYâSUBJECT TO FURTHER EDITORIAL CORRECTION

Finding 4.4: The economics of developing programs has recently changed with the shift to
cloud-based approaches and platforms.
CURRICULUM
As discussed in Chapter 2 of this report, there is a progression of topics and skill sets that will
guide students to develop data acumen. Key concepts required to develop data acumen include
mathematical foundations, computational foundations, statistical foundations, data management and
curation, data description and visualization, data modeling and assessment, workflow and reproducibility,
communication, domain-specific considerations, and ethical problem solving. These skills then become
transferable into a range of data science positions in the workplace.
Each undergraduate modality discussed in Chapter 3 offers a unique pathway to various data
science careers. The degree to which each concept or skill is emphasized in each modality depends upon
the respective career trajectories. While a 4-year data science degree may be most appropriate for some
data scientists, a 2-year associateâs degree may be better suited for others. And while a boot camp may
help prepare a business professional to incorporate data analytics in the workplace, a data science minor
may offer valuable training for data-driven decision makers in a variety of fields. It is important to note
that, as the field of data science continues to evolve at a rapid pace, it will often be necessary to reevaluate
the types of careers utilizing data science as well as the data science skill sets necessary to achieve
success in those careers.
FACULTY RESOURCES
Mirroring the variety of pathways for data science education discussed in Chapter 3, there are a
number of ways in which data science courses may be taught. Some data science courses, owing to their
interdisciplinary nature, are taught either by a team of faculty or by two faculty with the appropriate areas
of expertise to cover multiple perspectives. Though this approach offers the most well-rounded
experience for students, it can be difficult to find the administrative support and additional resources
needed. It remains challenging to recruit appropriate new faculty to teach both introductory data science
courses and courses for the data science major or minor. Faculty need to have multiple experiences with
data science projects to develop the perspective to guide their students. These faculty need to be diverse
and have the ability to serve as role models for future data scientists, while also meeting competencies in
the practical data acumen areas discussed in the previous section and in Chapter 2 of this report. As the
field of data science expands, faculty are likely to be needed in an even broader range of competencies.
Considerations for current faculty are also necessary, as many will need to be retrained in new
data science methods and tools, both of which will continue to evolve rapidly in the coming years.
Faculty will also benefit from professional development in new teaching approaches to best meet the
needs, learning styles, and knowledge levels of future undergraduate students. Such training will be
especially useful for faculty teaching introductory classes composed of students with various academic
backgrounds and career interests. Funded by the National Science Foundation, Training a New
Generation of Statistics Educators 3 is an example of a program that creates professional learning
communities whose members participate in workshops, mentorship programs, and national conferences,
all in an effort to increase their statistical content knowledge and improve their teaching. The current
3
The website for this National Science Foundation-supported project is
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1432251, accessed February 6, 2018.
4-7
PREPUBLICATION COPYâSUBJECT TO FURTHER EDITORIAL CORRECTION

Data science is emerging as a field that is revolutionizing science and industries alike. Work across nearly all domains is becoming more data driven, affecting both the jobs that are available and the skills that are required. As more data and ways of analyzing them become available, more aspects of the economy, society, and daily life will become dependent on data. It is imperative that educators, administrators, and students begin today to consider how to best prepare for and keep pace with this data-driven era of tomorrow. Undergraduate teaching, in particular, offers a critical link in offering more data science exposure to students and expanding the supply of data science talent.

Data Science for Undergraduates: Opportunities and Options offers a vision for the emerging discipline of data science at the undergraduate level. This report outlines some considerations and approaches for academic institutions and others in the broader data science communities to help guide the ongoing transformation of this field.

Welcome to OpenBook!

You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.