Goal

If we understand how students make choices on their way
through college, we can improve decision support for these
important pathways. Courses chosen today have impact on
other courses being options down the road, versus remaining
closed for lack of prerequisite knowledge. Overly narrow
course choices leave on the table important contributions
that college can make to students' lives. A first goal is to
understand what Stanford students have chosen over the past
18 years. That understanding can inform both future
students, and university policy.

We will use historic enrollment, and possibly other
data. Already available are vector embeddings from courses
taken by past students. An exploration of the corresponding
clusters is a first step. But we plan to generate a network
of course sequences and their frequencies, to then apply
network analytics to these structures. Our hope is to find
course-taking patterns, and instances of unusual, innovative
course choice behaviors.

2. Discover Majors Degree Requirements.

Prerequisites: Any neural networks
course.

Goal

The University has no central place where the requirements
for obtaining a Bachelors or Masters degree described in
a unified form. Every department decides their own palette of
options. Each organization chooses how to represent the
requirement alternatives in writing. Finding and parsing
these documents would be tedious. But thousands of students
have fulfilled those requirements over the years. We will
try to derive the requirements from the history of course
taking.

We will use analytic tools, such as neural networks to
derive every department's Masters degree requirements at
Stanford, and develop a unified form to describe them.
Using eighteen years of enrollment history, and
the majors of the respective students, we hope to learn the
sometimes widely branching alternative paths to the degrees.
We will also attempt to describe
the
undergraduate requirements from observed data, and
express them in the form we develop.

Computer Science

English

Example descriptions of masters
degrees in CS and English.

→

3. Breadth of Student Interests Over Time

Prerequisites: Linear algebra and
basic statistics.

Goal

Most universities encourage students to take advantage of
courses offered outside of students' focus of study. Policy
changes over time have attempted to encourage breadth of
study, particularly for undergraduates. Has the breadth of
interest changed among students during the past eighteen
years? If so, let's understand what may have triggered those
changes.

We will use vector embeddings of course choices to compute
the intellectual spread of student choices. Student choices
are motivated by requirements, and background. But given these
embeddings we will analyze how the resulting per year
distributions of spread have changed during the
past n years.

4. The Gist of Course Evaluations

Deploy NLP on course evaluation answers to the question
"What would you like to say about this course to a student who
is considering taking it in the future?"

Prerequisites:Some NLP class.

Goal

When students use Carta, they often glean information from the
textual course evaluation part. We aim to extract salient course
information from the text. If successful, the results of this work
might end up in Carta to help future students.

The first thought when thinking of applying NLP to opinions tends
to be 'sentiment analysis.' We can of course run such techniques
over evaluations, particularly because the domain of discourse is
narrow: The content is always about Stanford courses.

But more interesting will be the subtler gems. Hints such as
"Definitely do the reading every week." Or "Problem sets are only
every other week." Or "Find your project partner early, because
you will need all the time you can get for completing the
project." These hints will be harder to isolate, but could be
extremely useful as a potential addition to Carta one day.

5. Automatic Study Guides for MOOCs

Goal

We will take the view that many online courses
will be modular, like
Stanford's self-paced
database course. We will extract word clusters from
closed caption files of course videos to identify topics. We
will then attach learning resources to each topic. Resources
are relevant course forum question/answer pairs, video
snippets, Wikipedia search results, and student-identified
entities. We will use these resources to automatically
create study guides and learning hints.

We can obtain closed caption files for a number of
Stanford's online courses. These will be the source of word
clusters that each define a topic. We also have a half
billion individual 'events' of learners interacting with
Stanford's open online courses. Events are starting or
rewinding video tapes, forum posts, and assignment
submissions. Forum posts identified by an existing
poster-confusion classifier, as well as repeated incorrect
assignment submissions will serve as triggers to offer
topic- and student-specific study resources. We will need to
identify those resources automatically.

Specific Projects

Partition all video text into clusters.

Identify 'good' question/answer
pairs among course forum pairs.

Create UI for attaching help
resources to topic clusters.

Given the topic clusters, identify
relevant Web based resources.

6. Predicting Sensitivity of Coral Reefs to Heat Stress

Prerequisites: Python, CS231N.

Goal

An existing biology project is researching the impact
of artificially introduced heat stress on coral bleaching. The 400
colonies under investigation are surrounded by sand, other types
of corals, and algae. Given the heat stimuli and coral response
data, can we create a predictor of coral response from photos
taken around the corals? For example, can we help predict how
surroundings of 25% sand, 30% branching corals, 35% encrusting
algae, and 10% mounding corals predict coral response to heat?

I don't yet know details about
the number of photos, or availability of training data. In the
absence of such labeled examples we may not be able to identify
each surrounding species. But we might well still be able to
determine how color and edge density distributions predict
measured data.
Our contact will be in the field between Jan 18 and early
February, generating more photos. He is hoping for instructions
from us as to how best to shoot the images.
I will keep this entry updated as I learn more.

7. Teaching Choreography Online

Prerequisites: Python. Experience
with either HCI or distributed systems.

Goal

We will develop infrastructure and (with help) pedagogy for
teaching choreography entirely online. Choreography is the
activity of designing dances. Geographically distant
students will be able to work on dance design exercises
together. The 'performers' will be avatars of any
shape. They will operate in a 3D robotics simulation
environment. Students will continuously be able to observe
their teammates' work.

We will try to use
Gazebo, an existing high fidelity robot simulation
environment. The software was developed for the DARPA
robotics challenges, and can take into account mass
distributions of simulated avatars. Gazebo is by nature
distributed, but we many need additionally to use a
high-function distributed messaging system.
Three main elements are involved in this work. Development
of a Web based UI for easily manipulating avatars, the
distributed messaging for allowing distant Gazebo instances
to be coupled, and some choreography pedagogy. We will
consult with a professional choreographer.