Undergraduate Program

Summer REU

National Science Foundation Research Experience for Undergraduates (NSF REU)

Computational Methods for Understanding Music, Media, and Minds

How can a computer learn to read an ancient musical score? What can methods from signal processing and natural language analysis tell us about the history of popular music? Can a computer system teach a person to better use prosody (the musical pattern of speech) in order to become a more effective public speaker?

These are some of the questions that students will investigate in our REU: Computational Methods for Understanding Music, Media, and Minds. They will explore an exciting, interdisciplinary research area that combines machine learning, audio engineering, music theory, and cognitive science. Each student will work in a team with another student and will be mentored by two or more faculty members drawn from Computer Science, Electrical and Computer Engineering, Brain and Cognitive Science, the program in Digital Media Studies, and the Eastman School of Music.

PI

You are a 1st, 2nd, or 3rd year full-time student at a college or university.

You are a U.S. citizen or hold a green card as a permanent resident.

You will have completed two computer science courses or have equivalent programming experience by the start of the summer program.

It is not a requirement that you are a computer science major, or that you have prior research experience. We wish to recruit a diverse set of students, with different backgrounds and levels of experience. We encourage applications from students attending colleges that lack opportunities for research, and from students from communities underrepresented in computer science.

Before starting the application, you should prepare:

An unofficial college transcript, that is, a list of your college courses and grades, as a pdf, Word, or text file. Include the courses you are currently taking.

Your CV or resume, as a pdf, Word, or text file.

A 300 word essay as a pdf, Word, or text file explaining why you wish to participate in this REU, including how it engages your interests, how the experience would support or help you define your career goals, and special skills and interests you would bring to the program.

The name and email address of a teacher or supervisor who can recommend you for the REU.

The application website does not allow you to save and resume your application before submitting, so start the application when you have time to enter all the information.

STEP 1: Apply online no later than February 1, 2019. (application portal to open on December 1, 2018)

In the 2019 application form, you can specify your top project preferences from the list below. We will do our best to match you with one that matches your preferences and interests. You will be assigned to a project based on your background and skills.

2019 Projects (Planned)

Project #1

The project will be to develop a computational music generation system that merges features from two musical styles. We will use a dataset of classical melodies and another dataset of rock melodies. The computational system will learn pitch patterns from one dataset and rhythmic patterns from another dataset, and will merge them to create melodies that combine the two styles. An additional project might be to incorporate harmonic information from the rock dataset, adding chord symbols to the generated melodies.

Project #2

In immersive concerts, the audience’s music listening experience is often augmented with texts, images, lighting and sound effects, and other materials. Manual synchronization of these materials with the music performance in real time becomes more and more challenging as their number increases. In this project, we will design an automatic system that is able to follow the performance and control pre-coded augmented events in real time. This allows immersive concert experiences to scale with the complexity of the texts, images, lighting and sound effects. We will work with TableTopOpera at the Eastman School of Music on implementing and refining this system.

Project #3

In the past year, our 3D audio recording team has recorded over 35 concerts at the Eastman School of Music with several binaural dummy head microphones, binaural in-ear microphones, and Ambisonic soundfield microphones, including a 32-capsule Eigenmike and a Sennheiser Ambeo VR Mic. We have built a large database of 3D audio concert recordings for spatial audio research. This project plans to not only record summer concerts but also measure impulse responses in a concert hall with a variety of binaural and Ambisonic microphones. Our goal is to compare the results made with different microphones and explore the best method to measure and understand complex hall acoustics.

Project #4

Title: Assessing the Effectiveness of a Speaker by Analyzing Prosody, Facial Expressions, and Gestures

How we say things convey a lot more information than what we say. Imagine the possibility of measuring the effectiveness of a speaker, or an oncologist delivering critical information to a patient or even measuring the severity of a patient with Parkinson’s by analyzing their prosody. This project will involve using knowledge from music to inform feature extractions, use machine learning to model them and then use cognitive models to explain the outcome.

Project #5

Face-to-face interaction is the central part of human nature. Unfortunately, there are immense barriers for people with social-communicative difficulties, for example people with autism and people with hearing deficit, to engage in social activities. In this project, we seek design and technology innovation to create Augmented Reality (AR) technologies that facilitate social-communicative behaviors without interrupting the social norm of face-to-face interaction. We are looking for students with an interest in assistive technology, Augmented Reality, natural language processing and machine vision to take part in the design, interface prototyping, and evaluation of socially-aware AR environments that help people with special needs to navigate their everyday social life.

Project #6

This REU will develop a combined approach to reading damaged ancient manuscripts. Beginning with multispectral images, we will employ a combination of computer vision and natural language processing to fill in the holes in ancient texts written in Zapotec and Mixtec. The end goal will be to visualize the results with an AR/VR application.

Project #7

There is an emerging presence of AI technologies in our everyday life from voice assistants such as Echo and Google home to smart life systems such as Fitbit and Spotify music suggestion. It becomes more and more important for people without an AI background to understand fundamentals of how a machine thinks and behaves, in order to better interact and collaborate with our increasingly intelligent work and life environment. We are looking for students with an interest in education technology, tangible user interface, and intelligent social agent to join our project. The students will take part in the design, interface prototyping and evaluation of physically and socially embodied education technologies that support K-12 AI education in formal and informal learning environments.

2018 Projects

Audio-Visual Scene Understanding

Evaluating the role of audio towards comprehensive video understanding - We are interested in measuring the role of audio plays in high-level video understanding tasks such as video captioning and spatiotemporal event localization. In this project, students will design novel Amazon Mechanical Turk interfaces to be used to collect audio-oriented annotations for tens of thousands YouTube videos. They will get hands on experiences on training deep learning algorithms to run on large-scale data with the focus on joint audio-visual modeling.

Assessing the Effectiveness of a Speaker by Analyzing Prosody, Facial Expressions, and Gestures

Assessing the severity of Parkinson's disease through the analysis of a voice test - This project involves the analysis of two vocal tasks from the Movement Disorder Society-Unified Parkinson’s Disease Rating Scale (MDS-UPDRS) performed by people both with and without Parkinson's disease (PD). The tests include uttering a sentence and saying ‘uhh’ in front of the computer’s microphone. Our analysis will include extracting and identifying useful features from the audio recording and develop a novel machine learning technique to assess the severity level of Parkinson’s disease.

Computational Methods for Social Networks and Human Mobility

Investigating Human Mobility in Virtual and Physical Space - The student will develop the data analysis skills required to investigate complex system data, including python coding and statistics. They will then apply these skills to study the unexpected similarities between human mobility in physical and virtual space.

Audio Based Non-invasive Blood Pressure Estimation - With cardiovascular disease as the leading cause of death in America, constant blood pressure measurement is imperative to detect early onset symptoms. Piezoelectric sensors can be used in conjunction with a recurrent neural network in a wearable device (such as a smartwatch) to extract pulse wave velocity data and heart rate data to estimate blood pressure. The concept further expands the use of machine learning techniques and applies it to activity trackers. Although related technologies exist in the field, none of these technologies use a recurrent neural network with a piezoelectric sensor, nor is any of the said technologies achieved the status of the standard in the industry, as the field is still in its infancy. Continued research is required to develop a smartwatch which can accurately detect blood pressure; however, enough pulse wave velocity, heart rate, and blood pressure data to teach the recurrent neural network and develop a working prototype sufficient for the end of the summer.

Music and the Processing Programming Language

A Framework for Developing Music-Generated Games (Erik Azzarano, Rochester Institute of Technology) - Erik is investigating a framework for developing music-generated games based on live or external audio input. He aims to create an intuitive mapping between a game’s mechanics and features of the audio input. For example, features of the audio such as frequency, amplitude, and beats, or onsets are extracted and mapped to different game parameters to drive the experience, such as when enemies spawn, their location, and how fast they move. The goal of this project is to have a finished framework with all of the appropriate mappings between game mechanics and audio features. The framework should allow the game to suitably portray any type of music or sound input.

Applying Recurrent Variational Autoencoders to Musical Style Transfer (Adriena Cribb, University of Pittsburgh) - Artistic style transfer refers to taking the style of one piece of art and applying it to another. While this problem has seen great progress in the image domain, it has been largely unexplored in the context of music. Adriena is building a single recurrent variational autoencoder that allows harmonic style to be transferred to any degree directly between two musical piece to ultimately produce deep learning methods for compositional style transfer and tools that allow musicians to explore novel modes of composition through the recombination of stylistic elements in different pieces of music.

Deep Learning of Musical Forms

Reverse-Engineering Recorded Music

Mentors: Professors Mark Bocko and Stephen Roessner (Electrical and Computer Engineering) and Darren Mueller (Eastman School of Music). Use signal processing algorithms to discover how the same recordings were remastered over time.

Web-based Interactive Music Transcription

Mentors: Professors Zhiyao Duan (Electrical and Computer Engineering and David Temperley (Eastman School of Music). Building an interactive music transcription system that allows a user and the machine to collectively transcribe a piano performance.

The Prosody and Body Language of Effective Public Speaking

Mentors: Professors Ehsan Hoque (Computer Science), Chigusa Kurumada (Brain and Cognitive Science), and Betsy Marvin (Eastman School of Music). Measuring the visual (e.g. smiling) and auditory features (e.g. speaking rate) that cause a speaker to be highly rated by listeners.