Researchers have access to more digital text than ever before, from websites to newspaper articles to books. This availability offers the potential to answer sweeping questions about the evolution of literature and language at scales previously unheard of--so long as we can actually makesenseof all the data we have. Research in natural language processing has provided us with powerful statistical techniques to model the behavior of text within a large collection of documents. However, using and interpreting such models can present a challenge to those whose expertise lies outside the field of statistics. In my research, I design, develop, and evaluate visual techniques for putting statistical text analysis into the hands of researchers with a wide variety of backgrounds.

This summer, I hope to hire 2 students to take part in ongoing research in this vein. In particular, I am looking for students to help me with the following projects:

Character sonic signatures:Different characters within literature are sometimes attributed with different voices--not just in the types of words they use, but in thesoundof their speech. For instance, in Shakespeare’sOthello, the titular character is sometimes described as having slower, rounder speech when compared to the quick, staccato dialog of the villain Iago. This past summer, we found that we were able to algorithmically detect such differences between characters, and built visualizations to help explore those differences. This summer, we will be working with humanities scholars to analyze the differences we have found, and broaden our questions to compare characters within different authors, time periods, and more.

Visual tuning of statistical text models:Text scholars, including those in the humanities, are becoming increasingly practiced at incorporating statistical models into their analysis. However, though there are a variety of tools to help them explore such models, the act oftraininga good statistical model on a body of text lies somewhere between a black box and an art. In this project, students will design and develop visual techniques to open up pieces of this black box, helping researchers incorporate their domain expertise into the tuning of the models they use.

The precise trajectory of these projects is open-ended, to be steered by the particular backgrounds and interests of the students involved. Potential useful experience would include familiarity with statistical models, machine learning, visualization, or the digital humanities (though none of these are required!). Accepted students would be required to take a 1-credit independent study during the spring to prepare for their project.

Project: Mental Models of Home Networks (Amy Csizmar Dalal)

Empowering people to better understand, operate, and troubleshoot the increasingly complex computer networks within their homes is an active area of research within the fields of computer networks and human computer interaction (HCI). Such cross-field research promises to create technical solutions that are user-friendly and that don’t assume that there is a resident technical expert within every home. This last part is important as home networks grow increasingly sophisticated and ubiquitous, and as more people work, play, and run various aspects of their lives digitally within the home.

While proposed solutions aim to create more agency for people to solve their own home networks’ operational issues, such solutions often utilize highly technical language when presenting information. In this project, we'll explore the assumption that people understand these technical terms by examining the mental models people apply to their home networks. The methodology for this study includes semi-structured interviews of non-technical home owners in a metropolitan area of the Midwest. The goal is to use the insights from these interviews to design a larger and more comprehensive survey of the language that non-technical people utilize and comprehend when discussing home network performance, and ultimately to design more intuitive, more effective tools for home network maintenance and troubleshooting.

The exact details of what students will be working on depends on the state of the project at the start of the summer as well as student interests and backgrounds. However, tasks are likely to include some combination of the following:

1. Analyzing subjective data from existing interview transcripts. This includes transcribing and coding the data to identify trends and themes from the interviews.

I expect to hire 2-3 students for 6-8 weeks during the summer of 2018 for this project. Ideally, students should be available to participate in an independent study spring term to read papers and learn the techniques you'll be using this summer. At a minimum, students should have completed CS 111 by the start of spring term. CS 257 and CS 344 are helpful, but not required.

How well do you understand Git, the version control system used both in our classes and by software developers around the world? This XKCD (https://xkcd.com/1597/) conveys the relationship that many people seem to have with it. It's an amazingly cool tool, but it's weird and confusing. I want to better understand what's hard about it, and help people understand it better. To that end, students and I have built a tool called Elegit (find it at elegit.org), which is a work in progress to help students use Git with the particular goal of helping users see what is going on.

But it's still under development. It needs stability fixes, bug fixes, new features; and perhaps more important, we need to see what students are learning by using it.

This project is a combination of software development work and CS education research. Immediate goals are to dive into a fairly extensive backlog of software issues (https://github.com/dmusican/Elegit/issues). This is a fabulous chance to make a contribution to a meaningful software product in development, and get your code out there in use. Additionally, we will engage in testing with students to assess the usability of the system, and to begin to study how Elegit changes how students learn to understand Git. One side effect of doing all this is that by working on the project, you'll learn a lot yourself about how Git works!

Students who sign on to the project should be available to participate in an independent study during the spring of 2018 to begin working on the project.

Cancer is a disease resulting from the accumulation of genomic alterations that occur during the individual’s lifetime and cause the uncontrolled growth of a collection of cells into a tumor. These mutations occur as part of an evolutionary process that may have begun decades before a patient’s diagnosis. Better understanding about the history of a tumor’s evolution over time may yield important insight into how and why tumors develop as well as which mutations drive their growth. While recent algorithmic progress has led to improved inference of tumor evolutionary histories, there is still a very challenging task.

This summer students in my group will be working to investigate and characterize practical limitations resulting from different aspects of available DNA sequencing technologies to inferring tumor evolution. The exact details of what students will be working on will depend on their interests, background and how the project progresses prior to the start of summer. Aspects of the project that students may likely work on include:

Performing computational analysis of real DNA sequencing datasets to determine what complexities exists in a range of different tumor samples using both existing software packages and code written by the student.

Creation of simulated datasets using analysis of real DNA sequencing datasets to mimic the complexities of real data.

Students working on these tasks may gain experience working with large datasets, using large multi-core machines, writing multi-threaded code and will become familiar with a number of DNA sequencing analysis software packages and tools.

I expect to hire multiple (potentially up to 4) students for this project. Students who are accepted will work up to 10 weeks during the summer of 2018. Ideally, students should be available to participate in an independent study during the spring of 2018 to read papers, familiarize themselves with related tools/concepts, and have discussions to begin planning the project. Applicants should have completed at a minimum CS 201. Students who have taken Computational Biology, Bioinformatics or Algorithms are also strongly encouraged to apply. No specific biology background is required, just an interest in applying computational techniques to important biological problems.

Project: Computer-assistant proof system for tilings (Jed Yang)

Can a region be tiled by a set of tiles? This problem is computationally hard (NP-complete) in general. Some specific tiles are meaningful due to deep connections in mathematics. For example, counting tilings of triangular regions with a specific set of 3 tiles yields Littlewood-Richardson coefficients, which are numbers that occur in seemingly unrelated fields of mathematics. In my research program, I make small modifications to these tiles. Most of these variations produce tiles with uninteresting qualitative or quantitative behaviours. Recently, we discovered some tiles that are related to well-studied mathematical objects. However, the computational complexity of tiling with these special tiles are not yet known.

This summer, I would like to work with students to use computers to prove theorems about these (and other) tiles. For example, typically, to prove that tiling with a specific set of simple tiles is NP-complete, one constructs gadgets and link them with wires to perform (universal) computation. I would like to automate this process. Specifically, you will develop software that creates gadgets that can be used in NP-completeness proofs, or even generate entire proofs directly. Besides proving new theorems, you will also gain skills in algorithm design, implementation, and software development. Depending on your interests and background, we may also work on the more mathematical side of tiling theory.

I plan on hiring 2-3 students for 5-7 weeks of research. CS 201 or its equivalent is appropriate preparation for this project. In particular, CS 254 (or other math background) is not required. However, students with strong interests in mathematics are encouraged to apply. Ideally, students should be available to complete a 1-credit independent study during the spring to read papers, familiarize with background, and plan for the summer.