A scene from the 2018 Neurohackademy on Aug. 10, 2018 in the Alder Commons on the University of Washington campus.Alex Alspaugh/University of Washington

Each night, high-definition cameras mounted to telescopes collect terabytes of data about objects in the sky. Each day, scientists sequence the genomes of people, animals, plants and microbes for biomedical and evolutionary research. Each year, the Large Hadron Collider produces 30 petabytes of data on particle collisions.

Science has become a big-data endeavor. But scientists are not universally adept in “data science” — the computing and statistical skillsets needed to handle, sort, analyze and draw conclusions from big data. The shortage of know-how in data science can hamper research, medicine and even private industry.

Now a team from the University of Washington, New York University and the University of California, Berkeley has developed an interactive workshop in data science for researchers at multiple stages of their careers. The course format, called “hack week,” blends elements from both traditional lecture-style pedagogy with participant-driven projects. The most recent was a neuroscience-themed event held in July on the UW campus. As the team reports in a paper published Aug. 20 in the Proceedings of the National Academy of Sciences, participants rated the hack weeks as opportunities to learn about new concepts, foster new connections, share data openly, and develop skills and work on problems that will positively affect their day-to-day research lives.

Participants work on their projects at the 2018 Neurohackademy on Aug. 10, 2018.Alex Alspaugh/University of Washington

“The idea behind hack week was to bring together people who were interested in data science and give them a place to meet, talk and exchange ideas,” said lead and corresponding author Daniela Huppenkothen, associate director of the UW’s astronomy-focused DIRAC Institute. “But instead of a traditional format with experts lecturing nonexperts, this would allow participants to mingle more and teach one another.”

Huppenkothen was involved in the inaugural hack week event, “Astro Data Hack Week,” held at the UW in 2014. That event brought together big-data researchers in astrophysics and cosmology. Since then, the team has held four additional Astro Hack Week events, three “Neuro Hack Week” events for neuroscience and two “Geo Hack Week” events for the geosciences.

All hack week events have the same basic design and organizing principles. They usually commence with some structured periods for instruction, and then shift toward time for participant-driven, open-ended projects, as well as peer networking and free discussion. The projects can resemble a hackathon, but with greater emphasis on collaboration and learning rather than specific outcomes. Hack week participants tackle their projects in smaller groups, with organizers circulating to observe and provide feedback or encouragement.

The projects range from experiments that the participants brought from their home institutions to ideas that come up during the course. One project from the inaugural Astro Hack Week, for example, eventually became Stingray, a software project to provide algorithms to analyze time-series data in astronomy. At last month’s Neurohackademy, a new two-week version of Neuro Hack Week, one team worked on developing common ways to analyze different types of MRI scans.

The events’ open-ended structure places greater responsibility on the organizers of each hack week.

Participants collaborating on chosen projects at the 2018 Neurohackademy on the UW campus.Alex Alspaugh/University of Washington

“A hack week takes a different kind of preparation, because you don’t have the security of ‘falling back’ on the structure of traditional talks and lectures,” said co-author Anthony Arendt, a research scientist with the UW Applied Physics Laboratory who has organized Geo Hack Week. “You have to set up ways to encourage participants at all levels of ability and comfort — creating a welcoming space for everyone to pitch ideas.”

Most hack weeks organized by the team cap the number of participants at 60. Organizers also strive to select participants to maximize diversity — including scientists of different abilities, backgrounds and at different stages of their careers. Participants also agree to abide by a code of conduct that emphasizes respect and positive interactions.

In surveys conducted after eight hack weeks, participants ranked the events positively as spaces to learn, teach, network and foster relationships. More than three-quarters ranked the hack weeks as successful learning experiences, while two-thirds reported teaching skills to someone else. This feedback was constant across different backgrounds, showing that the unique format of hack weeks helps all participants feel included, said Huppenkothen.

“Now we want other scientific communities to learn about our experiences and see how they might start organizing their own events,” said Huppenkothen. “We also want feedback from other communities — both good and bad — and to widen the dialogue about data science and skill development.”

Aftermath of a brainstorming session at the 2018 Neurohackademy.Alex Alspaugh/University of Washington

Their paper includes supplementary materials detailing the hack week experiences and advice for other groups interested in starting their own workshops.

Participants gave hack weeks high scores for promoting open-science principles — in which researchers publicly post and share their datasets, code and methods. Open science principles are critical to addressing challenges that researchers face in making their research more reproducible, said co-author Ariel Rokem, a data scientist with the UW eScience Institute and co-organizer of the recent Neurohackademy, along with Tal Yarkoni at the University of Texas at Austin.

“One of our goals with the hack week format is to elevate the quality of science being done,” said Rokem. “The best way to do that is to try out ideas and share what you’ve learned.”

Additional co-authors are David Hogg with the NYU Center for Data Science; Karthik Ram at the Berkeley Institute for Data Science at the University of California, Berkeley; and Jake VanderPlas at the UW eScience Institute. The research was funded by the National Institutes of Health; the University of Washington; New York University; the University of California, Berkeley; the Charles and Lisa Simonyi Fund for Arts and Sciences; and the Washington Research Foundation.