August, 2013

The Microsoft Research Connections blog shares stories of collaborations with computer scientists at academic and scientific institutions to advance technical innovations in computing, as well as related events, scholarships, and fellowships.

I recently sponsored an event in Manizales, Colombia, training biologists on .NET Bio and BioHPC, two projects that make computational research easier in the life sciences. As part of the training, Jarek Pillardy—the head of the Cornell Bioinformatics Facility (CBSU) at Cornell University—and some of his staff presented various aspects of BioHPC. I had the opportunity to sit down with Jarek, who is not only the developer of BioHPC but also a long-time user of the .NET Bio project. Here is a recap of that conversation.

Simon: You lead the CBSU—what activities does it support?

Jarek: CBSU is the Cornell University Bioinformatics Facility, and its mission is to support biological research with advanced computational infrastructure and bioinformatics tools and techniques. The facility’s main activities can be divided into maintaining extensive computational infrastructure configured for bioinformatics; providing easy access to the infrastructure through the web via BioHPC Web or interactively through BioHPC Lab; training, mainly through workshops and consulting; direct research collaborations, ranging from small projects to participating in major grants as co-principal investigators; and software and LIMS development.

Simon: What prompted you to develop BioHPC, and what does it do?

Jarek: BioHPC is our main way to deliver computational infrastructure to biologists. It is not easy for an experimental biologist to use computing tools directly and navigate the complicated maze of schedulers, command-line tools, data-storage methods, and other infrastructure. BioHPC simplifies access, both through the web and interactively, and management of the infrastructure (hardware and software). We created BioHPC to make our life easier and to provide services for many more researchers. BioHPC Web gives users a simple way to submit data for processing and for managing jobs and data. BioHPC Lab is a tool to organize access to interactive machines, reserve time, and manage associated resources, like storage and computing time. For us, it provides a convenient platform to deliver computational resources (hardware and software combined) and a set of tools to manage them.

Simon: Do you have any plans to extend the capabilities of BioHPC in the future?

Jarek: BioHPC is constantly evolving to meet the changing needs in bioinformatics and adapt to technological changes. Currently, we are supporting a diverse array of local and remote clusters, but we are planning to add capacity to run computations in the cloud. We are in the final stages of adding Windows Azure to our supported computing infrastructure. We will be also adding new software.

Architectural overview—BioHPC schema

Simon: How do you see the Windows Azure cloud being used in bioinformatics?

Jarek: For direct research computing, I can see two main scenarios. First, there will be advanced users, running their own virtual machines. These probably will be a minority of users. Second, there will be researchers who access Azure resources via an intermediate service like BioHPC. This scenario will involve a lot of task-focused services (for example, analyzing population data, assembling and annotating sequences, or handling a particular software pipeline) running on Azure, with the end-user not even fully aware of that. Azure provides an opportunity to bring data closer to the computing infrastructure.

Jarek: I think BioHPC may deliver for them the same benefits it does for us: an easy-to-use tool that provides convenient access to infrastructure and simplified management. They are still in the process of setting up and organizing, and we are in close contact with them, providing consultation and help. BIOS’s mission to the Colombia researchers is very similar to what our facility provides to Cornell, so our tools should be very useful to them. I hope they will be able to improve and expand BioHPC in order to meet their particular needs, which will make it much better.

As Jarek notes, BioHPC is a living, constantly evolving project, as is .NET Bio. If you’re a biological researcher, I encourage you take a good look at these tools.

—Simon Mercer, Director of Health and Wellbeing, Microsoft Research Connections

Sixty students from Europe, the Middle East, and Africa participated in the 2013 PhD Summer School at Microsoft Research Cambridge.

The beginning of July is always a special time of year at Microsoft Research Cambridge as we welcome PhD students to our annual PhD Summer School. We began our eighth Microsoft Research PhD Summer School with a traditional afternoon tea served at Selwyn College—one of the 31 University of Cambridge Colleges—which also accommodated students for the week. PhD students from across Europe, the Middle East, and Africa joined us for a week filled with learning, networking, and mentorship in our new lab building, which opened just a few months ago.

“The School was an exciting showcase of research by Microsoft staff and the students themselves, and provided training in key research skills,” says Andy Gordon, co-manager of our Programming Principles and Tools group and part-time professor at University of Edinburgh. “We were especially delighted to welcome the first cohort of PhD students in the Joint Initiative between the University of Edinburgh and Microsoft Research.”

This year’s diverse student body included 20 Microsoft Research PhD Scholars, as well as students from Max Planck Institutes in Germany, the Cambridge Computer Lab, and students associated with Microsoft Research’s collaborative research institutes: BSC-Microsoft Research Centre, Microsoft Research-Inria Joint Centre, and Microsoft’s Advanced Technology Labs in Egypt, Germany, and Israel.

The technical agenda included a stimulating mix of talks and hands-on demos and poster sessions. Our research talks covered the wide spectrum of work we are conducting across the lab, including environmental science (“Modelling All Life on Earth. Yes, Really!”), computational biology (“Software for Programming Cells”), and cloud computing (“Cloud Computing—Big Data and Beyond”).

As in previous years, we complemented our research talks with a range of personal development talks. These included the all-time favorites, “How to Write a Great Research Paper” and “How to Give a Great Research Talk” by Simon Peyton Jones and “A Rough Guide to Being an Entrepreneur” by Raspberry Pi co-founder Jack Lang, as well as talks on “Strategic Thinking for Researchers” and “Intellectual Property at Microsoft”.

The Thursday afternoon keynote was a special highlight for the students: Christopher Bishop presented “Machine Learning: the Future of Computing?”, which was followed by a DemoFest where Microsoft researchers demonstrated their newest research projects to the students, who had the opportunity to try out new technologies and ask questions. The day ended with drinks in the sunshine and a formal dinner at Jesus College, where a crew of Microsoft staff, including laboratory director Andrew Blake and a number of Microsoft researchers, joined the students.

But, we weren’t the only presenters at this year’s Summer School. PhD students displayed their research to dozens of Microsoft researchers during the three lunchtime poster sessions. They received valuable feedback on their research and Sue Duraikan of Duraikan Training provided targeted poster training, giving individual as well as group feedback with special guidance for non-native speakers.

We also engaged the students in practical work. Students had the choice between a Windows Azure tutorial and a .NET Gadgeteer workshop. About 25 students participated in a .NET Gadgeteer hackathon and the best project by Istvan Haller from Vrije Universiteit Amsterdam was rewarded with a hardware gift.

During the week, we hosted a number of social events to give students and staff a chance to relax, socialize, and network. Some of the students used the opportunity to visit the Royal Society Summer Science Exhibition in London, where Microsoft researchers presented the Technology for Nature exhibition.

After five busy days, we said goodbye to the Summer School class of 2013. We were pleased to receive positive feedback reflecting the students’ appreciation:

“Great experience, high quality talks, and great research!”

“Very friendly and welcoming. Felt right at home and had fun! The best thing is that the Microsoft team knows how to treat PhD students as researchers that matter.”

Microsoft Cambridge management and staff were equally positive about the School’s outcome: “The 2013 PhD Summer School seems to have been a wonderful success,” Andrew Blake comments. “It has brought together around 60 research students who have shown us some of the very exciting work they engaged on, and it is clear that the next generation of researchers in computer science and related areas is full of ideas and promise for the future.”