The Microsoft Research Connections blog shares stories of collaborations with computer scientists at academic and scientific institutions to advance technical innovations in computing, as well as related events, scholarships, and fellowships.

Scientists can agree that there’s a lot of data out there, and that we could be using it more efficiently. Now the White House has asked for input on how to do just that.

Data from scientific research is important to a diverse array of user communities from researchers, governments, and companies to wildlife managers, transportation managers, hospitals, and teachers. As the quantity of data in individual and community collections grows, its potential value also increases but, unfortunately, so do the associated challenges of data access, privacy, storage, and archiving. These challenges are social, economic, and technical, and the solutions will require collaborative contributions from universities, federal agencies, companies, scientific societies, and other organizations.

Effective approaches to realizing the benefits of scientific data are likely to require many elements, including:

Providing incentives and rewards for sharing data

Creating and disseminating software tools and online services that enable users to find and analyze data of interest

Facilitating systems by which funding agencies and users can contribute to the costs of data storage, sharing, and analysis

Developing systems and metrics to determine when and how data is worth preserving and sharing

Microsoft believes that these are challenges worth tackling, and that coordinated efforts are urgently needed to advance our ability to curate, preserve, and use digital scientific data to maximize the societal and economic impact of research. Therefore, on January 12, 2012, Microsoft submitted our input in response to the White House Office of Science and Technology Policy (OSTP) request for information (RFI) on Public Access to Digital Data Resulting From Federally Funded Scientific Research.

The Microsoft response emphasizes two areas: Economic Models and Software Tools and Online Services. We discuss that nations, to facilitate research and realize societal benefits of that research, should create environments in which innovation can occur around the critical elements that enable data sharing, retention, and use, and the costs should be shared among the various groups that receive benefits from the data and associated discoveries. In some cases, dissemination and use of specific data sets are necessary to meet high priority scientific, policy, economic, or societal goals, and thus should be supported by relevant government agencies. In other cases, there are opportunities to create a tool or service infrastructure that enhances the value of data and allows the provider to monetize access at a level sufficient to cover the investment made in creating or maintaining the data archive. We emphasize that in determining which data to share and how, it is important to recognize that consumers of a particular data set may be outside of the research community that created it (for example, in another scientific field or at a commercial enterprise). These consumers should still help define the value of the data and drive the creation of tools to facilitate its cross-domain use. They must also share in paying for its maintenance costs. Overall, we stress the value that innovations in information technology, including emerging cloud services, can bring to facilitating data sharing and analysis and enabling collaborative, multi-disciplinary, and international science.

While the Microsoft response to the OSTP RFI on access to digital scientific data focuses on a few specific areas, it builds on collaborative work already done by the research community and Federal agencies in this area. Experts from Microsoft participate regularly in and support such efforts. In particular, we remain committed to the conclusions of the National Science Foundation’s Advisory Committee for Cyberinfrastructure’s Task Force on Data and Visualization and the Blue Ribbon Task Force on Sustainable Digital Preservation and Access. We also agree with many of the challenges described and conclusions reached in the National Science Board's draft Data Policies Report released on January 5, 2012.

The above reports and activities focus on the policy side of realizing the value of scientific data. Microsoft is also working to create, demonstrate, and implement the technical side of these challenges. In the book The Fourth Paradigm, the authors identify a range of opportunities where access to data is fundamentally changing the way science is conducted. Microsoft, in partnership with the academic community, is working to put these ideas into practice. Examples include WorldWide Telescope; the new earth-science data explorer, Layerscape; the Eye on Earth network for environmental maps; and data analytics tools such as Daytona and Excel DataScope.

—Elizabeth Grossman, Technology Policy Group, Microsoft Corporation

January 31, 2012, update: The White House Office of Science & Technology Policy (OSTP) has publicly posted all of the responses to the RFI.

Recaps of the top 10 news stories of the year—it’s a New Year’s tradition that rivals Dick Clark’s “New Year’s Rockin’ Eve” show. So who are we to buck convention? Therefore, without further ado, here are the top 10 Microsoft Research Connections blogs of 2011, as chosen by your clicks.

Who can resist building apps for the latest and greatest Kinect sensor? Apparently not the developers who are avid readers of our blog. So let’s raise a cup of cheer, or eggnog, to the intrepid innovators who are using the Kinect for Windows Software Development Kit (SDK) to push the boundaries of natural user interface applications.

A planetarium show plus a demonstration of the new earth-sciences applications of Microsoft Research’s WorldWide Telescope (WWT) took center stage at the California Academy of Sciences. If you thought turning your computer into a world-class telescope was cool, you’ll be blown away by WWT’s ability to create earth-science narratives.

The ancient Egyptians had nothing on us: using chemistry symbols in digital documents can be every bit as cumbersome as carving hieroglyphics into stone. And then came Chemistry Add-in for Word, which makes it easier for students, chemists, and researchers to insert and modify chemical information, such as labels, formulas, and 2-D depictions, from within Microsoft Word.

Research archivists, librarians, and others who have grappled with organizing and accessing voluminous research collections asked for it—and Microsoft Research Connections delivered: the 2.1 release of Zentity. A repository platform designed to manage research objects—such as journal articles, reports, datasets, projects, and people—as well as the relationships among them, Zentity supports arbitrary data models and provides semantically rich functionality that enables users to find and visually explore interesting relationships between elements.

Today, it seems that everything—from smart phones and tablets to PCs and supercomputers—is sprouting extra cores so users can do more. Can Microsoft Research Connections help create parallel code to make the most efficient use of these ubiquitous multi-core processors? Need you ask? A joint venture of the Barcelona Supercomputing Center and Microsoft Research Centre (BSCMSRC) is bringing together the expertise of hardware and software researchers to do just that.

Quality control—it’s vital in food inspections and DNA sequencing. Unfortunately, not all sequencing technologies produce reliable and accurate results, and experimental data will always contain varying rates of error.That’s where Sequence Quality Control Studio (SeQCoS) can help. A Microsoft .NET software suite designed to perform an array of QC evaluations and post-QC manipulation of sequencing data, SeQCoS generates a series of standard plots that illustrate the quality of the input data.

Every year, the annual Grace Hopper Celebration of Women in Technology brings the research and career interests of women in computing to the forefront. This past year was no exception, as some 2,000 attendees descended on Portland, Oregon, to hear about the latest research and explore the roles of women in computer science, information technology, research, and engineering. Microsoft Research Connections was there, too, offering support and free epiphytes (really)!

Chinese university students took the Kinect for Windows SDK and pushed it hard, applying the sensor’s depth sensing, voice and object recognition, and human motion tracking capabilities to diverse topics: from education to commerce to culture and history. Their creative and elegant applications far surpass traditional games, demonstrating Kinect’s potential in diverse areas.

Our blog readers are very interested in Kinect! And why not? Thanks to contributions from Microsoft Research, Kinect has state-of-the-art audio, skeletal-tracking, and facial-recognition capabilities. Microsoft built Kinect to revolutionize the way you play games and how you experience entertainment. But along the way, people started applying the “Kinect Effect” in ways we never imagined—from helping children with autism to assisting doctors in the operating room.

Drumroll please: the top-ranked Microsoft Research Connections blog explored—what else?—a game. But, surprisingly, it isn’t Kinect based! Instead, it’s a learning game that was developed in collaboration with the Rochester Institute of Technology. Called Just Press Play, the game helps students earn a digital reward for the ultimate achievement: collegiate success. Just Press Play encourages students to venture out of their comfort zone and get involved in all aspects of school—including (gasp) interactions with school faculty and staff.

So there they are: 2011’s most-read Microsoft Research Connections blogs. Why Robots Invade Upstate New York didn’t make the list is beyond us. Go figure. Happy New Year from your friends at Microsoft Research Connections!

Data mining has become one of the most critical research processes in this era of data-intensive science. There are, however, many areas of science where the usefulness of data mining is limited by the massive nature of the datasets. Consequently, scientists are desperately looking for new tools that can dig into the data faster and deeper. In the rapidly developing field of synoptic sky surveys, for example, transient signals from a variety of interesting astrophysical phenomena must be detected and characterized in (near) real-time. The resulting wealth of data is invaluable to researchers seeking new discoveries, but they need better computational methods to help them manage and analyze so much data.

I was privileged to give two talks during day two of the workshop. In “Discovery of Hidden Patterns in Data through Interactive Search,” I presented the Environmental Informatics Framework (EIF), a strategy and technology platform that the Microsoft Research Connections Earth, Energy, and Environment group developed to help advance data exploration in environmental research. I demonstrated Microsoft PivotViewer, a faceted search technology included in EIF that enables users to visually and interactively search and discover hidden patterns in massive data or image sets.

I was pleased to receive positive feedback from attendees about the work that Microsoft Research is doing for data-intensive sciences. As one participant noted to me in email, “I have to admit that I wasn’t aware of the work that Microsoft Research was doing, but I was very impressed with what I saw yesterday. The work you’ve been doing on data visualization can only be described as stunning!”

In “Building a Better Scientist,” my second talk of the day, I discussed how the fourth paradigm for data-intensive scientific discovery is changing the way scientists conduct research, and is, therefore, creating a need for a new generation of scientists with advanced computational mindsets. The presentation stimulated passionate discussions, and, as event chair George Djorgovski pointed out, it is a topic closely related to how fast and deep we can go with our data.

—Yan Xu, Senior Research Program Manager, Microsoft Research Connections