Compiler 7/12/13: China Rising and A Tree Grows in Data Science

12

Jul

2013

CHINA'S LATEST SUPERCOMPUTER VICTORY

China's Milky Way 2 supercomputer was recently declared the fastest supercomputer in the world by industry scorekeeper Top500, the latest move in the increasingly international race for high performance computing supremacy. Late last month, CI Senior Fellow Rick Stevens appeared on Science Friday, alongside Top 500 editor Horst Simon, to talk about why that competition matters, and what the global push for faster computation will do for medicine, engineering and other sciences.

"These top supercomputers are like time machines," Stevens said. "They give us access to a capability that won't be broadly available for five to ten years. So whoever has the time machine is able to do experiments, able to see into the future deeper and more clearly than those that don't have such machines."

The same time machine metaphor was also picked up by the University of Chicago's profile of Mira, our local Top500 competitor, which was bumped down to #5 by the Milky Way 2's top ranking. But there's no shame in fifth-best, when fifth-best can run 10 quadrillion calculations per second -- the equivalent computing power of 58 million iPads. CI Senior Fellow Gregory Voth is quoted about how access to such a world-class resource helps both today and tomorrow's scientists.

“Having access to a computing resource like Mira provides excellent opportunities and experience for educating up-and-coming young scientists as it forces them to think about how to properly utilize such a grand resource very early in their careers,” Voth says. “This gives them a unique perspective on how to solve challenging scientific problems and puts them in an excellent position to utilize computing hardware being imagined now for tomorrow.”

WHY DATA SCIENCE MUST BE OPEN SCIENCE AND MORE FROM DSSG

The Data Science for Social Good fellowship has reached the halfway point, and the website is starting to fill up with interesting content about the projects. Some fellows have already produced tools for the community to use, such as Paul Meinshausen's interactive tree map of the City of Chicago's Data Portal. Instead of a cold, no-frills list of the datasets available for download by the public, Meinshausen' s map uses color and shape to guide users quickly to the data they are seeking and make rapid comparisons about the size of the dataset. The visualization was popular enough that programmers in Boston and San Francisco quickly applied his code to their own city's data portals, while another built a common map for every city that uses Socrata software to share its data.

Meinshausen also thoughtfully used the tool he developed as an illustration in his eloquent argument for why data science should also be open science.

While respecting privacy and confidentiality, our job is to work in the open as much as possible. It’s not enough to just have a “policy” of openness - we want our work to be as understandable and inspectable as possible. Our goal is to work in a way that invites replication, imitation, improvement, and even rejection.

OTHER NEWS IN COMPUTATIONAL SCIENCE

Scientists and institutions around the world are facing similar challenges in how to store and manage snowballing amounts of research data. In order to prevent hundreds of parallel wheel reinventions, the National Science Foundation and the University of Chicago recently hosted a workshop in suburban Washington DC to share solutions for research data management, reports Jessica Stoller-Conrad. Among the scientists, librarians and IT professionals in attendance was CI Directory Ian Foster, who spoke about the potential of software-as-a-service tools such as Globus Online to “make it easier to share data than not to share data."