PSC Presents Sherlock, a YarcData uRiKa System for Big Data Analytics

The Pittsburgh Supercomputing Center Presents Sherlock, a YarcData uRiKa System for Unlocking the Secrets of Big Data

PITTSBURGH, November 7, 2012 —The Pittsburgh Supercomputing Center (PSC) and YarcData, a Cray (Nasdaq: Cray) company, today announced the deployment of “Sherlock,” a uRiKA graph-analytics appliance from YarcData for efficiently discovering unknown relationships or patterns “hidden” in extremely large and complex bodies of information. Funded through the Strategic Technologies for Cyberinfrastructure (STCI) program of the National Science Foundation, Sherlock features innovative hardware and software, as well as PSC-specific enhancements, designed to extend the range of applicability to scales not otherwise feasible.

These techniques have been long used by the government and are coming into wider commercial use. Sherlock will focus on extending the domain of applicability of these techniques to a wide range of scientific research projects.

“Sherlock,” says Nick Nystrom, PSC director of strategic applications, “provides a unique capability for discovering new patterns and relationships in data. It will help to discover how genes work, probe the dynamics of social networks, and detect the sources of breaches in Internet security.” Those diverse challenges, along with many others, he adds, have two important features in common: Their data are naturally expressed as interconnected webs of information called graphs, and data sizes for problems of real-world interest become extremely large.

“Until now, graph analytics has largely been impractical for big data,” says Nystrom. This is because, he explains, processing of graph structures requires irregular and unpredictable access to data. On ordinary computers and clusters, nearly all the time is spent waiting for that data to move from memory to processors. Even more challenging, graphs of interest typically cannot be partitioned; their high connectivity prevents dividing them into subgraphs that can be mapped independently onto distributed-memory computers. These factors have precluded large-scale graph analytics, especially for the interactive response times that analysts need to explore data. “YarcData’s uRiKA, ” says Nystrom, “overcomes that barrier through groundbreaking innovations in computer hardware and software.”

“Many current approaches to big data have been about ‘search’ – the ability to efficiently find something that you know is there in your data,” said Arvind Parthasarathi, President of YarcData. “uRiKA was purposely built to solve the problem of ‘discovery’ in big data – to discover things, relationships or patterns that you don’t know exist. By giving organizations the ability to do much faster hypothesis validation at scale and in real time, we are enabling the solution of business problems that were previously difficult or impossible – whether it be discovering the ideal patient treatment, investigating fraud, detecting threats, finding new trading algorithms or identifying counter-party risk. Basically, we are systematizing serendipity.”

The project complements ongoing leadership in data-intensive computing at Carnegie Mellon University (CMU). Randal E. Bryant, Dean of the School of Computer Science at CMU, notes, “We’re very pleased that the PSC will have this new capability for analyzing large-scale, unstructured graphs. Such data structures pervade many of the big data applications being investigated by researchers in such diverse areas as biology (e.g., the connectivity between molecules in a protein), networks (e.g., the structure of the world-wide web), and artificial intelligence (e.g., the relationships between different concepts.) The uRiKA system will enable scientists to deal with far more complex graphs than would otherwise be possible.”

PSC customized Sherlock with additional nodes having standard x86 processors to add valuable support for heterogeneous applications that use YarcData’s Threadstorm nodes as graph accelerators. This heterogeneous capability will enable an even broader class of applications, such as genomics, astrophysics, and structural analyses of complex networks. Sherlock runs an enhanced suite of familiar semantic web software for easy access to powerful analytic functionality, together with common programming languages. PSC’s Data Supercell provides complementary, high-performance access to large datasets for ongoing, collaborative analysis.

Prototype projects, led by researchers from across the country, will use Sherlockfor research including understanding the natural language of the Web, learning about human social networks involving different types of online and telephone interactions, cluster finding in astrophysics, and genome sequence assembly. For example, Bin Zhang, of the Fox School of Business at Temple University, notes the potential for Sherlock to expand his research into clustering in social networks, “With the help of Sherlock, I can finally observe the true size of social groups in real-world networks of millions to even a billion people. Researchers believe that social group size is larger for online social networks than for traditional groups, but so far it has been impossible to extract groups from large networks and visualize their structures. Sherlock can finally enable us to observe the structure of large social groups and even the whole network.” Additional projects will be introduced over time; more information is available at www.psc.edu/sherlock.

About PSC:http://www.psc.edu The Pittsburgh Supercomputing Center is a joint effort of Carnegie Mellon University and the University of Pittsburgh together with Westinghouse Electric Company. Established in 1986, PSC is supported by several federal agencies, the Commonwealth of Pennsylvania and private industry, and is a partner in the National Science Foundation XSEDE program.

About YarcData YarcData, a Cray company, delivers business-focused real-time graph analytics for enterprises to gain business insight by discovering unknown relationships in Big Data. Early adopters include the Canadian government, Institute for Systems Biology, Mayo Clinic, Noblis, Sandia National Laboratories, and the United States government. YarcData is based in the San Francisco bay area and more information is at www.yarcdata.com.

About Cray Inc. As a global leader in supercomputing, Cray provides highly advanced supercomputers and world-class services and support to government, industry and academia. Cray technology is designed to enable scientists and engineers to achieve remarkable breakthroughs by accelerating performance, improving efficiency and extending the capabilities of their most demanding applications. Cray's Adaptive Supercomputing vision is focused on delivering innovative next-generation products that integrate diverse processing technologies into a unified architecture, allowing customers to surpass today's limitations and meeting the market's continued demand for realized performance. Go to www.cray.com for more information.

Events Calendar

Pittsburgh Supercomputing Center

PSC is a joint effort of Carnegie Mellon University and the University of Pittsburgh. Established in 1986, PSC is supported by several federal agencies, the Commonwealth of Pennsylvania and private industry and is a leading partner in XSEDE (Extreme Science and Engineering Discovery Environment), the National Science Foundation cyber-infrastructure program.