From TeraGrid to PetaGrid

n the not-too-distant future, the bridges you drive on may be equipped with thousands of tiny sensors that will enable engineers to monitor structural stresses and strains. Your doctor may be wearing a digital device that enables her to use your personal genomic data to assess the advisability of specific drug treatments. Your clothing may know enough to keep you cool or warm as the temperature changes. All of these not-so-futuristic scenarios involve data, sensors, and computing, and the capacity to combine all three to understand and respond to the world around us is an exciting challenge for the first decade of the Digital Millennium.

In the last issue of enVision, I was delighted to share with you news of the TeraGrid hardware and software infrastructure that will form the basis of a National Information Infrastructure. In August, the National Science Foundation awarded $53 million to SDSC, NCSA, Caltech, and Argonne National Laboratory to build and deploy a distributed terascale facility and its software infrastructurethe TeraGrid. The role of SDSC and NPACI in the TeraGrid is to lead efforts in data-oriented computing. With nearly a quarter petabyte of spinning disk at SDSC alone, it is conceivable to think of data-oriented applications that can attempt to synthesize knowledge from immense collections of data in real time in order to track behavior and predict important phenomena. Such applications are increasingly critical to science and society in this millennium, and even more so since the tragic events of September 11.

The TeraGrid will provide terascale computing and data management facilities, as well as high-speed networking, forming the basis of the National Information Infrastructure. The TeraGrid will grow by extending laterally to encompass more nodes, by extending it from above to form a grid of grids, and by extending it from below to include sensors, sensor nets, and massive numbers of wireless personal digital devices that can serve as throwaway endpoints.

As the TeraGrid grows in size, heterogeneity, and speed, it will evolve to become a PetaGrid, where the sheer number, scale, and diversity of resources promises enormous potential and provides enormous challenges. The PetaGrid will encompass not just high-end, resource-rich TeraGrid nodes, but also low-end devices such as sensors and personal digital devices, which will become increasingly ubiquitous and provide critical sources of input, as well as output destinations in the first decade of the Digital Millennium.

NPACI partners at SDSC and UCSD, collaborators at the California Institute for Telecommunications and Information Technology (Cal-(IT)2), industrial partners, and others are building a campus-wide early prototype of the PetaGrid which will be comprised of a regional wired and wireless environment of sensors, personal digital assistants, large-scale computational and data management infrastructure, and visualization.
Applications being developed for this environment will include critical bioscience and environmental applications in which data must be analyzed and synthesized in real time to respond to natural and man-initiated disasters. Coupled with this integrated regional facility will be remote facilities through the TeraGrid, NPACI resources, and international efforts including global grid activities in Europe and Pacific Rim countries such as Japan and Australia.

Indeed, the TeraGrid will form the foundation of our National Information Infrastructure, and the PetaGrid will allow us to expand this critical vision to include a wider spectrum of new devices, collaborators, media, grids, and applications. This computational and information management platform will provide a critical tool to help address the challenges faced by science and society in our increasingly complex and connected world.