News » data collectionshttp://www.utexas.edu/news/
The University of Texas at AustinTue, 31 Mar 2015 16:59:15 +0000enhourly1http://wordpress.org/?v=3.2.1Expanded Data Repository at Austin and Arlington Campuses to Improve Research Capacity for Entire UT Systemhttp://www.utexas.edu/news/2011/12/19/expanded_data_repository/
http://www.utexas.edu/news/2011/12/19/expanded_data_repository/#commentsMon, 19 Dec 2011 21:53:22 +0000Faith Singer-Villaloboshttp://www.utexas.edu/news/2011/12/19/» Continue Reading]]>The Texas Advanced Computing Center (TACC) at The University of Texas at Austin today announced that it is working with The University of Texas at Arlington to deploy a data repository in January to increase connectivity, computing capacity and collaboration among all 15 institutions in The University of Texas System.

This project is part of the System’s cyberinfrastructure initiative, a $23 million project announced in December 2010. The expansion of TACC's existing Corral storage facility will provide 10 petabytes of storage for scientific data, replicated for protection at both sites, along with data collection management software for open science and clinical research data. The repository will greatly enhance researchers’ capabilities to share valuable scientific data in important research projects. TACC staffers will work with researchers to effectively use this resource.

"The new data storage repository will help us reach new heights in IT capacity and research computing," said Patricia Hurn, associate vice chancellor for health science research at The University of Texas System. "Using economies of scale to our advantage, this collaborative effort advances excellence across all our institutions, allowing them to grow in their computational approaches to research and improving their data warehousing and analysis needs."

"TACC is excited to support increased sharing of data by researchers across the state of Texas," said Jay Boisseau, director of TACC. "The explosive growth in the generation and capture of digital data from new sensors and instruments is making new scientific discoveries possible — this resource will help talented researchers collaborate more easily to make those discoveries."

Chris Jordan, leader of the Data Management and Collections group at TACC and chair of the storage committee for The University of Texas System cyberinfrastructure initiative, says the first phase of the project will be deployed in January. "The intent is to make the large pool of storage available to researchers for six months, and during that time encourage feedback and dialogue on what the needs might be for other technologies, such as data security, to meet the full range of research requirements throughout all 15 institutions."

The expanded data repository is composed of two identical installations — one in Austin and one in Arlington. "If a power outage takes out one of the two data centers or they become non-operational, the system can continue functioning," Jordan said. "There's a strong emphasis on data integrity and reliable operation under a wide range of failure conditions."

Researchers who need five terabytes of data storage or less will have free access, and researchers requiring more can purchase storage for $250 a terabyte per year. Some storage will also be set aside to support strategic, collaborative projects that enhance the leadership position of University of Texas System institutions.

The overall cyberinfrastructure project, of which the $4 million data repository is one component, will provide funding for support staff to ensure that researchers across Texas can effectively use all of the advanced computing capabilities, including networking, central storage, data collection and high performance computing.

]]>http://www.utexas.edu/news/2011/12/19/expanded_data_repository/feed/033135expanded_data_repositoryexpanded_data_repository2011TACC Introduces New System for Data-Intensive Computing and Storagehttp://www.utexas.edu/news/2009/04/06/tacc_corral/
http://www.utexas.edu/news/2009/04/06/tacc_corral/#commentsMon, 06 Apr 2009 16:36:12 +0000Faith Singer-Villaloboshttp://www.utexas.edu/news/?p=5735» Continue Reading]]>"Corral," a system for data-intensive computing and storage, is the newest resource to be deployed by the Texas Advanced Computing Center (TACC) at The University of Texas at Austin.

A partnership among TACC, DataDirect Networks (DDN) and Dell Inc., Corral went into friendly-user production on March 31 and is available to researchers and educators at The University of Texas at Austin. The resource will soon become available to a wider group of users, including UT System institutions and National Science Foundation TeraGrid users.

Corral will support database, file system and Web-based access, as well as other network protocols for storage and retrieval of data from local and remote sources. Corral's high-performance parallel file system, based on Lustre, will be accessible from TACC's world-class computational resources, Ranger and Lonestar. The system will also be accessible from Stallion, the world's highest-resolution tiled display, and from Spur, TACC's remote visualization system, enabling mathematical and visual analysis of petabyte-scale datasets. Corral will host Web applications and services for access to data from anywhere on the Internet.

"We support world-class science and engineering research, and we are now working with increasingly diverse applications from other domains," TACC Director Jay Boisseau said. "In both our science research support and in our projects from new communities—industry, humanities, etc.—we are seeing a rapidly growing need to be able to host, manage and organize massive data collections, and to support the development and availability of new types of data applications. We're excited to partner with DataDirect Networks and Dell to provide new capabilities for our growing user community."

Paul Bloch, president and co-founder of DDN, said, "DataDirect Networks' storage solutions, such as the S2A9900 ExaScaler system which TACC deployed, are designed for extreme performance, data reliability and scalable capacity, which lend itself to many applications in an HPC datacenter, such as long-term and fast-scratch data storage. We have a strong presence in high performance computing, and are proud to support seven of the top 10 fastest supercomputers in the world. We're honored that TACC has put its trust in us and our research computing storage technologies to support the Corral project."

"Dell has a long-standing commitment of supporting the global research community's efforts to solve major scientific problems with high performance computing," said John Mullen, vice president of Dell education, state and local government. "We are now extending that commitment to affordable, accessible HPC research storage solutions, such as Corral, through our partnership with TACC and DataDirect Networks. Going forward, we will continue to drive standards into the HPC ecosystem, making it simpler for scientists and researchers worldwide to collaborate, share information and address many of society's biggest challenges."

Chris T. Jordan, a senior operating systems specialist in TACC's Advanced Systems Group, said Corral complements TACC's system portfolio, enabling users to gain additional insights from the systems that are already in place. For example, a user can access all of Corral's storage capabilities from HPC systems Ranger or Lonestar, and from TACC's visualization systems, Spur or Stallion, Jordan said.

"We hope that people will use the TACC Visualization Laboratory to visualize data on Corral that may have been generated on Ranger," Jordan said. "Corral provides online storage at the petabyte scale—it's all online, accessible and high-speed so that researchers can store and use much more data as part of their computation or visualization."

Data collection projects that will use Corral include:

PECOS Engineering Simulation Project, The University of Texas at Austin—The Center for Predictive Engineering and Computational Sciences (PECOS) is a new Department of Energy-funded Center of Excellence within the Institute for Computational Engineering and Sciences at The University of Texas at Austin. The PECOS project will develop the next generation of advanced computational methods for predictive simulation of multiscale, multiphysics phenomena, and apply these methods to the problem of reentry of vehicles into the atmosphere. PECOS hopes to advance the science and modeling of atmospheric reentry and the science of predictive simulation. Corral will be used to process, manage and store the images and other data generated by the project, and will provide high-speed access to this data for researchers and members of the public anywhere in the world.

Herbarium Digitization, The University of Alaska Museum of the North—One of the world's premier collections of arctic and boreal plants. With support from the National Science Foundation, the Herbarium is taking high-resolution digital photographs of 230,000 pressed plants to capture data about the collection and to make these specimens more accessible for research and education. The images are archived as digital negatives, the most data-intensive file format, preserving all of the data captured by the camera. Making these images publicly available requires four terabytes of rapidly accessible Web storage. Corral will be used to process, manage and store the digital images and other data generated by the project, and will provide high-speed access to this data for researchers and members of the public anywhere in the world.

Center for Space Research (CSR), The University of Texas at Austin—CSR will use Corral for two important space-based projects—imagery data and geospatial data for emergency response operations, and high-precision gravity data processing. As part of CESAR (Cyberinfrastructure for Emergency Situation Assessment and Response), Corral will be used to rapidly access the 'framework' geospatial data needed for emergency response operations during natural and man-made disasters. Framework data are the most recent, high-resolution aerial and orbital imagery and elevation data sets. CSR will also use Corral to store the data sets collected during a major event, such as Hurricane Ike, for distribution to state and federal agencies, and universities performing disaster research. The Gravity Recovery and Climate Experiment (GRACE) is providing a continuous, multi-year record of the spatial and temporal variations in the Earth's mass through measurements of its gravity field, and has provided new insights into the evolution of the Earth's climate system. The group expects to collect a few terabytes of original data and 20 to 40 terabytes of analysis results. Corral will house the data online for rapid mission reprocessing and scientific analysis. In addition, Corral will host the output products online for analysis of multi-year data sets.

Institute of Classical Archaeology (ICA), Liberal Arts, The University of Texas at Austin—ICA will use Corral to preserve, protect and disseminate two dynamic datasets to the wider academic community and the public. The first dataset contains information gathered during an intensive field survey of ancient sites in the territory of Metaponto in South Italy where data were documented using GPS and incorporated with remote-sensing imagery into a geographic information system. The second dataset involves excavations in an area of the Greek, Roman and Byzantine city of Chersonesos in Crimea (Ukraine). These spatial and contextual datasets also contain extensive data produced in the course of specialist research into forensic anthropology and ancient agriculture and technology.

The Hobby-Eberly Telescope Dark Energy Experiment (HETDEX), The University of Texas at Austin—The HETDEX project at McDonald Observatory is the first major experiment to probe dark energy, the mysterious force causing the expansion of the universe to speed up over time. Over three years, HETDEX will collect data on at least one million galaxies that are nine billion to 11 billion light-years away, yielding the largest map of the universe ever produced. The map will allow astronomers to measure how fast the universe was expanding at different times in history. The project will generate several tens of terabytes of data in a realm previously unexplored by astronomers of which the project will use a small fraction. TACC will archive the dataset for use by the wider astronomical community, and provide a public Web portal.

Some of these data collections are as small as five terabytes, while some are as large as 100 terabytes.

"As Corral fills up, we plan to expand it," Jordan said. "It's designed to extend TACC's infrastructure. We now have one unified system that can support all of these applications that can grow to meet future demands."

Technical Specifications

1.2 petabytes of SATA disk in a Data Direct NetworksTM S2A9900 controller system shared via parallel file system to other TACC systems and via databases such as MySQL, PostgreSQL and SQL Server.

The disk system is composed of 1200 one terabyte drives, and the controller has eightInfiniband connections to the server systems. The controller is capable of reading and writing data at up to 6GB/sec.