HPCwire » DOEhttp://www.hpcwire.com
Since 1986 - Covering the Fastest Computers in the World and the People Who Run ThemTue, 03 Mar 2015 20:27:41 +0000en-UShourly1http://wordpress.org/?v=4.1.1NERSC Realigns for Enhanced Data Focushttp://www.hpcwire.com/2015/02/26/nersc-realigns-for-enhanced-data-focus/?utm_source=rss&utm_medium=rss&utm_campaign=nersc-realigns-for-enhanced-data-focus
http://www.hpcwire.com/2015/02/26/nersc-realigns-for-enhanced-data-focus/#commentsFri, 27 Feb 2015 01:51:00 +0000http://www.hpcwire.com/?p=17590The National Energy Research Scientific Computing Center (NERSC), one of the nation’s primary HPC facilities for scientific research, has implemented several organizational changes, which it says will help its 6,000 users “more productively manage their data-intensive research.” NERSC’s Storage Systems Group will move under the Services Department, in order to foster greater synergy with the Read more…

]]>The National Energy Research Scientific Computing Center (NERSC), one of the nation’s primary HPC facilities for scientific research, has implemented several organizational changes, which it says will help its 6,000 users “more productively manage their data-intensive research.”

NERSC’s Storage Systems Group will move under the Services Department, in order to foster greater synergy with the Data & Analytics Services Group. As part of this restructuring, Katie Antypas, who heads the Services Department, is now the NERSC Deputy for Data Science.

“Data science is a cross-cutting thrust for NERSC and Katie will be responsible for organizing our work in this area and furthering our data strategy,” explains NERSC Director Sudip Dosanjh. “This effort will require close collaboration between the Storage Systems and Data & Analytics Services groups at NERSC, in addition to other groups in NERSC, Computational Research Division and ESnet.”

Data is not just a buzzword, it’s the emerging fourth-paradigm of scientific discovery, and the org shakeup positions NERSC to focus more resources on data, how it’s moved, managed, and analyzed.

“From our users’ perspective, this approach will provide a more coherent structure and result in improved tools and capabilities to help them manage, and move data between the different layers of memory and storage,” says Antypas, who is also leading the Cori supercomputer initiative. “When you look at the architectures coming down the road, it’s evident that the lines between memory and storage are blurring. For example, in our newest system, Cori, there will be a Burst Buffer, a layer of flash storage between the system’s memory and file system, so it just makes sense that our Storage group and our Data and Analytics group will need to work together to make it and future services a success.”

NERSC is meeting data challenges in other ways too by joining forces with other national labs to keep pace with rapidly expanding data flows. Prabhat, head of NERSC’s Data & Analytics Services Group, references a new class of data challenges involving simulation, experimental, and observational data. “Our goal is to better utilize our existing infrastructure so we can plan and prioritize for handling these new modalities of data,” he says.

The Storage Systems Group, led by Jason Hick, will also have an expanded role in response to an increased demand for data services. The group supports NERSC’s science data portals. Also known as science gateways, the portals provide a mechanism for the dissemination of large datasets, such as observational data from telescopes and neutrino observatories.

The org changes reflect the evolving science landscape, says Antypas, marked by ever-larger datasets and the merging of experimental and simulation-based discovery.

]]>http://www.hpcwire.com/2015/02/26/nersc-realigns-for-enhanced-data-focus/feed/0ROSE Framework Blooms Toward Exascalehttp://www.hpcwire.com/2015/02/12/rose-framework-blooms-toward-exascale/?utm_source=rss&utm_medium=rss&utm_campaign=rose-framework-blooms-toward-exascale
http://www.hpcwire.com/2015/02/12/rose-framework-blooms-toward-exascale/#commentsFri, 13 Feb 2015 02:09:09 +0000http://www.hpcwire.com/?p=17451One of the many ways that the Office of Advanced Scientific Computing Research (ASCR) supports the Department of Energy Office of Science facilities is by championing the research that powers computational science. A recent ASCR Discovery feature takes a look at how the DOE science community is preparing for extreme-scale programming. As supercomputers reach exascale Read more…

]]>One of the many ways that the Office of Advanced Scientific Computing Research (ASCR) supports the Department of Energy Office of Science facilities is by championing the research that powers computational science. A recent ASCR Discovery feature takes a look at how the DOE science community is preparing for extreme-scale programming. As supercomputers reach exascale speeds, likely in the mid-2020s, they will be capable of billion-way parallelism. To facilitate the difficult programming challenge of exploiting all those cores, computer scientists at Lawrence Livermore National Laboratory (LLNL) are working to automate the generation, analysis, debugging and optimization of complex application source. The compiler infrastructure effort is part of the ROSE project based at the LLNL’s Center for Applied Scientific Computing.

As code becomes more complex, human ability to interact with it directly becomes limited. Programmers start with user-friendly source code, which gets fed to ROSE, and transformed into source code for vendor-provided compilers to help create executable machine instructions. Then the code is analyzed, debugged and further optimized using various tools. Now improved, ROSE translates the program back into source code. “This source-to-source feature enables even novice programmers to take advantage of ROSE’s open-source tools,” notes the feature piece.

The ROSE framework originated in the early 2000s with support from DOE’s Advanced Scientific Computing Research program (ASCR) and the National Nuclear Security Administration, and has since been used to write, rewrite, analyze or optimize hundreds of millions of lines of code.

The lead LLNL researcher on the ROSE project Dan Quinlan recalls how the name originated as a reference to Shakespeare’s Romeo and Juliet. It came about as he was trying to assure team members that the tool was to help their code achieve its full potential not to change anything. His motto “Your program under any semantics-preserving transformation is equivalent” was an allusion to Shakespeare’s famous line, “a rose by any other name would smell as sweet.”

Livermore’s lab-directed research program has since received additional funds to use ROSE for new research activities. The focus now is on supporting the DOE’s push into exascale computing. To this end, the team is working with ASCR’s X-Stack Software Project. The name X-Stack calls up both “exascale” and the “stack” of software tools developers need to create computer software.

The X-Stack program team is aligned with another project called D-TEC, which is short for DSL Technology for Exascale Computing. The focus here is writing and doing test-runs on large, complex, but narrow-purpose software for anticipated supercomputer architectures. The ROSE group has contributed to nine DSLs that are in various stages of development.

A main challenge of exascale programming is the need to manage data strategically and to rely less on hardware-managed caching due to its associated data, time and energy challenges. The workaround is to employ software-managed caching and selective use of hardware-managed caching on small, fast memory levels. Experts anticipate that exascale architectures will use many levels of memory, ranging from large and slow to small and fast.

“In specific exascale architectures, you are using many levels of memory, and you need to use software-managed caching to copy data from the larger and slower memory levels to the smaller but faster levels of memory to get good performance,” Quinlan says. “You can put just what you need where you want it, but the code needed to support software-managed caches is extremely tedious to write and debug. The fact that the code can be automatically generated (using ROSE) is one of the more useful things we’re doing for exascale.”

]]>http://www.hpcwire.com/2015/02/12/rose-framework-blooms-toward-exascale/feed/0Is U.S. Falling Behind in Supercomputing and Exascale?http://www.hpcwire.com/2015/01/29/u-s-falling-behind-supercomputing-exascale/?utm_source=rss&utm_medium=rss&utm_campaign=u-s-falling-behind-supercomputing-exascale
http://www.hpcwire.com/2015/01/29/u-s-falling-behind-supercomputing-exascale/#commentsThu, 29 Jan 2015 20:58:46 +0000http://www.hpcwire.com/?p=17195Few dispute the importance of supercomputing to U.S. competitiveness. The argument is around whether current government efforts – primarily through the Advanced Scientific Computing Research (ASCR) program within the U.S. Department of Energy (DOE) – are effective and sufficient or wasteful and excessive. Yesterday, a panel of HPC experts testifying at a U.S. House of Read more…

]]>Few dispute the importance of supercomputing to U.S. competitiveness. The argument is around whether current government efforts – primarily through the Advanced Scientific Computing Research (ASCR) program within the U.S. Department of Energy (DOE) – are effective and sufficient or wasteful and excessive.

Yesterday, a panel of HPC experts testifying at a U.S. House of Representatives hearing (Subcommittee on Energy – Supercomputing and American Technology Leadership) argued the decline in U.S. supercomputer research spending is putting U.S. computer and competitive leadership at risk.

“In the past century the federal government financially supported two-thirds of the nation’s research and development activity but that has gradually declined to one-third. Industry, on the other hand, has increased its share from one-third to about two-thirds. The problem is that, because of financial market pressure for rapid returns, industry focuses largely on ‘D,’ not ‘R,'” said panelist, Norman Augustine, retired chairman and chief executive officer of Lockheed Martin Corp.

“The result has been that in terms of arguably the most significant measure of national research investment, research funding as a fraction of GDP, the United States has recently dropped from first to seventh place in the world. The extent of America’s disinvestment in research is such that America now ranks 29th among developed nations in the fraction of research that is governmentally funded. It is projected that within about five years China will surpass the U.S. in both research funding as a fraction of GDP and absolute funding,” said Augustine.

DOE operates 17 laboratories located throughout the country, the efforts of which are principally focused on energy research and the provision of weapons that underpin the nation’s nuclear deterrent. FY2015 funding for ASCAR is $541 million.

Augustine contends that because DOE laboratories enjoy relatively stable funding and are well suited to “long-term, high-risk/high-payoff, often-large projects with applicability that may not be evident at their outset.” He cited support of research in commercial nuclear fusion and hydraulic fracturing to produce shale gas would be but two examples of such endeavors.

For the HPC community, these are familiar arguments and were shared by the other panelists:

Just getting to the hearing was challenging for one of the panelists (Dr. Giles) as recent snow in Boston restricted flights out. He participated by video conference.

The practical matter for the HPC community is prying loose government funding. Energy Subcommittee Chairman Randy Weber (R-Texas) issued a statement in seeming strong support of ASCR:

“As we face the reality of ongoing budget constraints in Washington, it is our job in Congress to ensure that taxpayer dollars are spent wisely, on innovative research that is in the national interest, and provides the best chance for broad impact and long-term success. The basic research conducted within the ASCR program clearly meets this requirement. High performance computing provides a platform for breakthroughs in all scientific research, and accelerates applications of scientific breakthroughs across our economy.”

At least as interesting as the discussion of funding was discussion around key technology challenges including the importance of co-design principles (simultaneously algorithm, software and hardware development) in supercomputing, worries over hitting the fundamental limits of silicon, and the difficulties faced in achieving exascale computing systems.

Technology transfer was another concern cited. Moving applications and technology out from national supercomputing centers into the mainstream can be challenging. Turek noted the general rule today is the national labs are 5-to-7 years ahead of the broader commercial HPC market. Panelists were pressed on their thoughts for how tech transfer could be accelerated while steering clear of problematic conflict of interest issues that complicate private-public collaborations.

On the whole, it was an interesting, if not entirely unfamiliar, conversation. Part of the purpose of the meeting, said Dr. Giles, was to renew interest in a bill passed by Congress but not the Senate last year. Subcommittee member Randy Hultgren (R-Ill.), had introduced a bill, H.R. 2495 (IH) – AMERICAN SUPER COMPUTING LEADERSHIP ACT, which eventually died in the Senate. Dr. Giles was an advisor on the bill.

Here are a few key points of the bill:

“…Amends the Department of Energy High-End Computing Revitalization Act of 2004 with respect to: (1) exascale computing (computing system performance at or near 10 to the 18th power floating point operations per second), and (2) a high-end computing system with performance substantially exceeding that of systems commonly available for advanced scientific and engineering applications.

Directs the Secretary of Energy (DOE) to:

(1) coordinate the development of high-end computing systems across DOE;

(2) partner with universities, National Laboratories, and industry to ensure the broadest possible application of the technology developed in the program to other challenges in science, engineering, medicine, and industry; and

(3) include among the multiple architectures researched, at DOE discretion, any computer technologies that show promise of substantial reductions in power requirements and substantial gains in parallelism of multicore processors, concurrency, memory and storage, bandwidth, and reliability.” Click for a Library of Congress summary of the bill.

A video archive of the session is available as are copies of panelists’ written statements. Members of the subcommittee include: Randy Weber (R-Texas), Chair; Dana Rohrabacher (R-Calif.); Randy Neugebauer (R-Texas); Mo Brooks (R-Ala.); Randy Hultgren (R-Ill.).

]]>http://www.hpcwire.com/2015/01/29/u-s-falling-behind-supercomputing-exascale/feed/0DOE Seeks to Mend HPC Talent Gaphttp://www.hpcwire.com/2015/01/27/doe-seeks-mend-hpc-talent-gap/?utm_source=rss&utm_medium=rss&utm_campaign=doe-seeks-mend-hpc-talent-gap
http://www.hpcwire.com/2015/01/27/doe-seeks-mend-hpc-talent-gap/#commentsTue, 27 Jan 2015 15:07:20 +0000http://www.hpcwire.com/?p=17139Get a group of HPC stakeholders in a room and it won’t be long before they are bemoaning the talent shortage, the gap between the demand for a well-trained HPC workforce versus the number of qualified candidates available to fulfill these positions. Despite the attention paid to this topic, the HPC talent gap has been Read more…

]]>Get a group of HPC stakeholders in a room and it won’t be long before they are bemoaning the talent shortage, the gap between the demand for a well-trained HPC workforce versus the number of qualified candidates available to fulfill these positions. Despite the attention paid to this topic, the HPC talent gap has been a thorn in a field that is increasingly understood to be synonymous with a nation’s leadership potential. And while the issue crosses industry and government boundaries, the Department of Energy (DOE) has additional reason to be concerned due to the significance of its mission, which includes such sensitive areas as cyber- and nuclear security, not to mention being a testbed for innovation and discovery.

Given the importance of scientific computation to the federally-funded DOE centers, the DOE’s Office of Science set out to explore the issue further, by charging the Advanced Scientific Computing Advisory Committee (ASCAC) with identifying the causes of the shortage and the solutions considered most likely to reverse the trend. At SC14 in New Orleans, the ASCAC subcommittee Chairperson Barbara Chapman of the University of Houston revealed key findings and recommendations, further detailed in this 26-page report.

The study’s authors collected data from national laboratories, university computing programs and previous reports on workforce preparedness, noting several relevant trends that shed light on different dimensions of this challenge. The initial finding confirmed the prevailing suspicions that indeed all DOE national laboratories face workforce recruitment and retention challenges in computing sciences fields relevant to labs’ missions. The situation could become even more problematic as large numbers of DOE employees are expected to retire in the coming decade.

The major contributor for the talent shortage is the lack of computing science graduates, with US citizens, females and minorities being especially underrepresented. For example, foreign nationals currently account for more than half of the graduate students in Ph.D.-granting computer science programs. This leaves labs seeking candidates from an international pool, which has the effect of extending already-long lag times between the time a position is posted and when it can be filled. The report notes that it takes 100 days to fill a DOE job versus 48 to 50 days to fill a similar position in industry. When US citizenship is a requirement, as is the case at the DOE National Nuclear Security Administration (NNSA) labs, it can take upwards of 200 days to fill a position.

The subcommittee further reported a lack of diversity in the talent pool. The percentage of women graduating with computing degrees is just 17.2 percent for computer science and 18 percent for all computing doctorates. Hispanic and African-American students comprise less than 4 percent of computing doctorate recipients.

Another factor according to the study is an uneven distribution in specialties with “hot” topics like artificial intelligence and robotics being favored at the expense of a solid HPC foundation in algorithms, applied mathematics, data analysis and visualization, and high-performance computing systems. These skills are cross-disciplinary, requiring a mix of computing, math and science skills. Although well-designed computational science degrees and specializations are popping up at institutions across the country, these are still ad-hoc programs and not prevalent enough at this juncture to significantly amend the shortage.

One of the most effective tools that addresses these primary root causes, both the lack of interest in computing degrees and in the core HPC subject matter, is outreach and recruitment.

The subcommittee pointed to several already-established DOE-facilitated programs as being critical to bolstering computing sciences workforce. The list of successful outreach efforts includes a five-year program from the Nuclear Science and Security Consortium partnership. Focused on training a generation of nuclear scientists, the program reached more than 100 students since it was established. The DOE Computational Science Graduate Fellowship (DOE CSGF) is another one that that is paying dividends, rated highly effective in multiple reviews, according to Chapman. The fellowship trains students in interdisciplinary knowledge and provides DOE lab experience. The subcommittee recommends expanding the DOE CSGF program and using it as a model for new fellowship programs in areas pertinent to DOE lab needs, such as exascale algorithms and extreme computing.

Given the shortfalls of existing academic programs in meeting the needs of current and future methodologies, such as exascale computing, the subcommittee recommends the establishment of DOE-supported computing leadership graduate curriculum advisory group to publish curricular competencies guidelines at the graduate and undergraduate level with the aim of influencing curriculum development efforts.

Other recommendations focused on the importance of boosting the DOE’s visibility on university and college campuses as well as the need to work with other agencies to “pro-actively recruit, mentor and increase the involvement of significantly more women, minorities, people with disabilities, and other underrepresented populations into active participation in CS&E careers.”

The subcommittee notes that the interesting career opportunities offered by the DOE national laboratories will be a natural draw with increased awareness, yet elements of lab culture could be made more appealing to today’s mobile generation. In this regard, the report recommends uniform measures across the DOE laboratories to facilitate incentives like ongoing relocation assistance, lifetime professional development, and a sabbatical program. In order to implement such strategies, the DOE would need to examine the laboratory funding model and its relationship to recruiting and retention.

“This is an issue for the whole supercomputing community,” says Chapman in a DOE feature article on the report. “Meeting the mission-critical workforce needs of the national laboratories will require leadership to address this lack of diversity and to design outreach programs to attract a more diverse student population.”

]]>http://www.hpcwire.com/2015/01/27/doe-seeks-mend-hpc-talent-gap/feed/1US Leadership Steps Up Supercomputing Commitmenthttp://www.hpcwire.com/2014/12/03/us-leadership-steps-supercomputing-commitment/?utm_source=rss&utm_medium=rss&utm_campaign=us-leadership-steps-supercomputing-commitment
http://www.hpcwire.com/2014/12/03/us-leadership-steps-supercomputing-commitment/#commentsWed, 03 Dec 2014 21:58:56 +0000http://www.hpcwire.com/?p=16655On November 14, US Secretary of Energy Ernest Moniz announced two new high performance computing (HPC) awards, the first program to set a solid stake in the ground for the attainment of US exascale computing. A video of that presentation is now available for public viewing. As we covered previously, Secretary Moniz revealed that the Read more…

]]>On November 14, US Secretary of Energy Ernest Moniz announced two new high performance computing (HPC) awards, the first program to set a solid stake in the ground for the attainment of US exascale computing. A video of that presentation is now available for public viewing.

As we covered previously, Secretary Moniz revealed that the DOE would be making $325 million available for the construction of two state-of-the-art supercomputers at the Department of Energy’s Oak Ridge and Lawrence Livermore National Laboratories. The initiative is an extension of the joint Collaboration of Oak Ridge, Argonne, and Lawrence Livermore laboratories, aka the CORAL collaboration, announced earlier this year.

Summit, to be housed at Oak Ridge National Laboratory, will provide at least five times the performance of the lab’s current leadership system Titan, said Moniz, while Livermore’s new computer, called Sierra, will deliver at least seven times more performance. And although details were not disclosed yet, Argonne’s CORAL award will be forthcoming.

In addition to the sizable $325 million supercomputing investment, Secretary Moniz also announced approximately $100 million in funding to develop extreme scale supercomputing technologies as part of FastForward 2, a joint research and development program between government labs and vendor partners Cray, Intel, NVIDIA and AMD.

Expected to come online in the 2017 timeframe, the two 150-petaflops scale systems represent a five-to-seven factor speed-up above current top systems, but they must meet that challenge with only ten percent more energy. “We’re going beyond 10 MW for a single computer already, so energy-efficiency will be a major focus [of the program],” noted the Secretary.

Moniz emphasized the importance of this sizable investment for applications critical to the United States and the world. Planned uses include weapons simulations that support nuclear non-proliferation, climate modeling, combustion, and engineering. Already, such government-industry collaborations have resulted in efficiently engineered systems, for example so called super-trucks, class A vehicles that have been made 60 percent more efficient thanks to HPC-powered modeling.

“We are going to see enormously important contributions across the science, energy and nuclear security activities of the Department,” said Moniz in closing. “Once again, we could not do it without the support of these and other key members of Congress who support the labs and more importantly the American science enterprise, that it remains the driver of innovation, economic development and security in this country.”

Following Moniz’s turn at the podium, Senator Andrew Lamar Alexander, Jr. (R-TN) took the opportunity to discuss the significance of the bipartisan initiative, noting with a smile that it was his wish that once again the world’s fastest supercomputer would reside in Oak Ridge, Tenn. But he added, “the important thing is not only the speed of the computer, but it’s the fact that we use it better. As a member of Congress, I like to see the cooperation of the three laboratories in combining our resources rather than competing. We’re not competing with each other, we’re working together to make this the place in the world with the best supercomputing.”

]]>http://www.hpcwire.com/2014/12/03/us-leadership-steps-supercomputing-commitment/feed/0CORAL Signals New Dawn for Exascale Ambitionshttp://www.hpcwire.com/2014/11/14/coral-signals-new-dawn-exascale-ambitions/?utm_source=rss&utm_medium=rss&utm_campaign=coral-signals-new-dawn-exascale-ambitions
http://www.hpcwire.com/2014/11/14/coral-signals-new-dawn-exascale-ambitions/#commentsSat, 15 Nov 2014 04:57:03 +0000http://www.hpcwire.com/?p=16245Just when it started to look as though the architectural course had been set for the next wave of large-scale supercomputers, today offered quite a shakeup to the standard. And it’s not just the amount spent to turn a novel architecture into a pre-exascale reality, although to be fair, it’s rare indeed to see a Read more…

]]>Just when it started to look as though the architectural course had been set for the next wave of large-scale supercomputers, today offered quite a shakeup to the standard.

And it’s not just the amount spent to turn a novel architecture into a pre-exascale reality, although to be fair, it’s rare indeed to see a lump $325 million deal from the Department of Energy to fund new systems with an extra $100 million added to support extreme scale technologies under the FastForward initiative.

Aside from the sheer investment figures, the fascinating part of what’s happening is architectural—and therefore, important in terms of what this means for how centers think about energy consumption, prioritization of extreme scale scientific and security challenges, and of perhaps to some degree, the slightly less dominant position of the U.S. in terms of its national supercomputing capability.

While many expected these first two of the new pre-exascale systems to come out of the CORAL collaboration between Oak Ridge, Lawrence Livermore, and Argonne national laboratories to follow the trends set by Titan and other accelerated Intel-powered x86 machines, those expectations were upended by IBM in today’s announcement about a new class of systems sporting GPUs via a close collaboration among OpenPower members IBM, NVIDIA and Mellanox.

Before we delve into an early overview of the systems, it’s worth noting that the very status of IBM’s role in the future of supercomputing had been called into question over the last year, making this a rather surprising announcement in its own right. From selling off their core HPC-oriented server business to Lenovo to quietly bringing the Blue Gene era to a close, it seemed that their interests were shifting toward a more general Power-based approach for all datacenters—not just HPC with its unique subsets of system choices.

To be fair though, this is still what they’re doing. The massive procurement is for systems that are not exactly distinct HPC offerings per se, but rather more advanced and forward-looking variants on the overall OpenPower push to upend Intel’s dominance. However, with the addition of key technologies from Mellanox and NVIDIA, specifically the latter’s NVLINK technology, the new generation, which we heard for the first time today is called “Power9” IBM has found a way to maintain an edge at the high end while refining the Power approach to the wider datacenter market as these technologies mature and are put to the test at scale….and massive scale, at that.

The result of all of this are two systems that will be installed in the 2017 time frame. Summit, which will be housed at Oak Ridge National Laboratory and will be dedicated to large-scale scientific endeavors ranging from climate modeling to other open science initiatives. The other, called Sierra, is set to be installed at Lawrence Livermore with emphasis on security and weapons stockpile management.

Both are GPU-accelerated systems that have fewer nodes for all the performance they’re able to pack in due to the collaboration between NVIDIA and its Volta architecture, which for those who follow these generations, is two away from where we are now with Pascal expected in 2016. The key here is the NVLink interconnect, which is set to push new limits in terms of making these the “data centric” supercomputers IBM is espousing as the next step beyond supercomputers which have traditionally been valued according only to their floating point capabilities.

We will be exploring the technology in a companion piece that will immediately follow this one and offer a deeper sense of the projected architecture from chip to interconnect. However, to kick off this series, we wanted to provide a touchstone for these first inklings at what exascale-class systems might look like in the U.S. in the years to come.

One thing is for sure, these are packing a lot of punch in a far lessened amount of space. The Summit system at Oak Ridge is expected to push the 150 to 300 peak petaflop barrier, but according to Jeff Nichols, one of the most remarkable aspects of the system is how they were able to work partners IBM, NVIDIA, and Mellanox to create an architecture that can be boiled down to a much smaller number of nodes for far higher performance and a much larger shared memory footprint.

At this stage, Summit will be 5x or more the performance of Titan at 1/5 the size—weighing in at just around 3400 nodes.

“This shared memory capability and lower node count is important to our developers going forward,” he said. “I can say as a computational chemist myself that developers love having fewer nodes to manage and more shared memory per node to work with.”

The “data-centric” approach that IBM has been wrapping around for this announcement in particular is another key feature of the Summit system said Nichols. In addition to having the 5x to 10x performance boost using accelerators, which are already in play at Oak Ridge National Lab on the Titan machine, the capabilities for managing vast amounts of complex simulation data is critical. “We can ingest more data, more varieties of data, and explore modeling and simulation data in new ways that we couldn’t do even with Titan,” he explained. “As we move toward exascale, and this is certainly an early step towards that, we do feel that we have a good path forward in terms of how we’ll develop and deploy future systems along this architectural path” with both computational and data centric needs in mind.

As NVIDIA’s Sumit Gupta told us today that each of these nodes is so powerful that four of them alone today would make the Top 500. “You probably need a couple of racks of servers to get into the Top 500 but GPU performance will advance so much that we’ll get that with just four nodes. The central reason why the largest supercomputers are using accelerators is that CPU alone is too much power. A 150 petaflop system today would be half the power of Vegas—and that isn’t going to improve much.”

Gupta added that NVLink, which will explore in depth in a follow-up technical piece, is central because the CORAL collaborators wanted a fast processor but required a data movement paradigm that would allow data to be handled quickly without extra hops. The traditional CPU and GPU connected traditionally over PCIe has been great for classical high performance computing, he noted, but with high throughput computing users at that scale need the processors to be able to move data efficiently from point to point.

These features are key for the weapons stockpile program that is central to national security where the Sierra system will offer a massive increase in performance and efficiency at Lawrence Livermore. This machine is expected to offer in excess of 100 peak petaflop performance.

As LLNL’s Mike McCoy said today, “Simulation is critical to our stockpile program—it’s critical for us to make sure we never have to return to nuclear testing. But our 3D weapons simulations codes involve 3D applications, multiple physics packages, and our major codes easily run over a million lines not to mention the databases they employ. At the end of the day, key national security decisions are made based on these calculations but the question is always how do we know these systems are going to do the work we need?

In answering his own question he explained the way value of the partnership of OpenPower members. “This is not an off the shelf approach—the partnerships are strong and we share the risk in development and deliver platforms that can rapidly come into production and serve our needs. This effort is achieved through a systems integration approach and there will be tight integration between the vendors and code development teams which is called codesign—this has been interestingly enough applied into the past and led to advances like the Blue Gene L that led to advances and performed. This partnership represents a huge opportunity to deliver these and future first gen exascale systems.

We’ve displaced an Intel-based system at ORNL and we haven’t been there for a number of years. It’s a nice achievement for us,” said IBM’s Dave Turek in a conversation today. But the real value in this news is how it could represent the first seismic shift away from the FLOPS-centric approach to large-scale systems to one that takes the problems of data to heart at the core. “We are aided here not because of anything other than what we’re seeing in terms of the evolution of the marketplace through direct measurement how necessary it is to simultaneously deal with analytics in concert with modeling and simulation. If you look at an example like seismic processing and you go back ten years, the bulk of the time would have been dedicated to the algorithm and making it faster but what’s transformed the conversation is the radical influx of data. Now when you inspect the infrastructure that’s being deployed in examples like this, there’s a tremendous amount of mundane data sorting and managing that’s taking up the compute.

Just as efforts like this have bolstered IBM’s supercomputing products overtime, this new collaboration represents a shift for the company. IBM has in fact established an entirely new HPC roadmap—all around the concept of data centric computing. With these systems, the balance of performance, data movement, memory, and overall footprint are balanced with the needs of the new generations of highly scalable codes under development now with assistance from NVIDIA and IBM.

Follow up with us during your SC travels over the weekend and on Monday for more detail about the architectural features we’ve been able to tease out of a few conversations with IBM, NVIDIA, Mellanox and others.

]]>http://www.hpcwire.com/2014/11/14/coral-signals-new-dawn-exascale-ambitions/feed/0DOE HPC: A Change Is Cominghttp://www.hpcwire.com/2014/11/10/doe-hpc-change-coming/?utm_source=rss&utm_medium=rss&utm_campaign=doe-hpc-change-coming
http://www.hpcwire.com/2014/11/10/doe-hpc-change-coming/#commentsMon, 10 Nov 2014 23:42:45 +0000http://www.hpcwire.com/?p=16064On June 18-19, representatives from six DOE HPC centers met in Oakland, Calif., for the DOE High Performance Operational Review (HPCOR) to discuss the best way to support large-scale data-driven scientific discovery at the DOE national laboratories. Attendees were asked for their feedback on current and future requirements as well as challenges, opportunities and best practices Read more…

]]>On June 18-19, representatives from six DOE HPC centers met in Oakland, Calif., for the DOE High Performance Operational Review (HPCOR) to discuss the best way to support large-scale data-driven scientific discovery at the DOE national laboratories. Attendees were asked for their feedback on current and future requirements as well as challenges, opportunities and best practices relating to eight breakout topics. Their findings are now available in the form of a 56-page report.

The introduction to the DOE High Performance Computing Operational Review (HPCOR) begins with the assertion that the “High Performance Computing (HPC) facilities are on the verge of a paradigm shift in the way they deliver systems and services to science and engineering teams.”

The reason, in a nutshell: the rise of big data.

The report continues:

“Research projects are producing a wide variety of data at unprecedented scale and level of complexity, with community-specific services that are part of the data collection and analysis workflow. The value and cost of data relative to computation is growing and, with it, a recognition that concerns such as reproducibility, provenance, curation, unique referencing, and future availability are going to become the rule rather than the exception in scientific communities.

“Addressing these concerns will impact every facet of facility operations and management. The optimal balance of hardware architectures may change. Greater emphasis will be given to designing software to optimize data movement relative to computational efficiency. Policies about what data is kept, how long it is kept, and how it is accessed will need to adapt. Data access for widespread scientific collaborations will become more important. Processes and policies that ensure proper and secure release of information will need to evolve to both maintain data protection requirements and meet future data sharing demands.”

The primary message of this review is that DOE HPC centers need to change the way they have traditionally operated. There were calls for greater collaboration, tighter integration, the need for standard metrics and benchmarks, as well as toolsets and best practices. By identifying common needs in the areas of training, data management and analysis, among others, the centers can coordinate and collaborate on solutions, the report suggested.

The June meetings were organized into eight breakout sessions focused on the following topics: system configuration; visualization/in situ analysis; data management policies; supporting data-producing facilities and instruments; infrastructure; user training; workflows; and data transfer. Here are just a few of the many relevant points that were raised:

On system configuration for data analytics:

Today, operationally, we think of HPC centers in terms of peak Flop/s. With the shift toward a data-intensive workload, the typical breakdown of compute versus I/O and storage will likely be different. Determining the appropriate ratio common to all centers is likely not useful because different facilities have different compute and analysis needs. However, the order in which system hardware is chosen may change to:

1. Determine the memory/core needed for workloads 2. Determine the amount of SSD or persistent storage needed 3. Determine the parallel file system and network speeds needed for data-intensive computing 4. Allocate the remainder of the budget to Flop/s (CPUs, accelerators, many-core chips)

On data management:

The DOE facilities are taking an active role in helping to identify and shape policies and guidance to enable a data management infrastructure. Ultimately, data will be on an equal footing with computation simulations.

On infrastructure (specifically referring to the public cloud):

Some sites have deployed private cloud architectures effectively. However, public cloud offerings are not tailored toward “largest-scale” data-intensive and data analytics processing. Their use creates availability, reliability, performance, and security concerns for the national laboratory complex.

]]>One of several insightful presentations to come out of the DOE Computational Science Graduate Fellowship was delivered by Katie Antypas, Services Department Head, National Energy Research Scientific Computing Center, Lawrence Berkeley National Laboratory.

In “Preparing Your Application for Advanced Manycore Architectures,” Antypas gives a humorous and on-point overview of major architectural trends in HPC and talks about why they are happening and what users can do to start preparing their codes for the manycore era.

While these terms do not have precise definitions, Antypas suggests that there is about a half order of magnitude difference in going from multicore to manycore. Manycore basically refers to an architecture with lots of lightweight cores (hint: lightweight means slow). “With manycore architecture, the number of cores is more important than the speed of any given core, and in the multicore era, you are still thinking about single cores, and single thread performance,” she says.

Antypas continues by addressing the factors that are contributing to the change of paradigms. The community is coming up against a number of walls, the most prominent of which are the memory wall, the power wall and the parallelism wall. For users, this means their jobs are harder because the onus is on them to change application codes to achieve high performance. Antypas goes over several sources of parallelism, including domain parallelism, thread parallelism, data parallelism, and instruction-level parallelism, and cites reasons to use each of these. Also discussed is the renewed focus on vectorization.

“Regardless of processor architecture, users will need to modify applications to achieve performance,” maintains Antypas. This will necessitate users to:

The presentation also includes an overview of the upcoming NERSC system, Cori, which will employ the newest Phi “Knights Landing” processors. Delivery is scheduled for mid-2016.

Antypas points out that knowing the peak performance (3 teraflops) is not a very helpful statistic, but the fact that it is self-hosted processor, not an accelerator, will be very good for users, who “already have a lot to worry about with finding more threading and parallelism and won’t have to worry about offloading to a coprocessor.”

You can find her one-hour presentation and slides here and embedded below:

]]>http://www.hpcwire.com/2014/10/13/preparing-manycore/feed/0Report Pins HPC Progress to Software Scalabilityhttp://www.hpcwire.com/2014/10/02/report-pins-hpc-progress-software-scalability/?utm_source=rss&utm_medium=rss&utm_campaign=report-pins-hpc-progress-software-scalability
http://www.hpcwire.com/2014/10/02/report-pins-hpc-progress-software-scalability/#commentsThu, 02 Oct 2014 21:49:01 +0000http://www.hpcwire.com/?p=15456A new report from the Council on Competitiveness (Council) explores how US government investment in HPC benefits America’s industrial and economic competitiveness. The Council on Competitiveness, with funding from the US Department of Energy, engaged Intersect360 Research to interview HPC-using US organizations across a wide range of industries. Over a six month period, Intersect360 conducted Read more…

]]>A new report from the Council on Competitiveness (Council) explores how US government investment in HPC benefits America’s industrial and economic competitiveness. The Council on Competitiveness, with funding from the US Department of Energy, engaged Intersect360 Research to interview HPC-using US organizations across a wide range of industries.

Over a six month period, Intersect360 conducted 14 in-depth interviews and collected 101 comprehensive online surveys. The findings were published and released today in a 75-page report, entitled “Solve. The Exascale Effect: the Benefits of Supercomputing Investment for U.S. Industry.”

The Council, which is known for popularizing the sentiment “to outcompute is to outcompete,” views HPC as instrumental to the progress of the United States in science, engineering and business. The report reiterates the value that HPC brings to the collective table and identifies key areas of targeted investment that would provide the greatest benefits.

Key findings:

Two-thirds of US companies that use HPC say that “increasing performance of computational models is a matter of competitive survival.”

More than one-third of US industry representatives surveyed claim their most demanding high performance computing applications could utilize 1,000-fold increases in computing capability over the next five years.

One of the comments speaks directly to the difficulty humans have conceptualizing big numbers. “I tend to think in terms of 100x,” wrote one respondent. “However, if I view today’s compute power to what we used for modeling and simulation just a couple of decades ago, I have to believe that 1,000x will be needed. I just can’t get my head around this yet.”

Software scalability was cited as the most significant limiting factor to achieve the next 10-fold improvement in performance, and it was given as the second most significant limiting factor to reach a 1,000-fold improvement. Cost of hardware was the number one impediment to 1,000-fold improvement, and making the case to management also ranked as a significant limitation.

There is strong agreement that government investment in leading-edge HPC benefits US companies and industries, but often in non-quantifiable ways. Respondents also note that the links between government and industry need to be strengthened.

Several of the comments and responses from long-form interviews centered on the idea of trickle-down or the ripple effect, where once elite-class technologies find their way into commodity systems. The biggest promise of exascale computing, viewed this way, is affordable petascale computing.

In the words of one commenter, “The fastest benefit that exascale is going to have is that the petaflop will become extremely inexpensive, so that every midsize company will be able to have a petaflop supercomputer. But industry doesn’t adopt the leading edge of HPC; they’re usually several ticks behind.”

One of the strongest refrains in the report was the pressing need for application software that will exploit hardware scalability. Recognizing that different users have different needs, the report suggests several program models for government consideration. These are targeted programs for different segments of the commercial space, broken down into in-house software; ISVs; open source; and entry-level HPC. With regard to entry-level adopters, the report contends “the ability to integrate HPC into the workflow is a bigger challenge than scalability.”

]]>http://www.hpcwire.com/2014/10/02/report-pins-hpc-progress-software-scalability/feed/0A New Era in HPChttp://www.hpcwire.com/2014/09/17/new-era-hpc/?utm_source=rss&utm_medium=rss&utm_campaign=new-era-hpc
http://www.hpcwire.com/2014/09/17/new-era-hpc/#commentsWed, 17 Sep 2014 21:24:31 +0000http://www.hpcwire.com/?p=15188A recent presentation that came out of the 2014 Computational Science Graduate Fellowship (CSGF) HPC workshop, held in July in Arlington, Virginia, holds that HPC is on the cusp of new era. In “Supercomputing 101: A History of Platform Evolution and Future Trends,” Rob Neely, Associate Division Leader, Center for Applied Scientific Computing, Lawrence Livermore National Read more…

]]>A recent presentation that came out of the 2014 Computational Science Graduate Fellowship (CSGF) HPC workshop, held in July in Arlington, Virginia, holds that HPC is on the cusp of new era.

In “Supercomputing 101: A History of Platform Evolution and Future Trends,” Rob Neely, Associate Division Leader, Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, traces the history of the high performance computing as defined by the dominant platforms, starting with mainframes and continuing through vector architectures, massively parallel architectures, and the current emerging trends that will define the upcoming exascale era.

With the first slide, Neely identifies three major eras of computing: mainframes, vector era, and distributed memory era (MPP). But there’s a fourth emerging era that has so-far proved to be difficult to define.

“There is no one defining feature of this new era like there has been in the past,” says Neely. “This manycore era for lack of a better term is where we are now. It’s characterized by accelerators, lots and lots of simple cores. It’s really all about extracting parallelism from your applications.

“If you notice this curve actually bent upward a bit it’s all due to the ability for us to scale out machines, add and more more processors to these architectures instead of just relying on single processor performance increases (Moore’s law). The last twenty years we’ve gotten spoiled a little bit by the acceleration in capabilities in high-performance computing. Unless we do something very smart very soon we are going to lose that acceleration.”

Keeping this progress going is crucial for scientific discovery and it’s an integral part of the DOE’s mission. This rests on a simple equation:

The rest of the one-hour talk charts the history of supercomputing up through the current “manycore” paradigm. Highlights include the difficulty of parallelism, the need for scalability, the upcoming merger of HPC and data science (see minute 56:00), and the “exascale problem.” Because of the level of complexity involved in designing the next-generation of hardware and software, support is building for codesign initiatives, which facilitate deep collaboration between application developers and vendors. DesignForward and FastForward are two such DOE programs.