HPCwire » Dellhttp://www.hpcwire.com
Since 1986 - Covering the Fastest Computers in the World and the People Who Run ThemSun, 02 Aug 2015 12:39:43 +0000en-UShourly1http://wordpress.org/?v=4.2.3Dell Aims PowerEdge C-Series Platform for HPC and Beyondhttp://www.hpcwire.com/2015/06/30/dell-aims-poweredge-c-series-platform-for-hpc-and-beyond/?utm_source=rss&utm_medium=rss&utm_campaign=dell-aims-poweredge-c-series-platform-for-hpc-and-beyond
http://www.hpcwire.com/2015/06/30/dell-aims-poweredge-c-series-platform-for-hpc-and-beyond/#commentsTue, 30 Jun 2015 23:25:12 +0000http://www.hpcwire.com/?p=19168Dell has positioned its latest PowerEdge C-series platform to meet the needs of both traditional HPC and the hyperscale market. The recently hatched PowerEdge C6320 is outfitted with the latest generation Intel Xeon E5-2600 v3 processors, providing up to 18 cores per socket (144 cores per 2U chassis), up to 512GB of DDR4 memory and Read more…

]]>Dell has positioned its latest PowerEdge C-series platform to meet the needs of both traditional HPC and the hyperscale market. The recently hatched PowerEdge C6320 is outfitted with the latest generation Intel Xeon E5-2600 v3 processors, providing up to 18 cores per socket (144 cores per 2U chassis), up to 512GB of DDR4 memory and up to 72TB of flexible local storage.

HPCwire spoke with Brian Payne, executive director of Dell Server Solutions, to explore how the new PowerEdge C6320 fits in with Dell’s broader portfolio and approach to the widening HPC space.

With two Intel Xeon E5- 2699 processors, the new server offers a 2x performance improvement on the Linpack benchmark, delivering 999 gigaflops compared with 498 gigaflops from the previous generation PowerEdge C6220 (outfitted with Xeon E5-2697 CPUs). The C6320 also achieved a 45 percent improvement on the SPECint_rate benchmark and up to 28 percent better power efficiency on the Spec_Power benchmark.

The PowerEdge C6320 employs a “4in2U” design, meaning it has four independent server nodes in a 2U chassis, which offers a density that exceeds that of traditional rack servers, and is twice as dense as a 1U server, according to the company. “It also provides an interesting and unique balance of memory and storage and connectivity options,” Payne noted.

In the HPC sphere, Dells sees the C4130 as addressing pain points like scarce datacenter space, delivering double the density from a traditional rack server, allowing customers to scale more compute nodes per rack. Many of the datacenters in the HPC space have been engineered to take advantage of that density, meaning that they have the requisite power and cooling infrastructure in place, said Payne.

Beyond addressing the density, Dell recognizes that the HPC space is changing to become more heterogeneous, and there is burgeoning demand for acceleration technology coming from an ever-widening user group that includes technical computing, scientific research, financial services, oil and gas exploration, and medical imaging.

Customers with problems from these and other domains that lend themselves to being solved more efficiently by GPGPUs and Xeon Phi have the option to pair the PowerEdge C6320 with the accelerator-optimized PowerEdge C4130. Introduced back in Q4 of 2014, the PowerEdge C4130 is a 1U, 2-socket server capable of supporting up to four full-powered GPUs or Xeon Phis.

Dell says its PowerEdge C4130 offers 33 percent better GPU/accelerator density than its closest competitors and 400 percent more PCIe GPU/accelerators per processor per rack than a comparable HP system. A single 1U server delivers 7.2 teraflops and has a performance/watt ratio of up to 4.17 gigaflops per watt.

Dell works closely with the major coprocessor suppliers to align roadmaps and ensure that future developments can be deployed in a timely manner. Currently, the C4130 supports NVIDIA’s Tesla K40 and K80 parts; Intel Phi 7120P, 5110P and 3120P SKUs; and AMD’s Firepro line, including the S9150 and S9100 graphics cards.

Advanced seismic data processing is one of the segments benefiting from accelerator technology. Dell has already scored a win in this market by delivering a combination of the 4in2U form factor and the C4130 server to a customer in the undersea oil and gas space. The unnamed business was able to double compute capacity with 50 percent fewer servers, supporting new proprietary analytics, according to Dell.

Dell’s marquis customer in the academic space is the University of California San Diego, which relied on the new PowerEdge C-series for its Comet cluster. The new petascale supercomputer has been described as “supercomputing for the 99 percent” because it will serve the large number of researchers who don’t have the resources to build their own cluster. Deployed by the San Diego Supercomputer Center (SDSC), Comet leverages 27 racks of PowerEdge C6320, totaling 1,944 nodes or 46,656 cores, a five-fold increase in compute capacity compared with SDSC’s previous system.

Payne noted that SDSC was able to get this cluster powered up and starting to run test workloads in under two weeks, months ahead of Dell’s general availability, which begins next month. Payne pointed to the packaging of the platform as a key enabler. “Instead of racking up four discrete rack servers, having those in a single chassis simplifies that process and can help with the speed of deployment,” he said.

“Our goal is to democratize technology and help the [HPC] industry move forward to drive innovations and [discovery],” he stated. “The way we can do that is by driving standardization and by bringing down the marginal cost of compute – to increase their productivity and also engage with them to understand the nuances and challenges that they have and adapt to those. In the case of San Diego Supercomputing Center, they had a timeline that didn’t necessarily line up with our product general release timeline and we found a way to adapt and respond to their timing needs to fulfill the demand for this latest platform.”

Payne added that Dell is opening up market opportunities beyond high-performance computing. The PowerEdge C6320 along with its embedded management software will be used as a host platform for hyper-converged systems such as Dell Engineered Solutions for VMware EVO: RAIL and Dell’s XC Series of Web-scale Converged Appliances.

By targeting the hyper-converged market, Dell was able to design in a new capability in this product class, a management capability called iDRAC8 with Lifecycle Controller. The tool allows customers to rapidly deploy, monitor and update their infrastructure layer. Larger high-performance cluster users may have the means to build their own tools and capabilities. For everyone else, Dell is making this technology available in its PowerEdge C-Series line. Prior to that it had only been available in the mainstream PowerEdge lineup. For those that don’t need this capability, Dell can still deliver the baseline capabilities without the added cost or complexity burden.

“We are seeing more applications of high-performance computing in mainstream industry, outside the domain of traditional national labs, traditional universities,” said Payne, addressing the symbiosis that is occurring at the interplay of HPC, enterprise and big data. “Going into R&D departments, in oil and gas and other segments that are building out big systems, you see some big data problems being treated very similarly to the way high-performance computing problems are solved.

“You have to think about the skill set and the staff in the IT department that is responsible for deploying and administering this infrastructure, and many times that staff is hosting and supporting a diverse set of workloads for the company – from email to database and now high-performance computing as well as some Web technologies. These folks were trained and accustomed to using server OEM tools to manage the infrastructure and they rely on those versus building their own, now we have extended and given them something that they are familiar with that makes it easier for them to take on a high-performance computing project.”

The new server starts at roughly $16,600 and includes the chassis and four C6320 nodes (2x Xeon E5-2603 v3 CPUs, 2x8GB DDR4 memory, 1×2.5-inch 7200rpm 250 GB SATA, iDRAC8 Express, and 3-year warranty). More details, including networking options, are available on this product page.

]]>http://www.hpcwire.com/2015/06/30/dell-aims-poweredge-c-series-platform-for-hpc-and-beyond/feed/0First HPC Service at University of Sydney Tackles Big Science Problemshttp://www.hpcwire.com/2015/06/15/first-hpc-service-at-university-of-sydney-tackles-big-science-problems/?utm_source=rss&utm_medium=rss&utm_campaign=first-hpc-service-at-university-of-sydney-tackles-big-science-problems
http://www.hpcwire.com/2015/06/15/first-hpc-service-at-university-of-sydney-tackles-big-science-problems/#commentsMon, 15 Jun 2015 22:34:10 +0000http://www.hpcwire.com/?p=18930A Dell cluster called Artemis is on track to solve problems important to Australia and the world. Commissioned by the University of Sydney, the new cluster is the university’s first high-performance computer (HPC) service. Like research institutions everywhere, the University of Sydney is seeking to accommodate increasingly data-centric workflows. This new HPC service, established via a partnership Read more…

]]>A Dell cluster called Artemis is on track to solve problems important to Australia and the world. Commissioned by the University of Sydney, the new cluster is the university’s first high-performance computer (HPC) service.

Like research institutions everywhere, the University of Sydney is seeking to accommodate increasingly data-centric workflows. This new HPC service, established via a partnership with Dell Australia, was custom designed to facilitate access to big data for research across a range of disciplines.

“Artemis will enable researchers from diverse fields to perform state-of-the-art computational analysis and improve collaboration between research groups by providing a common set of tools and capabilities with consistent access mechanisms,” said NHMRC Australia Fellow, Professor Edward Holmes from the Charles Perkins Centre.

Artemis will be available at no cost to University of Sydney researchers and will support a number of scientific domains, including molecular biology, economics, mechanical engineering and physical oceanography.

One of the primary use cases for the 1,512-core Dell cluster is unlocking the secrets of infectious diseases, like the Ebola virus. Professor Holmes is part of a team that is sampling DNA data to track the spread of Ebola in West Africa. High-performance computers can boost processing and analysis times by an order of magnitude, which according to Dr. Holmes, enables real-time epidemic tracking, where previously Ebola researchers were limited to retrospective studies.

This real-time capability facilitates targeted emergency response strategies, integral to getting resources where they are most needed to help the affected population and ultimately block further transmission.

Artemis is a fully managed service that is housed inside a Dell datacenter and connected to the university network over 10 Gigabit Ethernet. The system is comprised of 56 standard compute nodes, two high memory compute nodes offering 10 terabytes of fast DDR4 memory, and five GPU compute nodes, each with two NVIDIA Tesla K40 graphics processors. Each node contains two Intel Xeon E5-2680 V3 12-core processors for a total of 1,512 Haswell cores in all. The standard and high-memory nodes are based on Dell PowerEdge R630 servers while the GPU nodes are based on Dell PowerEdge R730 boxes. The 56Gb/s FDR non-blocking InfiniBand fabric connects all nodes and the high performance Lustre file system.

]]>http://www.hpcwire.com/2015/06/15/first-hpc-service-at-university-of-sydney-tackles-big-science-problems/feed/0DDN, IBM Lead Large HPC Storage Supplier Packhttp://www.hpcwire.com/2015/06/15/ddn-ibm-lead-large-hpc-storage-supplier-pack/?utm_source=rss&utm_medium=rss&utm_campaign=ddn-ibm-lead-large-hpc-storage-supplier-pack
http://www.hpcwire.com/2015/06/15/ddn-ibm-lead-large-hpc-storage-supplier-pack/#commentsMon, 15 Jun 2015 17:20:15 +0000http://www.hpcwire.com/?p=18926No surprise, storage requirements in HPC keep rising and solid state drives (SSD) continue making inroads at the node level according to Inersect360 Research’s annual HPC User Site Census: Storage survey released last month. Also notable, DataDirect Networks and IBM remain neck-and-neck atop of a long list of suppliers in what is still a fragmented Read more…

]]>No surprise, storage requirements in HPC keep rising and solid state drives (SSD) continue making inroads at the node level according to Inersect360 Research’s annual HPC User Site Census: Storage survey released last month. Also notable, DataDirect Networks and IBM remain neck-and-neck atop of a long list of suppliers in what is still a fragmented market.

“We are seeing great adoption of SSDs, which is reversing the trend toward diskless nodes in clusters, where now we are seeing SSDs used as those local disks. SSDs were reported in 21% of systems installed since the beginning of 2013,” said Addison Snell, CEO, Intersect360 Research. Very few of the systems with SSDs reported 100% usage, which suggests that most SSDs are being used as an additional tier between memory and traditional hard disks in order to improve storage-to-memory latencies.

Snell noted two other prominent trends: Big Data represents a large and growing opportunity for HPC storage providers to apply their solutions to non-traditional HPC buyers; Complexity still exists in the storage stack and consolidation is necessary to improve productivity and ease of storage management. The latter should open up opportunities for integrated storage solutions.

Among the report’s key findings:

Storage capacity continues its onward and upward trend. Total storage capacity at over 400 sites exceeded three exabytes, triple the total storage capacity reported in 2012. While petabyte storage system installations are increasing, large storage systems only account for 9% of all storage systems reported. 50 TB to 100 TB storage systems account for the largest segment, with 30% share.

Parallel file systems account for 44% of the 342 named storage management packages, which is about the same as last year. Most parallel file system usage was found in storage systems with capacities of 200 TB or more and in storage systems last modified in 2013+. Most storage management software in use by the surveyed sites (59%) was provided by the storage system vendor. GPFS and Lustre continue to be most frequently mentioned named storage management packages with 16% and 15% of the systems, respectively.

Ethernet, and in particular, 10 Gigabit Ethernet, was the network protocol for almost 60% of storage systems. InfiniBand, however, captured more installations for storage systems installed in 2013+ with 47% share, suggesting successful competition against the higher speeds of Ethernet (10G, 40G, and 100G).

“We’ve seen InfiniBand continuing to grow as a storage interconnect, which is important for the ongoing future of InfiniBand. The pattern is similar to what happened with system interconnects, with InfiniBand replacing interconnects other than Ethernet. Ethernet stays pretty steady, and the market is moving toward the two interconnects covering the bulk of the market,” noted Snell.

Despite increasing storage capacity requirements and the flurry of acquisitions in recent years, no dominant vendor has surfaced.

The report notes, “Historically, the major enterprise storage vendors — EMC, NetApp, and Hitachi Data Systems (HDS) — have all maintained healthy shares of the HPC market based on volume and their overall presence in the market. With each company acquiring high-performance storage technologies (Isilon by EMC, LSI Engenio by NetApp, BlueArc by HDS), Intersect360 Research had expected these players to make gains in high-performance storage markets. This has not occurred, and recently, all three companies appear to be de-emphasizing segments oriented toward scalability and performance. This has created a market opportunity for others to gain share, particularly DataDirect Networks (DDN) and Seagate (which acquired Xyratex), though IBM and Panasas may see some benefit as well.” Released in May, the Intersect360 Research HPC Site Census Storage report is available at www.intersect360.com.

]]>http://www.hpcwire.com/2015/06/15/ddn-ibm-lead-large-hpc-storage-supplier-pack/feed/0Tulane Accelerates Discovery with Hybrid Supercomputerhttp://www.hpcwire.com/2014/12/16/tulane-accelerates-discovery-hybrid-supercomputer/?utm_source=rss&utm_medium=rss&utm_campaign=tulane-accelerates-discovery-hybrid-supercomputer
http://www.hpcwire.com/2014/12/16/tulane-accelerates-discovery-hybrid-supercomputer/#commentsTue, 16 Dec 2014 14:59:40 +0000http://www.hpcwire.com/?p=16821The rich culture and distinctive charm of the city of New Orleans served as the backdrop for this year’s annual Supercomputing Conference (SC14). If you haven’t been before, residents of the Big Easy will urge you to visit the uptown campus of Tulane University. Renowned for its beautiful trees and landscaping, the university is also a prominent Read more…

]]>The rich culture and distinctive charm of the city of New Orleans served as the backdrop for this year’s annual Supercomputing Conference (SC14). If you haven’t been before, residents of the Big Easy will urge you to visit the uptown campus of Tulane University. Renowned for its beautiful trees and landscaping, the university is also a prominent research facility with TOP500-level computing prowess.

Just in time for the fall semester, Tulane welcomed a new supercomputer named Cypress. Manufactured by Dell, the system illustrates just how far the city and Tulane have come since Hurricane Katrina ravaged the area in late August 2005.

As Provost and Senior Vice President for Academic Affairs of Tulane University Michael Bernstein shares in a video made in partnership with the computer vendor, “Katrina completely reoriented the relationship between the university and the city.”

“It is difficult to describe how close to death Tulane University was and the city of New Orleans was,” adds Laura S. Levy, PhD, Vice President of Research, Tulane University. “Thanks to the vision of our leadership and the hard work of lots of people, Tulane University led the city and rose out of the ashes.”

Charlie McMahon, Vice President of IT and Chief Technology Officer, goes on to describe the project in more detail.

“If we take a look at the state of high-performance computing at Tulane, when Katrina came through, it was essentially nonexistent,” he says. “As Tulane was looking at how to build an architecture, Dell became clearly the leading partner for us to figure this out.

“Dell leveraged their relationship with Intel, who in turn leveraged their relationship with Cloudera. For one of the first times ever in production, we are going to have an environment that takes a high-performance file system, Lustre, layers Hadoop on top of that and allows us to do big data analytics using Hadoop in an HPC environment. That what makes the approach Tulane is taking so unique.”

Cypress will be used for a wide-range of workloads including sea-level calculations, traumatic brain injury studies, and other data-heavy tasks, such as molecular docking in support of drug discovery. The high-throughput computing environment enables research that was not possible before, says one user, with projects that would have taken months or years reduced to a matter of days.

“This system allows users to move seamlessly between big data analytics and traditional high-performance computing capabilities, enabling research,” McMahon said in a release put out by the school. “We hope to demonstrate to the university that by using this supercomputing capability, our researchers are able to tackle bigger and more complex problems, to publish more papers and win more research grants.”

McMahon adds that Tulane has been contracted by the NFL Players Association to carry out long-term tracking of professional football players for whom traumatic brain injury is an occupational hazard.

Gabriel Feldman, J.D., Tulane University, director of Sports Law, and co-director of Tulane Center for Sport, says Cypress will help Tulane researchers determine the risks in playing sports – not only for football, but all collision or contact sports. The effort seeks to facilitate a safer experience for everyone involved by identifying ways to treat and prevent such injuries.

]]>http://www.hpcwire.com/2014/12/16/tulane-accelerates-discovery-hybrid-supercomputer/feed/0TACC’s New Director Shares Strategy, System Futureshttp://www.hpcwire.com/2014/07/03/taccs-new-director-shares-strategy-system-futures/?utm_source=rss&utm_medium=rss&utm_campaign=taccs-new-director-shares-strategy-system-futures
http://www.hpcwire.com/2014/07/03/taccs-new-director-shares-strategy-system-futures/#commentsFri, 04 Jul 2014 00:12:09 +0000http://www.hpcwire.com/?p=13676Each of the national labs and supercomputing sites have defining characteristics or “personalities” that are most often driven by the user communities that exploit their computational resources. Certain centers are affiliated with particular missions or needs. Some tend to prefer architectures that maximize overall performance and size in a way that tops the Top 500 Read more…

]]>Each of the national labs and supercomputing sites have defining characteristics or “personalities” that are most often driven by the user communities that exploit their computational resources. Certain centers are affiliated with particular missions or needs. Some tend to prefer architectures that maximize overall performance and size in a way that tops the Top 500 charts; others are conceived around specific application needs in energy, astrophysics, life sciences or other areas—and still others are known for making bold, diverse architectural decisions because their user bases are so varied.

Among those centers that fall into the last category is certainly the Texas Advanced Computing Center (TACC), which over the last several years, has become a site to watch because of the constant string of innovative choices. With a very broad user base coming in from NSF-funded projects, the team has had to balance the desire for high performance, availability, efficiency and accessibility with their goals to explore the potentially disruptive supercomputing technologies.

For instance, the Stampede system was the first supercomputer to successfully blend together GPU and Xeon Phis to create a hybrid that could allow users to test, optimize, and run their applications across different architectures for variable performance gains. Other machines, including most recently, Wrangler, are dedicated to exploring the emerging class of data-intensive or “big data” problems in science that traditional supercomputers aren’t designed to tackle. This has meant the team behind machines like these has had to think beyond normal courses of system design and explore new technologies.

Leading the charge for all of these missions—and several of TACC’s most interesting systems (Stampede and Wrangler in particular) over the last several years—is Dan Stanzione. He was formally named Executive Director of TACC this week, following his long stint as deputy director, which began in 2009. Before that, he was actively involved in architectural and system design choices at other centers, including his role as founding director of the Fulton High Performance Computing Initiative at Arizona State University.

Stanzione has led some bold choices at TACC and is making system architecture diversity a central theme in his tenure going forward. Instead of just focusing on large-scale HPC and simulation resources (represented by Stampede), he is pushing big data and cloud computing as other initiatives. ““We need an ecosystem of different kinds of systems to support the growing diversity of scientific computing workloads,” he told us this week. He said the team will be leveraging lessons learned on both the scalable manycore architectural front from Stampede with those on the data-intensive side as represented by the Wrangler machine. “Our future systems will fuse these two system types—blending scalable manycore techniques with superfast IO.” Further, leveraging all of this using cloud models via their OpenStack-based Rodeo cluster for front-end applications when appropriate will add further possibilities for their many users.

Just to level set on the current supercomputing stable at TACC, Stanzione described the future of the #7 ranked Dell-built Stampede syoercomputer, which is set for an upgrade cycle within the next year and a half with some variant of the upcoming Knight’s Landing architecture at the core. While we weren’t able to determine details about their architectural selection (and since Intel hasn’t officially released dates and full specs for the coming self-hosted parts) we’ll have to wait until more is clear this summer The team is currently also looking into possibilities for its next big system, which he says we’ll learn more about sometime this year.

Other important machines at TACC, including the interactive visualization cluster, Maverick, and Lonestar, which is now being used as a throughput-oriented system are seeing solid utilization but as it stands, their big clusters are saturated. “In this quarter alone the demand for Stampede was 6x what we had available,” Stanzione said. They will push some of those workloads over to Wrangler when it goes live at the beginning of 2015, but that system is dedicated to exploring some non-traditional HPC problems given its unique architecture and purpose-built design for handling large-scale graph algorithms and problems and massive analytics across large datasets.

The key to these capabilities is one of the more interesting stories out there—even if details are still somewhat light. At the heart of Wrangler is the technology provided by DSSD, the Andy Bechtolsheim startup that had a long life in stealth and then immediately went into the paws of EMC. We talked technical detail about the system with co-PI Chris Jordan back in 2013 if you’d like to pop over and review, but needless to say, seeing what’s possible with 100,000 flash chips stacked into an array and attached directly via PCI will be interesting to watch. Stanzione was giddy about this machine in particular, noting that more details about it will emerge in the next couple of months.

The important thing, he says, is that having a dedicated data-intensive machine will let TACC handle a new class of applications that have never fit into their HPC portfolio to date. This includes a broad array of life sciences, economics, humanities and other projects that require analytics operations that aren’t a good fit for traditional supercomputers.

“It’s a great time to be alive in HPC,” Stanzione said. And it seems especially good times at TACC where ground will be broken on new facilities to host coming systems, workspaces for the systems groups, a visualization area, and much-needed office space to accommodate the additional 80 people the center is bringing into the fold.

“I want to continue with the successes we’ve had and push with our NSF, open science and HPC projects. I look forward to diversifying these to help more users across different disciplines and missions. We want to be a leader in HPC but also in data and cloud, pushing manycore and other new technologies.”

]]>http://www.hpcwire.com/2014/07/03/taccs-new-director-shares-strategy-system-futures/feed/0What Drives Investment in the Middle of HPC?http://www.hpcwire.com/2014/05/15/drives-investment-middle-hpc/?utm_source=rss&utm_medium=rss&utm_campaign=drives-investment-middle-hpc
http://www.hpcwire.com/2014/05/15/drives-investment-middle-hpc/#commentsFri, 16 May 2014 05:52:39 +0000http://www.hpcwire.com/?p=12832When it comes to covering supercomputers, the most attention falls on the front runners on the Top 500. However, a closer look at the tail-end of the rankings reveals some rather interesting use cases—not to mention courses of development, system design, and user-driven requirements for future build out. The University of Florida is home to Read more…

]]>When it comes to covering supercomputers, the most attention falls on the front runners on the Top 500. However, a closer look at the tail-end of the rankings reveals some rather interesting use cases—not to mention courses of development, system design, and user-driven requirements for future build out.

The University of Florida is home to one expanding system, which rests just at the cutoff of the top supercomputing rankings at #493. The university’s Director of Research Computing, Dr. Erik Deumens, tells us the real purpose of the system is to support as many diverse applications as possible with as few queue barriers as possible. While this is a familiar claim no matter what size the site may be, the team has gone through great lengths to ensure that current developments to make their flagship system, called HiPerGator, are fed solely by user demand.

It might not be surprising then, at least to those in research computing, that the demand for the latest generation of processors with a 10 or 20% performance jump is far less critical than simply being able to onboard an application without a long queue and run in a reasonable amount of time. But meeting that need requires some serious thought about capacity, scheduling, and meeting diverse application requirements. In other words, for those tuning in for the ultra-high performance computing story, this isn’t the most exciting tale, but there are some important lessons to be learned from his team’s experiences working with a broad range of applications and over 600 users to find out what really creates a fully functional system—all based on what amounts to an “economic” decision-making process for their HPC investments.

In essence, the economics of demand determine the spending decisions at the University of Florida and several other similar centers. This isn’t so different than the large scientific computing sites in theory, except user requests trump all—including power or other considerations. “If the users are asking for the latest novel technology but it’s not the most efficient, we aren’t going to deny them what they need for their research,” says Deumens. In the case of HiPerGator, the university funds the system and staffing so that that individual researchers can use their grants to buy a desired number of cores for their jobs. Flexibility is built into the “purchase” as users can go past 10x what they requested as needed to avoid added complexity in terms of scheduling and managing their jobs. Deumens and team use Moab and Torque to handle the many requests, in addition to offering the capability for more sophisticated users to fine-tune their requests according to the mix of available architectures. The system tends to run under its maximum capacity at all times so that there are not long wait times since the one thing that researchers want—timely (if not immediate) access to computational resources that run in the anticipated timeframe. And essentially, says Deumens, everyone is happy.

For some background, the HiPerGator system in its original incarnation (announced last year) offered up over 16,000 AMD “Abu Dhabi” cores with Dell underpinnings, a 2.88 petabyte Terascala-built Lustre-based system and Mellanox’s Infiniband throughout. They’ve since added an additional round of cores from pre-existing systems (both Intel and AMD), bringing their HPC core count to over 21,000. There is a set of nodes that provide a total of 80 GPUs in addition and more planned for the future—in addition to the possibility of Xeon Phi cores as well as they plan their build-out to be completed by this time next year. “There are always exceptions but most of our users don’t care what processor generation they’re running on. They just want to get their work done.” And all the while, his team keeps very careful track of what the users are looking for in terms of new or existing hardware and they use this information to tally what they ask vendors for during each year’s hardware and software buying cycles.

To put this in context, when the original HiPerGator emerged, there were a total of 8 GPUs available to researchers, which they bought simply to support the mission of a semester-long class that required them for special projects. However, once researchers at the university knew they were available, they began experimenting with porting codes, including AMBER on the molecular dynamics front. These development activities led the application teams to desire full production runs, which required more GPUs. And so their unexpected influx of GPU nodes occurred organically. This is the exact type of case that will feed how the next generation of their system develops—actual user interest means more “purchases” from researchers, but to keep their one main goal of providing solid resources without the wait times, they’ll make sure to supply ample nodes with whatever the research community seems to desire.

Deumens and team are taking those desires on the road in the next months. They’re currently in the midst of looking for vendors to help them supply the needs of HiPerGator 2, which again, is slated for this time next year. He gave us a sense of what works—and doesn’t—when it comes to supporting research at a university that wants to become a top tier research center based on its HPC capabilities.

First, he says, there are some successes in terms of their approach to scheduling. It used to be a manual process, but has been eased through their Moab and Torque engines. Further, he highlighted the increasing role of Galaxy, the open source scientific gateway project for creating, tracking and sharing scientific workflows that has taken off in the biosciences community. He also says that for a research center their size, the more cores they have available, the better. While some of their users can take advantage of their Infiniband fabric and run MPI or SMP jobs, in the end it’s all about getting up and running.

The other element that has worked for research teams at the University of Florida is having a stable, strong storage system like their Terascala solution, which is capable of handling massive data flows—an increasing problem for all scientific computing sites as data demands scramble to meet the computing capacity that is available.

What’s missing from their system is something that will be difficult for any of the vendors who supply the next iteration of the machine next year. And it’s something we’ve heard from much larger centers. There is a dramatic need to make a “super app” of sorts that turns a researcher’s desktop machine into a direct link to the supercomputing site, handling scheduling, data movement, and output in a seamless, portable interface. While this seems like it might be easy in this era of web-based interfaces for everything, it’s what’s really missing for centers designed around simply serving scientific users—and something that he and his team will continue to look toward in the coming years.

It was interesting to listen to the difference in concerns about power, performance and ease of access from the perspective of a much smaller HPC site than the top ten system managers we so often talk to. Power is always a concern, of course, but at smaller scale when exascale is something for the DoE and other government labs internationally to worry about, the problems of real-world daily operations boil down to one simple factor—make a supercomputer easy to use, quick to load into, and predictable in its time to result. A humbling reminder after so many conversations about eeking performance out of the hottest processors, largest systems and biggest power footprints on the planet.

]]>http://www.hpcwire.com/2014/05/15/drives-investment-middle-hpc/feed/0How the iForge Cluster is Manufacturing Results for Big Industryhttp://www.hpcwire.com/2014/02/26/iforge-cluster-manufacturing-results-big-industry/?utm_source=rss&utm_medium=rss&utm_campaign=iforge-cluster-manufacturing-results-big-industry
http://www.hpcwire.com/2014/02/26/iforge-cluster-manufacturing-results-big-industry/#commentsThu, 27 Feb 2014 05:07:05 +0000http://www.hpcwire.com/?p=3892When one thinks of major manufacturing companies, including Boeing, Proctor and Gamble, John Deere, Caterpillar, Dow, GE and others, from a systems and software perspective, there is little doubt that the competitive edge lies in high performance computing. But for many of these companies, it’s not simply a matter of plugging engineering codes into high Read more…

]]>When one thinks of major manufacturing companies, including Boeing, Proctor and Gamble, John Deere, Caterpillar, Dow, GE and others, from a systems and software perspective, there is little doubt that the competitive edge lies in high performance computing. But for many of these companies, it’s not simply a matter of plugging engineering codes into high core-count, accelerated supercomputers to magically realize better results.

Finite element analysis, computational fluid dynamics and homegrown codes at the largest manufacturing companies have their own unique system needs—but tend to run inside daily workflows where experimentation with new architectures and approaches are pushed down the chain due to competing demands from across the organization. According to Merle Giles, who leads the private sector program and economic development initiatives at the National Center for Supercomputing Applications (NCSA), most of the common engineering applications tend to hum nicely at around 1000 cores. They don’t’ tend to require acceleration but do need major memory to handle decomposition and other critical elements.

So while Giles and his team at NCSA have access to Blue Waters, core counts and scientific application performance are secondary. For the users he’s targeting–those with commercial and mission-critical home-cooked engineering codes–such a massive resource might not have the specific appeal of another far smaller (but far more targeted) option: A pared down, but finely tuned cluster specifically built to address the experimental needs of the “power users” at leading manufacturing companies. Giles’ team has such a hardware resource…and they’re also able to collect the varied expertise across both NCSA and the University of Illinois to bring world class support to bear as well.

To put this difference between system needs in context, consider that memory-tied engineering codes on Blue Waters with its 64 GB of RAM on a single node might do reasonably well, but take a much smaller cluster, in this case, the iForge system that Giles and his team operate in the Digital Manufacturing Lab at NCSA, and these codes can sing through decomposition on 256 GB instead.

The iForge cluster has been benchmarked for common CFD and FEA applications against the mighty Blue Waters for confirmation—which has further bolstered Giles and teams’ mission to keep pushing the edges of what’s useful for the manufacturing companies they’re serving with their private cluster that’s reserved for the experimental “power users” from the big companies listed above. “We want to be complementary to Blue Waters, not redundant,” Giles explained.

If you haven’t heard of iForge, it’s because although it’s been around for the last three years (and churning a profit for Giles to pump directly into the program with more–and more interesting—cores) it’s not part of the more publicized publicly funded efforts one might expect out of a university or national lab/supercomputer center setting. You also haven’t seen it from any LINPACK or other publicized benchmarking runs. Giles says this is because it’s optimized for these users to test and deploy their mission-critical code using some of the newest hardware. For instance, iForge was one of the early recipients of Sandy Bridge when it was available and already sports one of the just-released new Intel Xeon E7 4890 15-core-based nodes–for now.

The goal is simple: let the power users from the high end of digital manufacturing hop on board to take new architectures for a spin, optimize their codes and evaluate them against their existing infrastructure to better understand upgrade/rip and replace ROIs without burdening their own in-house clusters. These users can take valuable lessons about how their code scales and operates, make choices and in turn, Giles and team can turn over valuable feedback to system vendors. Also, they can use these cores for production runs, which the team charges for and that keep the center profitable and support the endless cycle of refreshes and system expansion. And speaking of the system…

We were able to receive a number of deep details about the evolution of iForge from Evan Burness, the Program Manager for the private sector and economic development program at NCSA. Evan narrated the journey of cutting-edge hardware for us, including details about their Intel (and for a while, Opteron) environment, which was slung together by Dell with DDN storage and QDR Infinibad. As Burness described:

“We started iForge in Q3 2011 with “Intel’s Westmere” (116 EP nodes, 3 EX nodes) and AMD’s “Magny Cours” (2 nodes) architectures. That system had 1,584 cores total. In 2012 we upgraded to Intel’s “Sandy Bridge” (128 EP nodes) architecture when it was released to market (Q3 2012), and at the same time increased our AMD node count from 2 to 18, and upgraded the processors to “Interlagos” (Just like the CPU’s on Blue Waters). Through the upgrade, we increased the node count from 121 to 146, and the core count to 2,624.” As a side point, he’s counting 4 Opteron processors in a node as accounting for 32 cores, rather than the 64 AMD might cite since the the “16 core” Opteron chips based on Interlagos and Abu Dhabi only have 8 floating point units, which is what’s really important for their work.

These constant upgrades were part of the master plan for the project—and will continue to be so since the goal is to continue offering new architectures for manufacturing users to test and explore.

Burness says they intended to upgrade again in mid-2013 (this time through Intel’s early access program), but vendor delays pushed that back to January of 2014. At that time, they upgraded to Intel’s “Ivy Bridge” (144 EP nodes) line, and AMD’s “Abu Dhabi’ Line. The Ivy Bridge nodes indeed feature the 10-core variants (Xeon E5 2680 v2), with 96 of the nodes featuring 64 GB of RAM (for CFD) workloads and 48 featuring 256 GB of RAM (for finite element analysis) workloads.

“We also added one (1) Ivy Bridge-EX system directly from Intel, who saw iForge as a particularly good platform at which to throw this new technology given how industry brings real-world development and production problems to this system.” Burness explained. These only were released from Intel in December and January. iForge is now one of the precious few that we can get any details on that features 4 x 15-core Xeon E7 4890 CPUs for a total of 60 cores and 120 threads. Burness and team have also augmented the server with an extra 1.5 terabytes of memory and multiple Infiniband connections.

In addition, he said, they upgraded their network insofar as PCIe gen 3.0 on the Ivy Bridge nodes increased the usable bandwidth of our QDR Infiniband fabric from 25.6 Gb/sec to 32.0 Gb/sec, “all while maintaining the lowest possible latency (even lower than FDR).” Burness says that coupled with all of this, they’re also adding 8 instances of Windows and GUI’d Red Hat Linux in order to provide the “desktop computing” experience for users that need to do so much inside their engineering workflows to support batch processing HPC. “Here, think of the need to run CAD and CAM workloads at one’s computer and then send the files to an HPC cluster. Doing so becomes inherently tougher as the simulated models become more complex and the data sizes grow larger. Having one integrated environment for the entire workflow is a big productivity booster for our industry partners.”

Burness says that, “Throughout all 3 years of operation, iForge has been supported by a GPFS filesystem from IBM running on a DDN SFA-10000 storage system. We pack our storage servers with 192 GB of RAM/server in order to maximize the amount of caching/buffering to RAM, which can really improve performance for I/O intensive applications.” He noted that “A big part of the design focus is on producing a system every year that is as fast or faster than all but a very elite and small number of companies would be able to build and support for themselves (exceptions would General Electric, BP etc.).”

Giles put all of this in real-world terms by referencing a use case with one of the large manufacturers when they first received the early Sandy Bridges. He said that at time, many of their partners weren’t sure how to step up to Sandy Bridge, despite its promise (including AVX capability) for engineering applications. By allowing them to experiment and hit full production runs on the system, Giles says they were able to completely change their workflows, validate the usefulness of the architecture, and push their normal 128-core workflow into 256 core territory. This isn’t something they would have been able to do on their home machines, which are “artificially dumbed down” to support a broader, more policy-based approach to handling daily work.

This work on iForge will be all be overseen by by UI Labs, which is a separate nonprofit organization based out of the university, where it can better leverage academic resources and those found at NCSA as well. This ties in with the announcement this week of a $70 million “grant” (which Giles defines more of a matching funding arrangement similar to NDEMC) for digital manufacturing. This Department of Defense-led project will drive additional matched funds on the order of around $250 million from a number of manufacturing companies and other institutions.

As Burness concluded, “The government funded model of a system that runs in the same configuration for 3-5 years is not good enough for our power users from industry, as they have an insatiable appetite for speed and performance. In addition, we do a lot in design process to ensure a much higher level of uptime than many other HPC systems. A big part of that is our use of the GPFS filesystem. Though it must be licensed and is not faster than Lustre, it is WAYYY more reliable and easier to administer. It’s a huge part of the reason we’re able to achieve 99% uptime on iForge, which is a reliability level that industry demands.”

]]>http://www.hpcwire.com/2014/02/26/iforge-cluster-manufacturing-results-big-industry/feed/0Dell Conjures Startup Energy, Outlines Future Directionhttp://www.hpcwire.com/2013/12/12/dell-conjures-startup-energy-outlines-future-direction/?utm_source=rss&utm_medium=rss&utm_campaign=dell-conjures-startup-energy-outlines-future-direction
http://www.hpcwire.com/2013/12/12/dell-conjures-startup-energy-outlines-future-direction/#commentsFri, 13 Dec 2013 07:35:50 +0000http://www.hpcwire.com/?p=2577“If you can get a group of really talented people together and unite them around a challenge and have them work together to the best of their abilities, then a company will achieve great things,” said Tesla Motors and SpaceX CEO, Elon Musk this morning during the Dell World 2013 kickoff while sitting next to Read more…

]]>“If you can get a group of really talented people together and unite them around a challenge and have them work together to the best of their abilities, then a company will achieve great things,” said Tesla Motors and SpaceX CEO, Elon Musk this morning during the Dell World 2013 kickoff while sitting next to a passionately nodding Michael Dell.

Musk served as the perfect anchor keynote, where the emphasis was clearly on trying to conjure the culture and excitement of a startup in the midst of a grand scale challenge. In Dell’s case, that major hurdle is moving the needle from what shareholders expected to what is possible in market that requires dramatic differentiation on price, performance and actual innovation.

Bright lights and the energy of Austin aside, there was the sense in the room today of genuine excitement from company leaders–a sense that was confirmed during a few meetings with key leaders in the workstation and government/federal business. Michael Dell roused the troops with impassioned introduction to the “new” company, which he said is starting to feel like the world’s largest startup. This was, after all, the first event where they can show what a privately-held Dell looks like. Whether or not that’s dangerous for a company that’s invested many millions of dollars in building its software portfolio remains to be seen, but they did deliver some noteworthy announcements today that stretch across the divides (specifically between HPC and the rest of the IT world) and lend credence to some of their ambitious statements about reinventing.

One of the priorities for Dell going forward will be to continue integrating their numerous acquisitions from the last couple of years while still looking outward for companies that bring unique promise to the ecosystem. For instance, Dell spelled out how the investment arm at the company has pushed through $300 million for a new Strategic Innovation Venture Fund, which will give Dell an in to early and growth-stage companies in a wide array of areas, including cloud, big data, storage and datacenter-driven technologies. This is separate from last year’s announcement of their $60 million Fluid Data Storage Fund.

And while it seems like no event is complete without a thorough round of “big data to death” these days, the cloud computing phrase was bandied about just as often as the word “disruptive.” Dell pushed its cloud story as the end to end solution that many of their customers are seeking, both in its discussions with end users and in its string of announcements around clouds.

The goal is to make cloud computing more robust for enterprise adoption, one Dell is hoping to reach through more strategic technology partnerships. Among the many announcements today was news of a new partnership with Red Hat to help push OpenStack private clouds into enterprise settings. This makes Dell the first company to OEM Red Hat Enterprise Linux OpenStack, which they’ll deliver on via a dedicated division in their cloud services camp.

Further extending on their vision of more diverse enterprise workloads moving into the cloud in both the public and private senses, Dell also announced that they’ll be widening their Cloud Partner network to include Microsoft’s Azure cloud. This means building on their existing portfolio of Application Services for Windows in addition to offering more choice for those who want to decide between cloud scenarios (private, on-demand, etc.). At the core of this and their other cloud news is their own Dell Cloud Manager, which is designed to let users manage applications between private, public and hybrid setups at all stages.

The Cloud Partner network was also extended to capture Google, which was the subject of yet another announcement today around the duo’s mission to make cloud computing easier for enterprise users. Dell plans to make the Google Cloud Platform and its compute, application services and storage offerings available through their partner program.

While HPC, workstations and research computing in general are but a sliver of the “disruptive” vision Michael Dell spelled out today, the company is hoping that freedom to innovate will push it into a new era where it can capture the imagination like a startup while performing like a solid partner for its many customers in healthcare, media and entertainment, government and beyond.

]]>http://www.hpcwire.com/2013/12/12/dell-conjures-startup-energy-outlines-future-direction/feed/0Dell Measures ‘Ivy Bridge’ Server’s Performance Edgehttp://www.hpcwire.com/2013/11/12/dell-measures-ivy-bridge-servers-performance-edge/?utm_source=rss&utm_medium=rss&utm_campaign=dell-measures-ivy-bridge-servers-performance-edge
http://www.hpcwire.com/2013/11/12/dell-measures-ivy-bridge-servers-performance-edge/#commentsWed, 13 Nov 2013 01:48:09 +0000http://www.hpcwire.com/?p=1439Since Intel introduced its Xeon Processor E5-2600 v2 product family (code named “Ivy Bridge-EP”) in September, system makers, application specialists and other end users have been interested in how the new parts stack up to previous-generation “Sandy Bridge” processors for a variety of HPC workloads. In the parlance of Intel’s tick-tock development scheme, Sandy Bridge Read more…

]]>Since Intel introduced its Xeon Processor E5-2600 v2 product family (code named “Ivy Bridge-EP”) in September, system makers, application specialists and other end users have been interested in how the new parts stack up to previous-generation “Sandy Bridge” processors for a variety of HPC workloads.

In the parlance of Intel’s tick-tock development scheme, Sandy Bridge was the “tock,” a new microarchitecture, while Ivy Bridge was the “tick” – a shrinking of process technology (from 32nm to 22nm) that extracts more efficiency and computing density from the platform. Intel has stated that the new server chips, which pack up to 12 cores and 24 threads, reduce power consumption by as much as 45 percent and deliver up to 50 percent more performance across a variety of workloads.

For those seeking additional information about the actual performance benefits of the “Ivy Bridge” Intel Xeon E5-2600 v2 series processors over the “Sandy Bridge” Intel Xeon E5-2600 series processors, a report from Dell offers some valuable insight.

Published on Dell’s blog page and authored by Dell systems engineers Ranga Balimidi, Ishan Singh and Rafi Ikbal, the report compares the performance of the Intel Xeon E5-2600 v2 processors against the previous E5-2600 series processors across four HPC applications: LINPACK, STREAM, NAS Parallel Benchmarks (NPB), and the Weather Research and Forecasting (WRF) Model.

To test single-node performance, two PowerEdge R620 servers were commandeered, one outfitted with a dual 8-core E5-2665 processor and the other with a dual 12-core E5-2695 processor. To maintain consistency, the processors were of the same frequency and wattage.

Processor specs:

Dual Intel Xeon E5 2665 2.4GHz, 8 cores, 115 Watts

Dual Intel Xeon E5 2695 v2 2.4GHz, 12 cores, 115 Watts

The dual-node tests were performed on two-node clusters connected back-to-back with InfiniBand FDR. To enable the 32-core cluster comparisons, the Intel Xeon E5-2600 V2-based cluster was brought from 48 to 32 cores by turning off four cores per processor for the relevant tests (plotted below as IVB-32c). The same technique was used to reduce 24 Ivy Bridge cores to 16 cores for the single node performance testing (plotted above as IVB-16c-xxxx).

In the final analysis, the “Ivy Bridge”-based cluster delivered a significant performance boost over the otherwise-identically outfitted “Sandy Bridge” based cluster. The report’s authors linked improvements to the “increase in number of cores, Larger L3 cache and dual memory controller.” Embarrassingly parallel applications like NPB-EP did especially well.

]]>http://www.hpcwire.com/2013/11/12/dell-measures-ivy-bridge-servers-performance-edge/feed/0UCSC Ramps Up Astrophysics Work with New HPC Gearhttp://www.hpcwire.com/2013/08/07/ucsc_ramps_up_astrophysics_work_with_new_hpc_gear/?utm_source=rss&utm_medium=rss&utm_campaign=ucsc_ramps_up_astrophysics_work_with_new_hpc_gear
http://www.hpcwire.com/2013/08/07/ucsc_ramps_up_astrophysics_work_with_new_hpc_gear/#commentsWed, 07 Aug 2013 07:00:00 +0000http://www.hpcwire.com/2013/08/07/ucsc_ramps_up_astrophysics_work_with_new_hpc_gear/A new 60 teraflops supercomputer and 1 petabyte high speed storage system recently installed on the campus of the University of California at Santa Cruz will give astrophysicists at the college the computational and storage headroom they need to model the heavens like never before.

]]>A new 60 teraflops supercomputer and 1 petabyte high speed storage system recently installed on the campus of the University of California at Santa Cruz will give astrophysicists at the college the computational and storage headroom they need to model the heavens like never before.

“Hyades,” as the new supercomputer system is known, is composed of 376 Intel Sandy Bridge processors, eight Nvidia K20 GPUs, three Intel Xeon Phi 5110P accelerators, and 13 terabytes of memory. The Dell computer is 10 times as powerful as “Pleiades,” the system it replaces, yet it occupies the same space in the UC High-Performance AstroComputing Center (UC-HiPACC) and uses the same amount of electricity, according to this Phys.org story.

The system, which is named after a cluster of stars that makes up the head of the bull in the constellation Taurus, will be used to model space events, such as exploding stars, black holes, magnetic fields, planet formation, the evolution of galaxies, and what occurred after the big bang.

Perhaps more impressive than Hyades is the high performance, 1,000 terabyte storage system it’s paired with. Based on Huawei’s Universal Distributed Storage (UDS) system, the ARM-based array is similar to a system Huawei installed at CERN to store data from the Large Hadron Collider, and is expected to become one of the largest repositories of astrophysical data outside of national facilities.

The massive storage array is needed because supercomputer simulations generate so much data that it must be analyzed after the fact, Joel Primack, Professor of Physics at UCSC and Director of UC-HiPACC, told Phys.org.

“The Huawei system will be used to store our astrophysics results, not only from Hyades but also from simulations that we run at the big national supercomputing facilities, such as at NASA Ames or Oak Ridge National Laboratory,” Primack said. “Those facilities can only store the results for a limited time, and they also restrict access to them. Now, with the Huawei storage system, we can put our results on a local server.”

Hyades cost $1.5 million, and was funded by a combination of a grant from the National Science Foundation (NSF) and contributions collected on campus. Dell and Intel also chipped in with discounts on hardware. The supercomputer will be used by the Theoretical Astrophysics at Santa Cruz (TASC) group, which includes about 20 faculty and more than 50 postdoctoral researchers and graduate students in four departments, including Applied Math and Statistics, Astronomy and Astrophysics, Earth and Planetary Sciences, and Physics.

The Huawei UDS storage system is on loan to the Center for Research in Storage Systems (CRSS) at the UCSC’s Baskin School of Engineering. The CRSS, which is a joint academic/industry research center supported by the NSF, will be studying the performance of the storage system.