processors – HPCwirehttps://www.hpcwire.com
Since 1987 - Covering the Fastest Computers in the World and the People Who Run ThemFri, 09 Dec 2016 21:51:05 +0000en-UShourly1https://wordpress.org/?v=4.760365857Cavium Unveils ThunderX2 Plans, Reports ARM Traction is Growinghttps://www.hpcwire.com/2016/05/31/cavium-unveils-thunderx2-plans-reports-arm-traction-growing/?utm_source=rss&utm_medium=rss&utm_campaign=cavium-unveils-thunderx2-plans-reports-arm-traction-growing
https://www.hpcwire.com/2016/05/31/cavium-unveils-thunderx2-plans-reports-arm-traction-growing/#respondTue, 31 May 2016 18:51:30 +0000http://www.hpcwire.com/?p=27474Cavium yesterday rolled out plans for ThunderX2, the next generation of its ThunderX line of ARM system-on-a-chip (SOC) processors. Given all of the recent noise around Intel’s Broadwell and Knights Landing chips, the announcement is a reminder that the ARM camp continues to make what it believes is steady progress into the x86-dominated landscape. Cavium made […]

]]>Cavium yesterday rolled out plans for ThunderX2, the next generation of its ThunderX line of ARM system-on-a-chip (SOC) processors. Given all of the recent noise around Intel’s Broadwell and Knights Landing chips, the announcement is a reminder that the ARM camp continues to make what it believes is steady progress into the x86-dominated landscape. Cavium made the announcement at Computex being held this week in Taipei.

Besides detailing plans for the new chip – based on the latest ARMv8-A architecture, built on a 14nm FinFET process, featuring up to 54 cores and significantly expanded memory and IO capability – Cavium took pains to review ecosystem growth and market traction around the original ThunderX offering. One reason for this is the relative reluctance of OEM and ODM to talk publically about their ARM efforts.

“No one wants to annoy the 800-pound gorilla. Neither do we,” said Gopal Hegde, VP/GM, Data Center Processor Group, Cavium. He is referring, of course, to Intel which dominates the server landscape with its x86 lineup.

Nevertheless, the burgeoning capabilities of the ThunderX line and potential cost savings will pit Cavium – the rest of the ARM camp – against Intel in many use cases situations. HPC is one of Cavium’s targets and it says CFD codes, for example, will run better on ThunderX2. While the announcement was more ostensibly aimed at the enterprise datacenter and cloud, the HPC ambitions were also clear – the new beefier core could find a home in many HPC workflows.

Availability of ThunderX2 is still far off, scheduled for Q1 or Q2 2017. That said, the new SOC is a significant upgrade that will provide 2x-3x performance improvement over ThunderX and enhanced power management features, according to Cavium. A snapshot of TunderX2’s major features is shown on the slide here. Among the many additions are full support for out of order execution (OOO) per socket and doubling of cache size (see slide below).

Generally speaking, Cavium chips aren’t intended to compete with Intel’s latest and greatest, emphasized Hegde. Cavium rounded up statements of support from many key constituents in the ARM community to accompany the announcement. Two examples:

ARM Ltd. “The Cavium ThunderX2 will expand the market opportunity for ARM-based server technologies by addressing demanding application and workload requirements for compute, storage networking and security. ThunderX2 demonstrates Cavium’s ability to deliver a combination of innovation and engineering execution and the new product family increases the momentum for server deployments powered by ARM processors in large scale data centers and end user environments,” said Simon Segars, CEO, ARM

Gigabyte (mother boards). “Gigabyte has developed and is already shipping a range of Cavium ThunderX based server products to customers in US, Europe and Asia. We are seeing strong demand for these ARM-based platforms – especially from cloud service providers. The ThunderX2 represents a leap ahead in terms of overall performance and connectivity,” said Alex Liu, Head of Product Marketing, GIGABYTE.

Other testimonials were provided, notably from by AMI, Cannonical (ubuntu), E4, FreeBSD, Linaro, Red Hat, and Suse. No doubt this is just typical marketing practice but given the number, it has the feel of a deliberate show of strength for ARM in the server space. Full comments from all are available in the release.

“We think the ecosystem is already well developed,” said Hegde. Indeed, most of the needed pieces are in place and with the availability of silicon from a growing number of sources, not just Cavium, it will be interesting to see how the market develops.

Many prominent OEMs, says Hegde, are flying under the radar in terms of working with ARM but are nevertheless working with customers on ARM projects. He cites, for example, Cray as one with an ARM development platform it provided to customers. Steady improvement in 64-bit ARM offerings, he says, are changing attitudes, a point echoed by Simon Segars, CEO, ARM.

“The Cavium ThunderX2 will expand the market opportunity for ARM-based server technologies by addressing demanding application and workload requirements for compute, storage networking and security,” said Segars. “ThunderX2 demonstrates Cavium’s ability to deliver a combination of innovation and engineering execution and the new product family increases the momentum for server deployments powered by ARM processors in large scale data centers and end user environments.”

Cavium is taking advantage of the shrinking feature (from 28 nm to 14 nm) to improve power consumption and power management capabilities which it says will yield a 30 percent power savings compared to the first generation ThunderX. One new element is support for dynamic voltage frequency scaling (DVFS) and increased granularity in other power management controls.

The plan now is to offer TunderX2 in four distinct flavors:

Compute (ThunderX2_CP): Optimized for cloud compute workloads such as private and public clouds, web serving, web caching, web search, commercial HPC workloads such as computational fluid dynamics (CFD) and reservoir modeling. This family supports multiple 10/25/40/50/100 GbE network Interfaces and PCIe Gen3 interfaces. It also includes accelerators for virtualization and vSwitch offload.

As noted, Cavium tends to avoid head-to-head comparisons with the top-of-the-line Intel products, in part because it is targeting more cost-sensitive enterprise application stretching from pretty standard workflows up into more HPC domains. Shown here is a comparison provided by Cavium against Intel’s E5-2690v3 for cloud workloads.

]]>https://www.hpcwire.com/2016/05/31/cavium-unveils-thunderx2-plans-reports-arm-traction-growing/feed/027474Processor Diversity on the Rise, Reports Intersect360https://www.hpcwire.com/2015/11/12/processor-diversity-on-the-rise-reports-intersect360/?utm_source=rss&utm_medium=rss&utm_campaign=processor-diversity-on-the-rise-reports-intersect360
https://www.hpcwire.com/2015/11/12/processor-diversity-on-the-rise-reports-intersect360/#respondThu, 12 Nov 2015 13:43:50 +0000http://www.hpcwire.com/?p=22632Intel x86 processors continue to dominate HPC servers while the number of cores per processor also keeps rising, perhaps no surprises there. Also somewhat anticipated, the amount of memory per core, per processor, and per node is rising. These are the top line results of Intersect360 Research’s latest HPC sites survey on processor use. A […]

]]>Intel x86 processors continue to dominate HPC servers while the number of cores per processor also keeps rising, perhaps no surprises there. Also somewhat anticipated, the amount of memory per core, per processor, and per node is rising. These are the top line results of Intersect360 Research’s latest HPC sites survey on processor use.

A bit more intriguing were findings that ARM is generating some early interest – “for the first time in our surveys we note the appearance of an HPC system based on the ARM architecture” – and that accelerators share of systems market slipped very slightly after four years of steady growth. The latter may just be a pause. Nvidia (NASDAQ: NVDA) still owns the market but Intel (NASDAQ: INTC) is making inroads. Intersect360 does suggest the high tide of standardized processors may be starting to ebb.

“Rather than just look at microprocessors, you have to look at all of the different multi-core and manycore options. The majority of that has been Nvidia GPUS so far. We’re now seeing close to half of the HPC market having at least some accelerator or manycore component attached. When we get to [Intel] Knights Landing that’s not strictly just an accelerator. There’s going to be a great deal of diversity in the market, potentially a move back toward specialization away from what you might have called standardization or commodity,” said Intersect360 Research CEO Addison Snell.

Brand disloyalty – Despite Intel’s growing share of the x86 processor market in HPC (and elsewhere), we believe customers still treat these components as commodities, and will switch vendors when price/performance for their particular applications warrants a switch.

Multi-core processors – The single-core processor is nearly extinct. The most common processor in HPC is the eight-core CPU, but it is unclear what will be the most popular configuration in the future.

Memory requirements – Memory per core continues to rise. This could be a response to alleviating the more significant bottleneck in external storage or as a result of running applications with larger data sets.

Accelerators – No longer an afterthought for HPC users, accelerators are being used for production work across an array of applications. We expect that within a year or two the majority of new systems will be equipped with accelerators of some sort.

This latest Intersect360 Research report, HPC User Site Census: Processors, released late in October, examines a wide range of processor deployment by system type (HPC clusters, SMP, MPP, servers) as well as trends in memory and accelerator use. The most recent surveys were conducted in the second quarter of 2015. In addition, Site Census data obtained in 2013 and 2014 was used to aggregate additional information across all sites. The report presents results from a total of 781 HPC systems at 453 sites.

Despite the slight dip in accelerators’ share of systems, they are now entrenched in HPC according the report: Current offerings “from Nvidia, Intel AMD can deliver well over a peak teraflop of double precision floating point performance, delivering more than five times the peak performance of a high-end Power or x86 CPU. FPGA performance is more difficult to measure, although in recent years, much more powerful implementations (some with conventional ARM processors embedded into the chip) have become available. According to our surveys, approximately a third of HPC systems operating today contain some sort of accelerator,” reports Intersect360.

About half of all accelerated systems today have 20 or more accelerators, notes Interset360; two years ago the number was ten. Intersect360 contends this overall growth indicates more codes are able to take advantage of accelerator technology and likely being run in production environments, rather than for just test and evaluation.

The other main accelerator findings reported include:

Nvidia remains the dominant supplier, with a footprint in 77.8% of accelerated systems. That represents a slight reduction of market share from two years ago, when it supplied 85.3% of those systems.

Intel is the second most popular supplier with 11.3% of accelerated systems, a three-fold increase from our 2013 data. Given that its Xeon Phi coprocessor line has only been available since 2012, Intel has managed to capture a significant share of the accelerator market in a relatively short period of time.

AMD (NASDAQ: AMD) has captured 1.8% of the market, compared to 0.5% in 2013. A renewed focus on its FirePro server GPU line over the last two years may account for AMD’s larger representation in our latest surveys.

The share of accelerated systems in those most recently deployed/modified (2014+) declined slightly for the first time in four years.

Intersect360 had predicted Intel’s entrance into market in 2012 would validate accelerator model of computing for HPC and helping diversify the market. That certainly seems to be the case. According the report, early users of Intel’s Xeon Phi platform had interest in accelerators, but resisted adopting GPU technology. Looking ahead, Intersect360 expects Intel to capture additional market share from Nvidia over the next two years.

Increasing use of memory at the node, processor, and core levels remains trend. Part of the reason, suggests Intersect360, is the desire to relieve IO bottlenecks. Likewise, the sheer growth in the number of cores per processor is driving more memory use – a trend that seems likely to continue.

]]>Over the past couple of decades two primary trends have driven system software for supercomputers to become significantly more complex. First, hardware has become more complex. Massive inter-node parallelism (100,000+ nodes), increasingly large intra-node parallelism (100+ hardware threads), wider vector units, accelerators, coprocessors, etc., have required that system software play a larger role in delivering the performance available from this new hardware. Second, applications have changed. Historically, extreme-scale high-performance computing (HPC) applications were stand-alone executables that were bulk synchronous, spatially and statically partitioned, and required minimal system services.

As the community moves towards exascale, applications are being integrated into workflows, require big data and analytics, are incorporating asynchronous capabilities, and demand an increasingly rich set of libraries, runtimes, and system services. As part of providing comprehensive system services, the compute node operating system is being integrated into the control system, which is sometimes referred to as the global operating system. While providing a complete set of system services is important, this article focuses on the challenges and needs of the Operating System (OS) on the compute node. Figure 1 shows the “left to right” model typical in HPC systems, the control system, and the node-local OS. We describe how these trends are changing the requirements and hence design of the HPC compute node OS, and describe promising directions for how these challenges will be met for exascale computing.

Background:In addition to the above challenges, the compute node OS, hereafter just OS, must address an additional challenge. There has been a debate in the software community about whether a revolutionary or an evolutionary approach is needed to achieve exascale. We contend both are critical, and that the real challenge for system software to get to exascale and beyond is figuring out how to incorporate and support existing computation paradigms in an evolutionary manner while simultaneously supporting new revolutionary paradigms. The OS must provide this capability as well.

Historically, two designs have been used for operating systems. One is to start with a Full-Weight Kernel (FWK), typically Linux[i], and remove features so that it will scale up across more cores and out across a large cluster. Another approach is to start with a new, Light-Weight Kernel (LWK) and add functionality to provide a familiar API, typically Linux.

Requirements:Linux, or more specifically the Linux API, including glibc and the Linux environment (/proc and /sys) is important for supporting the evolutionary aspect and for addressing the described complexity needs. There is a set of classical needs that are interrelated and must be met, including low noise, high performance, scalability for capability computing, and allowing user-space access to performance critical hardware, e.g., the network. There is a set of emerging needs that include the ability to handle asynchrony, manage power locally and globally, handle re- liability, provide for over commit of software threads, and interact effectively with runtimes. The classical needs allow applications to achieve high performance while the emerging needs provide for higher productivity and support of new programming and execution models.

A key requirement for an exascale OS kernel is nimbleness, the ability to be modified quickly and efficiently to support new hardware and to provide targeted capabilities for the HPC libraries, runtimes, and applications. This is opposite of the requirement for a general purpose OS, whose success is based on broad-based use with known interfaces. High-end HPC systems, those that will first achieve exascale and beyond, push the edge of technology out of necessity and introduce new hardware capabilities that need to be utilized effectively by high-end HPC software. As an example, a decade ago, large pages were integrated into CNK, Blue Gene’s LWK in about six months while large page support in major distributions of Linux took significantly longer and remains an on-going effort. The reason is CNK’s limited application domain allowed many simplifying assumptions. New hardware technology will be required to achieve exascale computing, and applications will need to aggressively exploit the new technology. Thus, what is needed, is an approach that while preserving the capability to support the existing interfaces (evolutionary) provides targeted and effective use of the new hardware (revolutionary) in a rapid and targeted manner (nimbleness).

Approaches:The historical approaches of adding features to an LWK or trimming an FWK have additional weaknesses when trying to simultaneously support revolutionary and evolutionary models while trying to achieve high performance in an increasingly complex and rich environment. LWKs have been shown to exhibit low noise that allows high scalability. They also have been able to target the specific needs of HPC applications allowing higher performance. As the community moves to exascale, the need to leverage specific hardware and tailor the OS service to application needs, will become more important.

Three classes of approaches are emerging to overcome these weaknesses.

The first is to continue to use Linux as the base and containers to limit the interference between multiple applications thereby allowing the different applications (often a classical HPC and an emerging one, e.g., analytics or visualization) to share a node’s resources while trying to minimize the effect on the classical HPC application. Containers provide a virtual environment in Linux that provide the appearance of isolated OS instances. In the Linux community there is considerable excitement and work involving containers and HPC may be able to leverage this broader base of activity. The challenge with the container approach is that Linux remains underneath and any fundamental challenge with Linux itself remains.

The second approach is virtualization. A virtualized platform on which either an LWK or a Linux kernel can run provides high performance or the features of a more general purpose OS. It is important to ensure that the cost of virtualization, especially for the LWK, is kept to a bare minimum. This approach in isolation presents problems for simultaneous use of the LWK and FWK by the application, but could be combined with the approach below.

The third approach is to run multiple kernels simultaneously on a node. This has been an area of intense effort in the last several years and many efforts including McKernel, FusedOS, Nix, Tesselation, Popcorn Linux, and mOS are exploring this path. We will describe mOS as an example. The vision is to run an LWK on the majority of the cores to achieve high performance and scalability, while running Linux on one or a small number of cores to provide Linux compatibility. From the application’s perspective it achieves the performance of an LWK but appears to be Linux.

Figure 2 depicts the fully generalized mOS architecture for the research direction we are exploring in the multiple kernels space. While the figure depicts the full generality, we expect most instantiations to run a single application on a single LWK. A standard HPC Linux runs on a given core(s); an LWK(s )runs on the rest of the cores. On any given LWK, one or more applications may run. As mentioned, the expected scenario is to run Linux on one core, and one application on one LWK on the rest of the cores. When the application makes a system call, it is routed to the OS Node (via arrow 1b) if it is a file I/O operation, or to the LWK on the core that made the call (via arrow 1a). The LWK will handle performance critical calls. If it is a call that is not implemented by the LWK, then the LWK will transfer the call to Linux (via arrow 2) to be serviced. Linux will service the call and return to the LWK, which in turn returns back to user space on the original core. With this methodology, the application achieves the high performance and scalability offered by an LWK while providing the Linux environment. We have worked out an architecture for mOS and have early prototype code that is allowing us to confirm several of the architecture decisions we made.

Conclusion:System software for exascale systems is of necessity becoming more complex. The compute node OS, and how it supports the compute node runtimes and interacts with the global control system, will play a critical role in allowing us to achieve exascale and beyond. To be evolutionary and revolutionary simultaneously, the OS must meet the classical and emerging HPC requirements. A promising direction that several groups are exploring to address these needs is running multiple operating system kernels on a node simultaneously. While significant challenges remain and innovative work is still needed on the OS front there is confidence in being able to get the community well beyond exascale computing.

Author Bio:
Dr. Robert W. Wisniewski is an ACM Distinguished Scientist and the Chief Software Architect for Extreme Scale Computing and a Senior Principal Engineer at Intel Corporation. He has published over 60 papers in the area of high performance computing, computer systems, and system performance, and has filed over 50 patents. Before coming to Intel, he was the chief software architect for Blue Gene Research and manager of the Blue Gene and Exascale Research Software Team at the IBM T.J. Watson Research Facility, where he was an IBM Master Inventor and lead the software effort on Blue Gene/Q, which was the fastest machine in the world on the June 2012 Top 500 list, and occupied 4 of the top 10 positions. Prior to working on Blue Gene, he worked on the K42 Scalable Operating System project targeted at scalable next generation servers and the DARPA HPCS project on Continuous Program Optimization that utilizes integrated performance data to automatically improve application and system performance. Before joining IBM Research, and after receiving a Ph.D. in Computer Science from the University of Rochester, Robert worked at Silicon Graphics on high-end parallel OS development, parallel real-time systems, and real-time performance monitoring.

[i] Linux® is the registered trademark of Linus Torvalds in the U.S. and other countries.

]]>https://www.hpcwire.com/2015/10/05/defining-scalable-os-requirements-for-exascale-and-beyond/feed/021515IBM, NVIDIA and Mellanox Launch Design Center for HPChttps://www.hpcwire.com/2015/07/02/ibm-nvidia-and-mellanox-launch-design-center-for-hpc/?utm_source=rss&utm_medium=rss&utm_campaign=ibm-nvidia-and-mellanox-launch-design-center-for-hpc
https://www.hpcwire.com/2015/07/02/ibm-nvidia-and-mellanox-launch-design-center-for-hpc/#respondThu, 02 Jul 2015 15:25:32 +0000http://www.hpcwire.com/?p=19218Today’s launch by IBM, NVIDIA, and Mellanox of a new POWER Acceleration and Design Center in Montpellier, France, ratchets up their campaign to attract a wider developer community to the OpenPOWER platform and their efforts to build momentum for the march into the Intel-dominated HPC landscape. The Montpellier-based center is the second of its kind, […]

]]>Today’s launch by IBM, NVIDIA, and Mellanox of a new POWER Acceleration and Design Center in Montpellier, France, ratchets up their campaign to attract a wider developer community to the OpenPOWER platform and their efforts to build momentum for the march into the Intel-dominated HPC landscape.

The Montpellier-based center is the second of its kind, complementing the previously announced center in Germany at the Jülich Supercomputing Center in November. IBM has tens of client centers around the world and a rollout of similar design centers at other IBM sites seems likely. Technical experts from IBM, NVIDIA and Mellanox will help developers take advantage of OpenPOWER systems leveraging IBM’s open and licensable POWER architecture with the NVIDIA Tesla Accelerated Computing Platform and Mellanox InfiniBand networking solutions.

“[These centers] are crucial for engagement with the developer community and our clients. If you look at we are going with OpenPOWER at this point, it is trying to get the entire ecosystem of high performance computing, machine learning, data analytics, enterprise computing software developers, and ISVs onto the POWER platform,” Sumit Gupta, IBM vice president of HPC and OpenPOWER Operations told HPCwire in a pre-release briefing.

Not surprisingly, the OpenPOWER camp has broad goals. “If you look at the architecture NVIDIA and Mellanox are building with us, it is of course about scaling an application to hundreds and thousands of servers, but it’s also about taking massive advantage of a single server. Everything we do about scaling at the 100 petaflops level also helps the departmental cluster,” he said.

Quoted in the official announcement this morning, Stefan Kraemer, director of HPC business development, EMEA, at NVIDIA, said, “Increasing computational performance while minimizing energy consumption is a challenge the industry must overcome in the race to exascale computing. By providing systems combining IBM Power CPUs with GPU accelerators and the NVIDIA NVLink high-speed GPU interconnect technology, we can help the new Center address both objectives, enabling scientists to achieve new breakthroughs in their research.”

“The new POWER Acceleration and Design Center will help scientists and engineers address the grand challenges facing society in the fields of energy and environment, information and health care using the most advanced HPC architectures and technologies,” said Gilad Shainer, vice president of marketing at Mellanox Technologies. “Only Mellanox offloads data movement, management and even data manipulations (for example Message Passing – MPI collective communications) which are performed at the network level, enabling more valuable CPU cycles to be dedicated to the research applications.”

“Our launch of this new Center reinforces IBM’s commitment to open-source collaboration and is a next step in expanding the software and solution ecosystem around OpenPOWER,” said Dave Turek, IBM’s Vice President of HPC Market Engagement. “Teaming with NVIDIA and Mellanox, the Center will allow us to leverage the strengths of each of our companies to extend innovation and bring higher value to our customers around the world.” (see Is IBM Getting Openness Right?)

Come One, Come All?

According to Gupta the center is open for business now. Formally, the center is available to clients, all OpenPOWER members and free to academia. He was quick to add, “We are also proactively reaching out to the research community and application community. We have a very large application engagement team and welcome anyone who wants to work with us. Just get in touch with IBM we’re happy to engage.”

You can see his email inbox filling quickly.

The design center differs from the SuperVessel developer program (see IBM Introduces SuperVessel), which was launched in March in China. Supervessel is cloud-based (on OpenStack) while the Montpellier center is a traditional brick and mortar facility with an HPC cluster. Leading edge technology and expertise will be available from all three collaborators. Besides benefiting the developer and client community, Gupta notes it will enable NVIDIA, Mellanox and IBM to get faster feedback on how their technologies are working together.

As one would expect the competitive zeal runs hot in the battle between the Intel and OpenPOWER camps.

In characterizing POWER’s advantage, Gupta said, “The [POWER] CPU core is higher performance than an x86 core. We’ve been able to clock some of our cores up to 4GHz. Every core has eight threads while x86 only has two threads per core. On four processors sockets we can have 96 threads where on x86 you would have at best 24 threads. Our memory bandwidth is three times higher than x86 in most HPC applications. We can connect up to a terabyte of memory to any of our processor sockets which again enable us to operate on bigger data[sets].”

Intel would no doubt dispute the specific advantages and the battle of specs is likely to continue. What’s most important for OpenPOWER is winning support from a sufficient portion of the developer community and porting key HPC applications to the platform.

Gupta said, “One of the key metrics we look at is the number of developers that are adopting our platform. I think that’s critical for us to measure and there’s many ways to do it. You look at the number of developers but you also look at the number of applications that get onto the POWER platform and then of course you look eventually at the number of customers that are using your product.”

]]>https://www.hpcwire.com/2015/07/02/ibm-nvidia-and-mellanox-launch-design-center-for-hpc/feed/019218Is IBM Getting Openness Right? Yes, Says GM Doug Baloghttps://www.hpcwire.com/2015/06/29/is-ibm-getting-openness-right-yes-says-gm-doug-balog/?utm_source=rss&utm_medium=rss&utm_campaign=is-ibm-getting-openness-right-yes-says-gm-doug-balog
https://www.hpcwire.com/2015/06/29/is-ibm-getting-openness-right-yes-says-gm-doug-balog/#respondMon, 29 Jun 2015 13:41:04 +0000http://www.hpcwire.com/?p=19103The annual Red Hat Summit, held in Boston last week, is something of revival tent for open source where the pulpits are plentiful and so are smiling believers. Indeed it’s hard to dispute the powerful innovation springing from open source. The RH Summit, which started 11 years ago as a modest celebration of open source […]

]]>The annual Red Hat Summit, held in Boston last week, is something of revival tent for open source where the pulpits are plentiful and so are smiling believers. Indeed it’s hard to dispute the powerful innovation springing from open source. The RH Summit, which started 11 years ago as a modest celebration of open source (mostly Red Hat Linux), has since mushroomed into a boisterous, expansive technology ecosystem showcase.

So it’s interesting when a senior IBM exec turns up in a keynote slot. Big Blue’s heritage, at least at the high end, had for years been dominated by proprietary architecture. No longer, said Doug Balog, general manager of IBM Power Systems. The founding of OpenPOWER roughly two years ago, sale of IBM’s x86 business, and the sprint away from the formidable but proprietary Blue Gene (and re-embrace of the battle-tested mainframe) are all part of IBM’s about-face.

Balog is smack in the middle of proving IBM’s commitment to open source and community development. “It’s not an accident I’m here or that IBM is a major sponsor,” he said.

Message to Red Hatters

“The message we’re bringing to this audience, is we are bringing higher value to customers where we can help clients do amazing things. We’re able to do that by supporting an open approach and that’s kind of unique versus some of my competitors in the systems business who are taking commodity infrastructure and swapping the software available and really not bringing in any differentiation. They are simply providing the recipe from Intel in a lot of ways.

“Our value is we have differentiated systems – we have POWER systems, we have storage systems, we have mainframe systems. Those deliver unique capabilities but still leverage open technology to do it. We think that’s the right recipe,” said Balog.

According to the official bio, Balog is “responsible for all facets of the Power Systems’ business including strategy, architecture, operations, technology development and overall financial performance. He is also a current member of IBM’s Performance Team and a recent member of the Strategy Team, which focus respectively on tactical execution and the strategic direction for the IBM Corporation.” Probably doesn’t get much sleep.

Balog’s IBM counterpart is Ken King, general manager, OpenPOWER at IBM. The organization is separate from Doug’s but heavily related because both are based on the POWER8 chip. Both Ken and Doug report up into the Senior Vice President of IBM Systems, Tom Rosamilia.

While at RH Summit, Balog talked with HPCwire on a range of issues including how and why openness informs the new IBM; what the emerging HPC strategy is; how big data and mobility are transforming enterprise computing and why that’s good for IBM’s mainframes (z Systems); and Big Blue’s nascent cloud-based outreach to OpenPOWER developers. He also couldn’t resist taking a few light jabs at Intel and the ARM camp.

Bucking Technology Headwinds

“It really started two years ago with a conversation with Google, Mellanox, NVIDIA and Tyan (a mother board company). Up until that point systems generally ran faster year after year because processor speeds kept advancing,” he said. The group fretted about slowing the rate of performance gains for applications as well as the prospect of brutal commoditization caused by stagnant innovation.

“We said, there’s this great model called open source software. Can’t we take that model and adapt it to open source hardware and incorporate that same community approach to innovation. Yes, at the end of the day IBM will pick pieces from that innovation [for their] products but others will too. It’s going to create choice in the market place and innovation, not commoditization,” said Balog.

Perhaps not a revolutionary idea – they had a model – but giving away proprietary advantage and its associated higher profit margins isn’t easy if you’re not forced to. That’s kind of the point of the free market. Views vary on how successful and how open the new initiative is, but the OpenPOWER Foundation is growing. The gang of five has grown to about 140 members at present according to Balog.

“Jim Whitehurst (Red Hat CEO) and I were discussing his new book (The Open Organization: Igniting Passion and Performance, published in June) and talking about what‘s needed for openness. First, you have to form a [substantial] community because we have seen plenty of attempts at openness where it ended up being a set of family members getting together and nobody else joined. There’s always more to do but I feel good about the community we have built so far.”

“The next phase is getting a community to bring innovation to the space and we are starting to see that. At the [first] OpenPOWER summit in San Jose [in March] there were 15 companies who brought POWER-based motherboards, all very different, all very innovative, all really targeted at the cloud or HPC companies,” said Balog (see IBM’s First OpenPOWER Server Targets HPC Workloads, HPCwire).

Show Me The Money

“Now we have to start to move in the monetization direction. How do you take this community, take this innovation, and start to transform it into products and sales. How do [OpenPOWER] members who want to build on top find that balance between innovation and openness and turn it into monetization. We are all publically traded companies so at the end of the day we want to see some sales.”

“He was a perfect fit to come in and lead our HPC OpenPOWER business. He’s well connected in the industry and has a great business mind so that’s the role he has. He’s the guy Ken King and I look to and say OK you’ve got two big wins where’s the next five or ten or whatever the number is and where do we go from here.”

It doesn’t hurt that Gupta has extensive knowledge of accelerators which play a critical role in IBM Power Systems plans for HPC (see What IBM says about Gupta’s role, HPCwire).

“So [moving away from Blue Gene] was one of the strategy changes made two years ago. Historically we had built these wonderful engineering marvels called Blue Gene systems, beautiful, well-engineered machines but they are monstrous in size. Quite custom you would agree. In the best cases we shipped a few systems and didn’t lose money; in most cases we lost money. It’s a tough market.”

“We said why don’t we have an HPC strategy that takes our standard one and two-socket systems and buck those up with accelerators through CAPI attached (POWER8 Coherent Accelerator Processor Interface). CAPI plays a big role here, or NVIDIA attached with NVLink. I think accelerators are a big opportunity here and that’s not just IBM hardware but Altera and Xilinx and NVIDI, etc.

“That’s the kind of HPC system we’ve targeted so it’s much more about a rack full of scale out systems with accelerators. Obviously you need the code optimizations, so you’ve got to pick your target industries where they are willing to take their code and start heading down a path of leveraging accelerators. Some have seen that light already and some haven’t seen the light yet but I think more and more are,” Balog said.

Mainframe Revival

One benefactor of all the openness changes is the mainframe. It’s always been around, but also absorbed its share of condescension over the years. Porting Linux to IBM’s z-Systems line is opening doors to new uses. Container technology, Hadoop, and a myriad of other ‘open’ technologies are becoming accessible on mainframes.

“Think about the evolution and transformation of IBM, not as a company though we could talk about that, but also from a systems business and how this aspect of openness is really transcending the way platforms like the mainframe continues to drive growth, [especially in large enterprise environments.]

“[The mainframe] has been around for 50 plus years. Obviously, it’s very different today than it was 50 years ago although you still run the same aps from 50 years ago and that’s one of the miraculous aspects of the mainframe platform is the commitment to architecture continuity while enabling new work. One of the biggest drivers in addition to Linux [on the mainframe] is mobile transactions for z-Systems [such as in financial services.] I mean who actually goes to a bank these days,” said Balog.

IBM’s key HPC targets are unsurprising – government, oil & gas, financial services and increasingly life sciences. Balog emphasizes it’s a much more economical way to go, to take what’s already in the portfolio and bring in accelerators.

It will be interesting to watch how IBM fares in the TOP500 list in coming years. IBM had 153 systems (roughly 30 percent) in the T0P500 List last November including four in the top ten. IBM does have a couple of big HPC wins recently, one a DOE and the other with Science & Technology Facilities Council (STFC) in the U.K.

“We’re not really that focused on being in the TOP500. It’s not that we are shying away from those opportunities, but it’s not about the number of the score anymore. A lot of our focus is on the sweet spot for POWER and the marriage of HPC in those industries where it really is about the data analytics. That’s where the POWER architecture shines through even with this addition of the accelerator model. As you can see from the accelerator [possibilities] our approach is quite different than Intel’s in terms of it’s an open approach. All are welcome to bring their best acceleration technology. We didn’t see the need to go spend $16B dollars,” said Balog.

OpenPOWER, of course, isn’t the only processor-based ecosystem out there. Intel remains the giant everyone aims for. To say it has been less than spectacularly successful is just plain wrong. On the other hand, the market does seem hungry or at least open to more choice. Stir in the slowing of chip advances (i.e. the much discussed demise of Moore’s Law) and growing worry over power consumption and the suddenly potential opportunity for non-Intel contenders seems more realistic.

The ARM camp is one contender. It’s been a huge winner in the mobile device space where ARM’s reduced power requirements are critical. Traction in the server market has been slow, but release of a 64-bit design (ARMv8) is making matters more interesting. Just a week ago, the Mont-Blanc Project at the Barcelona Supercomputer Center (BSC) fired up a prototype running on ARM suggesting it is possible to get high performance from the architecture. Mont-Blanc is exploring more energy efficient approach towards achieving exascale computing.

ARM Needs a Body of Support

“We keep asking ourselves about it. Our view of ARM is it had promise, we were watching it, but I think we’ve seen a lot of the ARM server companies start to fold up tent and move away from it. Part of it is a weak core design, if I could call it that, I don’t mean that disparagingly, it’s just that that’s what it is. That’s why it is in all of our mobile devices.

“It hasn’t built the server ecosystem that clients might want to look at. So we just haven’t seen it mature at the pace it might have been able to mature and it’s been around for a while. You know AMD recently sort of declared they are moving away from it,” said Balog.

HP’s Moonshot server line has a model with ARM which has a few wins, a notable recent one at the University of Utah. Part of the challenge is to interest the developer community. Balog asked wryly about Moonshot, “But has HP ever delivered much of the ARM stuff really. They quickly touted that and went right back to Moonshot is a bunch of Intel servers.”

“Could ARM and POWER partner? I don’t know. We continue to ask ourselves that question. We’ll see. It’s low power. Do we think about power issues in the OpenPOWER community and what’s down the line, sure. [But so far] we aren’t seeing power consumption as a major issue to deployment. It is the balance between do you take a slightly stronger core that can run oodles of performance benefits over an ARM or an Intel processor and therefore it’s got a littler more energy consumption or do you go with a lot of systems with a really weak core.

“We continue to watch the space. We’re all for openness so I think they help chip a way at Intel at the low end and we chip away at middle- and high-end. The market wants choice and that’s sort of the fundamental thing we hear from the cloud companies,” said Balog.

Developer Outreach

Clearly a big challenge, and directly related to IBM’s presence at the Red Hat Summit, is engaging the open source developer community. They need to be convinced of IBM’s commitment and to be able to play on the POWER architecture. To some extent, IBM’s efforts there remain nascent, but growing.

“[Developer outreach] is more and more cloud based, no surprise, by providing access to POWER infrastructure in the cloud. [The idea] is to leverage benefits of a cloud versus [forcing] everybody to have their server on their desktop. A couple of weeks ago in China and we launched an open developer Linux platform in the cloud with accelerators called SuperVessel and it’s come one come all (See, IBM Introduces SuperVessel, HPCwire).

“You can do development, try out accelerators, try out POWER, try Linux on power, and it’s available free. We will expand it to the rest of the world over time. We have some things in the POWER development platform today, we have some slight differences but as POWER goes into software which it will here very soon, there will be Linux on POWER that will be another opportunity for us to provide free connection to developers.”

]]>https://www.hpcwire.com/2015/06/29/is-ibm-getting-openness-right-yes-says-gm-doug-balog/feed/019103Tracking the Trajectory to Exascale and Beyondhttps://www.hpcwire.com/2015/06/08/tracking-the-trajectory-to-exascale-and-beyond/?utm_source=rss&utm_medium=rss&utm_campaign=tracking-the-trajectory-to-exascale-and-beyond
https://www.hpcwire.com/2015/06/08/tracking-the-trajectory-to-exascale-and-beyond/#respondMon, 08 Jun 2015 14:12:44 +0000http://www.hpcwire.com/?p=18791The future of high performance computing is now being defined both in how it will be achieved and in the ways in which it will impact diverse fields in science and technology, industry and commerce, and security and society. At this time there is great expectation but much uncertainty creating a climate of opportunity, challenge, […]

]]>The future of high performance computing is now being defined both in how it will be achieved and in the ways in which it will impact diverse fields in science and technology, industry and commerce, and security and society. At this time there is great expectation but much uncertainty creating a climate of opportunity, challenge, and excitement. It is within this context of forging a future of computation in the crucible of innovation that we have been invited by HPCwire to host an ongoing series of informational articles tracking this trajectory to exascale computing and beyond. The answers are not yet established but the possibilities are currently emerging and the path or paths to be traversed towards these goals are only now coming into view.

It will be our pleasure over the ensuing months to guide this series of news articles, editorials, interviews, discussions, and perhaps some debates to provide and stimulate an open forum of consideration and dialog within these virtual pages and broad readership. We hope you will join us on this voyage of exploration as we illuminate and explore the rapidly evolving field of exascale computing towards the new frontiers of capability and discovery.

Dr. William Gropp

Even as we casually interject the term “exascale”, we as a community inadequately define or determine it’s meaning, at least in a specific and widely adopted way. Is it achieving: 1 Exaflops Rmax on the Linpack benchmark (HPL) or is it a thousand times the capability of current generation Petaflops class systems? Is it merely a single point on a many thousand-fold progression of performance (more than four so far in the lifetime of a single individual) or rather a trans-performance regime spanning a range of achievement across the three orders of magnitude from an exaflops to the ethereal heights approaching a Zetaflops; a rarely employed term even now. Is it even about flops (floating point operations per second)? In the age of “big data”, graph processing, and embedded and mobile computing it is apparent that floating-point operations are not the only important measure of performance. Integer operations, memory references, and data handling may be at least as important.

Dr. Thomas Sterling

For systems of the future (even the biggest ones now) there is great concern about total energy usage and power demand. Although subjective, one asserted threshold of pain is anything beyond 20 Megawatts. Yet Tienhe-2 already surpasses that when cooling is included, and commercial data centers already consume more than 100MW. A 20 MW limit imposes an average energy cost of about 20 Pica-Joules per floating point operation where today’s most “green” systems achieve a few Gigaflops per Watt. A rule of thumb is that a Megawatt per year costs approximately $1M. This is only one of several factors that will challenge practical computing in the exascale performance regime and era. We will consider over the succeeding weeks and months a post-modern view of performance, productivity, and even other less familiar properties as portability and generality.

But even more important than the what is perhaps the why. How often does one hear the question — “do we really need exaflops?” Over the following months we will invite experts to document diverse compelling cases where exascale computing is not only useful but essential for critical breakthroughs. Such politically charged domains as climate change demand degrees of resolution in time, space, and phenomenology to clarify, refine, and ultimately determine the validity of models as well as their implication for anthropogenic CO2 contributions.

On the brighter side (perhaps literally) is the potential impact of exascale computing to the ultimate realization of such alternatives to fossil fueled energy sources as controlled fusion. Here computing may not only determine the feasible designs for this potentially ultimate source of power but also be critical to its real-time control to make it possible. More than powering civilization on Earth, the same technologies enable projecting advanced human civilization into the solar system and beyond by the end of the 21st Century. Composite materials, microbiology, medical diagnosis, design optimizations, and even machine intelligence may all yield to computing in the exascale era. These and other application domains will be explored and discussed throughout this series. Beyond justifying the creation of exascale platforms, such detailed discussions will help determine their design and operational properties and how to achieve them.

The challenge to realizing exascale computing is not just about putting together enough hardware, or worrying about getting the energy down, or the creation of a new parallel programming language, or the crafting of new algorithms and applications. It is all these things and more and they are all interrelated in important and nuanced ways. There may not even be a single solution but rather a number of different design points both because of various opportunities and ideas and also because there are differences in the usage profiles of the application workloads and their resource requirements. It is also about responsible progress sustaining not just for future application codes but for literally decades of legacy programs upon which there is heavy dependence for agency mission critical problems, basic and applied science, and industrial and commercial applications. This challenge of innovation and continuity is one of the great problems faced by the community and that will be discussed throughout this HPCwire series of articles on exascale computing.

Advances in device technology will be essential in enabling future computing opportunities but will also be challenging. Semiconductor feature size is expected to shrink to 5 nanometers by the end of this decade yielding perhaps a density growth of about an order of magnitude. Yet this also reflects the approaching end of Moore’s Law and even then the power consumption demands may limit the practical use of the full capabilities of chips and full systems. There is promising work in all of these areas and we explore these innovative approaches to the hardware needed for Exascale systems. Other factors that have confronted system design and usage in the past also challenge the future of exascale. These include parallelism, latency of local and global access, memory hierarchies, overheads for control, and contention for shared resources. We will explore these opportunities versus challenges tradeoffs and possible strategies to optimizing within the design space now and in the exascale future in the pages of this HPCwire series.

Exascale is not just about the very biggest computing systems, it is about extreme capabilities at many scales. Perhaps the most exciting promise of exascale is the ubiquitous availability of Petaflops capable computing in the next decade. A single rack with a power consumption of 50 Kilowatts will be able to deliver 1 Petaflops well within 10 years. Thus exascale technology, which may at full capability sit on the raised floors of national centers worldwide, will also put Petaflops in the hands of most scientists, academics, and industry product developers. These systems are likely to cost on the order of $250K, well within the budget of many user domains. They will serve as end computing platforms but also as the training grounds for those who will need more computing power to solve their biggest problems.

The motivation of this new publication series is to build a bridge between the general HPC community and the industrial, academic, and government experts who are dedicated to realizing this exciting dream of practical exascale computing. Over the next months, you will see invited articles, interviews, editorials, and news briefs that will lay out the path even as the journey has begun. We the editors will serve as guides through this complex and changing space of discovery. We solicit questions and comments from our readership to help improve the discourse and story. We are delighted to have the opportunity to serve in this capacity and thank HPCwire for their support and encouragement in doing so.

]]>https://www.hpcwire.com/2015/06/08/tracking-the-trajectory-to-exascale-and-beyond/feed/018791Fujitsu Targets 100 Petaflops Supercomputinghttps://www.hpcwire.com/2014/08/12/fujitsu-targets-100-petaflops-supercomputing/?utm_source=rss&utm_medium=rss&utm_campaign=fujitsu-targets-100-petaflops-supercomputing
https://www.hpcwire.com/2014/08/12/fujitsu-targets-100-petaflops-supercomputing/#respondTue, 12 Aug 2014 22:08:23 +0000http://www.hpcwire.com/?p=14605On Monday at the Hot Chips 26 conference, Fujitsu revealed details about its upcoming SPARC64 XIfx chips, which the company is counting on to reach the 100-petaflops barrier and revitalize its HPC game. Of the top 10 fastest supercomputers on the current TOP500 list, only one is a Fujitsu system. But with the new 32-core […]

]]>On Monday at the Hot Chips 26 conference, Fujitsu revealed details about its upcoming SPARC64 XIfx chips, which the company is counting on to reach the 100-petaflops barrier and revitalize its HPC game. Of the top 10 fastest supercomputers on the current TOP500 list, only one is a Fujitsu system. But with the new 32-core SPARC chips touting more than one teraflops of double-precision performance, Fujitsu has the makings of a winning strategy for reaching beyond the petascale era.

When it comes to the supercomputing arms race, the Japanese chip and server maker should not be underestimated. It was the company’s eight-core SPARC64 VIIIfx processor and custom Tofu interconnect that catapulted the K supercomputer to a record-setting 10 petaflops LINPACK performance. The new SPARC64 XIfx chip is a follow-up to the SPARC64 IXfx, used in the company’s PrimeHPC FX10 machines. The SPARC64 X chip powered general purpose servers, but did not have an HPC counterpart.

The new processors have 32 main processing cores and two so-called “assistant” cores, yielding a peak performance of 1.1 teraflops. Compared to the Sparc64 IXfx parts, the new chips have 3.2 times the double-floating performance and 6.1 times the single precision. The assistant cores will be helpful for avoiding OS jitter and non-blocking MPI functions, according to Fujitsu.

The post-FX10 system (as yet un-named) will be constructed of highly integrated components in a high-density package. Company slides depict racks with 216-nodes (there are 12 nodes on a chassis — one CPU per node — and 18 chassis on a rack), meaning that a peak performance of more than 100-petaflops will be possible in a mere 463-rack footprint. To put the capability of the new machine in perspective, the performance of just one 12-node chassis corresponds to an entire cabinet of the K computer.

Each water-cooled chassis includes multiple Micron Hybrid Memory Cubes, which stack system memory in a 3D structure to deliver faster throughput and superior power utilization. The new Tofu 2 Interconnect increases link bandwidth by 2.5 times, supporting 12.5 gigabytes/second bidirectional communication links. Fujitsu reports that the entire software stack has also been enhanced for the successor to PrimeHPC FX10. Despite the new features, the FX-10 system maintains similar architecture (to both K and FX10) to promote application compatibility, according to the company.

A 100-petaflops system will have roughly twice the peak performance of the world’s current fastest computer, China’s Tianhe 2 machine, which achieves a peak performance of 54.9 petaflops (and a LINPACK of 33.86 petaflops) using a hybrid approach that combines Intel’s Xeon CPUs and Xeon Phi coprocessors.

Fujitsu’s plan is to keep evolving the K computer architecture all the way to exascale horizon, which the Japanese Ministry of Education, Culture, Sports, Science and Technology (MEXT) has targeted for the year 2020. The strategy involves co-design with selected target applications and innovations in the system software stack. Fujitsu is also working with both MEXT and RIKEN, the agency that operates the K computer, on a plan to drive exascale development.

]]>https://www.hpcwire.com/2014/08/12/fujitsu-targets-100-petaflops-supercomputing/feed/014605Alternatives to x86 for Physics Processinghttps://www.hpcwire.com/2013/11/11/alternatives-x86-physics-processing/?utm_source=rss&utm_medium=rss&utm_campaign=alternatives-x86-physics-processing
https://www.hpcwire.com/2013/11/11/alternatives-x86-physics-processing/#respondTue, 12 Nov 2013 00:43:29 +0000http://www.hpcwire.com/?p=1405The global distributed computing system known as the Worldwide LHC Computing Grid (WLCG) brings together resources from more than 150 computing centers in nearly 40 countries. Its mission is to store, distribute and analyze the 25 petabytes of data generated each year by the Large Hadron Collider (LHC), based out of the European Laboratory for Particle Physics (CERN) in Geneva, Switzerland. […]

]]>The global distributed computing system known as the Worldwide LHC Computing Grid (WLCG) brings together resources from more than 150 computing centers in nearly 40 countries. Its mission is to store, distribute and analyze the 25 petabytes of data generated each year by the Large Hadron Collider (LHC), based out of the European Laboratory for Particle Physics (CERN) in Geneva, Switzerland. Projects of this magnitude require significantly more computational resources than can be delivered by one facility, hence the need for a multi-organizational, international grid computing system. This infrastructure supports the science that makes discoveries like the Higgs boson possible.

Even more capacity will be required going forward. It is predicted that datasets must increase by 2-3 orders of magnitude to realize the full potential of this scientific instrument. Keeping LHC computing relevant in the coming years will require significant advances on the hardware side. Starting around 2005, processors began hitting their scaling limits, owing mostly to their tremendous power demand. This challenge has driven interest in new processor architectures, other than general purpose x86-64 processors.

This situation has inspired an international team of distinguished scientists to examine the viability of the ARM processor and the Intel Xeon Phi coprocessor for scientific computing. They’ve written a paper describing their experience porting software to these processors and running benchmarks using real physics applications. Their goal is to assess the potential of these processors to be utilized for production physics processing.

For the ARM investigation, the test setup included two low-cost development boards, the ODROID-U2 and the ODROID-XU+E, each sporting eMMC and microSD slots, multiple USB ports and 10/100Mbps Ethernet with an RJ-45 port. Each uses a 5V DC power adaptor.

The authors write that “the processor on the U2 board is an Exynos 4412 Prime, a System-on-Chip (SoC) produced by Samsung for use in mobile devices. It is a quad-core Cortex A9 ARMv7 processor operating at 1.7GHz with 2GB of LP-DDR2 memory. The processor also contains an ARM Mali-400 quad-core GPU accelerator, although that was not used for the work described in this paper.”

They continue: “The XU+E board has a more recent Exynos 5410 processor, with 4 Cortex-A15 cores at 1.6GHz and 4 Cortex-A7 cores at 1.2GHz, in ARM’s big.LITTLE configuration, with 2GB of LDDR3 memory, as well as a PowerVR SGX544MP3 GPU (also not used in this work).”

For the Phi investigations, the team created a basic HEP software development environment to support application and benchmark tests which can run directly on the Phi card. The setup employed a Xeon Phi 7110P card attached to an Intel Xeon box with 32 logical cores.

The paper delves further into the hardware and software specifics for each test environment as well as the various challenges and limitations that presented. There is also a discussion of experimental results and general tools support. The authors make the point that “when comparing and optimizing for various architectures, understanding the performance obtained in detail is as important as obtaining overall benchmark numbers.”

As could be predicted, single core performance is much lower for ARMv7 processor than traditional x86 processors, but the performance per watt is much improved for the ARM chips. The authors conclude “the potential for use in scientific (general purpose) computing is clear.” They also report “successful ports of both the IgProf profiler and the DMTCP checkpointing package to ARMv7.” Despite these positive initial tests, more work is needed before there is a clear answer on the benefits of these alternative architectures for HEP computing.

The paper describing this research has been submitted to proceedings of the 20th International Conference on Computing in High Energy and Nuclear Physics (CHEP13), Amsterdam.

]]>https://www.hpcwire.com/2013/11/11/alternatives-x86-physics-processing/feed/01405New Hope for Graphene-based Logic Circuitshttps://www.hpcwire.com/2013/09/06/new_hope_for_graphene-based_logic_circuits/?utm_source=rss&utm_medium=rss&utm_campaign=new_hope_for_graphene-based_logic_circuits
https://www.hpcwire.com/2013/09/06/new_hope_for_graphene-based_logic_circuits/#respondFri, 06 Sep 2013 07:00:00 +0000http://www.hpcwire.com/2013/09/06/new_hope_for_graphene-based_logic_circuits/As an excellent conductor of heat and electricity, graphene is a promising electronics substrate, but it can't be switched on and off like silicon can. With no solution in sight, a team of UC Riverside researchers has taken a completely new approach.

]]>For more than a half century, computer processors have increased in power and shrunk in size at a phenomenal rate, but the exponential advances described by Moore’s law are winding down. Electronics based on silicon complementary metal–oxide–semiconductor (CMOS) technology are coming up against the physical limitations of nanoscale. Currently, there is no technology to take the place of CMOS, but a number of candidates are on the table, including graphene, a one-atom thick layer of graphite. Research suggests this incredibly strong and lightweight material could provide the foundation for a new generation of nanometer scale devices.

Scanning electron microscopy image of graphene device used in the study. The scale bar is one nanometer.

As an excellent conductor of heat and electricity, graphene is a promising electronics substrate, yet other characteristics of this material have stymied its progress as a silicon alternative. To address these limitations, researchers at the University of California Riverside have taken a completely new approach.

Semiconductor materials have an energy band gap, which separates electrons from holes and allows a transistor to be completely switched off. This on/off switch enables Boolean logic, the foundation of modern computing.

Graphene does not have an energy band gap, so a transistor implemented with graphene will be very fast but will experience high leakage currents and prohibitive power dissipation. So far, efforts to induce a band-gap in graphene have been unsuccessful, leaving scientists to question the feasibility of graphene-based computational circuits.

But Boolean logic is not the only way to process information. The UC Riverside team showed that it was possible to construct viable non-Boolean computational architectures with the gap-less graphene. Their solution relies on specific current-voltage characteristics of graphene, a manifestation of negative differential resistance. The researchers demonstrate that this intrinsic property of graphene appears not only in microscopic-size graphene devices but also at the nanometer-scale – a finding that could set the stage for the next generation of extremely small and low power circuits.

“Most researchers have tried to change graphene to make it more like conventional semiconductors for applications in logic circuits,” Alexander Balandin, a professor of Electrical Engineering, said. “This usually results in degradation of graphene properties. For example, attempts to induce an energy band gap commonly result in decreasing electron mobility while still not leading to sufficiently large band gap.”

“We decided to take alternative approach,” Balandin continued. “Instead of trying to change graphene, we changed the way the information is processed in the circuits.”

]]>https://www.hpcwire.com/2013/09/06/new_hope_for_graphene-based_logic_circuits/feed/03829Mapping the Energy Envelope of Multicore ARM Chipshttps://www.hpcwire.com/2013/06/06/mapping_the_energy_envelope_of_multicore_arm_chips/?utm_source=rss&utm_medium=rss&utm_campaign=mapping_the_energy_envelope_of_multicore_arm_chips
https://www.hpcwire.com/2013/06/06/mapping_the_energy_envelope_of_multicore_arm_chips/#respondThu, 06 Jun 2013 07:00:00 +0000http://www.hpcwire.com/?p=4006Bigger is not always better in the world of supercomputing. While data scientists almost always desire more computational throughput, the key question is how best to deliver that: through traditional, power-hungry X64 processors, or through the cheap, low-power ARM processors that drive smartphones and tablets? The answer is not always clear.

]]>Bigger is not always better in the world of supercomputing. While data scientists almost always desire more computational throughput, the key question is how best to deliver that: through traditional, power-hungry X64 processors, or through the cheap, low-power ARM processors that drive smartphones and tablets? The answer is not always clear.

The ARM architecture is estimated to power more than 90 percent of smartphones, and a good chunk of the world’s tablets too. To sate the desire for ever-faster devices, ARM Holdings has funneled more resources into the development of its 32-bit ARM architecture, with the hopes of boosting performance (memory especially) while minimizing electricity consumption and heat.

This keen interest in the ARM architecture has garnered the attention of the HPC community, which is always sensitive to power consumption and cooling issues. Several HPC companies and supercomputer projects have started migrating to the ARM architecture, such as the Barcelona Supercomputing Center, which is developing a supercomputer based on ARM Cortex-A9 systems.

Instead of diving headfirst into the ARM pool, however, smart HPC system builders need a way to predict whether ARM-based systems will, in fact, deliver the expected benefits in power consumption.

To that end, researchers at the National University of Singapore’s Department of Computer Science recently wrote a paper that sheds light on the balance between processing, memory, and network I/O on the one hand, and energy consumption in the latest multicore ARM architectures on the other. The paper was published by Sigmetrics, a special interest group that promotes the evaluation of computer system performance.

First, researchers Bogdan Marius Tudor and Yong Meng Teo developed a model that can predict the execution time and energy usage of an application for different number of cores and clock frequencies. This gives the user the capability to select the configuration that maximizes performance without wasting energy.

Second, the researchers tested that model against three types of applications, including HPC, Web hosting, and financial workloads. The tests show that the model can deliver a configuration of core counts and clock frequencies that reduce power consumption by 33 percent without impacting performance.

But in the end analysis, smaller is not always better. “We observe that low-power multicores may not always deliver energy-efficient executions for server workloads because large imbalances between cores, memory and I/O resources can lead to under-utilized resources and thus contribute to energy wastage,” the authors conclude. “Resource imbalances in HPC programs may result in significantly longer execution time and higher energy cost on ARM Cortex-A9 than on a traditional X64 server.”

If the ARM architecture is to make significant inroads into the HPC world, it will need enhancements in memory and I/O subsystems, the authors say. These enhancements are expected to be delivered in the ARM Cortex-A15 and the upcoming 64-bit ARM Cortex-A50 families, they say.