Posted
by
timothy
on Monday August 08, 2011 @06:30PM
from the back-to-the-amd-k6-2-for-now dept.

An anonymous reader writes "IBM has terminated its contract with NCSA for the petascale Blue Waters system that was expected to go online in the next year. The reason stated was that NCSA found IBM's technology 'was more complex and required significantly increased financial and technical support by IBM beyond its original expectations.' The IT community is now wondering if NCSA will be renting out space in the new data center that is being built to house Blue Waters or if they will go with another vendor."

My experiance with microsoft in academia is they like to get us on programs that are a lot cheaper than paying for the software normally but are based on paying a subscription based on the size of the whole institution rather than paying for each individual machine.

The result is that there is no motivation to gradually migrate away from MS software since the only way to reduce the ammount paid to MS would be to virtually eliminate MS software from the institution (which is not realisitically going to happen

I'm still waiting for them to find out about Microsoft's [business model].

It might be interesting to look through the flock of Microsoft patents (thousands? millions?) with the idea of listing the patents for things published by NCSA people. More generally, how many patent violations there will be in the new super-computer, and how much will NCSA have to pay for licenses to use the things discovered/invented by their own researchers?

And how many companies in addition to Microsoft will be filing infringement suits against the NCSA? Yeah, we know that IV will be there, but how

The reason stated was that NCSA found IBM's technology 'was more complex and required significantly increased financial and technical support by IBM beyond its original expectations.'

As usual the/. summary is misleading at best. The actual language used was:

The innovative technology that IBM ultimately developed was more complex and required significantly increased financial and technical support by IBM beyond its original expectations. NCSA and IBM worked closely on various proposals to retain IBM's participation in the project but could not come to a mutually agreed-on plan concerning the path forward.

Other tidbits from the real press release are that IBM terminated the contract, no

If you read between the lines, we are of course back to the point of the headline:

a) IBM wanted to siginificantly increase the price ("required signifcantly increased financial support...", which they would of course passed on), which they could not get through. So they decilined delivery to the initially contracted conditions.

Since the sides couldn't come to terms, IBM took a huge hit by terminating the contract. Yeah, they get their hardware back, but it's probably not very easy to sell to anybody other than NCSA. And they have to return all the money, which means they did a lot engineering work for $0, once again with few prospects of monetizing the work in a future deal.

One of the big problems here is that this system was a one-off, that was not meant to be. IBM developed the system under the DARPA HPCS contract. They made a very capable system that is also very expensive. They hoped to sell a bunch of them; It looks like they sold just one. As such, all of the engineering costs are being amortised across just one machine. They couldn't leverage a bunch of smaller systems at other customer sites to stabilize the technology before deploying the monster big one at ncsa. Some

Not sell a system as big as Blue Waters, but using the same technology.The power 755, of which blue waters was supposed to be the prime example, is very powerful per node, has a lot of bandwidth in node, and between nodes, and could be quite useful in much smaller configurations. Tim Morgan at The Register indicates that IBM will still be selling smaller configurations of this machine. It's just hard to keep up that level of per-node performance across so large a machine, for the agreed upon cost.

Pretty surprising development, given the length of time that IBM and NCSA had been working on this. Dropping a contract like this essentially puts into question IBM's costing on future contract bids, so it's not something that they'd do lightly. It'll be interesting to see the scuttlebutt that comes out afterward to see how much of this was technical shortcomings and how much pure financial considerations from IBM. Maybe since IBM already got their big publicity for Power7 from Watson, they're being more profit-concious on future Power systems so they don't tie themselves to margins that are too low.

From the NCSA side, there will certainly be a fallback of some sort - NSF and NCSA are already working out those details according to recent reports. I'd guess that they go with a large Cray XE6 system, given that a pretty sizeable version of that system is already being stood up and ironed out (the Sandia/Los Alamos Cielo system), and Cray has a lot of history successfully standing up big systems (e.g. ORNL Jaguar, Sandia Red Storm, etc.). SGI Altix is the other alternative, I guess, and there's a pretty big one up at NASA now, though that'd probably be a riskier proposition than Cray IMO, and I expect that NCSA and NSF are going to be pretty risk averse on following up on this.

I'm sure Cray can get up to speed in this time frame. They've done if before for the jaguar deployment. However, if they go with Cray, why install it at NCSA. The NSF already has a big Cray running at University of Tennessee. (Kracken) Why not just upgrade the existing cray? They already have the bugs worked out, they would just have to add more cabinets, and probably upgrade the processors.

1) Kraken is an XT5, not an XE system - the associated changes of an upgrade from XT to XE would be very large.2) NCSA already has a big machine room (that they just built) to support that scale of a system. Does ORNL have enough additional power and cooling capacity to support Keeneland, Jaguar, and growing Kraken by an order of magnitude in size?3) ORNL is already installing Keeneland, an NSF track 2 system this coming year4) The larger politi

NSF already has a big cray XT5: Kraken at UofTenn. So the risk averse would probably say get a next generation XE6. Cray has announced an integrated GPGPU option, so NCSA could get a few cabinets of GPUs to play with, but integrated into a more traditional x86 super. The fact that NSF is already familiar with the machine could make this less risky.

However, this machine is not run by NSF, it's run by NCSA, who have no recent experience with Crays. Mostly they've been running whitebox clusters. They had SGI s

Yes. Good find. However, that sort of system speaks to the Altix' strengths. You program it like it's a SMP, you have one coherent memory space, and several hundred processor cores. This is the perfect use of an Altix. Of course SGI would rather you use your pre/post processing Altix next to a big ICE cluster, rather than a big IBM.

Absolutely. RIKEN in Japan got torn a new one when Fujitsu blew out the schedule (thereby jacking up the price) of the "K computer" by a couple of years, but being the ever trusting society Japan is, nobody made a fuss. Even talking about cancelling such a project would have been considered the height of rudeness, not to mention an admission of incompetence.

It's great to see academic institutions stand up for a change instead of just bending over and taking it.

IBM does need to drop the price of Blue Gene, BUT Blue Gene is absolutely awesome to work on (I use Intrepid). Almost all the rest of the rest of the "supercomputers" out there like Cray are basically just PC clusters.

Seriously? That's the first time I've heard that. What do you like about it? The buggy toolchain and CNK? The joys of (sort-of) cross-compiling? The I/O bottlenecks? The blazing fast (for 1999) CPUs?

The only way I can see BG/P being a useful machine is either:1) All you need to do is run LINPACK2) You're booting Linux on the compute nodes (in which case a commodity Linux cluster would probably be a lot cheaper)

What were you trying to run on there, a web server?
One of the advantages of Blue Gene is precisely that its compute nodes do not run some full-featured OS that gets in your way. As HPC platform, the Blue Gene line is pretty much unrivaled in terms of energy efficiency and reliability.

If your code is pure MPI C or Fortran, then the BG is a decent idea. Remember, the original name of the machine was "QCDOC", or "QCD On a Chip" - if you're running QCD, it rocks. Other things, not so good. Let's say you have a big code in Java and you want to run it on your Blue Gene. Well, you're screwed - there's no JVM for the worker nodes. Let's say you have a big code in Perl (and don't laugh - Perl is what about half of computational biology gets done in). That's a problem, because there's no OS

For forty years I dreamed passionately of having my ultimate computer at home. The Apple ][ was my first "workstation, and I invested heavily and actually had two floppy drives. Then I wanted to wire wrap myself an 8086 multitasking computer. Then I had to have an IBM PC/AT. But I knew in my heart that there were these special "expensive" machines called "workstations" that ran on some strange OS called UNIX. I discovered the RISC philosophy, and began dreaming of owning a RISC workstation. I found out abou

I hope we'll have a thread here rehashing how the Mosaic browser was developed at NCSA in the early '90s by a group of grad students informally lead by Marc Andreesen, and how the university sued after Andreesen and most of the original team took off for Silicon Valley to form Netscape.

Netscape was a crime against the internet and especially against web developers of late 90s early 2000s. If you ever had to design a form in Netscape 4.7 you know what I mean - having textboxes that can only be sized in characters is significantly painful. And I won't even talk about layers because already my blood pressure is getting too high.

Netscape the company was a crime against the Internet. Their aim was to introduce proprietary tags into Navigator and serve up those proprietary tags with their server technology. They were a genuine threat to Microsoft. That doesn't absolve Microsoft for crushing them, but it explains it. And things wouldn't automatically be 'better' if Netscape had won 'the browser war.' We wouldn't have Mozilla in it's present state. And I would really miss my SeaMonkey.

Heh. I downloaded and installed NCSA Mosaic about twenty minutes ago, and unfortunately it no longer appears to work on Windows 7. I don't know if there's something missing in the TCP/IP stack, something in the Windows Socket Services implementation, or what, but it crashes on trying to load URLs. And yes, I did add the "http://" to the front of the URL like you used to have to do.

It took a custom CPU to knock out the Tianhe (GPU-based) supercomputer. Did IBM plan to use an existing POWER chip, or were they trying to develop a new Cell-like (or other boutique) processor? IBM keeps saying that the future of Cell isn't dead. I wonder if NCSA thought they'd get more bang for their buck with a GPU-based solution?

My experience with IBM is that every new software or equipment setup is painful, complicated and goes over-budget, but once things are up and running, it is rock-solid, so in the long run it is still the vendor I would trust the most for enterprise projects. Knowing them, I always take into account the extra oil and time that will be needed to make things go smoothly at first.

This is very different from a vendor like Dell, who takes good care of its new customers (especially the ones with deep pockets) and make sure that the delivery is on time and budget, but after a while problems start to appear (wrong firmware, obsolete drivers, etc) and pretty soon they tend to ignore you if they feel you won't bring new business in the next quarter.

In this case with the NCSA thing, it's a typical situation where budgets have no room for the fudge factor because the organization has a price-driven selection process, which is wrong.

In this case with the NCSA thing, it's a typical situation where budgets have no room for the fudge factor because the organization has a price-driven selection process, which is wrong.

As in they don't have an infinite slush fund to tap into? That would be most organizations.

You'd think by now IBM would know how to develop a specification, price it, and honor the contract price. I have to and I've only been in business 7 years. Yeah, once in a while I take a haircut, but that's called honoring your contr

A price-driven selection is an incentive for bidders to go in very lowball, and this only leads to nightmares for both parties. It's a silly practice based on obsolete purchasing practices (such as requiring three quotes for any important purchase - which over the long run drives off the vendors who usually don't win; those could be a very good match in a specific situation but after a while they won't even bother try to win a business because they know that most of the time they are contacted just to make

It appears to be the latter. The spec is available here [nsf.gov].
NCSA negotiated a system with IBM, proposed it to NSF under the above linked RFP, went through a peer-reviewed awards process, negotiated an award with NSF, and started working on the delivery and other aspects with IBM and NCSA's other partners. Something went wrong in the last several months, and IBM's pull out was the result. I doubt that there is any more money to be found, and all parties knew what was asked of them in order for the project to b

To be fair, Dell is limited with driver support by what their vendors provide. You can reasonably expect your hardware to be supported until the next version of windows is released. At that time if the drivers aren't compatible with the new version of windows you will be upgrading your hardware.Pretty much an x86/x86_64 given.

Hardware companies don't make any money maintaining drivers for 4 year old hardware for which they will never see revenue again. Their margins are so thin there's no way they could aff

> To be fair, Dell is limited with driver support by what their vendors provide.

When you have some equipment installed and "certified" by Dell, you don't expect them to use obsolete drivers while there are three or for more recent versions on their own website. This happened to me twice, and almost a third time but then I knew the drill so when the setup was completed I asked for a complete driver inventory and did the comparison with the available versions myself - thankfully I catched them before they