On Wednesday 20 June 2001 15:27, Mike Harrold wrote:> Martin Dalecki wrote:>>> > Blah blah blah. The performance of the Transmeta CPU SUCKS ROCKS. No> > matter> > what they try to make you beleve. A venerable classical desing like> > the Geode outperforms them in any terms. There is simple significant

Let's just say I haven't exactly been thrilled with the performance of the geode samples we've been using at work. I have a 486 at home that outperforms this sucker. Maybe it's clocking itself down for heat reasons, but it really, really sucks. (Especially since I'm trying to get it to do ssl.)

And yes, we're thinking about transmeta as a potential replacement for the next generation hardware. We're also looking around for other (x86 compatable) alternatives...

> > Well the actual paper states that the theorethical performance was "just" > > 20% worser then a comparable normal design. Well "just 20%" is a half > > universe diameter for CPU designers.

In the case of transmeta, that's in exchange for a third processor core, which is probably worth something.

20% is only about 3 months of moore's law. 90% of processor speed improvements over the past few years have been die size shrinks. You could clock a 486 at several hundred mhz with current manufacturing techniques, and get better performance out of it than low end pentiums. (Somebody did it with a bottle of frozen alcohol and got themselves injured, but was managing a quite nice quake frame rate before the bottle exploded.) And that's not counting the fact a pentium has twice as many pins to suck data through...

And I repeat, if you're clocking the processor over 10x the memory bus rate, your cache size and your memory bus become fairly important limiting factors. (Modern processors are much more efficient about using the memory bus, doing bursts and predictive prefetches and all. But that's a seperate issue.)

Look at pentium 4. Almost all the work done there was simply so they could clock the sucker higher, because Intel uses racy logic in their designs and had to break everything down into really small pipeline stages to get the timing tolerances into something they could manufacture above 1 ghz. It's AT LEAST 20% slower per clock than a PIII or Athlon. It's all noise compared to manufacturing advances shrinking die sizes and reducing trace lengths and capacitance and all that fun stuff...

> So what? Crusoe isn't designed for use in supercomputers. It's designed> for use in laptops where the user is running an email reader, a web

Not just that, think "cluster density".

142 processors per 1U, air cooled, running around 600 mhz each. The winner hands down in mips per square foot. (Well, I suppose you could do the same thing with arm, but I haven't seen it on the market yet. I may not have been paying attention...)

> browser, a word processor, and where the user couldn't give a cr*p about> performance as long as it isn't noticeable (20% *isn't* for those types> of apps), but where the user does give a cr*p about how long his or her> battery lasts (ie, the entire business day, and not running out of power> at lunch time).

Our mobiles aren't (currently) battery powered, but a processor that doesn't clock itself down to 46 bogomips when it's running without a fan is a GOOD thing if you're trying to pump encrypted bandwidth through it at faster than 350 kilobytes per second. (The desktop units are getting 3.5 megs/second running the same code...)

> Yes, it *can* be used in a supercomputer (or more preferably, a cluster> of Linux machines), or even as a server where performance isn't the> number one concern and things like power usage (read: anywhere in> California right now ;-) ), and rack size are important. You can always> get faster, more efficient hardware, but you'll pay for it.

It's still not power, it''s heat. You can run some serious voltage into a rack pretty easily, but it'll melt unless you bury the thing in fluorinert, which is expensive. (Water cooling of an electrical applicance is NOT something you want to be anywhere near when anything goes wrong.)

Processors in a 1U are tied together by a PCI bus or some such. The latency going from one to another is very low. Processors in different racks are tied together by cat 5 or myrinet or some such, and have a much higher latency due to speed of light concerns. A tightly enough coupled cluster can act like NUMA, which can deal with a lot more applications than high-latency clustering can. (There hasn't been as much push for research here because it's been too expensive for your average grad student to play with, but now that the price is coming down...)

> Remember, the whole concept of code-morphing is that the majority of> apps that people run repeat the same slice of code over and over (eg,> a word processor). Once crusoe has translated it once, it doesn't need> to do it again. It's the same concept as a JIT java compiler.

Except code morphing's translation happens about when you suck stuff in from main memory into the L1 or L2 cache, which is happening WAY slower than the inside of the processor is dealing with anyway, so basically it gives the processor extra work to do exactly when it's likely to be spending time on wait states...

> /Mike - who doesn't work for Transmeta, in case anyone was wondering... :-)