quote:As with K8, K8L will have 3 ALUs (arithmetic logic units) and 3 AGUs (address generation units). Combined with cache enhancements and the new ability to reorder loads, K8L has a shot at outpacing Core in integer performance.

No. Because Core Duo(Yonah) with inferior decoder configuration, inferior memory bandwidth(which won't matter a lot but will make slight difference) and platform, still manages to outperform the current K8's. The Pentium M, which is even worse than Core Duo(slightly) still manages to outperform the K8's in integer. Now put Core with integrated memory controller, and comparison will look like Core Duo against Athlon XP.

Core microarchitecture will exceed K8's in general integer architecture, and at least equal in K8L's ability. Integer superiority is still gonna be there, K8L will be faster than Core in FP and SSE performance because of low latency integrated memory controller with lots more real-world bandwidth(well that depends on how AMD implements SSE, Intel may still have an advantage if AMD puts a poor implementation like they did with Athlon XP's SSE, or at least it looked poor). Reply

If ~33% of all instructions are Loads, and K8 pretty much totally lacks the ability to reorder Loads, adding that feature could substantially boost performance. It definitely "has a shot" at beating Core, but it may also fall short. Anyone making blanket statements one way or the other - i.e. it *will* beat Core, or it *won't* come close - needs to take a step back and check what they really know and what they are just assuming.

At present, AMD is saying K8L is going to have the ability to reorder Loads. They might only do minor reordering, or they might go so far as to have something similar to Conroe's memory disambiguation. Given that AMD hasn't done a major update to K8 in over 3 years (no, DDR2 controller and going dual core don't really count as major updates to the underlying architecture), K8L could be a lot of things. It migth only match Core Duo 2 on a clock-for-clock basis; it might fall short; it might even come out ahead. Also, there has been no indication that Intel is seriously planning on-die memory controller in the near future, probably to continue to protect their chipset market.

Personally, I really hope AMD manages to basically match CD2 performance, because runaway performance leads don't help the consumer. In the end, theoretical integer, PF, SSE, etc. performance isn't as important as real-world application performance. Right now, it's just too soon to declare a victor in the Core Duo 2 vs. K8L match-up. CD2 vs. K8 is already pretty much a done deal, though, and there's no indication that AMD will be able to come out on top in that rivalry. K8L is their "counterattack", and that's the architecture that needs to compete with CD2. Reply

quote:If ~33% of all instructions are Loads, and K8 pretty much totally lacks the ability to reorder Loads, adding that feature could substantially boost performance. It definitely "has a shot" at beating Core, but it may also fall short. Anyone making blanket statements one way or the other - i.e. it *will* beat Core, or it *won't* come close - needs to take a step back and check what they really know and what they are just assuming.

It's easy to see the performance in integer against Core. Core has ability to reorder loads, but Core Duo is in same situation as K8, it doesn't really have the ability either. Other than that, on the basic block diagram, K8 is superior architecturally to Core Duo, yet Integer performance is somewhat better on Core Duo. The difference probably goes deeper than that. One of the articles mention K7/K8 has similar technique to Intel's micro op fusion. It could be Intel's is much better, etc. If a K8 with substantially better microarchitecture(+ODMC) can't beat integer performance of Core Duo, will K8L with basically same microarchitecture(or may be worse) beat Core?? It's simple to see it probably won't. Reply

I totally agree that the "direct connect" is the most desirable way but I cannot help but think AMD is somewhat daydreaming. That is, what's showing in the slides seems way ahead of today's "practicality".

I mean, we've had this PCI Express which has been strongly pushed by core logic vendors, but so far all we practically have are video cards. I sometimes think all these mobo makers pay more attention to "asthetic" point when they design PCI-E slots so the boards look prettier. (lol)

If my understanding is correct, AMD will introduce a new type of slot, HTX, on motherboards. Will other technology/market follow? Or will it just give another chance to graphics card manufacturers to push us to buy new cards? On today's desktop boards, basically everything is "integrated", sans video. I know that a video card has its own core and frame buffer, and transfers data via Hyper Transport, but if a physics card can utilize the HTX, what stops a video card from connecting directly to CPU, without passing the core logic or system memory?

I think this will also be closely related to the available bandwidth of HTX per CPU core (or cores), and I can't really think of any add-in board that'll prioritize the bandwidth other than video cards, (OK and the physics cards) even though the HTX will be an open standard. (look at the lazy/lame Creative)

A very desirable case would be where storage (hard disks) can take advantage of this "direct" connection but then again there is a such thing called "memory", so my imagination stops there. (maybe solid-state/I-Ram type of storage can make use of the HTX? Then what's the use of memory? Taking care of I/O?) Talking about I/O, I just thought it'd be interesting to see keyboards/mice connect to CPU via HTX. (Sorry I couldn't resist)

All in all, like the article says, this roadmap seems just too broad/ambiguous/futuristic. I'm not a CPU engineer so my thinking could be totaly off, though. If so, please enlighten.

quote:At a lower level, we have a block diagram of the compute core for K8L CPUs. Again, this diagram is a bit oversimplified, but we can see a few key features of the architecture. On the FP side, the CPU is able to handle 2x128-bit floating point or SSE operations per clock. While this isn't quite as flexible as Intel's Core with its 3 SSE units, AMD's K8L will be able to handle 4 double precision floating point operations per clock. . (Current K8 chips can only do 1x128/2x64-bit SSE instructions per clock.)

I'm a little confused. IIRC Core2 can do 2x 128 bit operations, each of which can be an add or multiply, but only one of which can be a load. AMD is restricting the actual operations to just 1 add and 1 multiply, but is removing the restriction on loads? So they'll be better able to feed the vector units then Intel, but have less flexibility once they've loaded?

That doesn't make a whole lot of sense to me. I'd think if their SSE implementation was less agressive, they would not have added more load units to feed it. Has AMD confirmed that there are only 1 add and 1 mult unit? Or is this a case of Intel designing a nice backend and not providing the front end resources to keep it fed? Reply

However intel's C2 frontend(from L2 up) is far superior to AMD's. And was such since Banias. Also intel's backend(execution units) is now on par but only recently Yonah and older were inferior to AMD's brute force 3-issue backend.

AMD has kinda ingeniously hidden poor backend by IMC however for streamed(desktop) pseudo-random loads intel's huge cache structures mitigated this so they are forced to improve frontend(hard to do) and do some backend optimizations(easy) on the way. Well, they kinda knew they will have to do this since the 90's, they have just chosen to implement IMC and cater to the core itself in the next iteration.

On the 2load units - without them the maxFLOPS would be n, real one x. With them(load units are relatively simple and low power compared to FPU's) they've got MmaxFLOPS around 2n AND real achievable one(IMHO) in the 1.2x~1.5x range. Pretty good ROI for the one added load unit. Reply

quote:However intel's C2 frontend(from L2 up) is far superior to AMD's. And was such since Banias. Also intel's backend(execution units) is now on par but only recently Yonah and older were inferior to AMD's brute force 3-issue backend.

Could you explain how?

quote:On the 2load units - without them the maxFLOPS would be n, real one x.

No it wouldn't. CPUs have registers, so the number of load units has nothing to do with FLOPs. You could have just one load unit and still sustain an arbitrary number of FLOPs, provided you didn't mind using the same registers over and over again, which I suppose could be the case if you're doing an iterative approximation of a value.

quote: With them(load units are relatively simple and low power compared to FPU's) they've got MmaxFLOPS around 2n AND real achievable one(IMHO) in the 1.2x~1.5x range. Pretty good ROI for the one added load unit.

I don't think loads count as FLOPs, even if you're loading things to be used in FP operations, so having more load units doesn't increase max FLOPs. Reply

But you do have to remember that AMD implimented a propreitary coherent HT for use in SMP systems. They haven't always been open, even if their method was implimented on top of an open standard.

I do agree that general use of HyperTransport makes I/O much easier on many levels, and was a very good move for AMD. And now that they are opening up cHT, some really cool things can happen -- if the industry is ready. :-) Reply

[2001]
AMD has disclosed HyperTransport technology specifications under non-disclosure
agreement (NDA) to over 170 companies interested in building products that incorporate
this technology.
Multiple partners have signed the license agreement for HyperTransport technology,
including, among many others:

AMD is releasing the specifications to an industry-supported non-profit trade association in the fall of 2001.
The HyperTransport Consortium will manage and refine the specifications, and
promote the adoption and deployment of HyperTransport technology. It is also expected
to consist initially of a Technical Working Group and a Marketing Working Group.
Subordinate task forces will do the work of the consortium. Anticipated technical task
forces include:
Protocol Task Force
Connectivity Task Force
Graphics Task Force
Technology Task Force
Power Management Task Force
Information on joining the HyperTransport Technology Consortium can be found at
this website: http://www.hypertransport.org">http://www.hypertransport.org

San Jose, Calif., July 24, 2001 -- A coalition of high-tech industry leaders today announced the formation of the HyperTransport™ Technology Consortium, a nonprofit corporation that supports the future development and adoption of AMD's HyperTransport I/O Link specification.

Direct Connect Architecture (DCA 1.0 and 2.0 are published standards).

HTX is a published standard.

Some questions for you to ask the AMD engineers:

I'm still interested to obtain pinouts of AM2 and F1207 sockets to establish how many HT links they can support.

From 4x4 it looks like AM2 *MIGHT* support TWO HT links (one to other processor, one to the tunnel chip.

I note 4x4 is slated for 2006 launch.

Hope to see those boards real soon ;-) I assume you can populate one socket and put the other proc in there later when you have more money ;-)

I would like to see HTX appear on some 4x4 or AM2 boards but doubt it will happen.

However, on the "acceleration technology" I would like 4x4 to support the so-called "socketfiller" type where you drop in a xilinx fpga onto the socket. That would give a cheap 1cpu + 1fpga system. Hopefully acceleration is not precluded just cos its not opteron and not 1207.

Now thinking of opterons, I want to know the pinout of socket F. I want to count the HT link support built in to the socket. If its only 3 HT links that would force a socket change to do 4 links.

I think any HT compliant chipset can be used with any HT compliant part. Thats why Apple can use AMD's PCI-X bridge designed for Opteron processors on their older G5 systems. The chipset supports HT, thats all you need.

*IF* an am2 socket can indeed support TWO HT links, then the SECOND processor could use its spare link to connect to yet another I/O interface chip/chipset.

This would give opportunity for innovative 4x4 boards to add additional I/O, more pcie links, or an HTX slot.

Please can we verify:

How many links are available on AM2, and howmany links are available on FX62, and how many links are available on lower AM2 chips. I suspect the lower ones only have one HT link which would make them unsuitable for 4x4 operation. Please confirm.
Reply

There's a few of ways 4x4 could work with only the one HTT link in the socket.

1. AMD could enable a second chip-to-chip HTT link using pins/lands on top of the cpu, or some sort of edge connector, with a pcb which bridges the two.

or

2. AMD could be splitting the HTT link into 2 8-bit links. One to the chipset, one to the 2nd processor. Heck, if the chipset is smart enough the leftover 8 bit link could go back to the chipset, resulting in the equivalent bandwidth between chipset and processors as a 'standard' dual opteron rig, just less between the processors. For desktops, 8 bit is probably enough.

and of course if you're talking custom chipset, that leaves

3. The chipset has dual CHT links, one to each processor, and acts like a traditional dual FSB chipset.

Conroe is already obsolete because K8L will grind it into the dirt. Anyone who buys a Core 2 Duo this year is wasting their money because AMD's K8L is better in every way. There's no point upgrading now unless you are stuck with a rubbish last-generation netburst processor like a Northwood or Prescott, because it's clear that K8L will totally annhilate Core 2 Duo and its successors. But if Intel fanbois want to waste their money on Core 2 Duo, that's fine. A little bit of competition from Intel before AMD strike their devestating counter-attack next year will ensure AMD don't cut corners in the K8L design.

The best way to sum up the next year in CPUs is: Intel manage to gain a slight lead in the second half of 2006 and early 2007, but after that it will be AMD r0><0rs and Intel is teh su><0rs again!!!111

P.S. The above is not meant to be taken entirely seriously ;) though I do believe from what we've seen that K8L should be a bit ahead of what Intel have next year, if nothing else because of its integrated memory controller. Reply

More than two graphic cards will improve performance (over just two) only in the most insane resolutions (like 3000 by 2000 pixels). As for the use of four cores, there certainly exists - just not in the games right now. Even two cores won't bring a big boost in game performance - as of now. Who knows, maybe games ported from PS3 and XBox 360 will use them (hopefully) Reply

I can kind of figure it out for myself but I wanted to make sure - what is cache coherency? Either way, Torrenza looks very interesting and very promising. Not only will AMD be delivering good competitive performance, but has a chance to unlock a whole new path in bringing a new standard through integrated computing.

Hope you find out the goods for us Derek. Keep up the good work! Reply

To go with the shortest description, cache coherency is the catchall term for methods used to organize and inform multiple processors of cache changes in multiprocessor/multicore systems. Because both processors can work on the same data set at once, if one changes the data, the other needs to be intelligently informed about this, otherwise it will likely do something incorrectly. Reply