Oracle Blog

CMT musings

The problem with the world

My father in law says that the problem with the world is more people writing than reading. He has been saying it for decades. Who wrote that? I would riposte then, in my impertinent days before the Internet. We fight this battle one book at a time, or more. My father, for example, has multiple books in progress in different rooms. No, I haven´t told them about Blogs, and how Blogs exacerbate the world´s maladie now that any impertinent like me can write and publish.

The Web started fine, no writer surplus when pages were predominantly read-only static content. Mundane stuff did not deserve to be on the net; I was homepageless for many years. Static content concentration resulted in browsers accessing the same content over time, so Web caches made sense. Caching content at large aggregation points, like corporate Internet access proxies and at service providers, saved bandwidth and shortened response times. If you use something often, keep it close to you. What a concept... Processors keep cache lines in fast on-chip memory, operating systems keep file system caches in system memory, and restaurants keep the most popular dishes pre-cooked ready to heat and serve.

Except for mutual fund fees, which are damn predictable regardless of future performance, past behavior may not be a good predictor for the future, warns the prospectus. Such warnings fit the Web cache case, though. Web usage evolved to include much more dynamic content. Content is now highly customized to our identities, and cannot be days old. Auction sites, brokerage houses, and blogs demand content that is personal and timely; the impact on infrastructure is simple, it drives more bandwidth and end-point capacity so that this content can be assembled and served fresh. The Moore-Shannon match I described in a previous post gives us more endpoint and channel capacity in the servers and the plumbing that make up the net. I mean no disrespect by skipping the sophisticated distributed caching and tiered processing that also make up the net, I am exaggerating to highlight how brute force caching is becoming less useful.

Expectedly, brute force caching is not the best culinary choice either. We sent men to the moon but haven't made a reheated pizza that tastes the same, and in spite of decades of civil aviation pre-brewed airline coffee smell is as cruel a torture as coach class legroom. Stashing pre-cooked dishes in a BIG refrigerator is brute force cuisine. I am here to advocate the Big Oven approach instead.

Incidentally, we faced the same choice when we created our CMT UltraSPARC T1 processor. Allocate more transistors and power resources to caching or save them for the processing resources themselves. Larger caches in processors ARE the brute force approach. In our case, just like most restaurants, we were optimizing for throughput (and particularly throughput per Watt), and there was a better solution, vertical threading. By making each of the eight cores in the UltraSPARC T1 vertically threaded, AND by having a wide memory interface (23 Gbytes/sec bandwidth) the cores can keep retiring instructions in the face of long latency memory accesses, which is exactly the same problem tackled with caches. Long latency memory accesses are like cooking steps in culinary recipes, you must wait for the oven to do its thing, and that takes a while.

Cautious customers and, as of our product launch also some competitors, ask how an eight core CMT can perform with just a 3 Mbyte L2 cache. Isn't our CMT like eight processors in a single socket, shouldn't it have eight times the cache of traditional processors to keep them individually busy. Well, the whole point is that CMT addresses the memory latency problem through vertical threads, and this makes it much less sensitive to cache size because it is less sensitive to cache misses in the first place. Instead of a large refrigerator full of pre-cooked dishes to be heated and garnished by a single overworked cook, we put a large oven and hired eight nimble cooks. The cooks were taught how to handle four orders at a time (just like my father does with books), and whenever one of these orders goes in the oven they switch immediately to one of the other three that is not in the oven. That is why the large 23 Gbytes/sec oven is important, it holds up to four orders for each of the eight cooks at the same time.

We explained this approach to cautious customers through architecture and modelling data, but the most convincing step, in computers as in food, is testing and tasting. Trust the Explanations but test the product. Two weeks back I heard about the “try and buy” program. Evaluate the CMT box and only buy it if you want to keep it. Not many restaurants go that far. Some won't even let you in the kitchen to look inside those BIG refrigerators. I am not sure the “try and buy” program will satisfy competitors. After all competitors react in disparage or embrace modalities. Questioning the cache size is an example of the disparage modality. The embrace modality was used by another competitor, first claiming they already had multi-core vertical threaded network processors, and more recently announcing CMT plans themselves.

Network processors (aka NPUs) are indeed vertically threaded processors. Unlike NPUs the UltraSPARC T1 is a vertically threaded general purpose processor, with all the software development advantages of standard tools and languages, full memory protection, virtual memory, cache coherency across cores (at L1 and of course the shared L2), arbitrarily large program memory, and no collaborative thread yielding constraints. The UltraSPARC T1 is a good foundation for I/O and network facing workloads without the programming quirks of network processors. Competitors arguing they already have CMT technology is akin to comedian Benny Hill's reaction when told about Neutron bombs that destroy people without damaging their buildings. “Oh, we already have them in England, we call them mortgages”. That is how similar they are...

As for embracing CMT as their future direction, that would be just flattering.