Post Your Comment

11 Comments

I can't help but wonder what advantages this would have over Intel's existing open source TBB. Threading Building Blocks 2.0 seems to be a pretty robust runtime library to be able to do the hard work covering some of the most difficult things to manage currently...

Plus Intels companion tools are awesome. Expensive but pretty nice.
TBB is also open source and multiplatform friendly.
I'm also curious to learn more about real world TBB experiences.

Anyway, it's good to hear more work is being done on this stuff and different approaches will always help.

We have a long way to go because as it is Quad cores and above will not really be leveraged like everyone assumes they will be. It will take more than market saturation of multicores for them to be used efficiently. Unless an application is manipulating data streams, which has an easily splittable workload, like encoding, rendering, compression, etc. then you really won't see the type of workload granularity from other types of applications in truly leveraging multicores. Core's beyond 2 will offer ever decreasing negligible results for quite a while in mainstream applications without some advances.

I hope to hear more about advances on this in future articles because as it is, it seems quad will be somewhat of a wall for us of any real beneficial performance, anything above that will really just serve as a good heater unless it's for a very specific application. Reply

LWP isn't intended to be competition for TBB, rather it augments it (at least as much as TBB is beneficial on an AMD chip). TBB is compiler and library help, an essential part of extracting maximum performance, but it doesn't include anything as far as application profiling goes. LWP is the final link as far as that goes, once TBB has taken you as far as it can, you break out profilers and start looking at what your code is doing that could be causing any more performance bottlenecks. Reply

I think one of the points he is making is that this is really not a methodology or instruction set change that is used for multithreading, rather a HW based profiler which is only useful during product development.

Profiling is not new, so while AMD is proposing some unique intructions to get a realtime peak of the architectural state (profiling), it does not directly speed up multithreading by some new or novel algorithm. Basically, what AMD is proposing is pretty much already used at a software level to an extent. Reply

Yeah, while profiling is necessary especially for multithreaded apps for optimizing and finding overhead, stalls, IPC issues, synchronized contention, etc., I'm far more interested in the core issues of leveraging cores for better parallel execution. Rather than a better topical ointment we need to address the core cause. Initially I had thought AMD was doing more than just profiling with the extra architecture, but that was due to my silly skimming.

I also mentioned TBB because I think it also may be worthy of an article someday, it is an intriguing route to the core problems developers are facing in multithreading in the future. I'm curious how well it works, from engineers feedback, something Anandtech would have access to. Making a thread and throwing it to another core today is all well and good in removing thread contention for primary threads and giving other threads more CPU budget, but lower level multithreading and parallel advantages are currently limited for most types of apps due to inherent limitations. That's the real core issue in multithreading, improving scaling and fully leveraging several cores.

I'm interested to see what AMD's methods may bring in improved performance and maybe ways to gather a few more tidbits of data, much still depends on the software element and how it will present the data and how robust it will be during development time. Who knows, maybe they are putting the horse out first then bringing in the cart next, it would be the right order of things. Having a HW profiler in place first would make it easier to produce and test, a HW based multithreading optimizing approach, which would be stellar someday down the road. Reply

Maybe LWP can lessen overhead w/ HW and even be more precise, as you say. Although their is a good bit you can retrieve in software, so it will be interesting to see what it is I am missing, maybe some deeper level CPU cache usage metrics maybe beneficial. Time will tell. Reply

Dude, do you even know what software profiling *is* ? I have no idea what TBB *is*, but I can tell you with 99.9% certainty is is not even remotely related(except perhaps that it is a set of instructions).

Direct quote from wikipedia:

quote:A profiler is a performance analysis tool that measures the behavior of a program as it runs, particularly the frequency and duration of function calls. The output is a stream of recorded events (a trace) or a statistical summary of the events observed (a profile). Profilers use a wide variety of techniques to collect data, including hardware interrupts, code instrumentation, operating system hooks, and performance counters. The usage of profilers is called out in the performance engineering process.

Thank you for the definition. I just finished saying I have used Intels profilers in profiling applications, so I may know.

As to TBB, it's a runtime tool for making threaded applications more efficient and easier to produce, not a profiler. I initially thought AMD's approach was going to be more than just profiling, but that's what I get for skimming articles. Reply