Description

In Visual Studio 2010 Beta 1, you were introduced to new analysis and profiling capabilities (Parallel Profiling and Performance Tools) designed to make concurrency understandable and, ultimately, debuggable. Today, with the release of
Visual Studio 2010 Beta 2, we introduce an updated and significantly more capable concurrency visualization and profiling tool which is available with other profiling features in Visual Studio 2010 Premium and Ultimate. What does it do, exactly? How does
it work?What's new?

Here, Architect Hazim Shafi, Dev Lead Sasha Dadiomov and PM Bill Colburn tell us all about the Concurrency Visualizer Profiling Tool, including a demo. So, fire up Beta 2, spin up some threads and visualize concurrency. You should profile an already-existing
application that employs concurrency and, perhaps for the first time, get to see what your concurrent code is
actually doing at run time.

Hi, sorry, but this is the least informative webcast I've seen on "going deep" so far, and I watched quite a few. And it's really surprising since you have 3 developers in the room with the camera and they keep talking marketing...

First off, all this functionality existed for years in Intel Thread Profiler for native applications. Additionally Thread Profiler is displaying transitions (transition from thread1 to thread2 is when thread1 leaves a critical section and thread2 acquires
access to it by acquiring the synchronization primitive). From what I understood VS 2010 shows you that the thread was idle/waiting on a sync primitive, but it does not show you which thread needed to release the mutex, so the current one can advance. Another
feature, or whole analysis engine is a Critical Path analysis. Which, as I understood it, is also missing from VS 2010 profiler.

From the "going deep" host I was expecting questions like:

how large is the sampling, instrumentation and tracing overhead? Are there cases, where it skews up application behavior and how would you fix that?

Why sample on context switches only? Why not use time-based sampling and sample say, every 10 ms? Time quanta for the thread is quite large ~20ms and from what I understood the callstack is only collected on each context switch event. So really the tool
does not tell you what was going on _inside_ the quanta, when the thread was really working. BTW what if a thread is doing some CPU intensive work and there's no over-subscription. Thread Scheduler will keep this thread running for as long as it can without
any context switches and therefore no callstacks.

Is there support for new and cool threading features of VS 2010 - Asynchronous Agents, task-based parallelism with PPL and TPL? And why not?

And the little things like: why do I need to scroll down for active legend, if it's so informative, important and interactive? if you know that everyone will want to look at the graphical timeline after data collection, why that checkbox is not "on" by
default? and so on...

Story about helping codec people was a lot of fun! I might be missing something, but it sounded like codec developers could not just figure out to actually time their "encode" and "decode" functions running on a stream/image loaded in memory, then take the
inverse and guess the only reasonable explanation for the difference between 90 and 24 FPS. But rather decided to substitute thinking with a GUI tool. Cool

Great! Looking forward to hearing more about the Profiler, it really did look like a good starting point. I do realize that half of my questions can be answered with "well, this is the ETW limitation/purpose", but still I was wondering if there's a plan
to implement EBS or TBS to provide answers to some more complex questions that arise during performance analysis. Support for TPL and PPL is something I'd very much like to see implemented.

First, let me say that I will follow up with a more detailed walkthrough of the product asap, so keep an eye out for it. Let me address some of your questions:

1. Profiling Overhead: This is of course application dependent and also platform dependent. For CPU-intensive phases of your application the overhead is negligible. Because tracing involves I/O, it can interfere with your I/O intensive applications. On
some platforms, collecting callstacks is more expensive (e.g., x64 vs. x86). That's due to the calling conventions that are being used and the information necessary to walk the stack. I don't know of any profiling tool with zero impact, but this has thus
far not been a source of feedback from customers. Do you have data to the contrary?

2. We actually sample on both context switches and at regular time intervals (1ms). So, we provide you with data about why threads blocked and where as well as data about what threads are doing when they're executing. You can get at the sample profile
data by clicking on the "Execution" legend entry or by clicking on the execution segments in the time line. When you do the latter, we show you the sample callstack and give you a visual hint to where that sample was taken. Does this help?

3. We do have some support for PLINQ, TPL, and PPL. We show markers for PLINQ queries and some PPL and TPL parallel constructs that allow users to identify the region when they are executing so you can focus your tuning on them. Try it out! For PPL, you
have to opt into this feature by calling the Concurrency::EnableTracing()/DisableTracing() methods.

4. I had a hard time parsing your comment about the active legend and the checkbox. Can you elaborate more? If this is a usability related question, I am very well aware of some of the warts in the product. We spent a huge amount of time improving the
tool from that perspective. Look at our CTP, Beta 1, and Beta 2 bits and you'll agree that we've come a long way. We've also made significant investments in usability studies and worked with designers. We've learned a lot during this process and hope that
we can avoid some of the pitfalls in future releases.

Finally, Intel's Thread Profiler is very different from our tool in both methodology and diagnostic information provided in addition to our tool being fully integrated with the development experience. I don't want to get into a competitive analysis here,
so choose what you find useful for your needs.

Hi, thanks so much for all the info and especially the link to the paper on MSDN, this is exactly the level of details I was looking for. Had a couple of follow up questions after I read it.

About transitions (lines that connect the blocking segment with an execution segment on another thread) in the paper you say "When this visualization is visible, it illustrates ...". I was wondering why would it not be visible? Did you refer to the cases
of uncontended critical section or is there more to it?

Regarding PPL support. In case of nested parallel_for-s or in case of two master threads start two parallel_for algorithms in parallel would it be possible to recognize on the timeline which thread is executing which parallel_for exactly? Or would I only
see markers from one parallel_for, or inner-most parallel_for-s?

Sorry, my comment about a checkbox was on usability and it refers to the bottom checkbox on Page 1 of 3 of the Performance Wizard (Figure 8 in your paper). In the demo you showed it was not "on" by default and I was wondering why would one run the Concurrency
analysis from the VS 2010 GUI if not to see the visualized timeline.

Anyway, the Profiler really looks like a great Tool, I'm definitely giving it a try!

"Thread transitions" are depicted via the
Thread Ready Connector. This is only shown when an unblocking event occured on another thread in same process, which is why it isn't always visible.

For PPL support, the Concurrency Visualizer does not depict nesting, nor does it depict which threads were involved in a parallel for loop. However, markers will be shown for all parallel loop iterations.

Regarding the check boxes in the performance wizard, the reason the lower check box isn't on by default is because there is another profiling tool related to concurrency. In addition to the Concurrency Visualizer data, the Visual Studio profiler presents
contention data, which can be viewed after checking the first box.