Data Driven Threading Design

Intel® Advisor XE is a threading prototyping tool for C, C++, C# and Fortran software architects. Quickly model and compare the performance scaling of different threading designs without the cost and disruption of implementation. Find and eliminate data sharing issues during design when they are less expensive to fix. Model the performance impact of any added synchronization and project the scaling on systems with larger core counts.

Intel Advisor XE's suitability analysis gives you a performance estimate before you invest significant effort in implementation. Implement only the options that have a high return on investment.

Profiling identifies where an application will benefit most from parallelism

Compare the effort and benefit of alternatives before you invest in implementation

Quotes:

"Intel® Advisor XE has been extremely helpful in identifying the best pieces of code for parallelization. We can save several days of manual work by targeting the right loops. At the same time, we can use Advisor to find potential thread safety issues to help avoid problems later on."Carlos Boneti, HPC software engineer, Schlumberger

“Intel® Advisor XE has allowed us to quickly prototype ideas for parallelism, saving developer time and effort, and has already been used to highlight subtle parallel correctness issues in complex multi-file, multi-function algorithms which may eventually have caused lengthy debugging activities later in our code development. We plan to continue to use the tool as we extend our research into more complex algorithms and begin to take further advantage of the thread parallelism and performance of the Intel Xeon and Xeon Phi product family.”Simon Hammond, Senior Member of Technical Staff, Scalable Computer Architectures, Sandia National Laboratories

“It's powerful to have the performance and correctness analysis combined together in Intel Advisor XE. Experimenting with different parallelization techniques is much faster and more convenient.”Arthur Yuldashev, Senior Lecturer, Ufa State Aviation Technical University

Workflow For Effective Design

Intel Advisor XE's workflow panel guides you through the steps to successfully add effective threading to your application. Implementation is delayed until step 5. This lets you use the active code base and continue to release product updates during the design phase (steps 1-4).

1. Survey – Search for Parallel Sites

Start by measuring your app to see where it will benefit from parallelism.

2. Annotate Your Source

3. Check Performance - Compare Alternative Designs

Check the performance and scalability of the design you sketched out. Is it as fast as you thought? How will it scale on systems with more processors? Intel Advisor XE projects scalability and helps you evaluate alternative threading designs. Understand potential load imbalance, lock contention and the impact of runtime overhead.

4. Check Correctness

Do you have any data sharing problems that can lead to deadlocks or races? Find and fix them now before you implement the parallelism.

Intel Advisor XE gives you a list of errors and shows you a snippet of the code at all the related code locations, with click-through navigation to the actual source location.

5. Implement - Choose A Parallel Programming Model

Intel Advisor XE gives you a choice of open and industry standard parallel programming models. OpenMP is compatible with legacy code.

Intel® Threading Building Blocks (Intel® TBB) has a rich set of abstractions. Intel Cilk Plus is simple. Microsoft TPL* is designed for C#.

Parallel Programming Models

C, C++

Intel® TBB

Intel® Cilk™ Plus

OpenMP

C#

Microsoft TPL*

Fortran

OpenMP

Simplify

Ideally Intel Advisor XE would automatically add threading to your application and make it run faster. Unfortunately, the technology for doing this well is beyond the current state of the art. Instead, Intel Advisor XE is designed for a software architect or developer. It automates the analyses required and gives the developer the information needed to productively add effective threading to an existing application.

Explore Alternatives Before Implementation

Looping through this design process lets you quickly explore alternatives and make tradeoffs. Intel Advisor XE helps you make better design decisions by accurately projecting performance and identifying synchronization errors. You get faster parallel performance while avoiding costly design errors.

Parallelize Without Adding Schedule Risk

The delayed implementation workflow means that the threading design, performance projection and error analysis can proceed in parallel with normal release development. Annotations describe your design to Intel Advisor XE letting it give you better performance and correctness information, but make no changes to your compiled code. You can continue to release product updates during the design of your parallelism. All your test cases continue to work. Verify that your application is stable and correct before you implement parallelism.

Cut and Paste Templates

Once the design is final, Intel Advisor XE provides cut and paste templates to implement threading using a one of several parallel programming models.

Productive Parallel Programming Models

The Intel® Parallel Studio XE products include a selection of productive parallel models such as Intel® TBB and Intel® Cilk™ Plus. Implementing parallelism at a higher level yields scalable and reliable parallelism with fewer lines of code. Task-based algorithms, containers and synchronization primitives simplify parallel application development. The task scheduler for Intel® TBB and Intel® Cilk™ Plus dynamically maps tasks to threads to load balance, preserve cache locality and increase performance. This lets you develop faster and deliver more performance with less code to maintain.

Technical Specifications

What’s New

Iteration Space Modeling & Info Zone
Move the sliders to see what happens when you change the number and duration of tasks. It shows a high-level break-down of parallelism performance losses: imbalance, contention and parallel runtime overheads.

More Tech Articles

Documentation & Tutorials:

Supplemental Documentation

You can reply to any of the forum topics below by clicking on the title. Please do not include private information such as your email address or product serial number in your posts. If you need to share private information with an Intel employee, they can start a private thread for you.

I am running advixe-cl for the first time. If I just type
advixe-cl ... where ... are the arguments
the program hangs and nothing comes out. If I type
sudo advixe-cl ...
It works, but only when I put in the sudo password.
If I type su and then the superuser password it also works.
Now if I type advixe-gui as a normal user only a small screen title Intel Threading Assistant opens
, but nothing else happens. It completely open up if I type sudo advixe-gui and the the sudo password or
if I type su and the superuer password.
Now what is going on here?
I am attaching a snapshot of the files after a ls -al command. They all
all root root. So must i be root to run advixe-cl,-gui?
I installed eval version as root; it was recommended.
Any help appreciated. Thanks in advance.
Newport_j

I am running advixe-cl for the first time. If I just type
advixe-cl ... where ... are the arguments
the program hangs and nothing comes out. If I type
sudo advixe-cl ...
It works, but only when I put in the sudo password.
If I type su and then the superuser password it also works.
Now if I type advixe-gui as a normal user only a small screen title Intel Threading Assistant opens
, but nothing else happens. It completely open up if I type sudo advixe-gui and the the sudo password or
if I type su and the superuer password.
Now what is going on here?
I am attaching a snapshot of the files after a ls -al command. They all
all root root. So must i be root to run advixe-cl,-gui?
I installed eval version as root; it was recommended.
Any help appreciated. Thanks in advance.
Newport_j

When I type echo $PATH on my 64 bit Centos 6.3 system, I get
/home/james/advisor_xe_2013/bin64:...
which means that I am using the Parallel Advisor 64 bit binaries.
This happend when I typed:
source advixe-vars.sh
It is undesrtandable because my version of linux is 64 bit Centos 6.3.
Sometimes I may want to analyze a 32 bit executable so do i have to change the PATH to
32 bit binaries then? Can I still use what is shown in the PATH as the 64 bit binaries?
Any help appreciated. tahnks in advance.
Newport_j

i just installed an eval copy oof intel Parallel Advisor and I have just one question
Does it also work with gnu gcc compiler or just the Intel icc compiler?
Can it work with other compilers created by other software companies?
Any help appreciated. Thanks in advance.
Newport_j

Intel® Advisor XE Frequently Asked Questions

Choose a topic:

What is Intel® Advisor XE?

Intel Advisor XE is a set of tools that helps software architects and lead designers add effective parallelism to an application in order to improve performance on multi-core systems. It lets you quickly prototype a parallel design, project its scalability and identify synchronization issues.

Does Intel Advisor XE automatically parallelize my code?

No. However, it does help you get to the point where you can parallelize code confidently and effectively.

What are some key things I can learn about my program using Intel Advisor XE workflow?

Intel Advisor XE helps you identify portions of your code which may be good candidates for parallelization. It can also help you determine when it is not a good idea to parallelize a piece of code. A number of our users report this as one of the biggest benefits. A good forecast of where investment in parallelism will pay off lets you focus development on places with the most impact. Of course you could approximate some of this manually using a profiler, but Intel Advisor XE projects scalability across a large number of core counts, models the cost of synchronization and finds synchronization errors. You get a better forecast with less work.

Intel Advisor XE has several unique advantages because of its delayed implementation workflow:

Easier debugging. It works by annotating and modeling the "serial" code. Any change you make with Intel Advisor XE does not affect the correctness of your application. It will not crash due to "bugs" in your parallel implementation. If you use OpenMP directly, it's possible that "bugs" in your implementation may cause your program to behave incorrectly, generate the wrong answers, not converge and maybe even crash. Using a parallel break-point debugger on incorrect parallel programs is not an easy task. Many bugs are "random" in nature, and even when detected usually show you the symptom of the problem not the root cause.

Parallelize without adding schedule risk. Annotations describe your design to Intel Advisor XE letting it give you performance and correctness information, but make no changes to your compiled code. You can continue to release product updates during the design of your parallelism. All your test cases continue to work. Verify that your application is stable and correct before you go to the final step – implementing parallelism.

Performance estimates with less work. Intel Advisor XE minimizes the amount of work you need to do to get the first estimates of potential performance improvements. Due to Amdahl's law, we know that we must focus our parallelism on the parts of the application in which the most CPU time us used. Intel Advisor XE lets you quickly get this answer, whereas with OpenMP, your program may not run correctly, or may not even run at all, so you would need to spend significant time debugging just to get to the answer that you are starting at the wrong place in your application.

Compare optimizations. Intel Advisor XE can tell you how to improve your parallel model, based on sensitivity analysis of the model to 5 different kinds of parallel overhead. Intel Advisor XE by default gives you an evaluation of how your application may run under a default programming model. It also calculates how it would run under 5 different kinds of optimizations. If any of these optimizations give a substantially better answer, you know which aspect of your parallel model to tune, in order to try and achieve the benefits of the optimization.

Workflow guides people new to parallel design. You may be an expert, but others on your team may not. Intel Advisor XE gives you the opportunity to train your entire team on effective parallelization strategies so not only are you more productive (others can use Intel Advisor XE to answer their questions instead of asking you), but your entire team becomes more productive too.

Analyze scalability. Intel Advisor XE annotations allow you to see the performance of you proposed parallel solution for various core counts via the Suitability analysis. If you have the luxury of being in the design phase of adding parallelism, you can make sure your solution will scale to more cores when they are available. If you go straight to parallelism via OpenMP, TBB, etc. you will only see the performance on the hardware available but won’t have an indication of how your parallel implementation will scale.

Can Intel Advisor XE be used on applications for Intel® Many Integrated Core Architecture (Intel® MIC Architecture)?

Yes. Intel Advisor XE currently does not directly support applications compiled for Intel® MIC Architecture execution, but it can be a powerful tool in the process of creating these applications. The first step in creating an application for Intel® MIC Architecture is to identify the potential parallelism, and to validate that this parallelism is expressible on the host platform. This is exactly the problem for which Intel Advisor XE was created.

Can you run Intel Advisor XE on an MPI application?

Yes. Intel Advisor XE currently does not understand MPI parallelism, but can be used to add thread level parallelism to an MPI application.

Intel Advisor XE is designed to analyze serial code to help you understand how to optimize it through parallelization. Each rank of an MPI program is really a serial application. Therefore by using the Intel Advisor XE methodology on this code you can determine if additional parallel opportunity is present which you might be able to exploit. Intel Advisor XE does not analyze your MPI API usage, it basically ignores it. To use Intel Advisor XE on an MPI program you annotate the sequential code between MPI API calls the same as you would any other serial code section. Then you launch the advixe-cl command-line tool via the mpirun command. Each MPI rank on which the advixe-cl command is applied will generate a separate output result. You can manually copy this output result back to your host machine and then use advixe-gui to open and view this result.

Intel Advisor XE is meant as the first tool to help you understand the structure and behavior of your serial application as you work to refactor it into parallel form. This process is composed of multiple steps. Each of these steps is necessary to accomplish in order to have a successful result. Intel Advisor XE helps you with the first of these steps, and then you transition to using Intel VTune Amplifier XE, and Intel Inspector XE for the final steps.

In a practical parallel program you must identify sufficient computation to execute in parallel to achieve you performance goals. If you do not do this, then additional restrictions (such as problems caused by limited memory bandwidth, or limited cache size) are mostly irrelevant. Intel Advisor XE evaluates the theoretical upper bound of available parallelism taking only code partitioning into independent tasks into account. With this partitioning, Intel Advisor XE allows you to understand if this exposes sufficient parallelism to achieve reasonable performance, and helps you understand the sensitivity of your proposed parallel model to scheduling overhead found in parallel frameworks, and helps you understand if you need to fix any data declaration, or synchronization issues which would otherwise cause your application to generate incorrect results when run in parallel.

Once you have a model of a parallel execution as expressed via annotations that is both suitable for possible performance, and for correctness, then you can express this via the parallel framework. With the code refactored to use a parallel framework, you no longer have a serial application, and you are done using Intel Advisor XE. At this point you evaluate your newly written parallel application using Intel VTune Amplifier XE to see how it maps to actual parallel hardware (including memory bandwidth and cache size effects). And you can use Intel Inspector XE to evaluate correctness and synchronization issues that may not have been present in a model which was strictly defined by relaxed-sequential execution.

How does Intel Advisor XE deal with already parallel applications? How do OpenMP/MPI co-exist with Intel Advisor XE annotations?

Intel Advisor XE models serial code. In any application serial or parallel, almost all of the code executed is serial. For example, in an application that has multiple threads, each thread is actually equivalent to a serial program. Intel Advisor XE separates each of these "concurrent" threads of execution into a separate serial analysis.

In practice, it is usually most effective and understandable to disable any explicit parallelization in your application before using Intel Advisor XE. This is usually done by setting the number of threads to be 1, such that any existing parallelized code is mapped to a single thread of execution.

If this can be done, then you really do have a serial program, and then your task in using Intel Advisor XE is in trying to identify a new opportunity which has not already been parallelized. This might be a serial code region between two already parallelized regions. Or it might be hierarchical parallelism, where you are adding another level of parallelism to an already parallelized region. For example, you might have parallelized an inner-loop in a function, but you find that it is also productive to parallelize an outer-loop to increase the resulting scalability of the function.

How does Intel Advisor XE model an already parallel region?

Intel Advisor XE does not model already parallel regions. It ignores the fact the program may already be parallel, and instead treats each sequential thread of execution separately.

Does Intel Advisor XE model a Parallel Framework accurately?

No, but it does use the information to estimate scheduling and synchronization overhead. Intel Advisor XE's job is not to perfectly predict the performance of your final parallel program after you finish the refactoring. Intel Advisor XE's purpose is to help you make decisions to guide you along the path to create an effective parallel program.

In particular Intel Advisor XE is always using heuristics and simplifying assumptions to make what it does practical. For Suitability, it approximates the tasking structure to only focus on the critical (usually large tasks), and it uses statistics to model the smaller tasks. It uses a theoretical "ideal" model of a parallel machine in order to do a quick evaluation of your program to tell you if you are on the right track.

Intel Advisor XE lets you choose a Parallel Framework because:

This lets you document your expectation of what Parallel Framework you will ultimately use.

We use the Parallel Framework to estimate scheduling and synchronization overhead.

Different parallel frameworks typically have different costs due to the way they are implemented. The task dispatch cost in TBB is slightly higher than in Cilk because TBB cannot use special code-generate strategies that are possible with a language based approach such as Cilk. System API's (such as CreateThread, or POSIX routines like mutexes) may be much higher overhead than specially optimized routines in a parallel run-time.

The main purpose of the Suitability view is to help you understand the sensitivity of your chosen parallelism model to various overhead. You see these results in two different ways:

When you use the check-marks (on/off), you either consider or do not consider 5 different types of overhead which apply to your parallel model.

When you switch to a different parallel framework, you choose a new set of overheads to apply to the models.

Quite often, if your parallelism model is not sensitive to overhead, then you will see almost identical results from each of the framework choices.

However, in the cases where it shows a big difference, this is either telling you that you should use a parallel framework with more efficient task-dispatch and synchronization, or you should continue to work to refine your parallelism model so that it is not as sensitive to the overhead of the parallel framework.

Why does Intel Advisor XE use annotations as its modeling language instead of parallel framework syntax?

An advantage of Intel Advisor XE's modeling language is that it is a simplification over what would be required in a parallel framework to express a fully correct parallel application. You don't need to get it all right at the same time, you can use the annotations incrementally, starting with an incomplete, incorrect annotated parallel model, and over time refine it to be more accurate and complete. Intel Advisor XE also does not require extensive refactoring that might be necessary with Threading Building Blocks. Deferring this refactoring until you are confident you have a good solution proposed allows you to explore many different alternative solutions before picking the right one for your application.

Annotations describe your design to Intel Advisor XE, but make no changes to your compiled code. You can continue to release product updates during the design of your parallelism. All your test cases continue to work. Verify that your application is stable and correct before you go to the final step – implementing parallelism.

No. This would be a good research topic in automatic code refactoring. Our focus with Intel Advisor XE is to automate the analysis that will help a developer productively add parallelism. We’d like to do it all “automatically”, but there are still some things people do better.

Will Intel Advisor XE find all threading errors in my program when using correctness analysis?

Though it is likely that Intel Advisor XE can find any memory or threading errors present in the code paths exercised in some applications at the highest levels of analysis, there is no guarantee that Intel Advisor XE will find all threading errors in your program.

Do I have to recompile and rebuild my programs to use Intel Advisor XE?

As part of the Intel Advisor XE workflow, you will need to recompile and rebuild your programs after adding annotations and prior to running Suitability (a release build is recommended) and prior to running Correctness (a debug build is recommended). We recommend use of debug binaries or release binaries with debug information, so that error reports can be attributed to source code locations.

No, current Intel Advisor XE does not support process attach or detach. In order to find multithreaded bugs in your process, Intel Advisor XE has to monitor synchronization events and related annotations from the start of the program.

Of course, you can always stop Intel Advisor XE manually at any time after analysis starts.

Intel® Advisor XE

Getting Started?

Click the Learn tab for guides and links that will quickly get you started.