Summary

You have completed the Finding Hotspots on the Intel® Xeon Phi™ Coprocessor tutorial. Here are some important things to remember when using the Intel® VTune™ Amplifier to analyze your code for hotspots:

Step

Tutorial Recap

Key Tutorial Take-aways

1. Prepare for analysis

You made the build script to copy the matrix application to the card after each recompilation.

You built the target application with the Intel C++ compiler, ran it on the card via PuTTy*, and recorded a performance baseline.

You created a VTune Amplifier project, specified the script as an application to launch.

Create a VTune Amplifier project and use the Project Properties: Target tab to choose and configure your analysis target.

VTune Amplifier starts target applications from the host. It is not able to start an application directly on Intel Xeon Phi coprocessor architecture cards.

To copy the target native application to the card, you may either add the copy action to the build script or mount the host directory so that the binary is visible on the Intel MIC architecture target.

To run native applications on the Intel Xeon Phi coprocessor card, use ssh tools. See the Choosing a Target on Intel® Xeon Phi™ Coprocessor online help topic for other options.

Use the Analysis Type configuration window to choose, configure, and run the analysis. You may choose between a predefined analysis type like the Hotspots type used in this tutorial, or create a new custom analysis type and add events of your choice. For more details on the custom collection, see the Creating a New Analysis Type topic in the product online help.

See the Details section of an analysis configuration pane to get the list of processor events used for this analysis type.

2. Find hotspots

You launched Hotspots analysis that analyzes CPU time spent in each program unit of your application and identified the following hotspots:

Identified a function that took the most CPU time, the highest event count and CPI Rate. This function is a good candidate for algorithm tuning.

Identified the code section that took the most CPU time to execute.

Start analyzing the performance of your application from the Summary window to explore the event-based performance metrics for the whole application. Mouse over help icons to read metric descriptions. Use the Elapsed time value as your performance baseline.

Move to the Bottom-up window and analyze the performance per function. Focus on the hotspots - functions that consumed the most CPU Time. In the initial sort, they are located at the top of the table. Use the CPU Rate metric to understand the efficiency of your code. If the metric value exceeds a threshold, the VTune Amplifier highlights it in pink as a performance issue. Mouse over a highlighted value to read the issue description and see the threshold formula.

Double-click the hotspot function in the Bottom-up pane to open its source code and identify the code line that took the most CPU Time and accumulated events.

3. Eliminate hotspots

You solved the memory access issue for the sample application by interchanging the loops and sped up the execution time. You also used the Intel compiler to enable instruction vectorization.

Consider using the Intel compiler to vectorize instructions. Explore the compiler documentation for more details.

4. Check your work

You ran Hotspots analysis on the optimized code and compared the results before and after optimization

Perform regular regression testing by comparing analysis results before and after optimization. From GUI, click the Compare Results button on the VTune Amplifier toolbar. From command line, use the amplxe-cl command.

Next step: Prepare your own application(s) for analysis. Then use the VTune Amplifier to find and eliminate hotspots.

Optimization Notice

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.