This is a computer translation of the original content. It is provided for general information only and should not be relied upon as complete or accurate.

Sorry, we can't translate this content right now, please try again later.

IO Issues: Remote Socket Accesses

This recipe applies the General Exploration analysis of the Intel® VTune™ Amplifier to analyze a DPDK-based application for potential misconfiguration problems on a multi-socket system. The recipe can be used for any I/O bound workload.

Note

The optimization technique used in this recipe relies on the Intel® Data Direct I/O Technology (Intel® DDIO), which is a feature of the Intel® Xeon® processor E5 family and Intel® Xeon® processor E7 v2 family. Intel DDIO makes an I/O device talk directly to the processor cache without accessing the main memory. This feature is enabled by default and is invisible for the software.

Local socket: I/O device is attached directly to the socket where the I/O is consumed/produced.

Remote socket: I/O device and a core consuming/producing data belong to different sockets. I/O data has to traverse the Intel QuickPath Interconnect (Intel QPI) to reach the consuming core

The figures below illustrate an I/O flow in the local and remote socket topologies:

Local socket

Remote socket

The DPDK rigidly pins the polling process to the specific core. Thus, it is wise to pin only cores and ports belonging to the same socket to reduce latency and maximize bandwidth by utilizing Intel DDIO feature. Although, the complex system containing a big number of sockets, cores, and Ethernet devices may be easily configured non-optimally in terms of Intel DDIO usage.

This recipe demonstrates a remote socket access detection with the Intel® VTune™ Amplifier.

Analyze Remote Cache Usage

By default, the collected result opens in the General Exploration viewpoint. Start with the Summary window and focus on the Remote Cache metric, which is a basic indicator to determine a potential misconfiguration. This metric shows a percentage of clockticks utilized while getting the data from the remote cache.

In the perfect case (local socket), the Remote Cache metric is equal to zero:

Non-zero Remote Cache metric typically signals that a core was accessing the remote LLC. For the remote socket configuration, the Remote Cache metric value is 100% and VTune Amplifier flags it as a performance issue.

For further analysis, switch to the Memory Usage viewpoint and explore the Remote Cache Access Count metric that shows how many LLC misses were serviced by the remote cache. A high value of this metric indicates that a core and an I/O device were running on different sockets.

Compare metric values for the remote socket configuration:

And for the local socket configuration:

Identify Cores Accessing Remote Cache

To find out which cores accessed the remote cache, switch to the Bottom-up window in the Memory Usage viewpoint and choose a Core grouping level for the grid:

Note that the Remote Cache column is collapsed by default. Click the ">>" control on the right side of column name to expand child columns. The metric hierarchy in columns is the same as the metric hierarchy in the Summary window and in this case it starts with the Memory Bound group.