Intel® AVX2, RTM, BMI1 and BMI2 instructions being introduced on the 4th Generation Intel® Core™ processor family formerly known as Haswell microarchitecture.

The ADOX/ADCX instructions being introduced on the 5th Generation Intel® Core™ processor family formerly known as Broadwell microarchitecture.

Support for Hardware Lock Elision introduced on the 4th Generation Intel® Core™ processor family formerly known as Haswell microarchitecture.

Support for Restricted Transactional Memory introduced on the 4th Generation Intel® Core™ processor family formerly known as Haswell microarchitecture.

Support for the Intel® Secure Hash Algorithm (Intel® SHA) extensions present on the Intel® Atom™ processor formerly known as Goldmont microarchtiecture.

Related useful materials:

More information about the Intel® SHA extension is available here and includes a sample test application.

Intel is releasing this Intel® SDE so that developers can gain familiarity with our upcoming instruction set extensions. Intel® SDE can help ensure software is ready to take advantage of the opportunities created by these new instructions in our processors. We hope that developers will explore the new instructions using the currently available compilers and assemblers.

Intel SDE is built upon the Pin dynamic binary instrumentation system and the XED encoder decoder. Pin controls the execution of an application. Pin examines each static instruction in the application approximately once, as it builds traces for execution. During this process, which is called instrumentation, for each instruction encountered Pin asks Intel® SDE if this instruction should be emulated or not. If the instruction is to be emulated, then Intel® SDE tells Pin to skip over that instruction and instead branch to the appropriate emulation routine. It also tells Pin how to invoke that emulation function, what arguments to pass, etc.

Intel SDE queries CPUID to figure out what features to emulate. It also modifies the output of CPUID so that compiled applications that check for the emulated features are told that those features exist.

Intel® SDE comes with several useful emulator-enabled Pin tools and the XED disassembler:

The basic emulator

The mix histogramming tool: This Pin tool can compute histograms by any of: dynamic instructions executed, instruction length, instruction category, and ISA extension grouping. This tool can also display the top N most frequently executed basic blocks and disassemble them.

The debugtrace ASCII tracing tool: This versatile tool is useful for observing the dynamic behavior of your code. It prints the instructions executed, and also the registers written, memory read and written, etc.

The footprint tool: This simple tool counts how many unique 64 byte chunks of data were referenced during the execution of the program.

The XED command line tool which can disassemble PECOFF or ELF binary executables.

Installation

Download and unpack the appropriate kit for your platform. Set your PATH variable to point to that directory. You can also refer to the tools in the kit using full or relative paths. Do not rearrange the files or subdirectories in the unpacked kit. If you want to move the kit directory, move everything.

Windows*: If you are using Winzip, it puts the proper permissions on the unpacked files. However, if you are using Cygwin's tar command to unpack the Windows* kit, you must execute a "chmod -R +x path-to-kit" on the unpacked kit (where "path-to-kit" is the unpacked kit directory name).

Mac*: Intel SDE is using the MACH taskport APIs. By default, when trying to use these APIs, user-authentication is required once per a GUI session. In order to allow PIN/SDE run without this authentication you need to disable it. This is done by configuring the machine to auto-confirm takeover of the process as described below.

Important options are the short and long help messages. To see the short help message:

path-to-kit/sde -help

And to see the very long help message:

path-to-kit/sde -long-help

In the help messages, the command line options are often displayed using underscores between words, but dashes may be used instead of underscores. Often the Intel® SDE help messages and this web page will refer to command line options as "knobs" for historical reasons. The short help message contains some top level analysis tools knobs as well as the list of supported CPUs.

Emulate Everything Mode

Windows*: A file called sde-win.bat is provided in Windows* that runs a cmd.exe window under the control of Intel® SDE. You can make a shortcut to it and place that shortcut on your desktop. Everything run from that window will be run under the control of Intel® SDE, so you may experience a slow down even when you are not emulating anything. All it really does is:

path-to-kit/sde -- cmd.exe

OS X* or Linux*: You can run your favorite shell under the control of Intel® SDE:

path-to-kit/sde -- /bin/tcsh

And everything you run from there will be run under the control of Intel® SDE.

Running the Histogram Tool

To generate the instruction mix histograms by opcode (XED iclass, the default) or instruction form (iform). As of version 4.29, the instruction length and instruction category histograms are always included.

Notes:

The ISA extension histogram is also always computed and printed as star-prefixed rows in the histograms. ISA extensions are things like (BASE, X87, MMX, SSE, SSE2, SSE3, etc.). This is useful to see which instruction set extensions are used in your application.

The dynamic statistics are recorded and emitted several ways: (1) per-thread, (2) per function per thread, and (3) summed for the entire run. Instruction counts by function are also emitted if symbols are found for the application.

The output is written to a sde-mix-out.txt file in the current directory. The output file name can be changed using the -omix option:

path-to-kit/sde -mix -omix foo.out -- user-application [args]

The top 20 basic blocks are always printed in the output with their execution weights.

"-top_blocks N" will allow you to change 20 to N that you specifiy.

Iforms: "Iform" is the XED term for variants of instructions. In a simple world they would be things like reg/reg or reg/mem, but things are more complicated in general. The iform names come from XED. Consider them experimental and subject to change. To see histograms by the more detailed iforms, use the "-iform" command line option.

Mix Accounting

The rows in the mix output histograms come in two flavors. The rows that begin with "*" are meta-categories which sum up the data in different ways. Here are descriptions of some of the meta categories:

*scalar-simd anything with the XED_ATTRIBUTE_SIMD_SCALAR including AVX and SSE operations. The instructions that operate on one vector element and whose iclass name ends with "SS" or "SD" have this attribute.*sse-scalar any SSE instruction with the XED_ATTRIBUTE_SIMD_SCALAR*sse-packed any SSE instruction without the XED_ATTRIBUTE_SIMD_SCALAR*avx-scalar Any AVX instruction with the attribute XED_ATTRIBUTE_SIMD_SCALAR*avx128 Any AVX instruction with a 128b vector length but without the XED_ATTRIBUTE_SIMD_SCALAR*avx256 Any AVX instruction with a 256b vector length*avx512 Any AVX instruction with a 512b vector length.*mem-atomic Atomic memory operations*stack-read Stack reads*stack-write Stack writes*iprel-read IP-relative memory reads*iprel-write IP-relative memory writes*mem-read-1 Memory read, 1 byte*mem-read-2 Memory read, 2 bytes*mem-read-4 Memory read, 4 bytes*mem-read-8 Memory read, 8 bytes*mem-write-1 Memory write, 1 byte*mem-write-2 Memory write, 2 bytes*mem-write-4 Memory write, 4 bytes*mem-write-8 Memory write, 8 bytes*isa-ext-BASE The "BASE" ISA-extension (generic group of instructions. Base includes much of the older instructions*isa-ext-LONGMODE The set of instructions added with Intel64. These may be 32b or 64b instructions*isa-set-I186 ISA "set" is a categorization of instructions in the BASE ISA-extension. I186 includes instructions introduced on the 80186 processor.*isa-set-I386 ISA "set" is a categorization of instructions in the BASE ISA-extension. I386 includes instructions introduced on the 80386 processor.*isa-set-I486REAL ISA "set" is a categorization of instructions in the BASE ISA-extension. I486REAL includes instructions introduced on the 80486 processor and valid in REAL mode.*isa-set-I86 ISA "set" is a categorization of instructions in the BASE ISA-extension. I86 includes instructions introduced on the 8086 processor.*isa-set-LONGMODE ISA "set" is a categorization of instructions in the LONGMODE ISA-extension. LONGMODE includes instructions introduced with Intel64 mode.*isa-set-PENTIUMREAL ISA "set" is a categorization of instructions in the BASE ISA-extension. PENTIUMREAL includes instructions introduced with Pentium and valid in REAL mode.*isa-set-PPRO ISA "set" is a categorization of instructions in the BASE ISA-extension. PPRO includes instructions introduced with the PentiumPro.*lock_prefix Instructions with a 0xF0 LOCK prefix*rep_prefix Instructions with a 0xF3 REP prefix*repne_prefix Instructions with a 0xF2 REPNE prefix*osz_prefix Instructions with a 0x66 prefix*rex_prefix Instructions with a REX prefix (includes the following 4 cases). REX prefixes can be sued without any of the following 4 bits set as well.*rexw_prefix Instructions with a REX prefix with the REX.W bit set*rexr_prefix Instructions with a REX prefix with the REX.R bit set*rexx_prefix Instructions with a REX prefix with the REX.X bit set*rexb_prefix Instructions with a REX prefix with the REX.B bit set*one-memops Instructions with one memory operation*two-memops Instructions with two memory operations*disp_only Instructions with a memory operation that addresses memory without using a base register or index register -- just a displacement.*base_index Instructions with a memory operation that addresses meory using a base and index register, but without a displacement.*base_index_disp Instructions with a memory operation that addresses memory using a base, index and displacement.*scale_1 Number of instructions with a scale=1 for the index register*scale_2 Number of instructions with a scale=2 for the index registern*scale_4 Number of instructions with a scale=4 for the index register*scale_8 Number of instructions with a scale=8 for the index register*memdisp8 Memory operations with 8-bit displacements*memdisp32 Memory operations with 32-bit displacements

Checking For Bad Pointers and Data Misalignment

Two of the more common errors when bringing up new code are (a) dereferencing bad pointers, either null pointers or pointing to inaccessible memory and (b) misaligned data accesses. Intel® SDE has features to help identify these situations in programs.

The options for the pointer checker are:

-null_check [default 0]
Check memops for null references.
-null_check_out [default sde-null-check.out.txt]
Output file name for -null-check.
-ptr_breakpoint [default 0]
Make the ptr checker raise application break point on errors.
-ptr_check [default 0]
Wild pointer checker. Checks memops for accessibility.
-ptr_check_out [default sde-ptr-check.out.txt]
Output file name for -ptr-check.
-ptr_check_warn [default 0]
Make the ptr checker warn on errors. Default is do die on errors.
-ptr_raise [default 0]
Make the ptr checker raise exception on errors. Default is to do
PIN_SafeCopy on so that errors are ignored in analysis routines.

The alignment checker can give profiles of data alignment throught the program as well as when and where data accesses are misaligned.

Running the ASCII Tracing Tool

path-to-kit/sde -debugtrace -- user-application [args]

The output is written to a sde-debugtrace-out.txt file in the current directory by default. There are many options. Run 'sde -debugtrace -thelp' Pin tool option to see the choices. It prints the registers and flags modified by each instruction. It also prints the memory values read/written.

Using the Chip-check Feature

Starting with version 2.94, Intel® SDE includes a filtering mechanism to restrict executed instructions to a particular microprocessor. This is intended to be a helpful diagnostic tool for use when deploying new software. In the output of "sde -thelp" there is a section describing the controls for this feature:

To list all the chips that Intel® SDE knows about, you can use "sde -chip-check-list". To limit instructions to the processor codenamed Westmere, use "sde -chip-check WESTMERE -- yourapp". By default, Intel® SDE emits warnings to a file called sde-chip-check.out and also to stderr (if the application has not closed stderr). This behavior can be customized using the above knobs.

Using Intel® Transactional Synchronization Extensions (Intel® TSX)

Intel® TSX has two primary components: Restricted Transactional Memory (RTM) and Hardware Lock Elision (HLE). Both technologies are supported in Intel® SDE as of version 6.1.

The RTM and HLE options are as follows. You can see this in the long help output emitted when running "sde -long-help".

For RTM, there are 4 different modes for rtm, disabled, abort, full and nop. Disabled is the default because emulating RTM in software is very slow. The abort modes always aborts upon executing xbegin. Full is the RTM enabled mode. And NOP treats the RTM instructions like NOPs. Please see this blog posting for more information on how to use RTM.

Intel SDE provides statistics about the use of RTM and HLE during the execution of your program:

Debugging Emulated Code

Intel SDE provides support for debugging application with emulated code. A description on using the system debugger is available here.

Intel AVX and Intel SSE Transition Checking

It is recommended that a VZEROALL or a VZEROUPPER be inserted between code that uses Intel SSE and code that uses 256b Intel AVX instructions. Intel SDE can check for Intel SSE instructions followed by Intel AVX instructions without an intervening zeroing instruction, and vice versa.

Use the "-ast" Pin tool knob

Use the "-oast filename" to specify a filename other than "avx-sse-transition.out" . When using the "sde" driver -oast implies -ast.

All the other Intel SDE knobs like -mix are available also in the replay mode.

Using Intel SDE For Emulating Control-flow Enforcement Technology

Intel control-flow enforcement technology (CET) was described in a technology preview in the ISA extensions page.

Intel SDE now provides a way to emulate the user space aspects of this technology and the readiness of the software compiled with CET stack checks or CET indirect branch checks. Intel® SDE supports running the application on existing hosts (Linux and Windows) and provides ways to reduce false reports due to running with the system legacy runtime libraries (which were not compiled with CET).

System Configuration

Linux

On Ubuntu system the yama feature disables processes from using ptrace attach to the parent process. Intel SDE is using this feature to inject itself into the process. To disable yama on the system run the following as root:

# echo 0 > /proc/sys/kernel/yama/ptrace_scope

This change takes effect until the next reboot. To make this change permanent add it to the init scripts of the system.

Mac

Intel SDE is using taskport API to inject itself to the application process (whether in attach mode or in launch mode). This results with a popup window to confirm that it is allowed to take control of another process. This happens only at the first time that Intel® SDE is used on a GUI session. However, when running on non-GUI sessions, (e.g. SSH session) the popup will never show up and it will fail immediately. To cope with this issue, a one time configuration is needed to be performed on the machine so the OS will not try to show this popup and will auto-confirm the takeover of the process.

You need to perform the following procedure:

System Integration Protection

System Integrity Protection is a security technology in macOS* El Capitan (10.11) and later, which restricts the root user account and limits the actions that the root user can perform on protected parts of the Mac operating system.
System Integrity Protection includes protection for these parts of the system:

In order to run SDE on applications which are protected by the system-integrity, you must disable it.

For disabling/enabling SIP, please follow the instructions in the following link (End of the article): How to modify SIP

Please read the following article to learn more about what is system integrity policy and its impact: Apple* SIP

Code Sample: AES-128 Encryption and Decryption Routines

This sample code provides a set of C routines that demonstrate encryption and decryption routines using AES-128 in ECB mode. Read the End User License Agreement before you download the code samples.

Intel SDE tries to accurately set the MXCSR exception flags, but unmasked floating point exceptions are not supported.

What happens when my program dereferences inaccessible memory for emulated instructions?

Intel SDE will crash. You can use "sde -ptr-check -- your app" to get a more verbose error message. Alternatively, you can use "sde -trace-execute -- your app" to get dump of the instructions executed in your task to find out what instruction was last executed. Then you can use "sde -debugtrace -- your app" to look for the last write to the registers involved in the effective address computation for that last executed instruction.

There is a known problem of using Intel® SDE on Linux* systems that prevents the use of ptrace attach via the sysctl /proc/sys/kernel/yama/ptrace_scope. In this case Pin is not able to use its default (parent) injection mode. To resolve this, execute the following echo command as root. (SDE does not need to run as root.)$ echo 0 > /proc/sys/kernel/yama/ptrace_scope

Primary Technology Contact

Ady Tal: Ady is a senior software engineer in Intel Software and Services Group. Ady joined Intel in 1996. Ady works on emulation of new instructions in support of the compiler, architecture and the enabling teams.

46 comments

The segmentation fault that you are getting is not related to the SIGUSR1 signal (that was shown in GDB).We already got reports that we have issues with running pin instrumentation and therefore Intel SDE on VMs.We are working to resolve these issues and plan to update the download area with new kits.

I am getting a segmentation fault with the three recent version of SDE running on Debian Jessie in VMWare. The segmentation fault happens in SDE, and is not related to any specific application I am running.

It used to run just fine before. I think the issue might be related to some system update.

This is what I get from gdb:

Program received signal SIGUSR1, User defined signal 1.0x00007ffff6e20c1b in OS_BARESYSCALL_DoCallAsmIntel64Linux () from /home/sde-external-8.5.0-2017-06-08-lin/intel64/pin_lib/injector/libc-dynamic.so

I checked it and indeed SDE modifies the flags when emulating PEXT instruction.This is not a problem when the host CPU supports the instruction (in this case the instruction is executed on the host CPU and not emulated).

Apologies if you have seen this already. If you are attending PLDI this month, please consider attending PinPlay tutorial there.

In this tutorial, we will talk about, among other things, generating pinballs on Windows/MacOS with Intel SDE and replaying them on Linux. This will open up the possibility of analyzing Windows/MacOs application snippets on Linux. Also, we will discuss simulation region selection (PinPoints) replay-debugging ( DrDebug) which now work with Intel SDE (Linux only).