Event Tracing for Windows (ETW) aka xperf is an amazing tool for investigating the performance of Windows machines – I’ve blogged about it many times and it’s helped me find some amazing issues. But recording ETW traces has always been tricky. Microsoft’s wprui.exe showed some potential, but is ultimately missing some features and often gets tripped up by ETW performance bugs.

The fallback plan is to use carefully crafted batch files to record ETW traces. These offer flexibility but… uggh. Batch files? Really? Are we savages?

I finally got annoyed enough by this situation to create UIforETW. This is a tool that records ETW traces, works around ETW performance bugs, allows configuration of trace recording options, works as a trace management UI, and more.

“Sounds great! How much is it?”

I’m pleased to announce that UIforETW has been released on github. Download it, build it, hack on it, fork it, or contribute.

This post has been updated several times to document the many great new features that have been added since the original release. For more information on UIforETW and ETW/xperf tracing see ETW Central.

What does it look like?

Using UIforETW is pretty straightforward. It has a few buttons to start/stop tracing and record traces. A few check boxes to control tracing options. A couple of combo boxes, a list of traces, and a text box for entering notes about traces. That’s pretty much it.

How do you use it?

The simplest way to get started is to download etwpackage.zip from the UIforETW releases page on github. When you run etwpackage\bin\UIforETW.exe it will install the necessary versions of the Windows Performance Toolkit (and unlike VS 2015 it knows which versions work on Windows 7). The process of installing UIforETW and recording a trace is documented in Xperf Basics: Recording a Trace (the ultimate easy way). It’s designed to be simple enough for non-experts to use, so send your customers to that page.

You can also get the UIforETW code from github and build it if you want. Pull requests are welcome – just read the CONTRIBUTING file in the main directory.

Once you’ve run UIforETW click Start Tracing. By default this will trace to circular buffers in memory, so you can leave this running 24×7, with minimal impact on your computer’s performance. Go off and use your computer and the next time something happens that you want to investigate just type Ctrl+Win+C to save the buffers to disk. Tracing to circular buffers will continue.

You can rename the just-saved trace (F2) to remind you why you recorded it, then double-click (or type Ctrl+E) it to launch Windows Performance Analyzer (WPA). See the trace list context menu for more options. I also like to use the trace information area to record notes about my analysis.

Note that UIforETW has not been tested on 32-bit Windows. It probably works, but who knows? Caveat luddite, and really, caveat anyone using UIforETW – no warranties and all that.

Trace management

I record a lot of ETW traces. The investigative reporting posts that I do almost always require one or more ETW traces, and there are many more that I record while optimizing Chrome, or whatever I happen to be doing for a living. If I’m not careful then my notes on what I found in a particular trace will end up scattered through dozens of e-mails, google docs, and hangouts discussions. Therefore I think that the trace management features of UIforETW are crucial. There are a few components to this:

ETW trace compression is on by default in UIforETW and it makes traces 5-10x smaller. This lets me retain more ETW traces. This only works on Windows 8 and above, but Windows 10 is a free upgrade. This feature is effectively missing from wprui, which is bizarre.

Trace information is an edit field that lets me record notes about what I have found in a trace. It’s a tiny detail, but I find it invaluable. It means that I can easily refresh my memory as to why I recorded a trace and what I found in it. Trace information is stored in a text file with the same base-name as the trace, and is automatically loaded and stored as you navigate through traces. Trace information can also be prepopulated with important details – for Chrome developers I prepopulate it with a list of Chrome process IDs by type. I recommend recording important conclusions in the trace information with call stacks, process and thread IDs, and time ranges. You’ll thank me.

Trace naming is important in order to know when I recorded a trace and why. UIforETW defaults to naming traces based on the date, time, and username (and unlike wprui it names them so that they sort sensibly!) The username is handy when receiving traces from customers. UIforETW supports renaming traces (use the context menu or select a trace and press F2) and it renames the trace information text file at the same time.

Bonus features

Finding the interesting part of an ETW trace is crucial to a successful investigation, and user input is often a key indicator – many bugs start with “it hangs when I click”, or “the frame rate drops when I type ‘w'”. Therefore UIforETW includes an input logger which inserts ETW events into the trace for every mouse-click, mouse movement, or key press. The default mode anonymizes alphanumeric key presses for privacy, but you can switch to full-Monty mode when necessary.

WPA defaults to showing you a blank screen. If you try to display generic events it overwhelms you with irrelevant rubbish, and if you show the CPU Usage (Sampled) table it defaults to hiding the call stacks. UIforETW solves all of these problems by automatically setting up a WPA startup profile. Even the graphs that don’t show initially have new configurations – look for Randomascii in the name. If you don’t like the startup profile you can always change it, but I think it’s a better startup experience than the default. If the startup profile doesn’t copy over automatically, go to the Settings dialog and click the appropriate button. Because WPT 10 changed the startup format UIforETW ships with separate profiles for WPT 8.1 and WPT 10. These are updated occasionally so don’t forget to copy new ones occasionally.

UIforETW also ships with source and binary for a tool that retrieves PDBs and PE files from symbol servers, the latest version of my ETW providers DLL to emit custom events, a sample ETWEventDemo project that shows how to emit these custom events from your software, and various batch files, Python scripts, and other ETW related extras.

UIforETW automatically registers the Multi* provider set found in ETWProviders*.dll, configures the DisablePagingExecutive registry setting for 64-bit stack walking, and generally makes sure everything is going to work.

UIforETW will automatically register the Chrome ETW providers if any chrome tracing categories are selected in the Settings dialog – this requires Chrome 46 or higher and enabling exporting of tracing events to ETW from chrome://flags.

UIforETW can optionally monitor the working sets of specified processes. Just enter a semi-colon separated list of process names and details about the working sets (including PSS and private working set if Expensive working set monitoring is checked) will be recorded to the trace once a second and can be graphed in WPT 10 – on Windows 8 and above a built-in OS provider gives similar information for less CPU cost.

UIforETW will monitor battery status (percent full and drain/charge rate) when applicable, and if Intel Power Gadget is installed will also monitor process/GPU/package power draw, frequency, and temperature – handy for investigating thermal throttling or power efficiency. This can all be graphed in WPT 10.

UIforETW will monitor the actual CPU frequency (in addition to what Windows and Intel Power Gadget say). This allows detection of thermal throttling.

More details

At this point you should be able to use UIforETW. It’s not really complicated, and there are tooltips for all of the buttons. But, if you want to know all of the hidden features and raison d’être then by all means read on. Here is an explanation of most of the buttons in UIforETW:

Compress Trace: If you are on Windows 8 or higher then this check box will be available. If it is checked then the -compress flag will used during the trace merge step. This makes traces 5-10x smaller, but the compressed traces cannot be loaded on Windows 7 and below. If you record traces without compression then you can use the trace list context menu to compress traces later, either individual traces or all traces.

Context switch call stacks: When this is checked (which it normally should be) then two call stacks will be recorded on every context switch – one for the thread being switched in, and one for the thread that ‘readied’ the new thread. This normally has a minimal overhead but can significantly alter performance when profiling code that does hundreds of thousands of context switches per second. These context switch call stacks are vital when doing idle-thread-analysis – see the CPU Usage (Precise) documentation for more information, so only uncheck this if necessary.

CPU sampling call stacks: When this is checked (which it normally should be) then every sampling interrupt will record a call stack on every CPU. This has a fixed and relatively minimal overhead. These sampling call stacks are vital when investigating excess CPU consumption – see the CPU Usage (Sampled) documentation for more information.

Fast sampling: When this is checked the CPU sampling speed will be set to 8 KHz instead of the default value of 1 KHz. This can be important when you need to investigate a single slow frame (giving you ~133 samples per 60 Hz frame instead of ~17) but is not needed for investigating longer periods of time. The higher sampling rate makes traces larger or makes circular-buffer traces cover a shorter period of time.

GPU tracing: When this is checked GPU data will be recorded. GPU usage can be shown in WPA (under the Video category) and deeper analysis can be done in GPUView, which can be launched from UIforETW with Ctrl+G or through the trace-list context menu.

Input tracing: If this is set to Private or Full then all mouse move and mouse-click events will be recorded in the trace and will show up in the Generic Events graph. If set to Full then all key presses will be fully recorded, whereas if set to Private all letters will be recorded as ‘A’ and all numbers will be recorded as ‘0’.

Tracing types:

Circular buffer tracing: In this mode tracing goes to in-memory circular buffers and can run continuously. When Save Trace Buffers is clicked or Ctrl+Win+C is typed then the buffers are saved to disk. The last ~5-500 seconds (depending on activity levels) will be recorded and traces will never be excessively large. This mode is ideal for “retroactive profiling” where you decide to save the trace after inadvertently triggering a performance problem.

Tracing to file: In this mode tracing goes to files on disk. This allows tracing of arbitrarily long periods of time, but it can create arbitrarily huge and unwieldy traces. This mode is ideal when you want to capture all of a particular scenario, but you have to remember to start and stop tracing at the right times.

Heap tracing to file: In this mode all heap allocations for the specified process names are recorded to the trace. Heap tracing only applies to processes that are started after tracing begins. This mode tends to have an enormous data rate and is only suitable for recording short periods of time.

Show commands: UIforETW works by running xperf.exe to stop and start tracing. You can see the exact commands run by checking this box. If you encounter problems with UIforETW then this can be a helpful diagnostic tool. When this is checked the time to run various stages of saving a trace is also displayed.

Settings…: This brings up the settings dialog which allows additional configuration. Currently the options are:

WS monitored processes: Semi-colon separated list of processes whose working set should be recorded once a second.

Expensive working set monitoring: This enables monitoring of PSS and private working set of the WS monitored processes. Note that this can consume 100-200 ms of CPU time, so use with care. This feature requires running the 64-bit version if running 64-bit Windows.

Chrome developer: When this is checked the default symbol path will include Chrome’s symbols, and some Chrome specific symbol processing and trace analysis will be done.

Auto view traces: When this is checked the trace viewer will be launched as soon as a trace is recorded.

Stacks on heap tracing: Check this to include stacks in heap tracing. This is normally desirable, but is expensive.

VirtualAlloc stacks always: Recording VirtualAlloc call stacks allows lightweight memory profiling that can often be left on all the time. Recommended.

Chrome tracing categories: These check boxes correspond to some of the commonly used tracing categories in Google’s Chrome web browser. This requires Chrome 46 or higher and –trace-export-events-to-etw on the command-line.

For a quick reminder of what a particular button does just hover the mouse over it and wait for a tool-tip to appear.

Here is an explanation of most of the items in the Traces context menu:

Open trace in 8.1/10 WPA: This launches the selected trace in that version of WPA

Open trace in GPUView: This launches the selected trace in GPUView

Delete trace: This sends the selected trace and the associated trace notes text file to the recycle bin.

Rename trace: This lets you rename the selected trace and the associated trace notes text file from within UIforETW. By default the date/time portion of the trace is excluded from renaming – use Shift+F2 to allow renaming all of the trace name.

ETW compress trace (8.0+): This will run the ETW “-merge -compress” step on the selected trace which will dramatically shrink traces that have not previously been compressed. This requires Windows 8 and above.

Zip compress trace: This will, some day, compress traces into .zip files that UIforETW will transparently view. But this is currently unimplemented.

Compress all traces: This will run the ETW compress trace step on all traces, saving the results if it makes the traces smaller.

Zip compress all traces: This will run the (unimplemented) Zip compress trace step on all traces, saving the results if it makes the traces smaller.

Trace path to clipboard: This copies the full path to a trace to the clipboard. Type Ctrl+Shift+C to copy just the file part.

ETW bug avoidance

It is ironic that I seem to keep finding performance bugs in the ETW trace recording and analysis. As much as possible UIforETW works around these ironic bugs.

In 2012 I found that symbol loading failed for some WPA users, and was excruciatingly slow (150x slower than normal!) for others. When symbol loading failed it was because either dbghelp.dll or symsrv.dll were not in the path. Profiling showed that when it was slow it was because a slow version of dbghelp.dll was in the path. The fix is to copy known-good versions of these DLLs to the xperf directory. The UIforETW settings dialog has a button to do this. I don’t currently have either of these problems so I can’t fully test the fix, so if you hit these problems then try Copy symbol DLLs and let me know if it helps.

In 2014 I found that some symbols (notably the extremely large symbol files for Chrome) took ~25 minutes to load into WPA. Profiling of WPA showed that this time the problem was in msdia120.dll. The fix was to download the Chrome symbols, strip the private data, convert them to WPA’s .symcache files, and only then launch WPA. UIforETW does all this automatically, which saves about 24.5 minutes each time I profile a new version of Chrome. If you don’t have the Chrome symbol server enabled then the StripChromeSymbols.py script does nothing. This problem could occur on any large symbol files. The stripping step can be manually run from the traces-list context menu. This script now also fixes a bug in the 10.0.14393 (Windows 10 Anniversary Edition) that stops it from downloading symbols from Chrome’s symbol server.

In 2015 I hit a problem when recording ETW traces. During the merge step I found that xperf.exe was spending all of its time doing uncached reads from Amcache.hve. This added a six minute delay to recording each trace! To avoid this UIforETW tries to rename this file before running the merge step, and then rename it back at the end.

I’ve also recently hit a bug where when recording a trace from circular memory buffers the xperf.exe process does millions of one MB reads from a relatively small file. These reads are fully cached, but the huge volume of reads still causes multi-minute hangs. I have no workaround for this other than to trace to a file. Let me know if you hit this bug so that we can figure out what triggers it.

WPT 10 fails to record dbgid information on Windows 7 so UIforETW automatically invokes the WPT 8.1 version of xperf on Windows 7.

WPT 8.1 can’t change the CPU sampling frequency on Windows 10 so UIforETW automatically uses the WPT 10 version of xperf for this on Windows 10.

None of these bugs affect everybody, but many people are affected by some of them. If you hit one of these and UIforETW doesn’t work around it then UIforETW may print some helpful suggestions, or you can record a trace of xperf recording a trace using metatrace.bat.

Needed features

Currently the only supported form of trace compression is what is built in to Windows 8 and above. If you are recording or viewing traces on Windows 7 you are out of luck. It would be great if somebody implemented the context-menu options to compress traces into .zip files. The idea would be to compress each trace into a .zip file with the same name and then delete the .etl file. The .zip file will still show up in the trace list, and UIforETW would need to automatically extract the trace on demand

UIforETW is built with MFC and, in order to keep the binary small, it uses the DLL version of MFC. This means that it won’t run on any machines that don’t have these DLLs installed. A DLL installer is needed and I haven’t gotten around to writing it. This was solved by using github releases

Contributions will need code-reviews and approval, and contributors must sign the Google Individual Contributor License Agreement – see the CONTRIBUTING file for details. Apologies for the red tape, and thanks to those who have contributed already (see the AUTHORS file and the history).

In closing

I’m very pleased with this project. It came together quickly and it solves the problems that I care about. I hope that it is useful for others.

As always, if you want to learn more about how to use ETW I recommend my series of training videos.

Advertisements

Share this:

Like this:

LikeLoading...

Related

About brucedawson

I'm a programmer, working for Google, focusing on optimization and reliability. Nothing's more fun than making code run 10x faster. Unless it's eliminating large numbers of bugs.
I also unicycle. And play (ice) hockey. And sled hockey. And juggle. And worry about whether this blog should have been called randomutf-8.

59 Responses to UIforETW – Windows Performance Made Easier

Brilliant stuff, is this reworking of your vtrace work at Valve or just a start again and do it properly?

Every time I use WPR I always check your blog for more details, wondering why my computer and Chrome tabs were taking ages to load recently, turned out to be 2 Mcafee AV .sys files and ReadyBoost on a usb drive, all thanks to you Bruce!

To see the providers make sure that Show commands is checked. As for changing the UI for that has not been written. I would imagine a grid of check-boxes for kernel providers (with some buttons to set various defaults) and a way to enter arbitrary user-mode providers. It really depends on what flexibility is needed, because there’s flexibility also means potential complexity.

In the “always-on” mode, I see quite a bit of perf impact. For example, opening a regular command windows and performing a directory listing (dir) command, the output is visibly laggy.

I’m suspecting this is due to many OS kernel calls (a quick WPA trace inspection indicates calls into ntoskrnl.exe through kernel32.dll). I’m assuming this comes with a context switch (?) and so I found that turning “Context switch call stacks” removed the lag.

I do a lot of command-line work, so this pretty much prevents me from turning on context switch call stacks in the “always-on” circular buffer mode. Is there an inbetween setting that I could use to do some “light” idle-thread-analysis when necessary?

Interesting. While some overhead is expected I have never seen it to be perceptible, and I have left tracing on for thousands of hours.

Conveniently enough, the sampling profiler records *everything* means that it will show you where the overhead is coming from. Typically it will show time spent in call-stack collection in SetEvent or WaitForSingleObject.

Also, since the overhead is usually just proportional to the number of context switches you can use the CPU Usage (Precise) data (sorted by count) to see where the context switches are. Maybe some code is doing something foolish.

When “Context switch call stacks” is checked then “CSWITCH+READYTHREAD” call stacks are recorded. CSWITCH records a stack on the thread being switched in, and READYTHREAD records a stack on the thread that readied that thread. READYTHREAD is needed less frequently, so you could see if removing it helps.

But, unless you have a slow computer there really shouldn’t be lag. Maybe share a pair of traces of “dir /s”, one with and one without context switch callbacks.

Looks like this could be related to ConEmu using the client/server subsystem (csrss.exe) to communicate to cmd.exe? I get a lot of context switch counts there. I couldn’t reproduce the issue anymore in a vanilla cmd shell.

The trace without stacks did not capture the entire scenario. The CPU usage only covered about 19 s. You should shorten the scenario so that it fits — do a “dir /s” on a smaller directory. That way the full scenario can be compared in both traces.

Ideally use a batch file and “xperf -m Starting dir /s” to mark the start of the scenario, and ditto for the end. It would also be good to know the number of files being listed, for better context, and how long it takes without tracing, with light tracing, and with full tracing.

On the no_stacks trace I see about 10,000,000 context switches over the 19 s of activity recorded, so 250,000 per second (to cmd.exe, conhost.exe, and csrss.exe). That is enormously excessive. That is your problem. That is bad behavior. That is also why the trace covered such a short time period — a high data rate.

The context switch rate is orders of magnitude lower in the trace with context switch call stacks. The sampling data didn’t clearly attribute it to the context switch call stacks, but if you try doing 250,000 context switches per second then you have to expect performance issues, and recording two call stacks on each context switch will dramatically harm perf.

Equally worrisome is that your system never goes idle. On the no_stacks trace I see ~30,000 context switches per second to the three processes listed above). I just tested on my machine (a trace recorded while typing this and while WPA was loading symbols) and it shows an order of magnitude fewer context switches. A truly idle process should have zero context switches, or perhaps a few hundred for background activity.

I didn’t do a detailed analysis but my off-the-cuff analysis is that ConEmu64 is doing crazy things and it should stop. If it can’t go completely idle when idle and handle “dir /s” more sanely then I would point this out to the developers.

I realize you are already heavily invested in WPR/WPA and your own UIforETW, but it may still be useful for you to check out the ‘PerfView’ tool (web search will find it), as well as my blog entries on it). Like UIForETW it tries to simplify the collection of ETW (but it is a viewer as well), but its real goal diverges a bit from yours because it cares a lot about profiling .NET code well. Still, it will collect ETW traces that WPA or your tools will consume (it event Zips them as you suggested), and knows how to turn on the heap allocation events as well. Since it deploys as a single XCOPY EXE, it is MORE convenient than WPR if you need to collect on a ‘foreign’ machine (even if you decide to view the data with WPA). It also has some nice features for monitoring the trace and stopping when something ‘interesting happens (see /StopOn* options). You can stop on Exceptions, requests taking too long, ETW durations being too long or simply ETW events. Also it has some rather unique analysis features (grouping and entry point grouping), which can make analysis (especially diffing), quicker and more fruitful. WPA certainly makes prettier graphs, but the grouping features of PerfView are why I keep using it first (it also has a more sane symbol loading policy (lazy unless local)). Finally This is all built upon a library (TraceEvent (.NET)) that is available via a Nuget package so you an build your own high performance analyzers with relative ease.

I may be worth your while to take it for a whirl. There are a bunch of tutorial videos, and you can be experimenting with it in literally a minute from now.

Bruce,
This is an excellent tool.
Question: the scenario we are currently facing is a bit more complicated. We are trying to pinpoint the cause behind sudden CPU spikes that pretty much bring the machine to a halt. Something gets stuck in a loop and starts consuming all the CPU cycles, which causes black screens that can only be “fixed” by a reboot. We were thinking of running a trace to record the call stacks (we have a few processes in mind one of which is likely the culprit), but the problem is the trace data would get corrupted / remain unflushed if the server is forced to shut down abnormally during the reboot. So the only way to achieve this, per my knowledge, is by running the trace remotely. Is it something that is possible to do with WPR?

That sounds nasty. A ‘normal’ CPU spike should never render the machine unusable, so either there is some horrible lock-contention with priority-inversion on system locks, or else some sort of weird driver bug.

A couple of options come to mind:
1) Customize UIforETW so that it can detect this odd behavior and can flush the trace to disk when it detects it, hopefully before you reboot the machine.
2) Customize UIforETW so that it is always tracing to disk, with a new trace being started every 60 seconds or so. You could always get it to only retain the last 20 traces. That way when the problem happens you should just have to wait five minutes and then be sure that a few traces have been saved.

But, if the machine is in a bad enough state then none of this will work because the trace won’t get saved.

Hooking up a kernel debugger from another machine may be your best bet. Not fun, but effective.

Another option for analyzing ETW traces is the Media eXperience Analyzer, or MXA. This tool apparently allows you to have multiple streams of data (perhaps video, audio, or power measurements) synchronized with the ETW data stream. I have little experience with this tool – and it is very different from WPA – but it could be important in some scenarios.

If I had a Win2008R2 sp1 64 bit vps server that was used for derivatives trading, could I use this UIforETW to help see where potential performance issues are coming from? How would i enable compression? I have WinRar 4.x or 5.x installed

Do you know what the main differences are between WPR, Xperf, Logman, etc? They all seem to be “Controllers” for ETW. Is any one more powerful than the others? Is WPR just a wrapper for Xperf and Xbootmgr (for boot traces)?

I don’t know much about logman, but I think you are correct that these (and UIforETW) are all just controllers for ETW. ETW is baked into the OS, and these tools turn on the features of interest and direct the data to a file for later examination.

xperf is probably the most powerful, but it is possible that some capabilities are only available to other tools.

I don’t know where the “count of stutters” data is coming from. I see that you’ve uploaded two ETW files but without knowing more about them – i.e.; how they were recorded, what was happening when you recorded, what driver versions you were testing, etc., it is difficult to do any analysis.

One thing to look for would be CPU usage inside the NVIDIA drivers, in the process being profiled. There will usually be a couple of graphics driver threads, and some driver overhead in other threads.

UIforETW would need a few tweaks to get it to build and run on ARM64, but it shouldn’t be anything too difficult. The assembly language files need to not be compiled, and the code that calls them needs appropriate #ifdefs. And I have probably made some x86 specific assumptions about paths. But, it should be pretty straightforward.

Every time I try to save a trace I get “xperf: error: C:\Users\user\Documents\etwtraces\2017-01-28_14-27-18_user.etl: The system cannot find the file specified. (0x2).
Process exit code was 80070002 (2147942402)”

What type of tracing are you using? Circular buffer tracing or tracing to file? If you turn on ‘show commands’ and include the full output then that will be helpful because right now I don’t know what command is failing.

A setting to “Use other kernel logger” is available if you download and build UIforETW – it hasn’t yet made it into one of the releases. This option might help. But, I don’t know. ETW failures can be quite cryptic.

Circular – here’s the full output “xperf.exe -flush “NT Kernel Logger” -f “C:\Users\user\AppData\Local\Temp\kernel.etl” -flush UIforETWSession -f “C:\Users\user\AppData\Local\Temp\user.etl”
The trace you have just captured “C:\Users\user\AppData\Local\Temp\kernel.etl” may contain personally identifiable information, including but not necessarily limited to paths to files accessed, paths to registry accessed and process names. Exact information depends on the events that were logged. Please be aware of this when sharing out this trace with other people.
The trace you have just captured “C:\Users\user\AppData\Local\Temp\user.etl” may contain personally identifiable information, including but not necessarily limited to paths to files accessed, paths to registry accessed and process names. Exact information depends on the events that were logged. Please be aware of this when sharing out this trace with other people.
Trace save took 2.2 s
Merging trace…
xperf.exe -merge “C:\Users\user\AppData\Local\Temp\kernel.etl” “C:\Users\user\AppData\Local\Temp\user.etl” “C:\Users\user\Documents\etwtraces\2017-01-28_21-26-12_user.etl” -compress
xperf: error: C:\Users\user\Documents\etwtraces\2017-01-28_21-26-12_user.etl: The system cannot find the file specified. (0x2).
Process exit code was 80070002 (2147942402)
Trace merge took 0.2 s
Finished recording trace.”

The crucial information from the verbose output is that the error occurred during the merge process. The “xperf.exe -flush” command saves the contents of the kernel and user-mode buffers to disk, to user.etl and kernel.etl. Then “xperf.exe -merge” merges those two trace files into the final result. It also grabs a bunch of machine information as well – it is a very important step. But, it should also be a very reliable step – it’s just combining to ETW files that were previously saved.

The error message suggests that the *output* file cannot be found. That’s odd. I don’t know what that means.

I’d try running procmon while recording the trace in order to see what is happening. Also make sure that the ‘etwtraces’ directory exists.

You should also upgrade to the latest version of UIforETW. The ‘kernel.etl’ and ‘user.etl’ file names are dangerously generic and can conflict with other ETW tracing loggers. The latest versions (since Nov 20, 2016) use UIForETWuser.etl and UIForETWkernel.etl.

I’m afraid I can’t tell what is going on. However, I have a few suggestions:

One idea would be to switch from Circular buffer tracing to Tracing to file. Maybe that will work better.

Another suggestion would be to run the command invocations that are displayed when “Show commands” is checked in an elevated command prompt. Make sure that UIforETW is not running and that xperf is in the path, and manually run the commands, one at a time. This should make it easier for you to diagnose the problem.

I’ve watching your Wintellect videos and see you have code for an ETWProvider demo. Where can I download the code? We’re beginners trying to implement ETW for our code and would like to create custom events. Since it’s been a couple of years since you’ve created the videos, is this still the best starting point or do you have updated instructions/demos for creating custom ETW providers? Thanks