TL;DR – be wary of buying a Lenovo laptop or any other laptop that uses a Synaptics touch pad until Synaptics ships a fixed driver. Their driver has a memory leak and they have a battery-life bug that causes Windows to repeat the same system-scan once a second. So far Synaptics has failed to respond so the wait for a fix must be assumed to be infinite.

Update: October 15, 2017: an updated driver for my laptop was released September 11, 2017, six days after I published this post. However I have yet to see the driver on Windows Update, Lenovo’s System Update, or in any communications from Lenovo or Synaptics. So, I don’t know if or how ‘normal’ customers are supposed to get this fix. I only found out through the kindness of a stranger.

Now for more details to back up my claims…

My new laptop has great battery life and I want to keep it that way so if I notice something consuming CPU time for no good reason I investigate. If it’s a web page with battery-wasting ads I close it, and if it’s a process that keeps doing background work I’ll kill it.

So, when I noticed that Task Manager said that WmiPrvSE.exe was consuming ~0.7% of my CPU time (~5.6% of a core) continuously I investigated.

I shut down all visible processes and a few invisible ones but the CPU usage continued, so I grabbed an ETW trace. It showed WmiPrvSE.exe was doing about 200 ms of work, then sleeping for one second, and then doing the work again. This means that WmiPrvSE.exe was actually using more CPU time than Task Manager had indicated – a full 16% of a core (200 ms every 1200 ms). The image below shows five seconds of activity on my system which is blissfully idle except for WmiPrvSE.exe:

The next thing I did was look at the CPU Usage (Sampled) data to see where the time was going. A flame graph gave me the best visualization. I know practically nothing about WMI but it seems to me that if Process::LoadCheapPropertiesNT is consuming most of your CPU time then maybe ‘cheap’ is the wrong adjective to use. And if SetDateTime is a major cost then it might be that you are repeatedly loading libraries and strings and, I dunno, maybe this code should be ten times less costly:

Poking around at the trace a bit more I also noticed that tzrez.dll is opened 676 times whenever WmiPrvSE.exe springs to life and tzrez.dll.mui is opened 338 times, which would explain the LoadLibrary and LoadString costs which make SetDateTime so expensive. It looks like there are some easy optimization opportunities here for Microsoft. But, why is this code springing to life every 1.2 seconds?

As somebody with no WMI expertise I started sharing information on twitter and it worked brilliantly. Koby Kahane gave me several crucial hints including his interpretation of the flame graph above:

Who is doing this query?

Normally I use context switch data (CPU Usage (Precise)) to to track interactions between processes, as one process readies the next, but this time that just told me that WmiPrvSE.exe was being readied (woken up) by svchost.exe (3228) running Windows Management Instrumentation. That merely let’s me restate the question as “who is asking the Windows Management Instrumentation svchost to request this query?” The next suggestion from twitter was this:

Unfortunately that didn’t help. The type of query that had been set up was a fire-and-forget query where the WMI service automatically repeats it indefinitely. This seems like a dangerous option to give developers, especially since there are better alternatives:

unsecapp.exe readies four different processes: itself, svchost.exe (3228), System, and SynTPEnh.exe. One of this must be the process that created the polling WMI query, and the only one it can be is SynTPEnh.exe, signed by Synaptics Incorporated.

I killed the SynTPEnh.exe process but it came back to life like a digital zombie. So I killed its friends (SynTPEnhService.exe and SynTPHelper.exe) and then killed it and this time it stayed dead, and WmiPrvSE.exe finally stopped its work!

SynTPEnh.exe is the Synaptics TouchPad 64-bit Enhancements, and this is not the first time I’d noticed it misbehaving. Just one month earlier I had complained that it was using 662 MiB of private working set, and when investigating its new bad behavior it was consuming 942 MiB:

The search for an easier way

The whole investigation process for this CPU consumption seemed unreasonably messy. It bothers me that a process can create a polling query like this without having to leave a clearer calling card. It was pointed out that you can open up Event Viewer, go to Applications and Services Logs, Microsoft, Windows, WMI-Activity, Operational, then look for likely queries around the time that you rebooted. Here’s what I found:

The ClientProcessID identifies which process made the query. I’m not sure I could have found the right query if I didn’t already know what process ID I was looking for. If this is the easy way of tracking down who made these battery-depleting queries then maybe it needs to be made easier.

How long, has this been going on

I’d had my laptop for almost three months when I noticed this so naturally I wondered why I hadn’t noticed it before. I’m enough of a pack rat that I had a trace from when I first got the laptop (from investigating resume-from-standby hangs caused by a buggy Intel driver, sigh…) and I could see that this bad behavior was not showing up in that trace. I then looked at the version number of SynTPEnh.exe in that trace (ETW traces contain a ton of useful information) and it was 19.3.4.75. The badly behaved version was 19.3.4.83, so clearly a driver upgrade had happened and that caused this problem. I then found the driver install entry that showed exactly when the bad driver arrived – apparently it took me more than six weeks to notice:

What about that memory leak?

When I’d first noticed that SynTPEnh.exe was leaking memory I couldn’t reproduce it, but this investigation gave me a new idea: maybe the new-process notifications were related to the memory leak. I happened to have a process creation stress test lying around from investigating a Windows 10 performance problem so I ran it. To nobody’s surprise running it caused the SynTPEnh.exe private working set to increase by about 20 MiB – about 20 KiB per process (technically ~20.5 KiB per process because base-2 is so awesome).

Since the memory leak is proportional to how many processes are created I can safely assume that most users will not see as much memory leaked as I have. You really have to build Chrome or run ProcessCreateTests.exe if you want to leak a huge amount of memory. On the other hand, most consumers don’t have 32 GiB laptops and can’t afford to leak large amounts of memory. Here are some more leak statistics:

In other words, the memory leak was exacerbated by being a Chrome developer and by investigating the Windows 10 performance problem. It’s safe to say that I was hit worse than anyone. But it still needs fixing.

Next steps

I also contacted Lenovo since they are the maker of my laptop. They responded but I’m not sure how seriously they are taking the issue and I don’t feel like experimenting with driver versions that “might” fix the issue.

Microsoft suggested that they could use shimming to fix the issue. And they encouraged me to file a feedback report, which I did. I’m not thrilled with shimming as a solution, but unresponsive developers is the perfect use-case for it.

I’m not the first person to notice this. The bug was reported to Lenovo on May 30th and May 31st. Various suggestions were made but the issue can’t be considered fixed until an automatic update goes out to all customers.

For my own sanity I used the Roll Back Driver functionality to go back to a better version:

So, my laptop battery life is now back to normal, until the next update comes along. But a few million unwary customers are still wasting power and memory until this issue is fixed.

Share this:

Like this:

LikeLoading...

Related

About brucedawson

I'm a programmer, working for Google, focusing on optimization and reliability. Nothing's more fun than making code run 10x faster. Unless it's eliminating large numbers of bugs.
I also unicycle. And play (ice) hockey. And sled hockey. And juggle. And worry about whether this blog should have been called randomutf-8.

What would you suggest, event wise, be added to make tracking down the initiators of WMI notification queries easier? I guess adding the PID of the notification query initiator to the event for the “polling query” done every interval would go a long way.

I’d like to see a WMI event issued every time that the query happens. That would make finding the culprit in an ETW trace or in event viewer much simpler. Only emitting these events when the query is initiated seems perverse, at least in this context.

Have you tried searching e.g. “synaptics engineer touchpad” on LinkedIn? This is what I usually end up doing when I’m unable to reach the relevant people through normal support channels. I messaged someone (Perry) on your behalf, I’ll let you know if they respond!

Reply from Perry: “I got a report back from the driver team that the issue was fixed a while ago in version 19.3.4.99 of the driver. I’m not sure if it is currently available on Windows Update or not, but users should be able to get an updated driver by contacting Lenovo.”

The issue isn’t resolved until the fixed driver is on Windows Update for everyone. Contacting Lenovo/Dell/whomever is a non-starter because the vast majority of users have no way to realize that they have a problem that requires contacting Lenovo.

Thanks for this blog post- I knew I wasn’t going crazy when my laptop suddenly started having awful battery life. If I had a dollar for every time I ran powercfg /batteryreport to try and figure out what the heck was going on, I’d be rich.

It’s still not fixed for my Lenovo P51, but the fixes for other systems are an encouraging sign (as long as they go out through an auto-update system). It is also frustrating that Lenovo couldn’t tell me the state of this bug when I asked, and Synaptics never even responded.

The badly behaved version was 19.3.4.83, so clearly a driver upgrade had happened and that caused this problem.

This is exactly what I was afraid of when Microsoft announced their plans to have Windows 10 update drivers automatically. Microsoft Update has a long history of delivering (obviously) bad drivers; I once had Windows 8 try to update my Bluetooth driver with one that didn’t even have “Atheros” spelled correctly.

My HP Envy 17 laptop (3 years old now…) has a Synaptics trackpad and the same rubbish 64-bit extension thing. I too found over the years that occasionally it’d go bad for a while and then not. I just got used to killing it in Task Manager, and if it was particularly bad, finding the scheduled task that started it on logon and disabling that for a while.
Rubbish 😦

My new Thinkpad has a Synaptics trackpad but it’s a precision one so it doesn’t seem to need the same “helpers”.

From what I could tell, the “helper” thing was enabling two finger scroll, pinch-zoom, etc. That seems like something pretty essential to the driver. I can only imagine it wanted to know about all processes so it could hook sending them windows messages, etc. Surely there’s a more efficient way though 😦

Bah. Lenovo have no idea about mouse or keyboard drivers. My Yoga, according to them, needs no f-keys (their function keys are repeatedly enabled, even if you select f-keys), no escape key at all (no way to have f-keys and escape key at the same time, without a scancode hack), more bloatware than a Samsung phone, and more annoying popups from their monitoring systray aops than a pub’s poker machine giving away free spins…