Slow Startup with Multiple 'Starting' Services After Malware

I had an interesting problem with a server (Windows 2003 Standard) at a small business (6 users total) the other day – a very long startup time. The server in question is a standalone domain controller/DC as well as a database/application server and file/print server. Terminal Services is installed & configured, but rarely used – mostly for access from outside the office. Database and domain services/authentication were available fairly quickly, as were console logins (via UltraVNC/uVNC) – probably 15-20 minutes to that stage, but more than an hour before terminal services/remote desktop was available.

Digging around on the console attempting to track down the source of the problems, I found multiple services listed as “Starting” – all of them malware-based, with the actual infection cleaned out. My suspicion is that these non-startable services were causing the startup of other services to be delayed, though in this case I’m not really planning on setting up a test system to verify that.

In the rest of this post I’ll give a bit more detail on the scenario, what I found, what was needed to clean it out, and a few more notes on what I suspect was happening.

Scenario

We haven’t been working with this client for very long, so I’m not sure when these infections were actually cleaned out; the antivirus software (VIPRE Antivirus from Sunbelt Software) was configured to only keep logs for a few weeks (now corrected) but it was at least a month ago. I’m not even certain whether the problems were cleared by the current antivirus package or by a CD-based virus scan when we first started working with them. I know they had a rash of Conficker in the office, so this may have been the aftermath of that.

VIPRE and other tools all showed the system as clean even with the service entries in place, because the files that were being referenced no longer existed, and I suspect that nobody’s noticed the slow startup times because A) the user-facing services (database, authentication) were up fairly quickly B) the system was generally not being restarted while staff was in the office and C) it’s a server, it’s really not restarted that often. In this case an overnight power outage had exceeded the available battery backup duration and the system was shut down when they arrived in the morning.

Findings

In general, I didn’t find any indications of specific causes for the slowdowns – nothing relevant in the event log, etc. The suspicious services were all running with “svchost -k netsvcs” which is not surprising – it’s the home for multiple services loaded from DLLs, see the TechNet article in the resources list at the end of this post. The list of services run as part of netsvcs is found at HKLM/SOFTWARE/Microsoft/Windows NT/CurrentVersion/SvcHost in the netsvcs value (not in the subkey by the same name); the malware entries in my case were at the end of the list.

The short names for the services were randomly generated, but the descriptive names were reasonable-sounding fakes and the descriptions were pulled from other services. The services were hanging at stage “starting” and until they died other services weren’t starting even though there were no dependencies.

Setting the problem services to Disabled was not possible because access to the registry for them was denied. Similarly, the simple way to get rid of some services is with Sysinternals’ Autoruns tool, but because only the System account had access to those registry keys the version of Autoruns already on the system didn’t show the services (I have not checked whether newer versions will detect this problem).

Resolution

Identifying the specific service entries in the Registry wasn’t hard – I was looking for keys with no descendants (no plus sign next to them) and with randomly-generated names. It’s helpful to have a good feel for how names get shortened and abbreviated – just because a name doesn’t make sense at first doesn’t mean it’s actually random. The keys in question also all lacked values (hidden along with descendants by the lack of security permissions) and had permissions set to allow only System any level of access.

Removing the service entries from the registry manually was simple – it just required changing the security for the affected keys to allow Full Control to an administrative account; in this case the permissions were inherited all the way down. I have encountered situations in the past where security needed to be set on the key, then on the child keys separately. The names were also removed from the netsvcs value found in HKLM/SOFTWARE/Microsoft/Windows NT/CurrentVersion/SvcHost/ (they are in the value, not the key by the same name).

For those reading this, it’s also been pointed out that the speed issues could have been caused by an out-of-sync RAID array rebuilding in the background. I will confess that I didn’t even check that, though I didn’t see any note of RAID issues in the Event Log.
The RAID array in the system is healthy, with a hot spare, and in the configuration that I expected it to be in (the hot spare is the highest-numbered drive, indicating that it has not been utilized).
I’ll be checking to confirm that the RAID controller in that system does in fact log such events.

[…] There’s a bit more information about this out there, much of it related to the Conficker worm and cleaning up after it. This is based on research related to a situation where I had to manually clean out traces of some malware services. […]