Author: mark

I recently ran into a server on a site that had seemingly stopped servicing any DHCP requests. So after a little digging and checking, the System event log showed the time the DHCP service first started throwing errors, the error here being EventID 1059 “The DHCP service failed to see a directory server for authorisation.” So DHCP seemed to be having trouble talking to a domain controller, which seemed odd, as it was also a domain controller itself. A quick check of dcdiag returned the following output;

It appeared that the server had stopped servicing LDAP requests too, as well as the word “capability” being spelt incorrectly in the error returned.

About the same time the DHCP service started reporting problems contacting a domain controller, the Group Policy client also started reporting it was unable to find a DC, which would be expected as it was trying to contact the server itself and again it was failing. Checking the Directory Service event log showed that it was complaining about replication, but not much else. Checking the replication, with repadmin /replsummary again threw an error with communication via LDAP.
Running the same command from another machine seemed to show the RPC server was down on the server, which it wasn’t, the service was up. So I checked the RPC ports with a quick netstat -no and was greeted with tens of thousands of ports all in a TIME_WAIT state. That would explain things then, if there’s no RPC ports available various things will start to break. Googling “Ports not closing TIME_WAIT” led me to a hotfix from Microsoft, All the TCP/IP ports that are in a TIME_WAIT status are not closed after 497 days from system startup in Windows Vista, in Windows 7, in Windows Server 2008 and in Windows Server 2008 R2.

And a little further Googling around the problem showed this to not be a problem limited to Microsoft, with various other vendors and products mentioned as affected such as Avaya, Brocade, Cisco, EMC, QLogic and VAX/VMS;

Basically a 32bit counter used to record uptime will cause this problem when it overflows. If you record a tick for every 10 msec of uptime, then a 32-bit counter will overflow after approximately 497.1 days. This is because a 32 bit counter equates to 2^32, which can count 4,294,967,296 ticks. Because a tick is counted every 10 msec, we create 8,640,000 ticks per day (100*60*60*24). So after 497.102696 days, the counter will overflow.

So all that was left was to patch the thing, and while a hotfix is fine, it is fairly old and I did wonder if it had been included in standard Windows updates. Helpfully Microsoft advise that if you have the following security bulletin installed then the hotfix is not needed, suggesting it’s included in the security patch;

Again though, that’s a pretty old patch itself, and I guessed this must have been rolled into a standard patch at some point. So a quick search of the Microsoft Update Catalog for MS12-032 will show that update and all updates that supersede it, so you can then check if that update KB or any superseding update KB numbers are installed on your system.

If they’re installed, you should be fine and covered off against this, if not, keep an eye on your systems, as when they get over 497.1 days of uptime, you may find that some services start to fail, like ADDS and other dependant services.

Below is a basic script I wrote for PowerShell to get uptime of domain controllers in this example to see if any were approaching the time frame for needing this to be done, so they could then be checked off in WSUS for the right patches. WSUS is your friend here when it comes to rolling out fixes like this

Obviously the lesson here is keep your servers updated, but as we all know there are times when that’s not possible. In which case at least this should help you find and fix the ones that might be affected by this.

So, after running into this problem, I was initially sceptical of what the cause may be. I’d see talk around that Macs didn’t like their home folders to be part of an Active Directory domain that ends in the pseudo TLD of “.local”, but I never quite believed that this would be the cause.

Basically, symptoms would be that the machine will fail to log in using the domain credentials, and will just say something generic sounding like “Unable to login to the account, an error occurred”. After lots of testing and fettling with both the Mac and the domain settings (This was a new domain being provisioned for a specific event, and I wouldn’t suggest you just generally tinker with your domain controller configurations), it was found that the account could be logged in if the home drive was disabled in AD. In my case the home drive path was a location within a DFS namespace, but even a direct share on a file server gave the same results.

So, I spun up a new domain on a separate server (oh the joys of virtualisation) and this time gave the domain a .net TLD and the home drive specified in the same way within A DFS namespace. Surprisingly the account logged in here first time after the Mac had been rebound to the new domain. Some further fiddling was required with the domain controllers to make sure that they were responding to all requests with FQDN responses as opposed to NetBIOS ones. The details on how to do this via PowerShell or a direct registry hack are linked. After these changes have been made a reboot of the server will be needed, but then they should respond with FQDN addresses for both DFS referrals and targets.

At this point, the whole thing should work, and as usual, I hope this saves someone some time in figuring this out.

I noticed today that when in the VMWare Update Manager in admin view, some of the custom VIBs I had in were showing as “Not Connected”. This was my custom location for HP VIBs of http://vibsdepot.hp.com/index.xml as I use the HP image on the hosts in this vCenter. When I forced VUM to check the URL again, it was coming back again as “Not Connected”. So I thought I would try loading the XML file in a browser, which presented me with this lovely little “notification”;

I say “notification” as what they’ve done is use a redirect to point you to a different URL, which then contains the message that you must use a different URL now.
The new HP VIB URL is https://vibsdepot.hpe.com/index.xml and note the https rather than http.

Adding the new updated URL to the XML file get’s us right back into a connected state;

This has obviously been done following the HP and HPE split that was announced a few years ago, but which is obviously just starting to have consequences for things like this.

I just thought I’d post about this,as it’s something I’ve come up against recently, how to disable deduplication on a volume on Server 2012, 2012 R2 or 2016 and inflate the data back to it’s original form. In this example, the volume in question is E:

So let’s start with step one;DO NOT DISABLE DEDUPLICATION ON THE VOLUME
If you disable dedup on the volume first, you simply stop new data being processed, rather than rehydrating your already deduplicated data.

So with that in mind the, step two would be to run the following command in PowerShell;Start-DedupJob -Type Unoptimization -Volume E: -Full

When that job has completed, which you can check with the Get-DedupJob
command, you’ll then find that deduplication has been disabled on the disk. Since there’s still the garbage collection job to run, we need to rather counter-intuitively turn dedup back on for the volume with the following command Enable-DedupVolume -Volume E:

Once this is done, the next step is to run the following command to start your garbage collection on the volume;Start-DedupJob -Type GarbageCollection -Volume E: -Full

Finally, after that, the final step is to turn off dedup on the volume with the following command;Disable-DedupVolume -Volume E:

And that should save you any unnecessary drama.

NoteWhen all this is done, the volume will still show in some places like server manager sat at 0% deduplication rate, which is fine, as we’ve turned it off. I would guess this is just a bug, but it seems once a volume has been touched by the deduplication processes, it never goes back to a blank value for dedup rate.

Just a note, if you want to skip the narrative, the fix is at the bottom of the post, but if, like my GCSE maths teacher, Mrs Williams, you want me to show my working out, keep reading.

I’d been having some trouble installing .Net 3.5 on a Windows 8.1 machine for a while, seeing the same error no matter how I attempted the install. Turning the feature on through Windows features just threw a generic error which was of little help. Trying the same action directly on the command line via dism.exe, gave some detail in the dism.log file.

So, I ran dism.exe /online /enable-feature /featurename:NetFX3 on the command line and then checked the result in the dsim.log file, located at C:\Windows\Logs\DISM\dism.log

The two interesting lines in this are shown below;DISM DISM Package Manager: PID=4564 TID=796 Failed while processing command enable-feature. - CPackageManagerCLIHandler::ExecuteCmdLine(hr:0x800f0922)DISM DISM Package Manager: PID=4564 TID=796 Further logs for online package and feature related operations can be found at %WINDIR%\logs\CBS\cbs.log - CPackageManagerCLIHandler::ExecuteCmdLine

The CBS log file revealed a little further info in the following line;

The pearl of wisdom from these was to run lodctr /r from the command line, and then re-run the install.

Success, it worked, and I hope this at least proves helpful for someone else, as late at night and trying to fix this for someone who had a deadline looming to get some machines setup, it was a real problem to figure out.

Just to clarify something that people should be aware of, the Group Policy Preferences processing order. Within each CSE the settings are applied starting at number one and working down from there. I know it sounds obvious, but the documentation generally say “starting with the highest”, which I think leaves room for confusion as “the highest” could mean it finishes with one, especially when you look in the context of Group Policy and that the last setting applied wins.

While I was trying to update an iLO from version 2.00 to 2.27 on a HP DL360 G6, I was seeing the firmware update fail in Windows, the error back was that the hardware installed in the server was incorrect. Installing the firmware through the iLO web page itself just failed, appearing to timeout when attempting to upload the image.

The fix for this, oddly enough was to use Firefox. When doing the same update to the iLO in Firefox, the image does get uploaded and the firmware update then completed without any problems.

If you’ve ever been in a situation where you have a service falling over with no obvious cause, it might be some other service running under the same svchost process causing the failure. As it turns out the Microsoft Performance Team have a very handy guide on svchost troubleshooting.

This covers how to isolate the suspected service into it’s own process, even going as far as running it with it’s own svchost process, so it’s easier to see if it really is the service you suspect causing the problem, or something else. In my case I was trying to pin down a crash with the lanmanserver service, and this was very useful.

This policy setting allows you to specify the period of inactivity before Windows transitions to sleep automatically when a user is not present at the computer.

If you enable this policy setting, you must provide a value, in seconds, indicating how much idle time should elapse before Windows automatically transitions to sleep when left unattended. If you specify 0 seconds, Windows does not automatically transition to sleep.

If you disable or do not configure this policy setting, users control this setting.

If the user has configured a slide show to run on the lock screen when the machine is locked, this can prevent the sleep transition from occurring. The “Prevent enabling lock screen slide show” policy setting can be used to disable the slide show feature.

What I want to know is how on earth the system determines when it’s unattended. What if you’re watching a full screen video, is that unattended? What if you’re just running an Excel calculation, is that unattended?

I can find very little information, none in fact, on the Internet on how this is determined, but if anyone knows, please share.

I ran into a little problem today where I needed to add multiple DNS servers as name servers to multiple DNS zones all in one go. So this is essentially adding NS resource records to a zone, but doing it for multiple zones all at once. Yes I could have done them manually, but that’s boring and time consuming. So, here’s a quick one-liner that does the trick, obviously substitute in your DNS server and name server FQDNs in the correct places. If it fails for any reason it will continue on, but report the zone it failed on.