Or at least not entirely useless

Many moons ago I decided to get serious about client health. First, because I kept finding endpoints that didn’t have the ConfigMgr client. Second, because those that did have the client were failing to install updates. In this post I’m going walk through how I implemented a health script for my organization. I want to state early on here that you may not agree with all my choices and that’s fine. I’m a very special kind of broken that sometimes makes me draw hard lines where others might not. Another important factor to understand is that we manage all of our servers with ConfigMgr which led to some learning opportunities.

Sophie’s Choice

The first problem is of course what health script to use. The primary candidates are Jason Sandy’s Client Startup Script and Anders Rødland’s ConfigMgr Client Health Script. I’ve read through both scripts fairly extensively and both are absolutely excellent. I initially implemented Jason’s script in our lab as a startup script and it made some good headway. Jason’s script is VBS which is a big plus because it will run on basically any OS that calls itself Windows. After a while however there were some remediation steps I wanted to try and automate. About that same time Anders released his script and was iterating at a pretty fast pace. So I dove into the code and decided it was something I could not only use but something I could improve in small ways. When I did, Anders was kind enough to merge those changes.

To reiterate, both scripts are great and using either of them is infinitely better than neither.

Startup, Logon, or Something Else?

When we initially set up Jason’s script we did so using Group Policy to run it at startup according to its namesake. However, we quickly found that to be a bad idea for our servers. If the ConfigMgr client isn’t installed or isn’t working for whatever reason then the device isn’t getting patched. If it’s not getting patched, it’s not getting rebooted. If it’s not getting rebooted, it’s not running the health script that would fix all of that.

We then followed Anders’ guide to configure it as a logon script but ended up with similar results. Taken as a whole, most servers don’t get logged into all that often. Crucially, there’s plenty of forgotten servers that no one can even explain why it exists or what it does anymore. These devices are ripe for going out to lunch and no one but security noticing.

The above experience forced us to realize that we needed to run the script on a specified schedule rather than on startup or logon. At the same time, we don’t want all our servers running the script at the exact same time. My organization’s VMWare guys already hate me enough (they tell me it’s my face) and I wasn’t eager to give them even more reason to do so. Our solution was to use Group Policy to schedule the script to run daily on all devices but with a randomized start time.

The trigger of the scheduled task is daily at 8 AM with an 8 hour random start time.

Note: Make sure the preference action is to Create. If you use Replace it will reset the randomized start time when group policy is evaluated. That may prevent the script from running at all or at the very least much less frequently than intended.

UNC Path? I Don’t Need No Stinking UNC Path

One of the things I wanted to avoid was needing to read or write to any kind of centralized folder. The typical way to run these scripts is to place them on a UNC path and then make sure every device has read access to it. I’m not a huge fan of that configuration. It makes my health check dependent upon reaching a particular path and every device is going to reach out every single time it runs the script. In the case of logging it means giving every device write access to that folder. The majority of our devices are workgroup devices so that meant totally wide open folder permissions. My security team shot that down with extreme prejudice. Beyond my general aversion to this kind of centralization I wanted the script itself to be local on every box to make it as easy as possible to run manually as part of our support process. Sure, you could go to the UNC path but we found having it local easier. We also created some simple batch files to easily run the script silently or verbosely. To get the script locally on every device we again used Group Policy.

The source files point to the policy’s own folder. This is conveniently replicated to all our domain controllers. So if you have remote DCs it’s about as close to distributed as you’re going to get. If you ever need to update the files you can simply modify the policy to disable the ‘Apply once and do not reapply’ setting, wait for the policy to replicate, then re-enable the setting. Below is an image of the files we’re copying to the devices. You’ll notice that I’ve included ccmsetup which is all you need to kick off the client install. The downside is that this should be updated when you update ConfigMgr itself. Note that if you want to distribute updates as well you need to create separate file policies as the wildcard used above will not recurse into subdirectories. In our case we have additional file policies for Win 7/2008 R2 servicing stack and Windows Update Agent updates.

Here’s our configuration file for anyone interested. Notice how I’m using Powershell environment variables in the config file. This was one of the enhancements I made to the script. It will attempt to evaluate these paths so that you do not need to hard code them.

Logging? Who Needs Logging?

When we first implemented Anders’ script it supported writing its results to log files on a centralized UNC share or to a database. In both cases this meant giving write access to every device in your organization. As I mentioned above, my security team had a good laugh at this and when they wiped the tears from their eyes they politely told me ‘no’. Only they used much more colorful language when doing so. Beyond this, the logical place to put such a database would be on the SQL server used by ConfigMgr. However, if you’re using the SQL Standard license that is included with ConfigMgr then doing so is a pretty clear violation of that license based on the recently released FAQ on that topic. I have seen some people put their faith in wishful thinking on this front and decide that it’s ‘ok’ because it helps ConfigMgr. I strongly disagree; if you need convincing go read the feedback on the article for some pretty explicit clarification.

For the reasons above I have not centralized logging. That being said, the most recent version Anders’ script solves these problems. First, he has created a custom web service that can be used to interact with the database. Instead of every single device needing write access you now only need a service account that the webservice runs in that has the appropriate DB permissions. Further, Anders has clarified that SQL Express is perfectly fine for this use case and it obviously doesn’t need a lot of resources. I plan on investigating this in the near future and highly recommend you do likewise. Being able to use the reports would be a huge bonus.

Stay On Target

Once I had all of this figured out we had to decide how to target this. It seems simple enough but I found myself having to defend my position a lot. In fact, I had to argue with Microsoft itself. My position being: this should apply to every device in the domain because it doubles as our client push. We regularly find endpoints that have not been created using our supported OSD methods and therefore lack the ConfigMgr client. Trying to target a specific OU or security group wasn’t going to help that problem. The whole point is that someone did something stupid and we aren’t going to know about it until security flags the box as vulnerable. As always, your mileage may vary. We only have a single class of devices that are exempt from being managed by ConfigMgr so it was easy to exclude them with a WMI filter on the policy.

The End

So there you have it, that’s how we got semi-serious about improving client heath. It absolutely worked and in the first few days we saw a significant number of clients appear in the console. I’m absolutely positive not everyone will agree with my implementation method and that’s just fine. I offer this as merely one option among many and it’s what worked for my particular organization with it’s particular quirks based on a multitude of variables. With any luck, someone will find this approach useful for their organization.

It runs as system which I think means the other setting isn’t enabled. If not … I don’t remember and am no longer at that org to check. CCMSETUP is there to install the client if it’s not there or the script determines it needs to be re-installed. We used this technique as our way of enforcing that the client exists on every device.

Thanks – In the origianl script, the config.xml had a client > share property to point to the client install files. In which case you must have modifed the script to point to the local ccmssetup Im guessing.

You don’t have to, that’s just a choice I made.
Assuming the device is joined to a domain then Local System (or really the Machine Account) can access UNC paths just fine … it’s just a matter of permission. With the correct the permissions on the policy your endpoints can run the script from the same policy folder I referenced above. In fact, that’s how they’re pulling down the files in my scenario.

Great solution. I’ve been eyeballing Anders clienthealth for a while and your method got me moving on it finally. Did have a few differences. I was unable to get the GPP Copy to work using %PROGRAMDATA% variable so called out C:\ProgramData path. Guessing since this is still 2008R2 functional level. I put the files in netlogon with the client install for Jasons script vs in the GPO itself as you have done. Easier to maintain IMO. For the Updates I have that sourcing from netlogon via UNC as well. Added Windows 10 1809 folders for once it detects those. Ended up setting up Webservice as well so need to push WMF5.1.

You couldn’t use a CI as the only distribution method because that creates a Catch-22: you need a healthy client to run the CI to make sure the client is healthy. So if you used a CI it would only be in addition to some other method to make sure it gets there on devices missing the client or are broken. Now, what you could do is use a CI to report the results of the health script and thus get some level of reporting.

thanks for this. i’ve thought about this a few times and never looked into getting a script like this to run on the servers. this should help my workstation clients out a ton as well as most users do not logon/off very frequently.

Good question. For those I deliver the health script as an application, I should have outlined that part because it’s a significant portion of our environment as well. That’s not ideal obviously since you need a healthy client to install the health check. So making it part of OSD is important.