Making the future of technology exciting!

Category: VMware

If like me, you find yourself working for a client using a profile/user environment management tool such as Liquidwares ProfileUnity, you may come up against the problem whereby user defined keyboard layouts repeatedly refuse to persist between sessions when using Windows 7 or Windows 2008.

To put it into context, a user logs on and decides to change their keyboard layout from English (UK) to French (FR) or German (DE) and happily works for the duration of their session. The next day, having previously logged off their non-persistent desktop/RDSH, they logon again only to find that their Keyboard layout has reverted to English (UK). This goes on for a few days before the users get irate and find themselves experiencing Groundhog Day.

In my case, the client had implemented ProfileUnity having migrated away from VMware Persona (which is a slightly glorified version of MS roaming profiles) where the problem didn’t previously exist and logons reliably retained keyboard layout preferences. It’s a hard sell to tell a user when you introduce a new product that it has a load of new features but that it falls short on core features that were previously taken for granted.

Having spent quite a few hours hunting around for solutions and also testing numerous configurations unsuccessfully, I logged a ticket with support who were really helpful but between us, we came to the conclusion that it was a limitation based upon the way the OS behaves and handles keyboard layouts when associated with profile management tools. I was now at the point of giving up. From a sequencing perspective and understanding what was happening, despite ProfileUnity successfully retaining and restoring the required personalised keyboard layouts, because the restoration of the required registry keys was happening too late (as Windows sets the keyboard prior to the hand off to ProfileUnity) the keyboard layout injected by ProfileUnity just wasn’t being honoured, despite its best intentions to do so.

At that time I only had a few options left:

1) use GPOs configured with registry preferences that force the users keyboard layout to a specific language based upon item level targeting driven by users AD group membership (GPO’s are processed in sufficient time for the OS to honour the values)

2) configure multiple mandatory profiles each pre-configured with the required keyboard layouts that are delivered to users, again based upon ad group membership with item level targeting.

3) implement ProfileUnity Profiledisk. This would keep the entire ntuser.dat file on a VHD that would be mounted on logon. This would certainly overcome the problem, but unfortunately would introduce a number of limitations that Portability overcomes (Portability is the method used by ProfileUnity to selectively save and restore certain aspects of the profile)

As I’d discounted option 3, in the case of 1 and 2, this would mean that a user would lose their ability to retain and selectively choose their keyboard layouts and require administrative input to ensure they are placed in the right AD group. Not ideal by any stretch, but a compromise on at least providing a user with a consistent keyboard layout.

Before throwing in the towel, I finally stumbled across the following command from Microsoft:

control.exe intl.cpl,, /f:”C:\keyboard.xml”

Having read the documentation, when the command is combined with an appropriately configured XML file, it looks to be able to change the keyboard layout dynamically.

Excellent I thought! So I created an XML file (called keyboard.xml) with the following content in an attempt to flip the language to Swedish and ran the command above from Start –> Run:

Immediately, it didn’t appear to do anything. so I clicked Start –> Run again, only to find that in the context of the Run command, I was in Swedish! Great result, but I was only in Swedish for the one specific Window. If I clicked any other Window, it would revert to English (UK).

I suddenly had a brainwave and figured that if I ran the same command during the logon process, using the “Application Launcher” mechanism within ProfileUnity (to let me run this when I want) it could force the correct application of the required keyboard layout before the Windows Shell is invoked.

Share this:

A client of mine recently reported regular occurrences of high CPU usage in their virtual desktops as a consequence of TrustedInstaller appearing. They couldn’t fathom out why it would crop up and it seemed to be at random.

It didn’t take long to track down offending machines as their performance metrics in vCenter would put their CPU usage at the top of the list so as soon as one cropped up, I started remotely querying the machine and trawling through the Application Event Logs.

Each of the VMs were 2vCPU Windows 7 machines and as can be seen in the image above, the process would effectively take an entire vCPU (50% of the total available to the machine in this case)

After doing so for about 5 machines and correlating the CPU Time for the executable back to the Application Event Log time (in the example above, I worked back 49 minutes) they all pointed back to the execution of a Bomgar Support Customer Client that was being installed when the support team were actually helping customers for other issues. The irony was, that once the support engineer had resolved the users issue, they were effectively leaving the user with half of their CPU because the installation of the remote control agent, triggered the TrustedInstaller.exe

I validated my findings by asking to be sent a remote support request and lo and behold the same problem appeared. At least now we had a way to make the problem repeatable. After that, I ran another user permitted executable that triggers the Windows Module Installer service (TrustedInstaller.exe) and precisely the same problem appeared. I now knew it wasn’t limited to the support tool, but a generic problem at the OS layer.

Knowing that TrustedInstaller writes to the following log file C:\Windows\Logs\CBS\CBS.log, I took a look inside to find there were no specific issues reported other than a mention of “Scavenge”

This task however didn’t ever seem to stop and despite leaving a machine for 24 hours, it never got anywhere to gracefully terminate the start of the service in the first instance. At that point I realised something was possibly out of place with respect to the Windows Update process and in my frantic search stumbled across the Windows Update Troubleshooter.

After hitting “Close” on the troubleshooter, I noticed that TrustedInstaller.exe was still running so left the computer for about 10 minutes at which point the process closed. I went back to the CBS.log file and found that a subsequent Scavenge job had run again but this time had completed.

Share this:

I’ve come across this problem under a few scenarios before so thought I’d document the how and why of the problem here.

After configuring a new Desktop Pool which was designed to be a failover Pool for another Site, the first thing that was supposed to be done was to disable it. Unfortunately, when trying to do this basic operation, the following error message was displayed:-

“Error. Not all desktop pools were disabled due to an error”

Having seen this before, it transpires that it comes about because 3D Rendering is enabled on the pool, whilst “Allow users to choose protocol” is still set to Yes. I’m advised by VMware GSS that this is a regression bug (one that was fixed before but has crept back in)

Technically speaking, this isn’t a valid configuration as the documentation states its not allowed, however the setup Wizard allows you to save the configuration that way and the error is elusive as the real root cause. To figure this out, I managed to find the exact error within the logs on the Connection Server

C:\ProgramData\VMware\VDM\Logs\log-xxxx-xx-xx.txt (where xxx is the date)

Share this:

I got a call the other day advising that AppVolumes was “broken” when trying to update an existing AppStack. I quickly hopped onto the environment and noticed the Activity Log showed the error “Update Appstack xxx – Failed 5 times” as can be seen in the image below.

Knowing my way around the log files pretty well, I took at a look at the Server logs and noticed the following error:-

What this came down to was that the AppStack that was trying to be updated was trying to use a filename that already existed on the underlying datastore, based upon a previous updated version. So, to fix this, I changed the name of the Appstack as part of the update process and as soon as the AppStack name was changed, the error message was no longer presented and the updated became available to be provisioned for use.

I’m going to raise a Feature Request with VMware and ask them to include a level of error checking to ensure that duplicate file names cannot be entered and at least produce an error message when trying to commit an updated stack rather than waiting for the process to fail and having to trawl through the Server logs.

VMware have come on leaps and bounds in a bid to try to compete with the long standing success of Citrix in application delivery. However as a product looking to challenge the market, there are still some significant product limitations that affect its ability for uptake in the Enterprise environment.

I was looking to deploy a basic published application so that the local device would automatically launch the remote application in preference to the older version of the same application installed locally. During the deployment process, the FTAs (File Type Associations) were updated in order to force the file to open in the published application. Initially this proved to be successful when the source file (an .xlsx file) originated on the local devices C: drive.

It was only when I subsequently tested the opening of a file that was located on the Desktop or in My Documents that I was presented with the error:-

“There was an unexpected problem opening the file. The file may be missing or have misconfigured permissions. Contact your administrator for more information”

In the first instance, I thought there was something wrong in my configuration and deployment, but upon reviewing the log file on the Server hosting the published application, it turns out that the error translated into something more sinister:-

As you can see from the log excerpt above, there was suggestion the process was trying to open the file using CDR (client drive redirection) by appending \\tsclient to the file, as it would normally do successfully, if of course the file was located on the local client device. The catch here however was that the file didn’t reside on the local device, it was in fact actually sat in a Redirected Folder courtesy of the use of Microsoft standard Folder redirection Group Policy to a DFS hosted network share, referenced by a UNC path.

I tried a few other tests to make sure it wasn’t just a problem with the folder redirection, so opened up a direct UNC path to a file sat on the DFS root \\domain.name\DFS\FileShare\Text.xlsx and the same problem occurred.

After a ticket with GSS, they suggested I map the DFS root with a drive letter and then permit the pass-thru of the drive letter to the published application, which did of course work. However, this isn’t practical and in fact inefficient as the file is being pulled from the Network share to the client and then back from the client to the published application. Realistically no Enterprise environment should be expected to change their drive letter standards to overcome a feature that I believe should be native to the product. Citrix XenApp and Microsoft RemoteApp can handle this with relative ease, so I’m really surprised VMware have dropped the ball on something so simplistic and for so long as application publishing has been in their portfolio since 2014!

The latest news I’ve heard is that this is now in product development with the aim to deliver in Q2 of 2017. I urge that if you are considering VMware Application Publishing and rely on the use of redirected files and folders with UNC paths, I would be extremely mindful that this could be a serious blocker for the uptake of this platform, both on-premises and in the cloud.

Share this:

As the client had sufficient swing space in terms of compute, it permitted the deployment of a parallel implementation, allowing a clean slate to be deployed where required. When performing project work like this, it is always my preference to work this way so as to provide a rollback, sufficient testing time for environmental comparisons and avoid bringing in “customisation’s” that grew with the legacy environment that are no longer applicable in the new.

The underlying Hypervisor environment and swing space hosts were built from scratch and provided with sufficient storage to meet the needs of the deployment. The additional components were built out as follows:

EUC:
AppVolumes 2.11 to meet the requirement for Instant Clones (2.10 doesn’t work with Instant Clones)
Horizon 7.0.1 (latest release at the time of deployment)
Access Points 2.5.2 (latest supported release at the time of writing)

Where supported and required, virtual KEMP Load balancers were used internally in front of the Connection Servers and AppVolumes Managers as well as externally in the DMZ for the Access Points.

I won’t go into the detail of the installation as there are many online tutorials on how this is done but I will elaborate on some of the challenges and compatibility issues faced on the way.

Instant Clones is not compatible with VMware Persona management (Page 16, under restrictions). As the initial aim was to deploy Horizon 7 to ensure support compliance, there was no quicker way than to retain Persona as UEM would take far more planning and effort than timeframes would allow. UEM would follow as a later sub-project so this took Instant Clones off the table immediately.

Scope creep resulting in office activation prompts with certain AppStacks. During the deployment of Horizon 7, Microsoft Lync was introduced which meant that in addition to running Microsoft Office 2010 in the base image, components of Office 2013 were now present too. This resulted in applications delivered by AppVolumes needing to be re-provisioned where there was a KMS overlap as the AppStacks were originally only sealed with the presence of an Office 2010 KMS key. This resulted in the OS trying to re-activate Office applications post logon which caused no end of grief as we muddled through 100+ Appstacks to eliminate those that would present the activation prompt.

Sessions would randomly drop and users were unable to reconnect when using PCoIP. One of the most bizarre problems would be random session disconnects which left users unable to reconnect. If the user switched to using HTML access, they were able to reconnect to their session, save their work and logoff to create a new PCoIP session. Eventually, after running process monitor against a faulting virtual machine, I identified the fault being down to excessive logging created by the TPAutoconnect service which would result in the log files filling the non-persistent disk. This only happened with users who either worked remotely or had printers attached to their endpoint device (in most cases, there were very few as 90% of the user base were on Zero client devices). To date, there hasn’t been a resolution to the growth in logging levels, but as a work around, persistent disks were removed and the OS disk size increased. It has been noted that there is an uplift in I/O and an SR is still active with GSS.

AppVolumes 2.11 – bug with filter driver causing the error “Item not found” and “Could not find this item”. We have scripts that create and delete files to validate certain conditions for virtual desktops but randomly after introducing AppVolumes 2.11, the log files would throw up an error. We could also reproduce this problem by manually creating a file on the C: drive, then deleting it, resulting in the same Windows error. VMware subsquently issued me with AppVolumes 2.11.1 Patch release to allow us to continue with deployment.

Error occurred during vCenter operation. This error occurred when a default linked clone Desktop Pool was configured with 3D rendering (any option) and “Allow users to choose protocol” and the option was changed to “Yes”. Bizarrely, the wizard permitted this configuration but any further actions to the pool resulted in the error. A quick check of the Logs on the Connection server, revealed the error stating that it was an invalid configuration (as per the documentation), but why this couldn’t be translated into a meaningful message from the console, I’m not quite sure.

P25 Zero Clients, the primary monitor does not either correctly detect the resolution (going into 1024×768 rather than 1920×1200), or during login, the screen goes black, returns, hangs on Welcome, stalls back to black and eventually returns to display the desktop. (I’m guessing the sequence of events its part of the sync that takes place as it detects the primary/secondary monitor resolution). This was random and eventually by upgrading the Horizon View Agent 7.0.2, the problem appeared to be resolved.

Remote sessions would disconnect after 10 hours – this came down to a hardcoded and unchangeable value in the VMware access point 2.5.2. This has since been resolved in VMware access point 2.7.2 and has been deployed since as users would work for 12+hours and the hardcoded value was not acceptable.

Share this:

I’ve been off the radar for a few months as I’ve been working with a client to migrate from the now non supported version of VMware View 5.3. The client had an ELA with VMware, had recently made an investment in an uplifted on-premises compute environment and their capability and non functional requirements were met by VMware. It therefore made the most sense from a cost perspective to provide them with most recent version of VMwares EUC platform, deploying desktops as they currently do to permit delivery in the shortest time frame by maintaining the existing desktop image but with the adjustments needed to compliment the upgrade.

Before embarking upon the challenge that lay ahead, there were some immediate hurdles to have to consider:

I thought I’d see if I could break my environment again by starting from vSphere 6.0U1 and attempting to perform a completely hassle free upgrade to VSAN 6.2. My previous two posts have come back with problems so 3rd time lucky right?…

As before, I upgraded the VCSA (which was running as a VM within the environment I’m upgrading) from 6.0U1 to U2 and the hosts and rebooted everything to give me a clean slate. I migrated across a couple of VMs from another environment and did some typical admin tasks by cloning, vmotioning, powering down etc.

I hit the magic “Upgrade” button under VSAN Management and thankfully didn’t get any Task related errors. However, out of nowhere, the following error appeared : Cannot Upgrade the Cluster: Object(s) xxx.xxx.xxx are inaccessible in Virtual SAN.

Wow – third time lucky and a different error! This VSAN upgrade process is starting to become frustrating (umm, how much testing took place before it became GA?).

Anyhow, having seen a similar error, albeit displayed in a different way in this post, I figured I’d see what objects this particular upgrade failure attempt related to. As before, I SSH’d to the VCSA, RVC across to the Ruby Shell (other post has more instructions) and then ran:-

vsan.object_info BRAINS/ fcfbcd56-5731-55b0-42bb-0c4de9cd75c8

This time around, the objects seemed to be complaining about the lack of a Policy Entry within the CMMDS and LSOM Object Not found:-

As the objects were a) not found and b) reported to have a usage of 0.0GB I figured I’d get rid of them! So, SSH’d across onto an ESXi HOST (yes an ESXi host, people have tried this on the VCSA and it just doesn’t work) and ran the following command, replacing the Object ID where necessary. (note this is a homelab, proceed with caution)

The output reported “Successfully deleted” so I re-ran the upgrade and everything completed successfully.

As the battle to perform a seamless upgrade has been taking place, I have been made aware that there is currently a BUG that is in the process of being created into a KB around the deletion of objects that cause orphaned objects that result in problems such as these. As soon as I have more information, I’ll link it to the posts.

As per my previous post, I’ve had a few niggles getting VSAN 6.2 to work in my lab environment so I rebuilt things from scratch to try and see if the second attempt was less troublesome.

I started with VMware 6.0 U1 and built the VCSA, added three hosts on 6.0U1 and installed Horizon View 6.2. During the configuration of Horizon I elected to allow the first Desktop Pool to use the vsandatastore which subsequently creates associated Storage Policies to match the requirements of the various components for View (Replicas, OS Disks etc).

After spinning up a couple of Win 7 Virtual Machines, I embarked upon the same 6.2 upgrade. This time, I didn’t experience any “failed to realign following object” errors when attempting to upgrade to VSAN 6.2 as per this post, but I did receive the error “A general system error occured: Failed to evacuate data for disk uuid 522b9a6e-093b-a6c0-01b8-a963ac325bed with error : Out of resources to complete the operation”

I realised my mistake here in that as this is a 3 node cluster with an FTT set to 1, this wasn’t going to work. I subsequently went into the VSAN Storage Policy and attempted to drop the vSAN default policy FTT to 0 (accepting that my environment would be at risk until I switched it back) and applied the newly defined Storage Policy only to find that everything was suddenly reporting as “Not Applicable”. I checked the Resynchronize Dashboard status within VSAN and noticed that it was still churning away applying the newly configured Storage Policy. Once the resync completed, I tried the upgrade again, but this failed! I ssh’d over to the VCSA and then into the Ruby Console:-

rvc adminstrator@vsphere.local@localhost

I navigated to the folder /localhost/Datacenter/computers (using cd /localhost/etc) , and ran the following command (where BRAINS is the name of my Cluster)

vsan.ondisk_upgrade –allow-reduced-redundancy BRAINS/

The upgrade proceeded and eventually completed (wow, took 8 hours)

After completing the upgrade, I realised my mistake. As per the first section of this post, I’d installed Horizon View, which created a bunch of VSAN storage policies. Whilst I’d changed the default vSAN policy to an FTT of 0, I completely forgot to set the others to 0 as well!

Note to self – always check ALL storage policies before assuming that something else is broken – OR, take the simple route and force the upgrade to run without need to re-sync a shed load of data between disks due to a storage policy change.

Share this:

As an eager VSAN user in my homelab I was very keen to get upgraded to VSAN 6.2 so that I could start to benefit from the new feature set. Following a successful upgrade of the VCSA and the associated Hosts (which I had planned on documenting and may well get round to doing so shortly), I was all prepared and duly pressed the “Upgrade” button on VSAN only to hit an immediate blocker:-

“General Virtual SAN Error” – Failed to realign following Virtual SAN objects: 7ef7a856-333c-7f40-4dcd-0c4de99aaae2 due to being locked or lack of vmdk descriptor file, which requires manual fix

Google produced nothing because the product was less than 24 hours into GA so time to polish up my Ruby skills.

I ssh’d across to my VCSA and from within the Appliance Shell (not Bash shell), logged into RVC using:-

rvc administrator@vsphere.local@localhost

I entered my local sso password and was presented with the RVC shell.

Having used the RVC console as a consequence of some other troubleshooting efforts with VSAN previously, I knew my way around and immediately changed directory to the Datacenter level:- (you can browse your Virtual Center tree like a folder structure with ls inside Ruby)

cd localhost/ACM Computers/computers

Once inside the datacenter and computers folder, I ran the following command to include the Cluster name and one of the UUID provided in the error message “7ef7a856-333c-7f40-4dcd-0c4de99aaae2”

vsan.object_info BRAINS/ 7ef7a856-333c-7f40-4dcd-0c4de99aaae2

This returned the following helpful output:-

This showed me that the object that the upgrade was being blocked by was the ACM-ADC-V001 virtual swap file.

I quickly ran a health check to ensure that the entire VSAN cluster hadn’t got some inaccessible objects as there had been issues with vswp files historically in earlier VSAN releases.

vsan.check_state BRAINS/

but this returned healthy:-

I powered down the associated VM which appeared to remove the VSWP and the LCK file and re-ran the upgrade attempt. It failed again!

So, now to attempt manual object removal! (please note, I do NOT recommend doing this without GSS, this is my home lab so I did it off my own back). Seems that the vswp is stuck within the object based file system so I SSH’d across to an ESXi HOST (not the VCSA) and ran the following:-

Good news, the file successfully deleted, but OH NO! when rerunning the Upgrade again – bad news, whilst the UUID didnt appear in the failure list, it did fail with 3 other UUIDs. So I repeated the first instructions to determine what objects these were as part of a query and they all happened to point to a folder called “cloudvolumes” within which there are a number of pre-created Template files:-

template_uia_plus_profile.vmdk
template.vmdk
template_uia_only.vmdk

This folder and its files exist because I use AppVolumes, so for me I simply deleted these files directly from within the Datastore file browser and re-ran the VSAN Upgrade (I can recreate these later).

As soon as I completed this and re-ran the Upgrade process, it completed successfully!

I wonder if AppVolumes isn’t VSAN aware? The real issue here I would imagine arises if you have created multiple AppStacks that are placed within the same folder structure as they aren’t so easy to just go ahead and remove! Time for a ticket with VMware? Any one in production with similar issues?