Meta

Archive for the ‘Troubleshooting’ Category

A co-worker of mine was troubleshooting an issue where a user was unable to open images in the Windows Photo Viewer. Whenever she would try to view an image, the Photo Viewer would open but the image would never load; instead the image was replaced by the spinning blue circle. There was no issue opening the image with other image viewers. To troubleshoot, I fired up Process Monitor and captured a trace of the activity for a few seconds while trying to open the image.

Before starting, its worth noting that the Windows Photo Viewer is not an exclusive process that runs out of C:\Program Files (x86)\Windows Photo Viewer. It is actually executed by dllhost.exe, a host processes that is used by other processes (in this case explorer.exe) to launch code. If something were to go awry with hosted code, then it would only affect that instance of the host process and not explorer, which might crash the desktop.

After a few seconds of capture, I stopped collecting activity, went to Tools > Count Occurrences > Column: Result. My guess was that the Windows Photo Viewer was looking for something that it needed to properly load the images but could not find it. I selected the PATH NOT FOUND value and applied it. There I could see that dllhost.exe was unable to locate some files in C:\Windows\System32\spool\drivers\color.

I looked at a couple other systems and saw that this folder existed and was populated with the missing files the COM surrogate was looking for. I wasn’t sure how, why, who deleted this folder but I copied it to the problem computer, opened an image with Windows Photo Viewer and it was working.

Recently I needed to update my Office 2013 App-V 5 package. This is done via the Microsoft ODT tool by first downloading the content and then packaging using the /packager command. The download went fine, but upon running the packager phase the command would immediately go straight back to the prompt, which is an indication that something went wrong because the packager phase usually takes a good 10 minutes to run. I checked and double-checked the xml files used to build the package but it looked good. Troubleshooting the ODT setup is pretty straight forward, however; simply locate the setup logs in %temp%. They are named by computer name name + date.log. In my case, I kept encounter the following line in the logs:

This wasn’t completely unexpected as I have used this computer previously to create my Office 2013 App-V packages. But I had never encountered it so I was not sure where to look to clean up the previous install. Enter Process Monitor. I fired it up, reproduced the issue, stopped the trace. Although the trace was only a few seconds, this will likely result in thousands or ten of thousands of operations being captured. This “noise” can quickly be cleaned up, though, using filters. I figured the likely culprit would be in the registry, so I started with limiting results to RegQueryKey. From there, I filtered for the only processes involved. With ODT, this is setup.exe which then calls OfficeClickToRun.exe.

From here, I could see HKEY_CURRENT_USER\Software\Microsoft\Office\15.0 was being touched. Note, that whatever version of Office you are packaging should not be installed on the machine where you are doing the packaging so any key created here would have been done by the ODT setup.

I am a big fan of debugging tools, especially WinDbg. These tools can be quite complicated but most of the time I use them simply for a “one-dimensional” analysis of an application or system crash and use the information from the dump to find a cause and hopefully the most logical fix. In the example here, a user complains that PowerPoint-Excel is often crashing or becoming hung. I was able to locate a couple of errors in the Event Viewer > Application Logs that indicated a crash had taken place within the MS Office product, but it was too generic and failed to resolve to the failing component. I asked the user to manually create a crash dump for me the next time the issue happened by using the Task Manager > Applications > right-clicking the unresponsive PowerPoint or Excel application > Select Create Dump File. The dump file gets sent to the user’s temp directory, e.g. C:\Users\username\AppData\Local\Temp\AppName.DMP.

I copied the dmp and opened with with WinDbg. Note, if you are analyzing a 32bit application dump (often the case with MS Office – because who actually uses 64bit office?) you will want to use the 32bit version of WinDbg. Or, if you are incredibly lazy like me (can’t figure out what happened to my x86 version), you can use the 64bit WinDbg and change it use x86-based processor mode. Just run .load wow64exts followed by .effmach x86, followed by the always faithful !analyze –v –hang.

This resulted in following output (Including only the stack text since everything before this is not useful) :

Stacks are read from the bottom up so you can get a very basic idea of what is happening as you climb the ladder. We are really only interested in the component details … and maybe the method-routine of that component (assuming you have the symbols). For example, we start with the main PowerPoint component, PPCORE (everything after the ! is the method or routine). That calls into OART, which is the Office Art components of the MS Office suite, which goes into RICHED20, which I am guessing is something like rich text box component, then to another office component, MSPTLS, and again into another MS Office art component, GFX. After GFX you can see a couple exceptions (errors) are encountered before passing back into the main PowerPoint dll.

Hint: if you are not sure what these components are or belong to, open a new command browser in WinDbg and use the lmvm command to get the details of it, e.g. lmvm GFX:

So, from the stack above, my first guess is that we have an issue with some graphic module(s) – and maybe to a greater part the video subsystem (not necessarily the fault of MS Office). To confirm, I can look at all the thread stacks by dumping them with ~*kb. Reading from the bottom up, it looks like the problem starts with thread 15 (similar to what !analyze –v –hang shows):

At this point, I decide to check for a later version of the graphics drivers for the users computer and found a newer driver release. Install, reboot, and no further Office applications reported since. Altogether, the process of identifying the basic issue itself took no more than 10 minutes, all without any software engineer credentials.

I recently began encountering a problem where one of our application packages was getting hung during the install process. I could see the program directory correctly populated with application files but it seemed to get hung at end, and msiexec processes continued to run indefinitely. After manually killing each process, I checked to see if the application actually completed installing. When trying to run one of the applications that has a dependency on one of the failed MSI’s APIs, I encountered the following:

Further, Outlook was crashing when trying to load the application’s main component.

Afterwards, the application with the dependency launched without error and Outlook was no longer crashing. However, the root issue still needed to be identified as I had no idea what else in the MSI was failing to complete. After manually deleting the components from the file system (it was missing from Programs and Features), I ran the MSI again and this time used Process Explorer to look inside the hung msiexec processes. There were multiple processes running, but I could see one process, though, eating CPU resources, and this was likely where I might find the culprit:

Opening the process details, I select the Threads tab. The thread that is doing all the work (or in this case getting hung up) is indicated by the Cycles tab. From here, I opened the thread by selecting it and clicking Stack:

Stacks are read from the bottom up. You can see a 3rd party component here (HCApi – McAfee HIPS) at play. Note, this might not be entirely abnormal as you can always expect any anti-virus suite to be hooking itself into any number of processes. So, to confirm what I was seeing, I used the Task Manager to dump the msiexec process that was using the CPU time by right-clicking it and select Create dump file (which can also be done via Process Explorer):

Dumps can be analyzed with WinDbg or a tool like DebugDiag 2.0. I have grown increasingly lazy and forgetful over the years with WinDbg. It has a very high learning curve and if you don’t use it much, its easy to forget everything except the old faithful !analyze –v or !analyze –v –hang. I used DebugDiag instead. When it is installed, simply right-click the dump file and select Analyze Crash/hang Issue from the context menu and point it to the crash file.

It will do its best to figure out the issue in the most vague way and you almost always need to do some interpretation of your own. For the most part, the analysis summary can be ignored.

A little bit down in the report points me to what I was seeing in Process Explorer with the problem thread that was using all the CPU time:

You can click the Thread ID like a hyperlink to follow it down in the report, expanding the thread. It reveals a familiar site. However, there is a bit more insight as the entry point is revealed and I can see the problem has something to do with the CustomAction table of the MSI itself.

Through a quick process of elimination using Orca to remove some of the rows in the CustomAction table, I narrow down the cause specifically down to action ISSelfRegisterCosting.

Without that row, the MSI install completes normally; or so at least it seems. I have no idea what removing this action might have elsewhere so this merely a hack. To further confirm the problem is being caused by McAfee, I perform the install on a virtual machine where McAfee is not installed and it proceeds normally. I then reach out to our McAfee enterprise admin and ask him to disable HIPS on one of my workstation. After doing so, the MSI runs and the application installs as expected. He inherits the problem.

Update

This has since been corrected by MacAfee with a HIPS update. One of the DLLs trying to get registered was getting blocked.

The symptom: Internet Explorer 11 and Outlook were crashing*. This was happening in all cases after applying a new task sequence. The event viewer did not produce any events that would help so I configured the system to capture crash dumps for applications. After rebooting, I reproduced the crash by simply opening IE an grabbed one of the dumps and ran it through an x86 debugger (by default, the 32bit Content tabs in IE 11 run as 32bit processes). Using ! analyze –v command produced the following output:

A good guess is that the issue we are encountering is graphics driver related.

I wanted to see if that was the case with Outlook as well. However, that would require a different approach as Outlook was not producing any crash dumps to run through the debugger. Although there are several ways to work around this, the simplest and most GUI friendly is to use DebugDiag from Microsoft. I wrote about this sometime ago and it has since moved to version 2.1 but the approach I still the same for the most part. I simply ran DebugDiag Collection, created a crash rule, specified outlook.exe as the process. After starting Outlook, several dumps were generated. When complete I then launched DebugDiag Analysis, selected the last dump and then selected Start Analysis. The report generates as an htlm file and opens automatically in IE.

The analysis takes some of the guess work of how to proceed when you have little or no understanding of WinDbg. I could see my suspicion that the issue common to both IE and Outlook was the same, which does not surprise me as they both share common modules.

As a proof of concept, one way to test issues you suspect may be graphics related to IE if your system uses dedicated graphics is to disable GPU rendering in IE. This is done by going into the Control Panel > Internet Options > Advanced > and checking Use software rendering instead of GPU rendering (note, although the asterisk indicates you should restart your computer, only restarting IE may be required). In my case, this corrected the problem with IE so I was sure now that the issue was related to the graphics card. Before updating the driver, I checked the display information to see if perhaps the drivers were missed and maybe generic drivers were installed. This didn’t seem to be the case:

I would still update the drivers anyway. After doing so, both Internet Explorer and Outlook opened without crashing. Interestingly, after updating the drivers, the workstation reported a different video adapter model:

In the end, the resolution was to install the correct graphics drivers during the task sequence.

Note, in some cases the crashing application may report via WER (windows error report) popup. If you examine the details, you may note that the fault module name contains a stackhash code a7aa, which is often a indicator that the issue is likely related to a driver issue for the video subsystem:

*Oddly enough, I could not reproduce the issue when connected remotely using RDP. This should have been a clear indicator as to what the issue really was because RDP does not use the host system’s graphics drivers but those of the connecting system instead.

This is a case where the Application Event logs came in handy when my application started to crash upon loading on our Windows 8.1 images. Since my app was compiled as .NET, the stack in the error details pointed me to problem object:

When TextBlock3 loaded it was doing Environment.GetEnvironmentVariable(“SomeVariable”). The variable in question did not exist on our Windows 8.1 images. Although this did not seem to pose an issue for my Windows 7 dev box (it just would not display the TextBlock), Windows 8 will throw an exception if the variable does not exist and the app will crash. The likely cause my dev PC did not encounter this exception when running was because .NET framework 4.5 was not installed whereas .NET 4.5 is the default framework in Windows 8.

While in the process of composing an web-based email today via Internet Explorer, I noticed something odd when I went to attach a file to the message. In the IE file upload dialog there were a series of files listed on the desktop that I knew were not really there. Judging by the dates they had been hanging around for some time now but I had never noticed them (and, no, they were not hidden or marked as protected operating system files):

The files themselves were no mystery; they are nothing more than text files that dump process, thread, and stack information to the desktop when the JRE crashes. I tried to attach one of these to the web message but nothing would happen. However, from within in the dialog box I could right click and open them like any other files. Attempting to move the file also failed because the file itself apparently didn’t exist:

If that was the case, I should then get the same message trying to copy it back to the desktop. Instead, the dialog this time indicated it did exist:

I wondered if I could access these ghost files from outside IE by, for example, attaching them to a message in Outlook. Not surprisingly, the same set of files did not appear from the desktop location:

This was likely something specific to IE. To determine where the files were actually residing, I turned to Process Monitor, started a trace for file activity and proceeded to open one of the ghost files via the IE File upload dialog box. Afterwards, I stopped the trace and did a search for the file in question.

There was half the mystery, the files actually resided in C:\Users\username\AppData\Local\Microsoft\Windows\Temporary Internet Files\Virtualized\C\Users\username\Desktop. The other half of the mystery was then just a matter of some quick research into IE and Virtualized folders. That lead me to this MSDN article Understanding and Working in Protected Mode Internet Explorer. In short:

A Compatibility Layer handles the needs of many existing extensions. It intercepts attempts to write to medium integrity resources, such as the Documents folder in the user profile and the HKEY_CURRENT_USER registry hive. However it will not intercept writes to system locations like Program Files and HKEY_LOCAL_MACHINE. The compatibility layer uses a Windows Compatibility Shim to automatically redirect these operations to the following low integrity locations:

The article, however, doesn’t cover the other scenario. Local admin and domain techs accounts could logon but not standard users accounts. This is often due to file permissions. For whatever reason, a system-admin protected file was created deep into the profile of the Default user account. This is where all new profiles are created from, and Everyone\Users must have at least read permissions for a new profile to be successfully spawned from the Default. The details are alluded to in the Windows Application event logs, event ID 1509:

Looking at the problem file, we can see that the file has no permissions for Everyone\Users:

The solution is to delete the file (if non-essential) or add the correct user(s) and permissions. And depending on the location, you may need to uncheck Hide protected operation system files in Windows Explorer.

Recently, we were testing a new remote desktop application but I was experiencing problems connecting to the application, and it wasn’t isolated to a single workstation. A few of my other labs were unable to access the application as were other computers as the testing expanded. Admittedly, I should have figured this one out quickly as it was not the first time I encountered an issue trying to access remote desktop applications (i.e. terminal server apps) inside our network. The fact that we were hosting this RDP app internally, though, lead me to dismiss my initial hunch. A quick look with Network Monitor confirmed the problem:

The Windows local security authority processes (lsass.exe) is trying to get out to the cloud to do something, in this case likely some certificate revocation checking (this is usually the case for RDP apps). It keeps on trying by retransmitting until a response is received or it times out, but because we are behind a proxy, lsass.exe will never find a path outside the network. Certificate checking is handled by the Windows crypto API, which relies on WinHttp. Now, by default, Winhttp 5.1 can find a way out of the network if your network is configured to use Web Proxy Autodiscovery (WPAD). We do not, so the fix is to manually configure Winhttp to use some proxy. Since we have this information configured in Internet Explorer (along with proxy exceptions), we just need to import these settings into Winhttp via (elevated command prompt) netsh winhttp import proxy source=ie. Afterwards, the connection problem resolved*.

On the flipside, configuring Winhttp to use a static proxy can also cause connectivity problems for mobile users. I ran into this issue myself today when I was trying to stream a movie from Netflix and kept on encountering the following error: “Whoops. something went wrong… An Internet or home connection network connection problem is preventing playback… Error code: H7111-1101”

To verify my suspicion that the opposite was true—that Winhttp was trying to use a proxy it could not reach to do certificate verification checks—I looked in the Windows event CAPI2 logs and could see that the Netflix certificate check was failing:

The details pane further down revealed the cause as the certificate server was “offline”, which is just a generic term for can’t be found. You can verify your winhttp proxy via netsh winhttp show proxy. To correct, simply set the winhttp proxy to use direct connection via netsh winhttp reset proxy when outside the network.

*Certificate verification does not need to be performed for every session, i.e. the initial check is good enough until the crypto API determines that the information is expired and another check needs to be performed.

I am not sure why yet, but Netmon captures on a Windows 8.1 system did not reveal the lsass.exe process. Instead, the process was listed as unavailable.

So Java just released JRE 7 Update 45. This is apparent when someone goes to run a Java applet and encounters the following prompt:”Your Java version is out of date…”

For the average home user, this is not a big deal. But in a corporate environment headaches ensue. Why? Because some users will blindly click on the Update button and be redirected to the Java download page for the latest release. The first problem is that our users are not going to have the administrative privileges to update their Java client. But the real problem is that once a user has done this, they will be redirected to the Java download page each and every time they need to run a Java applet. So the question was, how do we get the prompt back so the user can select the appropriate Later option and Do not ask again until the next update is available?

For starters, the property that controls this setting is located in a file called deployment.properties in %userprofile\AppData\LocalLow\Sun\Java\Deployment named deployment.expiration.decision.10.25.2=update: