Blog Stats

Meta

A co-worker of mine was troubleshooting an issue where a user was unable to open images in the Windows Photo Viewer. Whenever she would try to view an image, the Photo Viewer would open but the image would never load; instead the image was replaced by the spinning blue circle. There was no issue opening the image with other image viewers. To troubleshoot, I fired up Process Monitor and captured a trace of the activity for a few seconds while trying to open the image.

Before starting, its worth noting that the Windows Photo Viewer is not an exclusive process that runs out of C:\Program Files (x86)\Windows Photo Viewer. It is actually executed by dllhost.exe, a host processes that is used by other processes (in this case explorer.exe) to launch code. If something were to go awry with hosted code, then it would only affect that instance of the host process and not explorer, which might crash the desktop.

After a few seconds of capture, I stopped collecting activity, went to Tools > Count Occurrences > Column: Result. My guess was that the Windows Photo Viewer was looking for something that it needed to properly load the images but could not find it. I selected the PATH NOT FOUND value and applied it. There I could see that dllhost.exe was unable to locate some files in C:\Windows\System32\spool\drivers\color.

I looked at a couple other systems and saw that this folder existed and was populated with the missing files the COM surrogate was looking for. I wasn’t sure how, why, who deleted this folder but I copied it to the problem computer, opened an image with Windows Photo Viewer and it was working.

Recently I needed to update my Office 2013 App-V 5 package. This is done via the Microsoft ODT tool by first downloading the content and then packaging using the /packager command. The download went fine, but upon running the packager phase the command would immediately go straight back to the prompt, which is an indication that something went wrong because the packager phase usually takes a good 10 minutes to run. I checked and double-checked the xml files used to build the package but it looked good. Troubleshooting the ODT setup is pretty straight forward, however; simply locate the setup logs in %temp%. They are named by computer name name + date.log. In my case, I kept encounter the following line in the logs:

This wasn’t completely unexpected as I have used this computer previously to create my Office 2013 App-V packages. But I had never encountered it so I was not sure where to look to clean up the previous install. Enter Process Monitor. I fired it up, reproduced the issue, stopped the trace. Although the trace was only a few seconds, this will likely result in thousands or ten of thousands of operations being captured. This “noise” can quickly be cleaned up, though, using filters. I figured the likely culprit would be in the registry, so I started with limiting results to RegQueryKey. From there, I filtered for the only processes involved. With ODT, this is setup.exe which then calls OfficeClickToRun.exe.

From here, I could see HKEY_CURRENT_USER\Software\Microsoft\Office\15.0 was being touched. Note, that whatever version of Office you are packaging should not be installed on the machine where you are doing the packaging so any key created here would have been done by the ODT setup.

I am a big fan of debugging tools, especially WinDbg. These tools can be quite complicated but most of the time I use them simply for a “one-dimensional” analysis of an application or system crash and use the information from the dump to find a cause and hopefully the most logical fix. In the example here, a user complains that PowerPoint-Excel is often crashing or becoming hung. I was able to locate a couple of errors in the Event Viewer > Application Logs that indicated a crash had taken place within the MS Office product, but it was too generic and failed to resolve to the failing component. I asked the user to manually create a crash dump for me the next time the issue happened by using the Task Manager > Applications > right-clicking the unresponsive PowerPoint or Excel application > Select Create Dump File. The dump file gets sent to the user’s temp directory, e.g. C:\Users\username\AppData\Local\Temp\AppName.DMP.

I copied the dmp and opened with with WinDbg. Note, if you are analyzing a 32bit application dump (often the case with MS Office – because who actually uses 64bit office?) you will want to use the 32bit version of WinDbg. Or, if you are incredibly lazy like me (can’t figure out what happened to my x86 version), you can use the 64bit WinDbg and change it use x86-based processor mode. Just run .load wow64exts followed by .effmach x86, followed by the always faithful !analyze –v –hang.

This resulted in following output (Including only the stack text since everything before this is not useful) :

Stacks are read from the bottom up so you can get a very basic idea of what is happening as you climb the ladder. We are really only interested in the component details … and maybe the method-routine of that component (assuming you have the symbols). For example, we start with the main PowerPoint component, PPCORE (everything after the ! is the method or routine). That calls into OART, which is the Office Art components of the MS Office suite, which goes into RICHED20, which I am guessing is something like rich text box component, then to another office component, MSPTLS, and again into another MS Office art component, GFX. After GFX you can see a couple exceptions (errors) are encountered before passing back into the main PowerPoint dll.

Hint: if you are not sure what these components are or belong to, open a new command browser in WinDbg and use the lmvm command to get the details of it, e.g. lmvm GFX:

So, from the stack above, my first guess is that we have an issue with some graphic module(s) – and maybe to a greater part the video subsystem (not necessarily the fault of MS Office). To confirm, I can look at all the thread stacks by dumping them with ~*kb. Reading from the bottom up, it looks like the problem starts with thread 15 (similar to what !analyze –v –hang shows):

At this point, I decide to check for a later version of the graphics drivers for the users computer and found a newer driver release. Install, reboot, and no further Office applications reported since. Altogether, the process of identifying the basic issue itself took no more than 10 minutes, all without any software engineer credentials.

Not too long ago we began to notice that users who were opening IE after logging onto some workstations or Citrix servers they did not have a previously existing profile on were getting proxy settings applied in IE that were not being defined in any group policy object. Our local office proxies are applied later in the GPO processing so if it was anything coming before that, it should have been overridden by local GPO. The quick remedy was simply wait for policy to apply in the background or manually run gpupdate. But we still needed to address the cause as in some cases the proxy server was no longer reachable, preventing users from getting out to the Internet.

To get an idea of what might be happening, we deleted a profile from a machine and logged onto it. We then fired up Process Monitor and left it running while launching Internet Explorer for the first time. After verifying that incorrect proxy setting was getting applied, we stopped the trace and the log was sent my way. Proxy settings are applied in the registry so I knew I would be looking only at registry events, searching for the string of proxy server name in the log, and specifically looking at the operation for RegSetValue. From here, it was a simple matter of opening the Properties for that Event and going to the Stack tab.

The component setting the phantom proxy server (iedkcs.dll) was coming from the Internet Explorer Admin Kit or IEAK. This is otherwise known as Active Setup and its what IE will use when it is initially launched by a new user logon if the IE install was configured using IEAK. Also known as Branding, Active Setup for IE looks for INSTALL.INS if it defined in the registry. You can find this file in Program Files (x86)\Internet Explorer\CUSTOM. Looking in the INSTALL.INS, I could see under [ConnectionSettings] the phantom proxy server. You can read about the internals of this here: https://support.microsoft.com/en-us/kb/2029043

At first this was a bit baffling because I had setup the rollout for IE11 using the standalone IE11 MSU and required dependencies and not the IEAK. However, after glancing at the INSTALL.INS file again, I noticed this was actually coming from an earlier IE 9 deployment, which was created using IEAK by my IE predecessor:

To correct, I referred back to https://support.microsoft.com/en-us/kb/2029043 and concluded that this could be avoided by going to [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Active Setup\Installed Components\>{2A2F9EE5-7FE3-44E8-AD75-72B5704DBCB4}] and deleting "StubPath"="RunDLL32 IEDKCS32.DLL,BrandIE4 CUSTOM".

To correct on new workstations going forward, I simply created a REG DELETE command in the Task Sequence after the IE11 setup to remove the value. For existing machines, a GPP registry is the easiest way to go.

I recently began encountering a problem where one of our application packages was getting hung during the install process. I could see the program directory correctly populated with application files but it seemed to get hung at end, and msiexec processes continued to run indefinitely. After manually killing each process, I checked to see if the application actually completed installing. When trying to run one of the applications that has a dependency on one of the failed MSI’s APIs, I encountered the following:

Further, Outlook was crashing when trying to load the application’s main component.

Afterwards, the application with the dependency launched without error and Outlook was no longer crashing. However, the root issue still needed to be identified as I had no idea what else in the MSI was failing to complete. After manually deleting the components from the file system (it was missing from Programs and Features), I ran the MSI again and this time used Process Explorer to look inside the hung msiexec processes. There were multiple processes running, but I could see one process, though, eating CPU resources, and this was likely where I might find the culprit:

Opening the process details, I select the Threads tab. The thread that is doing all the work (or in this case getting hung up) is indicated by the Cycles tab. From here, I opened the thread by selecting it and clicking Stack:

Stacks are read from the bottom up. You can see a 3rd party component here (HCApi – McAfee HIPS) at play. Note, this might not be entirely abnormal as you can always expect any anti-virus suite to be hooking itself into any number of processes. So, to confirm what I was seeing, I used the Task Manager to dump the msiexec process that was using the CPU time by right-clicking it and select Create dump file (which can also be done via Process Explorer):

Dumps can be analyzed with WinDbg or a tool like DebugDiag 2.0. I have grown increasingly lazy and forgetful over the years with WinDbg. It has a very high learning curve and if you don’t use it much, its easy to forget everything except the old faithful !analyze –v or !analyze –v –hang. I used DebugDiag instead. When it is installed, simply right-click the dump file and select Analyze Crash/hang Issue from the context menu and point it to the crash file.

It will do its best to figure out the issue in the most vague way and you almost always need to do some interpretation of your own. For the most part, the analysis summary can be ignored.

A little bit down in the report points me to what I was seeing in Process Explorer with the problem thread that was using all the CPU time:

You can click the Thread ID like a hyperlink to follow it down in the report, expanding the thread. It reveals a familiar site. However, there is a bit more insight as the entry point is revealed and I can see the problem has something to do with the CustomAction table of the MSI itself.

Through a quick process of elimination using Orca to remove some of the rows in the CustomAction table, I narrow down the cause specifically down to action ISSelfRegisterCosting.

Without that row, the MSI install completes normally; or so at least it seems. I have no idea what removing this action might have elsewhere so this merely a hack. To further confirm the problem is being caused by McAfee, I perform the install on a virtual machine where McAfee is not installed and it proceeds normally. I then reach out to our McAfee enterprise admin and ask him to disable HIPS on one of my workstation. After doing so, the MSI runs and the application installs as expected. He inherits the problem.

Update

This has since been corrected by MacAfee with a HIPS update. One of the DLLs trying to get registered was getting blocked.

The symptom: Internet Explorer 11 and Outlook were crashing*. This was happening in all cases after applying a new task sequence. The event viewer did not produce any events that would help so I configured the system to capture crash dumps for applications. After rebooting, I reproduced the crash by simply opening IE an grabbed one of the dumps and ran it through an x86 debugger (by default, the 32bit Content tabs in IE 11 run as 32bit processes). Using ! analyze –v command produced the following output:

A good guess is that the issue we are encountering is graphics driver related.

I wanted to see if that was the case with Outlook as well. However, that would require a different approach as Outlook was not producing any crash dumps to run through the debugger. Although there are several ways to work around this, the simplest and most GUI friendly is to use DebugDiag from Microsoft. I wrote about this sometime ago and it has since moved to version 2.1 but the approach I still the same for the most part. I simply ran DebugDiag Collection, created a crash rule, specified outlook.exe as the process. After starting Outlook, several dumps were generated. When complete I then launched DebugDiag Analysis, selected the last dump and then selected Start Analysis. The report generates as an htlm file and opens automatically in IE.

The analysis takes some of the guess work of how to proceed when you have little or no understanding of WinDbg. I could see my suspicion that the issue common to both IE and Outlook was the same, which does not surprise me as they both share common modules.

As a proof of concept, one way to test issues you suspect may be graphics related to IE if your system uses dedicated graphics is to disable GPU rendering in IE. This is done by going into the Control Panel > Internet Options > Advanced > and checking Use software rendering instead of GPU rendering (note, although the asterisk indicates you should restart your computer, only restarting IE may be required). In my case, this corrected the problem with IE so I was sure now that the issue was related to the graphics card. Before updating the driver, I checked the display information to see if perhaps the drivers were missed and maybe generic drivers were installed. This didn’t seem to be the case:

I would still update the drivers anyway. After doing so, both Internet Explorer and Outlook opened without crashing. Interestingly, after updating the drivers, the workstation reported a different video adapter model:

In the end, the resolution was to install the correct graphics drivers during the task sequence.

Note, in some cases the crashing application may report via WER (windows error report) popup. If you examine the details, you may note that the fault module name contains a stackhash code a7aa, which is often a indicator that the issue is likely related to a driver issue for the video subsystem:

*Oddly enough, I could not reproduce the issue when connected remotely using RDP. This should have been a clear indicator as to what the issue really was because RDP does not use the host system’s graphics drivers but those of the connecting system instead.

There has been some chatter about the Internet Explorer 11 MSI that is generated from the IEAK failing while the EXE is able to run successfully. I ran into this myself last week while trying to manually execute the package. The IE11_main.log always showed the same error: “ERROR: Error downloading prerequisite file (KB2834140): 0x800c0005 (2148270085)”

After a day or so of troubleshooting I realized the problem. When the IE11 setup runs, it needs to go out to the Internet to check Microsoft for prerequisite updates; both the MSI and the EXE do this. But the MSI (msisexec.exe) executes in the context of the local system account. By default, the local system account tries to find a direct path to the Internet, but if you are behind a proxy this is going to fail as the MSI child processes (specifically ienrcore.exe) are bypassing the proxy.

The EXE on the other hand when run manually executes in the context of a user account with Internet Access. That does not mean you are entirely out of the woods with the EXE package. If you are automatically deploying and leveraging SCCM the EXE (like the MSI) is still going to executed by the local system account. To overcome this, you need to configure the local system account to use a proxy (or not use a proxy, or update your proxy). This can be done via BITSAdmin with the following command:

Another workaround is to simply install the prerequisites before installing IE 11. The required KBs are:

KB2670838

KB2786081

KB2834140

KB2882822

KB2888049

Last, you should also be able to extract the cab files from these KBs and add them via IEAK custom components step in the IEAK. I have yet to get this to work, however, as the package still wants to go out to the Windows Update site. If you have gotten this to work, let me know how you did it.

UPDATE

I have decided to use the IE11 redistributable and avoid IEAK. Injecting updates into the package via IEAK is too much work. Fortunately for us we control everything we need to via group policy so there is no need for customizations. A script to install the prerequisite updates and then IE11 is much simpler.

Note, if you install updates then install IE11 in the same script, the IE11 installer is still going to want to go out to the Internet to check updates if you just run the executable command. There are two ways to workaround this. One is to install the updates, restart, then install IE11. But this means you have to break up your deployment into two steps. Fortunately, you can work around this. The answer lies in the IE11_main.log using DISM. When a successful install (with Internet Connection) of IE11 is completed you can find the command in the log:

Just extract the redistributable and then install IE11 via the main CAB using DISM and then the two MSUs using WUSA. For example:

This is a case where the Application Event logs came in handy when my application started to crash upon loading on our Windows 8.1 images. Since my app was compiled as .NET, the stack in the error details pointed me to problem object:

When TextBlock3 loaded it was doing Environment.GetEnvironmentVariable(“SomeVariable”). The variable in question did not exist on our Windows 8.1 images. Although this did not seem to pose an issue for my Windows 7 dev box (it just would not display the TextBlock), Windows 8 will throw an exception if the variable does not exist and the app will crash. The likely cause my dev PC did not encounter this exception when running was because .NET framework 4.5 was not installed whereas .NET 4.5 is the default framework in Windows 8.

Not too long ago I was asked to start looking into then recent reports of Outlook hanging on first launch. The first launch hang symptom happened as mail items were writing into the OST (cached mode) as user’s came in each morning and opened Outlook. Forcefully closing and reopening Outlook was a workaround but the cause needed to be identified as the impact was potentially the entire firm.

After a couple days I began to see a common theme. I was using Resource Monitor (resmon.exe – also accessible from the Task ManagerPerformance tab) in an attempt to identify the culprit. For hang scenarios, simply recreate the issue, open Resource Monitor, go to the Overview tab and locate the hung process. Typically a hung process will be displayed in red:

Right-click the hung process and select Analyze Wait Chain to get a list of the thread IDs. One of these threads will contain the module(s) that is causing the process to become unresponsive.

To look inside the threads, you can use Process Explorer. Just right-click on the target process, select Properties and go to the Threads tab. From here, use the TID column to locate the threads from the wait chain provided by resmon. Double-click the thread or select the Stack button to view the loaded modules in the thread. In this case, I saw a consistent stack in each of the hung Outlook processes on the various workstations:

There is a single 3rd party module appearing here, everything else is Microsoft. This made sense as this particular module was part of a recent Outlook add-in upgrade. The other threads were also examined but the only 3rd party modules they were referencing were proven addins that had never caused issues. A POC was tested on with several users over the course of a couple weeks by disabling the add-in via load behavior in the registry and issue did not resurface. The issue was raised with the vendor and a newer client for the add-in was provided that resolved this problem.

Another way to identify the cause assuming you have Windows Debugging Tools installed is to generate a dump of the hung process from the Task Manager. A quick !analyze –v –hang pointed to the same culprit (this is a 32bit process from a 64bit OS so you will need to get the 32bit stacks. Use .load wow64exts and .effmach x86 commands before !analyze):

A user was complaining that they could not access a web page. The initial impression was that it was being blocked. I bypassed the proxy mechanism in place that normally restricts web traffic but saw that the page still could not be loaded. The message alluded to something locally installed that may be the culprit, but this was really just a generic “catch-all” error message: “Is feedly blocked? Feedly is not able to load. It is probably because one of your extensions is blocking it…”

I decided to load up the Developer Tools in IE to see if it was problem with the script in the web page. To do this, just press F12 or go to Tools > F12 Developer Tools. Afterwards, select the Script tab and select the option to Start debugging. I selected the hyperlink in the right portion of the debug window and it took the cursor to the problem part of the javascript code.

Don’t worry, you don’t necessarily need to be a program-code debugger to figure this out. That what the Internet and your favorite search engine is for. Searching the debug error Expected identifier, string or number IE9 turned up two top hits that pointed out that a trailing comma after an object or array is a bad idea and that IE9 specifically cannot handle this problem script.

The workaround is to use an alternative browser like FireFox or Chrome (they are not as restrictive with malformed javascript). Internet Explorer 10 or 11 also seems to be handle this problem more gracefully, too, and allowed the page to load normally.