Dependency walker is the tool of choice for static dependency analysis of native binaries (it has some dynamic analysis too, but that niche at least has some alternative solutions). It is in a rather sorry state, however – development seems to be abandoned since more or less 2005, and it is unanimously described as aging. As a prominent example of dependency walker analysis failures, try to run it on itself:

It seems the dependencies it is able to resolve are a negligible minority of the overall dependencies – and the interwebs are fullofsimilarreports. The DLLs falsely reported as missing are all strangely named and unfamiliar, and the explanations given as SO answers range from a vague ‘some internal OS stuff’ to hypotheses about delay loads and side-by-side assemblies.

To the best of my understanding, as of March 2016 DependencyWalker 2.2 resolves side-by-side manifests very well and has no trouble with delay loads. I’m aware of only two dependency scenarios where it falls short, but unfortunately they are ubiquitous.

1: Compatibility Shims

2: Api Sets

An API Set is a strong name for a list of Win32 APIs … you should think of an API Set’s name as just a unique character string, and not as a dll name … API Sets rely on operating system support in the library loader … the library loader performs a runtime redirection of the reference…

Don’t be alarmed if that still sounds opaque. Brief history, as I understand it:

Sometime in the Vista dev cycle an effort referred to as MinWin began: essentially, smart people started moving functionality around in hope of simplifying the OS architecture. To protect the myriad components from breaking during a change, the ultimate solution was called in: an extra layer of indirection. This level is exactly Api Sets.

For example, the API set “api-ms-win-core-fibers-l1-1-1.dll” is an ‘atom’ of functionality encompassing the 5 APIs FlsAlloc, FlsFree, FlsGetValue, FlsSetValue and IsThreadAFiber (it is an untypically small such ‘atom’). All applications that consume fiber functionality declare dependency on this API set, and thereby become insensitive to the exact location of implementation (that might change between OS releases). During load time, the OS searches somewhere and automagically routes the calls from api-ms-win-core-fibers-l1-1-1.dll to wherever they happen to be implemented in this OS version.

One could argue that API sets now serve the original intended role of DLLs and that the architecturally clean solution is to have each API set implemented in its own DLL, but I’m sure this tradeoff has performance implications that I cannot even begin to quantify.

Some Internals

API sets are very partially documented, and the load-time mechanism that properly routes the calls – even less so. One could start by inspecting the shipped apiset.h (documented to be authored by the venerable Arun Kishan in Sep-2008), and learn that the key call is to the undocumented ApiSetResolveToHost. It is called from LoadLibrary, typically through a call stack such as –

The actual per-OS-version redirection data lies in a special file called ApiSetSchema.dll. Its technically a DLL (conforms to the PE spec), but not an executable one – the redirection data lies in a specialized section called .apiset, mentioned at the apiset.h macros. Sebastien Renaud did some spectacular reversing work and described the layout of the redirection data it contains.

Full(er) Redirection Table

In principle one could – and hopefully someday would – use Renaud’s work to create a community-maintained version of dependency walker, but until that day we can get by with the aforementioned built-in loader logging: whenever ShowSnaps is raised the loader spits out many hundreds of messages like –

Running a few applications and filtering the results, I arrived at the table dumped below. I’ll update it as time permits – but if you have some dependency you don’t understand you can follow the same steps (well, for apps you can run, anyway): raise ShowSnaps for your app and inspect the output to see where the ApiSet I missed really routes to. If you do, please comment here so I can correct the table.

The Problem

~5Y ago I blogged about data breakpoints. A hefty bit of the discussion was devoted to persistence of hardware breakpoints across a thread switch: all four implementations mentioned assume that HW breakpoints persist across thread boundaries, and some rough testing showed that that was indeed the case back then. Alas, somewhere between Windows 7 and Windows 10 – this assumption broke. The naïve implementation via SetThreadContext now indeed sets the debug registers only in the context of a specific thread. I suspect a deep change in the OS scheduler broke it – possibly hardware tasks are used today where they previously weren’t, but I have no proof.

Failed Attempts

I’m aware of a single attempt to address this shortcoming and implement a truly cross-thread data breakpoint: a year ago my friend Meir Meshi published code that not only enumerates all existing threads and sets debug registers in their context, but also hooks thread creation (actually RtlCreateThread) via a coded assembly trampoline to make sure any thread created henceforth would respect the existing breakpoints. The code seemed to work marvelously for a while, and broke again in Windows 10 – where MS understandably recognizes patching of thread creation as an exploit, and bans it.

I set out to find a working alternative to hooking thread creation. Two immediate directions popped to mind: DLL_THREAD_ATTACH and the lesser known TLS callbacks. These are two documented hooks available to user mode upon thread creation, and seemed like a natural place to access a list of pre-set breakpoints and apply them to the context of new threads. Both these attempts fell short again, since it seems these hooks are called before the target thread is created (from the stack of a different process thread), and setting the debug registers in this context does not persist to the target thread.

Bottom line, it seems that as of 2016 you really have to be a debugger and handle CREATE_THREAD_DEBUG_EVENT to manage hardware breakpoints. I was recently told by John Robbins that the VS team are aware of this need but it just isn’t currently a priority (this might change if this UserVoice suggestion gets a bit more votes, though). Luckily, VS isn’t the only debugger in the MS universe – and in fact it integrates with a much stronger one.

A Real Solution

WinDBG (and siblings) had perfect hardware breakpoints (via ‘ba‘) since forever. It is a less known fact that VS integrates rather nicely with WinDBG, and for completeness I’ll rehash the integration steps here.

(2) Run the debugee without a debugger (Ctrl+F5) and attach to it via the newly added ‘Windows User Mode Debugger’ transport:

The debugging engine is now WinDBG. The debugging experience is noticeably different: the expression evaluator is different – e.g. the view at the watch window and what it agrees to process are changed, threads pane no longer has thread IDs (why??) etc. – but all in all the large majority of VS commands and keyboard shortcuts are nicely mapped to this new engine.

You should see a new ‘Debugger Immediate Window’ pane, that accepts command line inputs identical to that of WinDBG, and with a nice bonus of auto-completes and auto help:

While at a breakpoint, type a ba command at this window. For example, to break upon read (r) of any of the 4 bytes (r4) following the address 0x000000c1`3219f7f4 (the windbg engine likes a ` separator in the middle of x64 addresses), type:

ba r4 0x000000c1`3219f7f4

And enjoy your shiny new read breakpoints that works across existing threads and are inherited by newly created ones.

A lot hasalreadybeensaid on it, (no, seriously, alot) and yet some root causes are not yet covered. All that follows holds for C++ projects, C#, VB, and everything MSBuild in general.

The symptom

Normally if you build your solution, change nothing and immediately try to build again – nothing happens, as you’d expect. But occasionally some projects are rebuilt despite not being modified. If by bad luck these projects are high up the dependency tree – many other projects are re-built, and irks galore abound.

Project ‘Whateva’ is not up to date. Project item ‘Shiny.html’ has ‘Copy to Output Directory’ attribute set to ‘Copy always’.

Project ‘Whateva’ is not up to date. CopyLocal reference source ‘F:\folder1\blah.dll’ is more recent than ‘F:\folder2\blah.dll’.

Project ‘Whateva’ is not up to date. Input file ‘F:\sources\yadda.csproj’ is modified after output file ‘F:\bin\yadda.pdb’.

Project ‘Whateva’ is not up to date. Last build was with unsaved files.

In my personal experience the most common reason is the first one: a file that is included in a project but is missing on disk. Treating this state as out-of-date is a weird design decision (I’d say VS should refuse to build), but that’s just the way it is. Anyway, more can – and should – be said about file dates.

File Dates: More than meets the eye

The question whether file1 is newer than file2 is not as straightforward as it might seem. NTFS has two associated times: created time and modified time (never mind ‘last access time’ now). The full rules of how these dates react to a copy/move are somewhat involved but the key piece in this context is:

If you copy a file from D:\NTFS to D:\NTFS\SUB, it keeps the same modified date and time but changes the created date and time to the current date and time. [In particular, making the created date later than the modified date]

So the date that is preserved across copies is modified date – but MSBuild compares created dates to determine whether a file should be copied. I sincerely wish it wouldn’t – but wait, the creation date in the copy can only grow newer than the source, and this shouldn’t call for a re-copy anyway, should it?

Enter NTFS Tunneling.

This Windows feature is beyond esoteric, so don’t feel bad if it doesn’t ring any bell. In a nutshell:

When a name is removed from a directory (rename or delete), its short/long name pair and creation time are saved in a cache, keyed by the name that was removed. When a name is added to a directory (rename or create), the cache is searched to see if there is information to restore. The cache is effective per instance of a directory. If a directory is deleted, the cache for it is removed.

Simply put, when you copy a file to a location where it previously existed, its original created date is resurrected – regardless of the created date of the actual source file.

<Sigh/>.

The name ‘tunneling’ derives from a quantum mechanics phenomena (hence the clickbait post title) where particles emerge in seemingly impossible locations. The original motivation for this design is irrelevant for decades now (probably since MS-DOS), and it is most probably left around as a compatibility constraint. Even better, on my own computer it seems the lifetime of the cache (advertised to be 15 seconds) is really infinite, and modifying it via the MaximumTunnelEntryAgeInSeconds registry key isn’t working.

<Double sigh/>.

Solutions

You can shut down tunneling altogether by setting HKLM\SYSTEM\CurrentControlSet\Control\FileSystem\MaximumTunnelEntries to 0, as shown in the KB article.

If for whatever reason you prefer not to mess with arcane NTFS registry keys, you can do the following: rename your bin folder (the one holding the created-date cache) to a temporary name, create a new bin folder and copy the previous bin contents to it. This seemingly no-op clears the tunneling cache and in my experience rids of the last bogus out-of-date checks.

These redundant re-builds are pretty much gone for me now. Please do tell me in the comments if it worked for you – and more importantly, if it didn’t.

Here’s a short cheat sheet summarizing an internal talk I gave a while ago about VC++ build diagnostics. These switches and tools are all documented, most are surveyed in previous posts here, and all are very useful when struck by baffling build/load errors.

To use this effectively you must first define the exact stage where the failure occurs (which isn’t always trivial), then use the listed switch/tool.

Specifically Process Monitor is useful in multiple contexts – but mostly when you suspect used file is being consumed from an erroneous location. I recommend not adding a process name filter, as multiple processes are typically involved – devenv.exe, MSBuild.exe, cl.exe, link.exe, mspdbsvr etc. For header files there’s /ShowIncludes and for dll’s there’s link /verbose or Gflags/ShowSnaps, but many other files are consumed and have potential for errors (property sheets, resources, etc.) and ProcessMonitor covers them all.

A lot of cheese was moved around in the process, and it would probably take me a while to know my way around again. As a prominent example, dbgint.h is now replaced by debug_heap.cpp – which is a borderline-breaking change: dbgint.h was kind-of-documented (although wrapped in disclaimers and admittedly an internal implementation detail), and real code came tumbling down. What’s worse, the type declarations that were available in dbgint.h are now hidden in debug_heap.cpp – which includes many un-published internal headers – and tool writers would probably have no choice but to cut and paste the type declarations and hope for the best.

I’m not entirely sure this breaking change (and others like it) are by design. One can still hope the final bits would see this fixed. What’s much worse –

L Published MS symbols are all stripped

Which means you can’t step through CRT/MFC sources. This is a major setback in productivity, and I hope only a temporary one.

J Context operator replaced

The context operator (‘{,,dll}symbol’) while being mighty useful at debug time, was broken beyond repair – and as I hoped in 2009 is replaced by the windbg-like ‘!’ operator:

However, as apparent in the screenshot:

L Context operator no longer deduces type

… and explicit casts are in order where they previously weren’t. That might seem like a quibble but this in fact prevents some very useful hacks previously available, notably checking memory integrity from the debugger:

The closest I currently have to a workaround is to capture the function to a variable in code, and invoke it from the watch window:

J Micro Profiling!

After you step past a code line in the debugger, a neat little tooltip appears:

A previous post discussed the Windows Debug Heap – with the main motivation being how to avoid it, as it is just empty, expensive overhead, and it isn’t clear why it is on by default in the first place. Remarkably, 3 weeks after posting here the VC team announced that from VC “14” (now in CTP) onwards the WDH will be opt-in and not opt-out, as it should be. So hopefully in ~2 years the recommendations in the last post will be largely obsolete. Anyway, I’m often facing unworkably slow run times in debug even after disabling the WDH, and further measures are in order.

In debug builds the VC++ CRT implementation runs a hefty load of iterator-validation tests on all STL containers – the simplest example being raising an assertion when an std::vector subscript is out of range. This leads to the unfortunate reality wherein if someone writes C++ code ‘by the book’, using standard containers for everything, s/he often ends up writing code that is utterly unusable in debug. In one of our projects image classes were coded with std::vector as the container for the image bits. The product code is very intensive in iteration on pixels of many images, and as a result debug builds completed a typical job in ~4 hours, whereas a release build completed in ~4 minutes. For a long while debugging that project was reduced to logging and stepping in disassembly, as debug builds were completely useless.

Now for some good news and some bad news. The good news is that this behavior is customizable via the _ITERATOR_DEBUGGING_LEVEL macro: #define it to 0 (or 1, if you have a particular need for it) early in the compilation – say in the project properties or the top of a precompiled header– and this disproportional computational overhead is gone.

Well that was a tad dramatic – it doesn’t always work, and in /clr builds in particular.

<rant>Now /clr projects will probably forever be second class citizens in the VC universe. Features will forever be coded for mainstream native code and will trickle down to /clr code as time and priorities permit (two notable examples are data breakpoints and debugger visualizers that are still unsupported in mixed debugging, but trust me – there are plenty more). </rant> Anyhow, as far as _ITERATOR_DEBUG_LEVEL goes – this is much more a bug than something resembling a decision. The venerable Stephan T. Lavavej elaborates (3rd reply from the bottom):

…The underlying problem is that _ITERATOR_DEBUG_LEVEL affects the representations of STL containers, and C++ (both native and especially managed) really hates it when code can’t agree on the representation of an object. When _SECURE_SCL/_HAS_ITERATOR_DEBUGGING were added in VC8, we should have created 5 variants of the CRT/STL binaries (including DLLs). Unfortunately we didn’t (this was before my time, otherwise I would have spoken up), and having only debug and release DLLs causes headaches. We suffered from longstanding problems in VC8/9 until we overhauled how this worked in VC10. During VC10 we untangled the worst of the problems by making std::string header-only. With invasive surgery we were able to get native code working correctly in every case except one very obscure one that nobody has noticed or complained about yet. (We now have 5 static libs, which solves the case of static linking absolutely 100% perfectly, but still only 2 DLLs.) But managed code is structured differently, and the tricks that work in native don’t work for it. As a result, customizing _ITERATOR_DEBUG_LEVEL basically doesn’t work under /clr[:pure]. Very few customers have encountered this (you’re one of the first) because we changed the release mode default to IDL=0 (which everyone wants), and few people want to modify debug mode’s default of IDL=2.

The thread is from Jan 2011 and this particular issue was resolved in VS2013. Similar issues remain, and I’m not sure /clr code would ever make it into routine test matrices in MS – so as the CRT code evolves, these issues would probably keep popping.

Bottom line: if – like me – you’re debugging C++ code that is both managed and makes extensive use of STL – your mileage may seriously vary when trying to customize iterator debug level. If you do develop purely native code and are trying to accelerate debug runs, I do recommend judiciously setting _ITERATOR_DEBUG_LEVEL to zero – and raising it back only when you’re tracking concrete iterator issues.

Have you considered making your “debug build” compile in release mode with no optimizations? Release mode versus debug mode affects what CRT/STL/etc. you link to (and whether you can effectively debug into them) and as a side effect affects your IDL default, but it’s not inherently tied to whether your own code is compiled with optimizations or not, and that’s what affects the debuggability of your own code. The IDE pairs release mode with optimizations and debug mode without optimizations, but there’s no fundamental reason linking the two.

This makes sense and I briefly experimented with this approach. Sorry to report, still no success. While I still can’t pinpoint the root cause, even when compiling release builds with /Od (optimizations disabled) the debugging experience is severely crippled. (and yes, of course I raised the proper PDB generation switches in both the compiler and the linker). Local variable watch and single-steps seemed highly erratic, ‘this’ pointer for class methods seemed to stick with $rcx throughout the method and thus give rubbish on member watch – etc. etc.

However, this is a step in a better direction. More on that in the next post.