Comment Feed for Channel 9 - Arun Kishan: Inside Windows 7 - Farewell to the Windows Kernel Dispatcher Lockhttp://video.ch9.ms/ch9/96dd/ad626ae3-081e-49e7-b4c8-e58d050496dd/ArunKishanWin7DispatcherLock_100.jpgChannel 9 - Arun Kishan: Inside Windows 7 - Farewell to the Windows Kernel Dispatcher Lock
You've learned about many of the new features of the latest version of the Windows kernel in the
Mark Russinovich Inside Windows 7 conversation here on Channel 9. One of Mark’s favorite kernel innovations is the way the Windows 7 kernel manages scheduling of threads and the underlying synchronization primitives that embody kernel thread management.
Prior to Windows 7 (and therefore Windows Server 2008 R2) the Windows kernel dispatcher employed a single lock, the dispatcher lock, which worked well for a relatively small numbers of processors (like 64). However, now that we find ourselves in the
midst of the ManyCore era, well, 64 processors aren’t that many... A new strategy was required to scale Windows to large numbers of processors since a single lock is limited in capability, by design: The masterful David Cutler, one of the world's greatest
software engineers, wrote the NT scheduler in a time when the notion of affordable 256-processor machines was more science fiction than probable. As we learned in the Mark Russinovich video, Windows 7 can now scale to 256 processors thanks to the great engineering of
Arun Kishan, a kernel architect you've met on C9 back in the Vista days. In order to promote further scalability of the NT kernel, Arun completely eliminated the dispatcher lock and replaced it with a much finer grained set of synchronization primitives.
Gone are the days of contention for a single spinlock. How did Arun pull this off, exactly, you ask? Who is this genius? Well, tune in. Lots of answers await…Arun's work directly benefits the overall performance of Windows running on many processors and means, simply, Windows can now really scale. Thank you, Arun!
Spinlocks are synchronization primitives that cause a processor to busy-wait until the state of the lock’s memory location changes.
As the name implies, the dispatcher lock is the fundamental lock associated with the kernel dispatcher, or the scheduler.
enTue, 26 Sep 2017 22:07:54 GMTTue, 26 Sep 2017 22:07:54 GMTRev9Re: Arun Kishan: Inside Windows 7 - Farewell to the Windows Kernel Dispatcher Lock
The interview I have been dreaming of, and about, and over and...

Enjoy! What Arun accomplished really is amazing. I'm just blown away by the elegance of his solution and the engineering strategies he employed to pull it off (you'll learn about those towards the end of the conversation).

C

posted by Charles

]]>
https://channel9.msdn.com/Shows/Going+Deep/Arun-Kishan-Farewell-to-the-Windows-Kernel-Dispatcher-Lock#c633851885110000000
Thu, 06 Aug 2009 20:48:31 GMThttps://channel9.msdn.com/Shows/Going+Deep/Arun-Kishan-Farewell-to-the-Windows-Kernel-Dispatcher-Lock#c633851885110000000CharlesRe: Arun Kishan: Inside Windows 7 - Farewell to the Windows Kernel Dispatcher Lock
Blimey, you've certainly got to have had your head in the kernel for awhile to
really keep up with that, but by the end I understood the general concept, and it sounded impressive .

This is fascinating to see how, with such talented people, a kernel designed more than decade ago can be enhanced to suit today's needs. This is why I love Operating System design and programming. There are such great foundations and languages to build upon
but also so many improvements to do.

]]>
https://channel9.msdn.com/Shows/Going+Deep/Arun-Kishan-Farewell-to-the-Windows-Kernel-Dispatcher-Lock#c633852429930000000
Fri, 07 Aug 2009 11:56:33 GMThttps://channel9.msdn.com/Shows/Going+Deep/Arun-Kishan-Farewell-to-the-Windows-Kernel-Dispatcher-Lock#c633852429930000000aL_Re: Arun Kishan: Inside Windows 7 - Farewell to the Windows Kernel Dispatcher Lock
I am a C# Developer who has always worked with Locking, Threads, etc through .NET and never through C++ libraries. Even though that is the case, would it not help optimization to have a mechanism for the developer to suggest/hint to the OS an okay timeframe
for a long running Wait to be loaded back from Paged Memory? In other words, a developer may know it is never important for a certain Thread to be running again for let us say 15 days. Consequently, she/he can add a TimeSpan argument to his Thread Function/Method
that gives a hint to OS that while it might be less than 15 days that Thread says it now wants to run because Wait was satisfied or whatever; he is okay with it taking up to 15 days. Then, the OS could decide based on OS Processor's resources to not have
to pay any attention to this Job/Thread running again in Non-Paged Memory if 15 days has transpired. If OS Resources are at a very low rate of utilization and there is plenty to go around for all Threads/Processes, it could then check this Thread/Job to see
if it wants to run sooner than 15 days.

Does this make sense? If it does, is this already built into Windows 7 or even Vista/XP? The basic thing I am trying to say is to give the OS a shortcut to skip over Thread Wait checks if resources are very low. In other words, the Threads with hints
like those that I mention above could be set aside completely in Paged Memory or even to Disk if there were many demands on resources and not even have to be checked to see if they need to run if need be. Then, when resources were plentiful, they could be
checked to see if they might want to be run and initiated at maybe a time where OS is, running yet User(s) are in bed, not at Server/Computer, etc.

It would be analogous to a Doctor triage where the Doctor could say, Yes, No, Revisit in 2 hours if you have time yet do not even think about this guy/gal unless you have nothing else to do.

The OS will already page out your stacks/process after some time of inactivity. I believe a thread becomes a candidate after about 4 seconds. Once the pages become a candidate for theft, then it's only a matter of time and memory pressure when the Memory
Manager will rip them away and use them for something else.

Basically, you don't need to give a 'hint' to the OS since it will do what you want on its own. On the other hand, if you want the thread to run quickly when it gets signaled, you may be in trouble because of this behavior. If the pages make it to the
disk, it could take about 10-15 milliseconds between when your thread is activated and when it can run (this is the typical seek time of a laptop disk). If the disk is already busy, it could take even longer.

]]>
https://channel9.msdn.com/Shows/Going+Deep/Arun-Kishan-Farewell-to-the-Windows-Kernel-Dispatcher-Lock#c633865752160000000
Sat, 22 Aug 2009 22:00:16 GMThttps://channel9.msdn.com/Shows/Going+Deep/Arun-Kishan-Farewell-to-the-Windows-Kernel-Dispatcher-Lock#c633865752160000000suneelreddyRe: Arun Kishan: Inside Windows 7 - Farewell to the Windows Kernel Dispatcher Lock
Fascinating. I would be interested to know what Dave Cutler's take on all this was. Was he involved in any of the early discussions? Did Arun formulate the solution first and take it to him (formally or informally)? Did he say "wow, great idea!" or perhaps
"nice try Rookie, but you forgot to assert the make-it-work bit on line 24"...or (perish the thought) did he get all defensive about his baby and start mumbling about Reagan-era priorities... This interview is the very essence of why Channel 9 rocks...

]]>
https://channel9.msdn.com/Shows/Going+Deep/Arun-Kishan-Farewell-to-the-Windows-Kernel-Dispatcher-Lock#c633925472140000000
Sat, 31 Oct 2009 00:53:34 GMThttps://channel9.msdn.com/Shows/Going+Deep/Arun-Kishan-Farewell-to-the-Windows-Kernel-Dispatcher-Lock#c633925472140000000CharlesRe: Arun Kishan: Inside Windows 7 - Farewell to the Windows Kernel Dispatcher Lock
I did some performance tests to compare Vista and Seven on a 16 core, dual boot Vista/Seven.The result is a consternation :- same 16 threads using TLS : 18% slower on Seven- same 16 threads using std:map per thread data : 2x slower on Seven- same 16 threads using only local variable on stack and only doing only math operations : 38% slower on Seven.Moreover : on Vista all threads finishes nearly at the same time, it is no more the case on Seven. On Seven the threads finish one after each other (after a long time), and when only 1 thread remains, it switches from 1 core to another one randomly (I do not set the affinities in this test).I am continuing my tests with critical sections, interlock instructions...