Whether kernel mode is in on the trick is immaterial since it is not
part of the trick.

Let's start with a version of the code which does not take advantage
of the OVERLAPPED structure address in the way
described in the article.
This is a technique I found in a book on advanced Windows programming:

This version of the code uses the address of the
OVERLAPPED structure to determine the
location in the MasterOverlapped table
and uses the corresponding entry in the parallel array
at OtherData to hold the other data.

Instead of doing simple pointer arithmetic to recover
the index, we walk the array testing the pointers.
This is naturally worse than doing pointer arithmetic, but
watch what this step allows us to do:
First, we reorganize the data so that instead of two
parallel arrays, we have a single array of a compound
structure.

Now that it's an array of compound structures, we don't need
to carry two pointers around (one to the OVERLAPPED
and one to the OTHERDATA).
We can just use a single OVERLAPPEDEX pointer
and dereference either the Overlapped
or the OtherData part.