Accessing the input buffer while a read operation is using the buffer may lead to corruption of the data read into that buffer. Applications must not read from, write to, reallocate, or free the input buffer that a read operation is using until the read operation completes.

This is the first time I've ever heard of reading data causing corruption.
So my question is, why does that happen? How can a read operation possibly cause data corruption?
What's going on underneath that causes this?

Pure speculation, but one could imagine that the implementation modified the protection flags of the underlying pages and waited for a page fault to signal read completion. Then you reading from the pages while the operation was in progress would result in corrupted data being returned. Similar to what is described here: blogs.msdn.com/b/oldnewthing/archive/2006/09/27/773741.aspx
–
user786653Feb 27 '13 at 16:33

@user786653: That's a very interesting point! Though it makes me wonder, where would the page fault handler be? I wonder if it's inside WaitForXxx or something...
–
MehrdadFeb 27 '13 at 16:49

2

There's nothing "unanswerable" about it; just because you and I don't know the answer doesn't mean it doesn't have one, especially when the documentation so explicitly mentions the problem. And the problem couldn't be more practical: knowing how your tools work makes you use them better. There's no reason to constrain yourself to a single abstraction level and have zero idea what the heck is going on underneath. But heck, go ahead and vote to close if it makes you feel you accomplished something great today.
–
MehrdadFeb 27 '13 at 17:56

1

This isn't really a practical programming question. The rules say don't do it, so don't do it. The reason why doesn't change the rules. Suppose I said the reason is that "on certain hardware, reads can trigger latching." Does that change in any way how you write your code?
–
Raymond ChenFeb 27 '13 at 21:15

1

Gaining an understanding of why something isn't permitted can be useful. Sure, you still have to follow the rules even if you don't understand why the rules are the rules. But there's still value in understanding why a rule might exist. For example, John Regehr's articles on undefined behavior in C (such as blog.regehr.org/archives/759) are helpful for understanding why certain constructs that might appear just fine at first glance are in fact undefined behavior. And that understanding might help programmers spot those dodgy patterns.
–
Michael BurrFeb 28 '13 at 9:22

3 Answers
3

ReadFileEx is implemented to use NtReadFile (more or less it's just a thin wrapper around it). NtReadFile does a lot of stuff but it uses IoBuildAsynchronousFsdRequest (or IoBuildSynchronousFsdRequest) to perform its task. From this article we know that:

If the target device object is set up do direct i/o (DO_DIRECT_IO), then IoBuildAsynchronousFsdRequest creates an MDL to describe the buffer and locks the pages.

(emphasis is mine)

Then I guess they call MmProbeAndLockPages with IoWriteAccess, this is done by the driver in kernel mode then the user supplied buffer (in user mode) can't even be accessed for read.

I don't know what will happen if you do it, probably a SEH exception will be thrown and your code will fail.

EDIT
As pointed out in the edited question even the ReadFile function forbids the user to read from the buffer until operation has been completed and it may returns ERROR_NOT_ENOUGH_QUOTA:

The ReadFile function may fail with ERROR_NOT_ENOUGH_QUOTA, which means the calling process's buffer could not be page-locked.

At least this makes clear that ReadFile (where the buffer isn't provided by the user) will allocate a page and it'll lock it (ok it has been said in the article I linked too...). It remains to understand if the corruption (if any, about this I strongly agree with @David) can occur with user defined buffer too (where a lock on the page, as pointed out by @Ben, most of times is impossible).

I don't think it uses page faults to detect buffer overruns simply because it knows the required amount of data before the call then it can allocate it once.

So why data can be corrupted?
After all everything here can due to an error but not to data corruption. This is a big guess but there was a known issue about MmProbeAndLockPages:

This issue occurs because of a race condition in the Memory Manager. When a driver calls the MmProbeAndLockPages routine, this routine may read some data that is being modified by another thread. Therefore, data corruption occurs. Depending on how the corrupted data is used, the application or the system may crash.

It's hard to say if this issue has been resolved at very low level or if can still exploit if application does something weird...

Crashes make sense, but a corruption? That's what confuses me. Anyway, +1 for the info. :)
–
MehrdadFeb 27 '13 at 16:56

A closer look will show that the semantics of ReadFile don't permit the kernel to do anything as drastic as marking the entire page write-only. (Well, maybe if you've opened the file for direct unbuffered I/O, in which case you're required to use entire pages as the buffer)
–
Ben VoigtFeb 27 '13 at 17:04

@BenVoigt I'm not sure, after all you supply a buffer and they ask you to don't even read from it so it may even mark the page for write-only. Anyway I guess (again) they ask you to don't even read because you may make their call to MmProbleAndLockPages to fail (what if you use the same buffer in a routine that deep inside calls MmProbeAndLockPages?)
–
Adriano RepettiFeb 27 '13 at 17:10

1

@Adriano: There's no prohibition on continuing to access other data on the same page as the buffer.
–
Ben VoigtFeb 27 '13 at 17:11

@Mehrdad ...then their call will fail (inside NtReadFile) and it will cause data corruption (where and how the exception is handled inside the ReadFileEx call stack?)
–
Adriano RepettiFeb 27 '13 at 17:11

Most likely, the corruption when you read from the I/O buffer results from the race condition -- the buffer may be partially filled in when you read from it, and the order in which it is filled in is unspecified. In addition, Windows could store anything in there during the time it owns the buffer -- you aren't guaranteed to see either the prior content or the data from the file.

What you can be sure of is that it isn't related to access violations when reading from the buffer, because it's perfectly legal to continue accessing other data in the same page. Only the buffer itself is forbidden to your use. Now, when the file is open for direct unbuffered I/O (FILE_FLAG_NO_BUFFERING), and the volume sector size is a multiple of the memory page size, then, the buffer is required to correspond to a sequence of complete pages, so the kernel has more freedom at that point. But that's a very particular set of conditions, and it's rare for the sector size to exceed the memory page size.

I don't think that's what they meant. They clearly say that the mere action of reading causes corruption "of the data read into that buffer", not corruption "of the application state" or something vague like that.
–
MehrdadFeb 27 '13 at 17:01

@Mehrdad: You read the buffer, and find unexpected data present there. Many people would describe that as corruption... even if the correct data is later present at the time the operation completes.
–
Ben VoigtFeb 27 '13 at 17:02

It's not a corruption of the data, it's a corruption of your state. There's a clear difference and what you're saying is not what they're describing. (Also, food for thought: if all I wanted the data for was help generating a random number, then a race condition would be the perfect thing!)
–
MehrdadFeb 27 '13 at 17:10

Yeah, so in your own words, if you read the data (2), then you can be penalized by corruption of the data you read (1). Confirms exactly what I'm asking.
–
MehrdadFeb 27 '13 at 17:34

No, that's not what I meant at all. The documentation states that accessing the buffer may corrupt it. It does not state what form of access may lead to a corruption.
–
David HeffernanFeb 27 '13 at 17:35

Yeah, because any of the forms of access they described can lead to the corruption. That's why they didn't say "write" instead of just "access"; read is a form of access. What's the point of mentioning reading otherwise?
–
MehrdadFeb 27 '13 at 17:49

That's one way to read it. But it can be read differently.
–
David HeffernanFeb 27 '13 at 17:51

Note that the text does not say that accessing the buffer will corrupt the data, only that it may. It is saying that the operating system is allowed to optimize on the assumption that you will not access the buffer during the I/O, and if you violate the assumption, then the integrity of the data is not guaranteed.
–
Raymond ChenFeb 28 '13 at 5:18