File systems in Windows are expected to offer a host of features that applications in turn rely upon. One of those features is “byte range locks”, which are implemented to allow sharing at the file level, while controlling sharing of individual ranges of the file. While this seems simple, the actual implementation (and its use by applications) is a little trickier than expected.

In this article we will describe byte range locks, how they work, and then suggest how you can implement them in your file system using the file system runtime library (FsRtl) package to implement them.

What are Byte Range Locks?

First, in order for byte range locks to be generally useful, applications using them must be sharing file level access – otherwise another process cannot open the file. Of course, there may be some application that uses byte range locks entirely internal to itself but we did use the term “generally useful”!

Thus, a byte range lock offers to serialize access to a region of a file, protecting it from other readers and writers (“exclusive” access) or allowing reader access, but preventing other writers (“shared” access). Normally, the application specifies the region by providing a range of bytes. The file system then rounds that region to the nearest available boundary (and what that boundary is depends entirely upon the file system).

Subsequently, if one application “owns” a lock on a region, all I/O operations need to be checked against that lock to ensure compatibility. For example, a write operation will only succeed for a process owning the exclusive lock on the file (or if there are no locks on that region of the file). This ensures that an application can safely update that region of the file without impacting any other applications because if they needed that section of the file not to change they would use a byte range lock to ensure that it does not actually change!

Issues When Using Byte Range Locks

There are some caveats for a file system implementing (or application using) byte range locks. For example:

Server applications distinguish the locks used by different clients by specifying a key value.Subsequent read and write calls must then be checked against that key to ensure correct behavior.

Some oplocks (the cache coherency protocol used by CIFS/LanManager) are incompatible with byte range locks, so those oplocks must be broken whenever a byte range lock is taken against the file.

Byte range locks are not enforced or available for paging I/O operations. This means that byte range locks are not enforced for applications that memory map the files. Applications using memory mapping have advisory usage for byte range locks (that is, the applications can use them and if everyone uses them the behavior is correct, but if anyone does not, they can access the data).

Byte range locks are associated with the given file object, the process requesting the byte range lock, and the range of bytes, as well as the optional key value. Since a single file object can be associated with multiple file handles (e.g., by duplicating the handle) it is not directly “handle based”.

Byte range locks can be used for any region of the file address space, even those regions that are not part of the current file. Thus, a 4KB file still allows byte range locks in the 4MB region of the file – and some applications rely upon this to implement their own access control mechanism.

Typically, these issues are all handled within the file system by using the file system runtime library package because it “does the right thing” – at least in terms of being compatible with the existing file systems.

Data Structures

The file system runtime library relies upon the FILE_LOCK_INFO data structure for tracking the individual byte range locks and the FILE_LOCK data structure for tracking the list of byte range locks for a given file. For example, the FAT file system example in the IFS Kit (shown in Figure 1a) allocated the FILE_LOCK structure as part of its file control block (FCB) and initializes it as part of its internal initialization(See FatCreateFcb in strucsup.c for the FastFat source code from the Windows XP IFS Kit).

Both of these are valid solutions to the problem. They indicate the expected behavior for each file system because in the FAT implementation there is an assumption that byte range locks are a regular occurrence and hence the storage is pre-allocated. In the CDFS implementation there is an assumption that byte range locks are unusual – but supported – and allocated as needed.

Note that the FILE_LOCK structure is typically associated with a given file, via the file control block. Thus, all locks on the file are associated back to the common repository. Again, there’s no requirement that this be the case for your file system. For example, if you were implementing an encryption file system where you had two different FCB structures – one for the encrypted form and one for the unencrypted form – you might choose to have a third “common” data structure that was used to track state common to both FCBs – such as the file lock structure.

Initialization

Regardless of how the FILE_LOCK structure is allocated, it must be initialized prior to use by the file system runtime library. For the physical media file systems, this is rather straight-forward because they do not utilize the “callout” functions available in the package. Network file systems, however, provide these callout functions as shown in Figure 2. While no longer included in the Windows XP IFS Kit, RDBSS (the mini-redirector wrapper) was include in the Windows NT 4.0 IFS Kit. (See RxFinishFcbInitialization in rdr2\supplied\rxce\fcbstruc.c for the RDR2 mini-wrapper source code from the Windows NT 4.0 IFS Kit).

In this case they use two functions to provide “callback” functionality to the network redirector after the byte range lock has been granted by the local system. Thus, a network file system can process the byte range lock request to a remote server responsible for coordinating locks between machines after the lock has been granted by the local machine.

The function RxLockOperationCompletion actually has an extensive comment, describing its operation and the complications involved in dealing with byte range locks. However, the salient point for our discussion is that it is called after the lock has been granted locally (by the file system runtime library) and it is now time to attempt to acquire the lock via the controlling server (presumably a file server or other lock server). Since this can be a potentially long-running operation it is imperative that the network file system not lock out other threads from proceeding – otherwise it will block other threads from running while trying to obtain a byte range lock, which could cause the system to appear hung.

The function RxUnlockOperation is called when a lock is being released by the application (or when the application terminates prematurely and the I/O Manager releases all locks held by the given processes). This provides the file system to coordinate the release with the remote (network) server. In one odd case for this unlock operation, a NULL context value can be passed to the file system to indicate that a lock request was made but not granted. Typically a file system would ignore such a case, although this call does allow the unlock routine to change the return status code for the original lock request.

For both RxLockOperationCompletion and RxUnlock Operation the locking operation blocks and waits for the routine to return.Thus, the lock or unlock operation is not completed until these routines have an opportunity to examine them and process them. For both routines, the return value from the routine is used to return a specific status value to the caller. For example, if the RxLockOperationCompletion routine finds that a lock is not available from the server, it can reject the call by returning STATUS_LOCK_CONFLICT. Of course, if there is some other reason to fail the call (a memory allocation failure, for instance) then the file system should return the appropriate error code.

Processing Lock Control Operations

So, when a file system received a lock control IRP or fast I/O call, it can use this initialized data structure and the necessary information to process it. Since these calls are rather self-explanatory we list them here for reference:

Call

Description

FsRtlProcessFileLock

This call processes the IRP variants of lock control.

FsRtlFastLock

This call processes the fast I/O routine for acquiring a lock. Note that this is nothing more than a macro that calls FsRtlPrivateLock.

FsRtlFastUnlockSingle

This call processes the fast I/O routine for dropping a single lock.

FsRtlFastUnlockAll

This call processes the fast I/O routine for dropping all locks held by a given process.

FsRtlFastUnlockAllByKey

This call processes the fast I/O routine for dropping all locks held under a given key by the specified process.

The remaining task for a file system is to check the status of the locks during I/O to ensure that the locks are consistent with the lock state on the file. There are four calls for a file system to use, depending upon whether the operation is a read or a write, and whether or not it is IRP-based or fast I/O based.

They are:

Call

Description

FsRtlCheckLockForReadAccess

This is the call to make when checking a read IRP.

FsRtlCheckLockForWriteAccess

This is the variant for write IRPs.

FsRtlFastCheckLockForRead

And this is the fast I/O variant for read.

FsRtlFastCheckLockForWrite

And this is the fast I/O variant for write.

In this case, a physical media file system can rely entirely upon the file system runtime library to do this check as shown in Figure 3. (See read.c in the FastFat code from the Windows XP IFS Kit).

//// We have to check for read access according to the current// state of the file locks, and set FileSize from the Fcb.//

This code demonstrates essentially everything required for a physical media file system – it does not do this check for paging I/O (remember our discussion earlier about memory mapped files?) This should do this check for all user access (cached or uncached) to ensure they are correct.

The implementation for a network file system is potentially more complicated because lock state may be stored on a remote server. The current mini-redirector model does not offer support for the mini file system to check the remote lock state, which is acceptable in a model where data is being written only on the local machine, or written directly back to the server. In the former case, the local lock state is sufficient.In the latter case, the server will do the lock check.

The fast I/O logic in FAT uses the read and write check (shown in Figure 4) to determine if fast I/O is possible. (See FatFastIoCheckIfPossible in fatdata.c in the FastFat file system code from the Windows XP IFS Kit).

////Based on whether this is a read or write operation we call//fsrtl check for read/write//

In this case, it only uses it to determine if the fast I/O operation is allowed to proceed – and if this fails it will ask the caller to build and send an IRP.

Managing IsFastloPossible

The existing file systems manage the IsFastIoPossible field within their file control block structure (part of the FSRTL_COMMON_FCB_HEADER) based upon their file lock state. (See example in Figure 5). (See fatprocs.h in the Windows XP IFS Kit).

////The following macro is used to set the is fast i/o possible field in//the common part of the nonpaged fcb//////BOOLEAN//FatIsFastIoPossible (//IN PFCB Fcb//);//

FAT uses this macro internally to set the state of this field based upon the byte range lock state. Notice that if there are byte range locks, the file system will immediately indicate that fast I/O is not possible

Of course, the exact determination here is up to the specific file system, but the added overhead of examining the lock table seems to trigger rejection of the use of the fast I/O path.

At this point, you might ask why the check in FatFastIoCheckIfPossible was necessary at all, since FAT will mark the file control block to indicate that fast I/O is not an option in this case. Other file systems might mark the file control block so that FastIoIsQuestionable is set, requiring the extra check in the calls to FsRtlCopyRead and FsRtlCopyWrite.

Conclusions

Byte range locking is actually relatively simple to implement in a file system and some applications (notably the Microsoft Office Suite) rely upon them to provide their functionality. Using the file system runtime library allows the file system to fully support such applications and provides a maximally useful file system.