Working with Large Memory-Mapped Files in VB 2010

WEBINAR:On-Demand

Introduction

The truth be told sometimes a technology comes along, and I don't have a user story for it. So, the technology may not be that hard to describe, but figuring out a user story just doesn't come to me or seem compact enough to fit into an article length column. Memory-mapped files are like that.

Generally a file is something that is read all at once or sequentially into memory and then manipulated. (Random access is supported with traditional streaming libraries, too.) The problem is very large files can easily blow out memory, especially on a 32-bit machine with a two gigabyte limit. Memory-mapped files logically treat all file data as if it is loaded in memory, and you can access any part of the file as if it were.

In a nutshell, a memory-mapped file allows you to treat a file as if it were entirely loaded in memory, and there is no logical upper limit to the file size. Without spilling the beans entirely let's chunk up the differences between File I/O and memory mapped files into some meaty bits.

File I/O vs. Memory Mapped Files

A traditional use for System.IO file usages is to read a file into memory and manipulate it or seek to a point in a file and manipulate that file at that point. The challenges with traditional File I/O have to do with physical memory limitations-two gigabytes (or usually much less) for 32-bit systems-and the cost of seeking through very large files. A streaming based file access system would probably not work very well for a database server, large text documents, or anything file-based that is particularly large.

The memory mapped file capabilities in .NET framework 4 work with physical files and logical files. The memory mapped capabilities logically map a file (or part of a file) and lets you treat the file as if it were all loaded in memory. The system's memory manager takes care of moving between the logical and physical mapping of a file. (There is also stream-based support for memory mapped files.) What this means is you can have a an extremely large file and interact with it as if it were all loaded and the memory manager handles moving the bits to and from its physical location.

There is still a two gigabyte limit for memory mapped files in a 32-bit system, but it is two gigabytes per chunk and you can create multiple chunks. The idea is that with memory mapped files you can grab all or a chunk of the file from a starting and ending location and access that chunk. Because you can split chunks up you can access more than the two gigabyte upper limit (on a 32-bit system), and you can access the chunks on different processes. Remember because you are logically working with memory you can think of mapped files as you would any other memory-this means breaking bits up between processes.

Accessing a File Using File IO

I think a challenging scenario might be writing your own file server, database server, or maybe a document processing application and explore how memory mapped files might help might improve performance. (However, this is probably a pretty big task.)

The task I picked was to perform a word frequency count. The example simply reads a file and counts the frequency of words. Listing 1 demonstrates one way you might do this using standard File I/O.

Listing 1 is pretty straight forward. Read all of the lines of a text file. Split each line into an array using an array of non-word characters and stick each word in a hashtable. Every time the same word is stuffed in the hash table--the word is the key and the count is the value--the value at that hash location is incremented. When you are all done you have a word frequency count. You could use the same approach for tasks like search and replace, highlight keywords, spell checking-thinking word processing here. You get the idea.

Working with Large Memory-Mapped Files in VB 2010

WEBINAR:On-Demand

Accessing a File Using a MemoryMappedFile

Memory mapped files can map to a logical file or a physical file. You can access a memory mapped file using a MemoryMappedViewAccessor, a MemoryMappedFile, or a MemoryMappedViewStream.

There are two kinds of memory mapped files: persisted and non-persisted. Persisted memory mapped files are files that are persisted on the file system. When the last process wraps up the data is saved to the file system. Non-persisted memory mapped files are not associated with a file in the file system, and when the last process is finished the data is lost and the memory is reclaimed by the garbage collector.

If you want to work with a memory mapped file in the same way you work with file streams then create an instance of the MemoryMappedViewStream. The methods in that class jive with filestream methods. If you want to work with persisted and non-persisted views in a non-filestream based way then request a MemoryMappedViewAccessor. Stream-style usage is going to have methods like Read, Write, and Seek. Non-stream-style usage supports methods that allow you to read native types like characters and numbers as well as structures and arrays of structures.

Listing 2 demonstrates how to read all of the characters in a file using MemoryMappedFile and count the frequency of words. Notice that there is no single place where all of the file data is accessed at once.

Listing 2 performs the same task but works differently. The first statement clears the storage Hashtable (no big deal). CreateFromFile is one of the methods that creates a MemoryMappedFile instance from a persisted file--a file on disk. There are other methods for persisted and non-persisted files. MemoryMappedFile.CreateViewAccessor without parameters maps the entire file to memory returning a MemoryMappedViewAccessor. Both the MemoryMappedFile and MemoryMappedViewAccessor are IDisposable so use a Try Finally block and explicitly call Dispose or a Using statement which implicitly calls Dispose at the end of the Using statement. (A Using statement is basically interpolated into a try finally block at compile time.)

Because String is not a discrete type MemoryMappedFiles do not read strings, so my approach reads a byte at a time breaking words on the characters I defined as word delimiters. Again, each unique word is inserted into the Hashtable and the number of inserts is counted. Because the text file, word-finder concept is so simple there aren't significant performance differences in this demo as written. When basic File I/O and streams are too slow or won't work then use a MemoryMappedFile.

The example in Listing 3 splits the text file into a couple of chunks using a MemoryMappedFile and threads to illustrate that MemoryMappedFiles support multiple, simultaneous processes against the same file.

Listing 3: Using the BackgroundWorker to split reading the mapped file into multiple threads.

The revised sample program uses the ConcurrentDictionary which is thread-safe. The file is split in half and each half is processed on its own BackgroundWorker.

Summary

The MemoryMappedFile supports a stream mode that lets you perform seeks like the System.IO File stream classes and it supports mapping a file to memory so you can access very large files as if they were an object in memory. For extremely large files, like those that exceed the 32-bit memory limit you can split the file into multiple MemoryMappedViewAccessors and operate chunks of the file up to the logical memory limit of each process.

About the Author

Paul Kimmel

Paul Kimmel is the VB Today columnist for CodeGuru and has written
several books on object-oriented programming and .NET. Check
out his upcoming book Professional DevExpress ASP.NET
Controls (from Wiley) now available on Amazon.com and
fine bookstores everywhere. Look for his upcoming book
Teach Yourself the ADO.NET Entity Framework in 24
Hours (from Sams). You may contact him for technology
questions at pkimmel@softconcepts
.com. Paul Kimmel is a Technical Evangelist for
Developer Express, Inc, and you can ask him about Developer
Express at paulk@devexpress.com
and read his DX blog at http://
community.devexpress.com/blogs/paulk.

Advertiser Disclosure:
Some of the products that appear on this site are from companies from which QuinStreet receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. QuinStreet does not include all companies or all types of products available in the marketplace.

Thanks for your registration, follow us on our social networks to keep up-to-date