File System Programming Guide

Performance Tips

If your app works with a lot of files, the performance of its file-related code is very important. Relative to other types of operations, accessing files on disk is one of the slowest operations a computer can perform. Depending on the size and number of files, it can take anywhere from a few milliseconds to several minutes to read files from a disk-based hard drive. Therefore, you should make sure your code performs as efficiently as possible under even light to moderate work loads.

If your app slows down or becomes less responsive when it starts working with files, use the Instruments app to gather some baseline metrics. Instruments can show you how much time your app spends operating on files and help you monitor various file-related activity. As you fix each problem, be sure to run your code in Instruments again and record the results so that you can verify whether your changes worked.

Things to Look For in Your Code

If you are not sure where to start looking for potential fixes to your file-related code, here are some tips on where to start looking.

Look for places where your code is reading lots of files (of any type) from disk. Remember to look for places where you are loading resource files too. Are you actually using the data from all of those files right away? If not, you might want to load some of the files more lazily.

Look for places where you are using older file-system calls. Most of your calls should be using Objective-C interfaces or block-based interfaces. You can use BSD-level calls too but should not use older Carbon-based functions that operate on FSRef or FSSpec data structures. Xcode generates warnings when it detects your code using deprecated methods and functions, so make sure you check those warnings.

Look for places where you are using callback functions or methods to process file data. If a newer API is available that takes a block object, you might want to update your code to use that API instead.

Look for places where you are performing many small read or write operations on the same file. Can you group those operations together and perform them all at once? For the same amount of data, one large read or write operation is usually more efficient than many small operations.

Use Modern File-System Interfaces

When deciding which routines to call, choose ones that let you specify paths using NSURL objects over those that specify paths using strings. Most of the URL-based routines were introduced in OS X v10.6 and later and were designed from the beginning to take advantage of technologies like Grand Central Dispatch. This gives your code an immediate advantage on multicore computers while not requiring you to do much work.

You should also prefer routines that accept block objects over those that accept callback functions or methods. Blocks are a convenient and more efficient way to implement callback-type behaviors. In practice, blocks often require much less code to implement because they do not require you to define and manage a context data structure for passing data. Some routines might also execute your block by scheduling it in a GCD queue, which can also improve performance.

General Tips

What follows are some basic recommendations for reducing the I/O activity of your program. These may help improve your file-system-related performance, but as with all tips, be sure to measure before and after so that you can verify any performance gains.

Minimize the number of file operations you perform. Moving data from a local file system into memory takes a significant amount of time. File-system access times are generally measured in milliseconds, which corresponds to several millions of clock cycles spent waiting for data to be fetched from disk. And if the target file system is located on a server halfway around the world, network latency increases the delay in retrieving the data.

Reuse path objects. If you take the time to create an NSURL for a file, reuse that object as much as you can rather than create it each time you need it. Locating files and building URLs or pathname information takes time and can be expensive. Reusing the objects created from those operations saves time and minimizes your app’s interactions with the file system.

Choose an appropriate read buffer size. When reading data from the disk to a local buffer, the buffer size you choose can have a dramatic effect on the speed of the operation. If you are working with relatively large files, it does not make sense to allocate a 1K buffer to read and process the data in small chunks. Instead, create a larger buffer (say 128K to 256K in size) and read much or all of the data into memory before processing it. The same rules apply for writing data to the disk: Write data as sequentially as you can using a single file-system call.

Read data sequentially instead of jumping around in a file. The kernel transparently clusters I/O operations, which makes sequential reads much faster.

Avoid skipping ahead in an empty file before writing data. The system might have to write zeroes into the intervening space to fill the gap. You should always have a good reason for including “holes” in your files at write time and should know that doing so might incur a performance penalty. For more information, see Zero-Fill Delays Provide Security at a Cost.

Defer I/O operations until your app needs the data. The golden rule of being lazy applies to disk performance as well as many other types of performance.

Do not abuse the preferences system. Use the preferences system to capture only user preferences (such as window positions, view settings, and user provided preferences) and not data that can be inexpensively recomputed. Recomputing simple values is significantly faster than reading the same value from disk.

Do not assume that caching files in memory will speed up your app. Caching files in memory increases memory usage, which can decrease performance in other ways. Plus, the system may cache some file data for you automatically, so creating your own caches might make things even worse; see The System Has its Own File Caching Mechanism.

The System Has Its Own File Caching Mechanism

Disk caching can be a good way to accelerate access to file data, but its use is not appropriate in every situation. Caching increases the memory footprint of your app and if used inappropriately can be more expensive than simply reloading data from the disk.

Caching is most appropriate for files you plan to access multiple times. If you have files that you intend to use only once, either disable the caches or map the file into memory.

Disabling File-System Caching

When reading data that you are certain you won’t need again soon, such as streaming a large multimedia file, tell the file system not to add that data to the file-system caches. By default, the system maintains a buffer cache with the data most recently read from disk. This disk cache is most effective when it contains frequently used data. If you leave file caching enabled while streaming a large multimedia file, you can quickly fill up the disk cache with data you won’t use again. Even worse is that this process is likely to push other data out of the cache that might have benefited from being there.

Apps can call the BSD fcntl function with the F_NOCACHE flag to enable or disable caching for a file. For more information about this function, see fcntl.

Note: When reading uncached data, it is recommended that you use 4K-aligned buffers. This gives the system more flexibility in how it loads the data into memory and can result in faster load times.

Using Mapped I/O Instead of Caching

If you intend to read data randomly from a file, you can improve performance in some situations by mapping that file directly into your app’s virtual memory space. File mapping is a programming convenience for files you want to access with read-only permissions. It lets the kernel take advantage of the virtual memory paging mechanism to read the file data only when it is needed. You can also use file mapping to overwrite existing bytes in a file; however, you cannot extend the size of file using this technique. Mapped files bypass the system disk caches, so only one copy of the file is stored in memory.

Important: If you map a file into memory and the file becomes inaccessible—because the disk containing the file was ejected or the network server containing the file is unmounted—your app will crash with a SIGBUS error.

For more information about mapping files into memory, see File System Advanced Programming Topics.

Zero-Fill Delays Provide Security at a Cost

For security reasons, file systems are supposed to zero out areas on disk when they are allocated to a file. This behavior prevents data leftover from a previously deleted file from being included with the new file. The HFS Plus file system used by OS X has always implemented this zero-fill behavior.

For both reading and writing operations, the system delays the writing of zeroes until the last possible moment. When you close a file after writing to it, the system writes zeroes to any portions of the file your code did not touch. When reading from a file, the system writes zeroes to new areas only when your code attempts to read from that area or when it closes the file. This delayed-write behavior avoids redundant I/O operations to the same area of a file.

If you notice a delay when closing your files, it is likely because of this zero-fill behavior. Make sure you do the following when working with files:

Write data to files sequentially. Gaps in writing must be filled with zeros when the file is saved.

Do not move the file pointer past the end of the file and then close the file.

Truncate files to match the length of the data you wrote. For scratch files you plan to delete, truncate the file to zero-length.