Bits and Bytes

Categories

Categories

Archives

Archives

Advertisements

Physical Programming 102: Random Access Files

If we wanted to read the contents near the end of a file, what are the considerations?

While it is true that computers are able to access files on a disk in any manner, there are some physical limitations that we need to consider when dealing with random access files. If we are trying to read a file from beginning to end, things are pretty straightforward. However, if we want to read the contents near the end of a file, care needs to be taken.

Most files are stored on some kind of physical media, either in an optical disk or magnetic storage on a hard disk. In this case, the mechanical drive limitations will factor into the file read/write operation. The storage media rotates only in one direction and that is the primary physical limitation. The drive will first need to seek out the correct location before performing the operation. This seek time is measured as latency and is usually significant, as compared to the read/write performance.

Therefore, it is in the programmers interest to avoid unnecessary seeks, such as when accessing a random access file.

Furthermore, the data is stored in the form of sectors or blocks. When data is read from the disk, it is read in blocks. Therefore, if the algorithm had to read a single byte or single sector, it would result in the same amount of disk activity. For efficiency purposes, it would be better to read the data in sectors and process the data in blocks.

Let’s have a hypothetical example of a large file with 512 byte sectors and we need to find a key somewhere in the the last 12KB of data.

In this scenario, the best way to do this would be to:

Get the size of the file.

Subtract 12KB from the size of the file.

Seek to this location.

Process the file from this position.

There are various algorithms that can be used to perform a search for the key within this last 10KB. Regardless of what algorithm is chosen, we need to be aware that the disk will be read in block sizes from the seek position in the direction towards the end of the file. It would not be wise to do it in the opposite direction as it will result in re-seeking the disk if done incorrectly.