Choosing
the "Right" Data Storage Method

Today there exists a wide variety of programming tools. In any craft,
choosing the correct tool is an important part of the building process.
Unfortunately, human nature is such that we find a few tools that we are
comfortable using and then habitually use them without considering the other
possibilities.

Generally speaking, if a tool works as advertised, then I don't have any
predisposition to not using it. But like anyone else, I make my choices
based on what I need to accomplish and what I know about using the tool.

A database is simply a tool for storing information. They vary widely
in capability and functionality, but all commercial database applications also
carry a large amount of overhead both in distributed size and memory usage.
I have used these databases frequently as a back-end to my applications and will
continue to do so when they are the right tool.

Common Ways to Store Data

There may be other methods that I'm not aware of, but when it comes right
down to it, all files are stored by writing bytes to the storage media. The real difference is
the interface provided by the software.

Visual Basic provides capability to access all these file
types. Each has its own advantages and disadvantages.

Points to consider before selecting a file access method

Does the program need to be SQL compatible (queries)?

If yes then a commercial database is the way to go unless you want to write an SQL
parser as well as the code to do something with the results.

Does the program need to link tables? If it doesn't use SQL, then
probably not.

If yes then linking tables in your own programs requires indexing and will
certainly be very complex. If you need this capability then the
commercial database is probably the way to go.

Does the application already use a back-end database?

If you already have to distribute the database then you can create a new
table and store your data in it. There are times you may not want to do
this, however. For example, you may want to create your own security
file that contains user names, passwords and privileges. By creating a
separate file, you can get this information before opening the database.

Does the file need to be imported by other applications that you didn't
write (Excel, Access, etc.)?

This can be accomplished with either a sequential access file or a commercial
database.

Does the file need to be written to in a multi-user environment?

Requires a commercial database or carefully written code to ensure users
don't overwrite each other's data.

A commercial database back-end is usually the best choice for intensive
data-entry applications or multi-user environments. It provides querying
capability, security, multiple-related tables and many other advanced
capabilities.

Many applications store information but don't need the advanced capabilities
of a commercial database. A file written using the intrinsic VB methods is
much smaller than any commercial database as well as being easier to distribute.

Additionally, there are often installation problems when using a commercial database as a
back-end, not to mention that the size of the distributed database
components (DLLs) and libraries can be many times larger than the database
itself.

Let's look at the intrinsic Visual Basic file writing methods (which are
basically the same in every language I've written software in).

Sequential File Access

Sequential files are
generally the easiest for a new programmer to read and write.
Unfortunately, they leave a lot to be desired when it comes to manipulating the data.
The two preferred ways to read them are:

Comma delimited Line Input # statement to read lines individually.
The individual values can be used to populate a User-Defined Type (UDT) or fields of a Record
object defined by a class in the project.

Read the entire file into the program and then get the individual lines
using a Split statement. From there, each line has to be split up
even further to get the individual fields. This is a lot of work to do
what the other file access methods achieve automatically.

Advantages

Fast — no database engine to initialize.

Size — much smaller than a traditional database. Field length is not
pre-defined.

Compatibility — can be imported/exported by a variety of applications making them the
most flexible way to share data.

File can be viewed and edited using a simple text editor.

Existing records and new records can be placed directly into the existing
file.

The one big advantage a sequential file has over all other types,
including commercial databases, is that many other programs that are capable of
importing data can recognize the fields and import from or export to the
same format. In fact, even if your application uses another file type, having an
export function that saves data into a sequential file is an almost foolproof
way to allow other applications to access the same data.

Disadvantages

Not flexible in data storage. All records must be of the same type.
That is, if you want your program to loop through and read the fields from a
sequential file, then each line needs to have the same number of fields and be
of the type expected.

Difficult to navigate. File must be looped through to retrieve an
individual record.

When deleting records, a new file must be written containing the remaining
records and then renamed to the original file name.

Random File Access

Random access files store UDT records. Each UDT must be a fixed size and the file is limited to storing nothing but one type
of record.

Advantages

Can easily pull individual records from the file. Because each
record is the same size, there is no need to loop through the file to find an
individual record. Just tell VB which record number you want and it goes
and gets it.

Speed. These files read very quickly because of the fixed
record-length. Again, there is no database engine to initialize.

Easy to read and write. Existing records and new records can be
placed directly into the existing file.

Smaller than a traditional database.

Disadvantages

String fields must be a fixed length resulting in wasted space and the
need to trim strings before displaying them. This also means that if you
want to have a record that has a field containing the Street address, for
example, then the field must be defined to hold the longest conceivable entry.
Records that do not use the entire amount of space just waste it. The
field is stored in the record using the full amount defined resulting in a lot
of dead space in the file. If the data is longer than the defined size
then the string is truncated resulting in lost data.

Not flexible. All records must be of the same type.

Can't be imported by other applications unless they recognize the specific
file type. Usually this limits importing to other applications you have
written.

When deleting records, a new file must be written containing the remaining
records and then renamed to the original file name. An option to this is
to read all the records into memory, open the file for output (clear the file)
and then put the remaining records back.

Binary File Access

Binary access files are the most difficult of the intrinsic Visual Basic
file-writing methods. They require the program to know exactly how the file
was written so that it can read the data back from the file. In spite of
this, I prefer Binary files to store data because they are ultimately flexible.

Advantages

Extremely flexible. Data can be written to the file any way the
programmer wants it. Multiple record types, object data, and arrays of numbers or strings
can all be saved in the same file.

Fast — no database engine to initialize.

Size — much smaller than a traditional database and usually smaller than Random
access files. Size is generally comparable to a Sequential access file
containing the same data.

Disadvantages

Can't be imported by other applications unless they recognize the specific
file type. Usually this limits importing to other applications you have
written.

File must be looped through to retrieve an individual record.

When editing or deleting existing records, a new file must be written.
Because the record size can vary, an existing record can not be simply placed
into the file because it may overwrite subsequent records. There are two
ways that I know of to handle this:

If the entire file is in memory, delete the contents of the file and
then write the records in memory to the file on disk.

Make a copy of the file, delete the contents of the current file, copy
records before the changed record from the file copy to the empty file,
write the current record, copy records after the changed record from the
file copy to the current file.

The downside to Binary access is that the program must know how the file was
written before it can be read. My way of handling this is to create a header record
that is stored at the beginning of the file. The header contains specific
information that lets my program read the file without problems.

I personally think the advantages of Binary access far outweigh it's
disadvantages as well as the advantages of other file types for many purposes.
The flexibility and efficient storage are major selling points to me.

In all cases, developing a good set of file manipulation routines takes time.
Once you have them, they generally aren't easy to port from application to
application. Because of this, I wrote VB DB to create binary access files
that are extremely flexible.

I have made this code available
to all VB developers. I personally use it in many of my programs. It
has seen extensive use and has proven to be very solid and reliable. You
can read more about it and download the source code
here.