Introduction

This source code (as part of the demo project) creates and uses a BTree database. It implements the following features:

create database

insert/update record

delete record

search for keys (exact match or sort-of wildcard)

traversal

sequential forward and reverse record access

user-specified fixed size records and fixed size keys

custom comparison and traversal callbacks

Background

A while ago, I started developing a system that would allow people to chart (Australian) stock market movements nicely and simply, and in a user-friendly fashion. Obviously, a large part of the client side of the system would be the storage of stock movements on the local machine. People need to be able to look at their charts without having to download large amounts of data every time they open a window. Programmatically, it makes sense if these movements can be retrieved in date order, since plotting stock movements on a chart window requires the data to be presented in date order. I needed to be able to retrieve them rapidly, too. Bear in mind that with the data package available with the download, some securities date back to 1980.

After a lot of mucking around, I eventually decided to write database functionality that I could use (not GPL'd, not hideously expensive). See the Points of Interest section for a full discussion of this process.

Contents and structure of BTreeDB

Before getting into the tree itself, let's have a look at what makes a tree.

Smart pointers

If you don't know about smart pointers, now would be a really good time to check them out. Look here for the article I used as a base for my smart pointers. What they do is let you copy pointers around, manage their own reference counting, and ensure that objects are deleted when the last reference to the object is released or goes out of scope.

The objects internal to the database use smart pointers a lot because ..... well, it's just so darn convenient.

The DbObj class

A DbObj is a generic chunk of data. To create one, you invoke a constructor, providing a pointer to some data and the data size. There are a few other constructors that allow you to create DbObjs of various types without having to worry about sizes. These include constructors for std::strings, numbers, and other DbObjs. Note that these objects make a copy of your data, so you can't expect to create the DbObj, change your original data, and expect the changes to be reflected in the DbObj. Note also that the DbObj doesn't contain any information about the type of data stored.

The code deals with smart pointers to DbObj objects. You create DbObjs, pass them to the BTreeDB, and forget about them. Or, get one back from the BTreeDB and don't worry about explicitly deallocating it. The reference counting infrastructure takes care of it.

DbObj objects are used to specify data that is to be inserted into the BTreeDB, and also keys that are used to look up records in the BTreeDB.

In all the sample code below, I'm working with a record structure that is made up of the following:

The TreeNode class

A BTree contains a multi-level hierarchical collection of tree nodes (i.e., a tree). These nodes contain a series of records, and references to child nodes. The records in the node are stored as DbObjs as mentioned above, while the children are stored as a list of smart pointers to TreeNode objects.

These are the chunks that get read from and written to the disk. They aren't loaded until they are actually required.

Working with the BTreeDB

Creating a BTreeDB

The first parameter (fileName) is a constchar* that points at a buffer containing the name of the file. If the file doesn't exist, it's created. If the file does exist, it's going to be used as a database file containing valid data. Note that the constructor doesn't open the file. It just initializes the data members. The second parameter is the size of the record (in bytes) to be stored in the database. The third parameter allows you to specify how many of these bytes are to be interpreted as the key. The fourth parameter is the tree's minimum degree, usually referred to in the text as t. A tree node can hold as many as 2t - 1 keys, but may hold no fewer than t.

Not shown above is a fifth parameter for the constructor. You can provide your own comparison function if your records require something other than a binary comparison when it comes to sorting. The default comparison is a simple memcmp using the first keyLength bytes of the two data objects being compared.

Opening a BTreeDB

db.open()

If the file given in the constructor doesn't exist, it is created and initialized to handle the record and key sizes provided in the constructor. If the file already exists, the values provided are compared with what is already in the file, and the database doesn't open if they disagree.

This method will return false if it fails to open the database for any reason.

Reasons for failure include:

database file locked by another process.

size and minDegree parameters not provided in the constructor and file doesn't yet exist.

inability to create the file if creation is required.

inability to read header information from the file if the file already exists.

Yes, I could have made the open function return an error code, but I didn't. Maybe later.

Inserting data into a BTreeDB

This is just a case of creating DbObjs that you want to insert, and then inserting them.

for (int i = 0; i < iterLimit; i++)
{
SampleRecord sr;
makeRecord(&sr); // This just fills in the sample record
DbObjPtr pObj = new DbObj(sr, sizeof(SampleRecord));
db.put(pObj);
}

You'll have to make sure that you provide a record that is the correct size. The put method will return false if this is not the case.

This method is also used for updating existing records. Note that the BTreeDB doesn't allow you to insert multiple records that have the same key. If you "put" any record with the same key as an existing record, the existing one will be overwritten.

Removing data from a BTreeDB

Other operations on a BTreeDB

There are examples in the code about how to traverse the BTreeDB, how to do "like" searches (where the search key is smaller than the actual record key size), moving forwards and backwards through the tree, and a few other things possibly unique to my situation.

BTreeDB wrappers

I use wrappers to provide structure to a given database. (I don't know where I found the Singleton class ... [have to insert link here]).

Caveats

This is not thread-safe code ... I protect access to the database using my own mutex in my application.

You need to be aware of when you open and close databases, and when they are flushed to disk. There are aspects of this that I haven't fully explored, but I know that I should be releasing memory (TreeNodes) more frequently than I do. If this is not done, you'll end up with the entire database in memory. While it's really good for performance on small databases, it may not be appropriate for larger databases.

At present, it suits what I'm trying to do.

Points of interest

My first attempt at implementing a database simply had one file per security, with the data stored sequentially. For ANZ, there was a file named ANZ.fdb; for CBA, there was a file named CBA.fdb, and so on. That was fine, since there are less than a couple of thousand securities on the ASX (Australian Stock Exchange). As the project grew, and I started incorporating options, I had to start putting these in their own subdirectories named for the underlying security. That was starting to look really ugly. I imagined that I would run into even more trouble when it came to storing warrants and intraday movements for a stock.

I looked around for alternatives. MySQL has an embedded database engine that is really fast, but this would have cost a ridiculous amount of money for a royalty-free distribution license, or a determination that I would release my entire project as open source. Hmmm ... not yet. Sleepcat has Berkeley DB, which is a very fast and efficient embedded database engine, so I had a look at that. Again, a lot of money, or open source. They do have an old version (1.8.5) that is released under the BSD distribution license and completely free. I played around with that for a bit, but as various sites state, there are some well known bugs with it. I didn't really know enough about the project to fix them. Just FYI: if you have a need for an embedded database, and you meet the open source criteria (or personal use, blah blah blah...), go with one of these packages

So I was stuck. I didn't want a bazillion files floating around in the user's file system, but I wanted the ability to add many records over time, and retrieve them quickly. I needed some sort of database package, couldn't find one, so eventually I decided to do it myself. Fine. So where to start? Back to my days of University, and the second year Data Structures and Algorithms class. The text for this class was Cormen, Leiserson, and Rivest's excellent Introduction to Algorithms, 2nd Edition, MIT Press, Mass., 1990. Although we didn't study B-Trees at the time, I remember trying to implement one a bit later in an abortive attempt to break into the entertainment industry. Long story ... don't ask. This time, though, I was determined to make it work.

Oh ... why use a B-Tree? Well, they're kind of nice to use, since they allow storage of large amounts of data on disk, indexed by key, with records accessible in roughly O(log n) time where n is the number of nodes in the tree. There are many places you can find out more about B-Trees. I'm not inclined towards reproducing large slabs of the text here, but you could look here if you are interested. It includes the basic algorithms for creating, searching, and inserting records from Cormen, Leiserson, and Rivest.

Back to the task at hand, I knew that deletion would be a problem, since no-one publishes pseudo-code for the deletion of keys from a B-Tree. It appears to be a bit more complex than the other operations, and is generally "left as an exercise for the reader". Grr. So I researched other textbooks, and I searched the net. Eventually, I found a paper entitled "Implementing Deletion in B+-Trees" by Jan Jannick of Stanford University. To quote from the abstract: "There are published algorithms and pseudo-code for searching and inserting keys, but deletion, due to its greater complexity and perceived lesser importance, is glossed over completely or left as an exercise to the reader. To remedy this situation, we provide a well documented flowchart, algorithm, and pseudo-code for deletion, their relation to search and insertion algorithms, and a reference to a freely available, complete B+-tree library written in the C programming language." Yay! So I downloaded the code referred to in the paper (you can get them starting from here). Except that, the C code didn't compile in VS7, and the C++ code broke after a couple of insertions. Bugger.

So, I built the creation, insertion, and search code, etc., using the text's pseudo-code. Unable to find any deletion pseudo-code elsewhere, I decided that I would just have to bite the proverbial bullet and write the code based on the narrative provided. After a long time, everything worked. I decided that no-one should have to go through that again, so I'm making the code available for public distribution. This may frustrate many university lecturers since there now exists code that actually does this wretched delete, but somehow ... I struggle to care. Victory is mine!

Addendum: Everything didn't work. Close, but no cylinder smoking thing. Deletion still has a problem. There are three different deletion orders in the test code: ordinary alphabetic, reverse alphabetic, and something else. The "something else" will sometimes result in not all items being deleted. I don't have the time to look at this now. Maybe, when I have a product that is keeping me and my family in the manner to which we want to become accustomed ...

History

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

Comments and Discussions

first,thank you for sharing this code. when I try to use this code to build a small database, it doesnot work. I insert some records in the database and check whether they have been inserted in the file and it proofs to be all right. However, when I open the program and check the records I inserted last time, it does not work~
Anybody can tell me what is up?

hi _oti,
i have to do a project to school, and i want to use for your source code of btree. my project is " to find out a file on the hdd using 2-3-4 tree(of course a particular case of btree)".
i want to use your code but is too "bushy" , is too hard to understand.please i want some help from you .i make a general program which make a btree , add in my tree , the function for search i can't do it . can you help me , at least with some advice ? thanks.
you can contact to me prv to this address: aurel_m2002@yahoo.com .

Thanks for your nice work.
I have a quetion about node split,
child->write(_dataFile);
newChild->write(_dataFile);
parent->write(_dataFile);
when the node is written into file, how does the program make sure the writting does not overwrite the existing node.

Good code and explanation! Learn quite a lot form your code together with "Introduction to Algorithms".
But
After I look into the src code of the project, I found that after delete
the size of data file is not shrink in any way. So the size of the file would probably depend on how many times we add node to the BTree,is that right?
My question is that should we handle this situation?

No, I don't resize the file after writing it. The deleted node could lie anywhere within the file, so reducing the size of the file would mean doing something like shuffling the last record into the location of the deleted node, updating pointers and then resizing the file. If garbage collection is an issue, I think it would be better served by occasionally rewriting the entire file every now and then.

The example given initialises and populates a tree then traverses it. I looked at the source code and didn't find a function that would delete a node from the tree. The deletion code is the main reason this article was written.

Quite probably a bug. I've made updates to both methods, as per your suggestion, and updated both the article and the source code. There are other updates as well, some of which were suggested by other folks. I guess the updates will appear in the next day or so.

Having said that, there are still problems in the deletion code, but it doesn't crash nearly as often now. At least, I didn't get it to crash with the parameters I'm using. YMMV.

Anyway it's a fine piece of code.
I've made some adjustements on the code by using a limited cache for nodes so the memory will not increase unlimited if we parse all the tree, and I really need that information with the parent when I unload the LeastRecentlyUsed (LRU) node from cache. My changes also includes a dirty flag for nodes so they are saved only if they are modified and they are in LRU position in cache ( I saw that you thought about it but never implemented ). When I'll finish maybe I will post my changes if somebody need them.

Yup, looking at the code surrounding it, that's what it should be. If only that was the the only problem. There are still problems somewhere else, and it's going to have to wait until I get a few peaceful hours.

If you read an earlier thread, you'll see that I use that now instead of my BTreeDB.

In fact, I've received several messages from people wanting to use my BTreeDB in earnest, and in every case, I've told them that this has become an academic exercise so that there exists (at least) one readily accessible implementation of the B-Tree delete code, and that if they're serious about wanting to store data in this fashion, they should look at SQLite. I usually send them a project outlining how to do what they want to using SQLite v3, and CppSQLite3.

I'd really continue to press on with SQLite if you can. If you have a large number of inserts to do, it might be worth dropping the index before you start inserting. Then you create the index again when you are done. I've found this to be pretty quick.

Sorry for the delay in answering you but the season is filled with far form the keyboard activities

Yes, you are right on the destroying index - insert data - recreate index on typical situations where insert is the only requirement.

But for large tables (several millions) with unique records the indexes are required for avoiding the replicated ones to go into the tables.

What I think I'll try is:

1. Small catalogs will be handled outside SQLite trough in memory tree indexing structures.
2. Large catalogs will remain in SQLite realms but grouping blocks of inserts in a transaction over those tables.
3. Very large integration table is the only candidate to your suggested drop-insert-build schema.

Of course, this colud be overrided with better hardware, but for the moment is not an option

Without knowing more about your application, I couldn't really say one way or the other. The only thing I can suggest is that you test each one of these schemes on small, medium and large tables and verify your intuition (if you haven't already). I do know that I have been surprised by database speeds...

A couple of thoughts in any case:

(1) On some of my tables, there are quite a few indices because people need really fast results from searches based on multiple fields. These can make insertion really slow. If I have any insertions that are not just a single record, they are in bulk, so dropping the indices is the best way for me. Fairly obviously, if your unique ID is the only index, this isn't going to help.

(2) Is it possible to keep a separate table whose only column is the ID, and index that? That way you could do a check up front to see if you have to insert or update. With the caching provided by both your hardware and the operating system, you might find this to be almost as quick as your in-memory tree structure, and it would work with larger tables.