Object…What?

Until the mid-1970s, most systems were built using functional systems. Object-oriented systems were introduced with a flurry of promises in the early ‘80s, many of which actually proved to be true, for once.

More recently, people have been talking about object-based systems, object stores and object-based file systems. In this article, I’d like to clarify the characteristics of each type of technology. Truth in advertising—there’s a lot of overlap, so I’ll try to smooth out the bumps in the ride.

OBJECT-BASED

Object-based is the widest category as it describes any programming language where state and operations can be encapsulated within objects. There are very few examples of purely object-based languages, with ECMAScript as possibly the only surviving one. It supports classes, static typing, modules, generators, iterators and algebraic data types, but it is not fully object-oriented. That’s because ECMAScript does not support the concepts of inheritance or subtyping.

OBJECT-ORIENTED

Object-oriented languages extend the object-based paradigm to include inheritance and subtyping:

Inheritance extends an object class with extra attributes and/or methods. Any method that works for a class higher up the inheritance tree automatically works with the subclasses. Languages such as Simula, Smalltalk, C++, C#, Java and Python all fall within the umbrella of “object-based,” but they are more accurately categorized as “object-oriented.”

Subtyping is a little more complex but tends to be obvious in reality. Wikipedia says, “If S is a subtype of T, the subtyping relationis often written S <: T, to mean that any term of type S can be safely used in a context where a term of type T is expected. The precise semantics of subtyping crucially depends on the particulars of what ‘safely used in a context where’ means in a given programming language.” A trivial example would be a language construct that allows integers to be used anywhere that a floating point number is expected.

OBJECT-ORIENTED DATABASE MANAGEMENT SYSTEMS (ODBMS)

It is much easier to store objects created using an object-oriented language in an object-oriented database management system (ODBMS), such as Objectivity/DB, than in other kinds of databases, because the ODBMS with its hierarchical structure (see Figure 1) is built to cleanly handle inheritance and, in some cases, subtypes.

Figure 1: ODBMS hierarchical structure

As a simple example, consider a system where a class named Mammal inherits from a Living_Thing class. Storing an object instance of class Living_Thing in an RDBMS is simply a matter of inserting a row into the Living_Thing table. However, storing a Mammal could be managed in various ways:

1. Have separate Living_Thing and Mammal tables, each defining all of the columns needed for an instance of that class:

a. Living_Thing could have Domain (Archaea, Bacteria etc.), Gender and Date1, where Date1 might record the birth of an animal or the germination date of a plant.

b. Mammal would have the same columns plus an extra one for Species.

2. Keep the common columns in a single Living_Thing table and the unique column(s) in a separate Mammal column. There must also be a Join table, or extra columns in the two primary tables, to relate a particular Mammal to its corresponding Living_Thing row.

3. Put all of the columns into a single table.

Clearly, Option 1 wastes a lot of storage space. Inserting a Mammal row also requires the insertion of a Living_Thing row. Applications also need a way to delete a row from the Living_Thing table when a row is deleted from the Mammal table. Option 2 adds overheads for maintaining the Join table. Option 3 will waste a lot of space if most of the rows are for Living_Things, not Mammals.

An ODBMS overcomes this mapping mismatch by assigning a “Structured Type” or a “Type Structure” to each object instance and storing all of the data needed for that instance in a single location.

Objectivity/DB maintains the class inheritance tree (or network for C++) in a Type Structure. In the above example, the Living_Thing instance might be of Type #1 and the Mammal instance might be of Type #2. If the Storage Manager is asked to search for Living_Thing instances, it actually looks for objects of either Type #1 or Type #2.

OBJECT STORES (or OBJECT-BASED STORAGE)

So what are object stores? One of the earliest ODBMSs was actually called ObjectStore (now Versata™), but we’re talking about a technology category, not a product. ODBMSs are no longer the only kind of persistent storage for objects.

Object storage, more correctly known as object-based storage, is a term introduced by the storage infrastructure industry to describe systems that manage data as objects, where each object has data, metadata, and a globally unique identifier.

In this respect, they are similar to ODBMSs, if they can actually store individual object-oriented language object instances as discrete objects in their domain. The Seagate™ Kinetic Open Storage Platform can do this, making it useful for storing documents, images, and videos. However, the overheads for handling very small objects in object-based storage can be prohibitive.

OBJECT-BASED FILE SYSTEMS

Object-based file systems separate file metadata and the constituent parts of each file. Each fragment of a file is stored as an object. Both the metadata and the fragments are handled using an object-oriented language.

The Lustre and Pananas™ object-based file systems are very popular in the High Performance Computing arena. Panasas PanFS™ also provides standard parallel NFS access to the logical files.

There are, of course, hybrids, such as Ceph, that support both regular file and object-based files using specialized storage mechanisms. However, data stored using one of the two mechanisms can’t always be accessed via the other. It’s also possible to build an Object Based File System using any other storage technology. There are several such wrappers for the Hadoop Distributed File System (HDFS), for instance.

SUMMARY

Confused yet? Let’s try to summarize it all with a chart. A cell with the value “Y” denotes that the feature is present; “N” means that the feature is not present; and “(Y)” means that the feature is present in only some implementations of the technology.

It’s clear that object-based technology is very useful when dealing with fairly large items, such as documents, images, and files. Most practical implementations are actually implemented using an object-oriented language. ODBMSs offer the greatest range of options and can leverage object-based infrastructure.