5.5. Programming with Objects

To cement your knowledge of pointers, objects and memory management we'll use three cases to illustrate how they are used when programming EMBOSS code. First a look at the AjPPdbtosp object which holds sequence codes. It is not widely used but does illustrate a lot of the key points nicely. Then the AJAX string object AjPStr and the special memory handling required to ensure failsafe memory allocation of the object functions. Finally the AJAX array objects to see how other dynamic objects are implemented.

5.5.1. An Example Object: AjPPdbtosp

5.5.1.1. Object Definition

Consider the following object which holds swissprot codes and accession numbers for a PDB code:

There is nothing new here other than Acc and Spr which are both pointers to AjPStr objects. These would have better been declared by using AjPPStr but that is not done here. As an AjPStr is itself a pointer (to a string object proper) you can see that we're dealing with pointers to pointers. In this case Acc and Spr are used to create two arrays of strings as you can see in the constructor function (below).

The first line declares that the function returns an object pointer of type AjPPdbtosp. The parameter ajint n is the size the Acc and Spr arrays should be, i.e. the number of pairs of Acc / Spr values that the object will hold.

The next line declares a variable called ret. This is the object pointer that is going to have memory allocated to it and will be returned to the calling function.

AJNEW0(ret); is the line that allocates an object proper to the pointer ret. This will now point to an instance of an AjPPdbtosp memory. By the time AJNEW0(ret); returns, memory space for an AjPPdbtosp object is reserved. This means enough space for an AjPStr, an ajint and two pointers (AjPStr *). Note that the two arrays or any string objects proper have not yet been allocated!

AJNEW0 sets all the structure elements to 0, this means the element n is set to 0 and the three pointers are set to NULL. AJNEW0 is a macro: it will allocate a single object of the correct type to any pointer that is passed to it - it can be used with any object.

Compare AJNEW0 to the two AJCNEW0 lines. AJCNEW0 will allocate an array of objects of any type and initialises the new variables to 0 or NULL as required. In this case, arrays of n objects each will be created. It is important to bear in mind here that ret->Acc and ret-Spr are passed to the macro. These are of the type AjPStr *, which means that the "object" which they point to is in fact another pointer variable. Therefore these macro calls will allocate arrays of n pointers, not arrays of instances of AjPStr objects as one might (incorrectly) first imagine. They create an array of nAjPStr object pointers allocated to each of Acc and Spr. In other words, ret->Acc and ret->Spr will point to blocks of memory each holding n pointer variables which are as yet NULL (unallocated).

The arrays are created but still no strings yet. ret->Pdb = ajStrNew(); allocates memory for a string object to the pointer Pdb in the new object. Notice that -> is used to dereference the pointer ret; and get to the Pdb element. This is the standard way in C of accessing elements in a data structure when you have a pointer to that data structure.

The lines ret->Acc[i]=ajStrNew(); and ret->Spr[i]=ajStrNew(); allocate memory for the n string objects for each array. It also illustrates how pointer and array notation can be used together. In this case, the 'i'th element of the arrays that ret->Acc and ret->Spr point to are accessed. The elements in these arrays are AjPStr (object pointers) and a string object is allocated to each of them.

The rest is obvious. The integer in the object is set to the size of the arrays and the pointer to the new object, complete with an allocated string and two arrays of strings, is returned to the calling function by return ret;. Note that the constructor should be coded to deal with negative arguments in a safe way, but that is not done here.

5.5.1.3. Object Destruction

It is the job of the destructor function to free the object itself and any memory that its member elements might point to. The destructor safely clears up all of the memory that was allocated by the constructor. This is achieved by calling other destructor functions as appropriate and by using AJFREE. The code is shown below:

The function, like all destructors, takes the address of the object pointer being deleted (AjPPdbtosp *thys).

For convenience a second pointer is declared and is used in the following lines to dereference thys. This is purely for reasons of clarity. The function returns if either an empty address was passed or if the pointer stored there is NULL.

The string object in AjPPdbtosp is deleted first by calling the default destructor function with the address of the string.

The string objects proper, referenced through the arrays, are deleted by calling ajStrDel in a loop for every array element in both arrays.

AJFREE is then called to delete the arrays themselves, referenced by pthis->Acc and pthis->Spr.

Once the loop terminates AJFREE is again called, this time freeing memory for the AjPPdbtosp object itself. The pointer is set to NULL so that it's ready for re-use by the calling function.

It should be clear that although AJFREE will free the memory pointed to by its argument, as used here it frees the arrays but not the string objects proper that are pointed to; that is the job of the ajStrDel calls in the preceding code.

5.5.1.4. Usage Example

Here is a code snippet illustrating how the object constructor and destructor could be used. You'll notice they're used in exactly the same way as any other object:

5.5.2. AJAX Dynamic String Object

5.5.2.1. Introduction

The string object (AjPStr) is one of the simplest of all the AJAX objects. AJAX strings have more functions than any other datatype and are used by many other objects. Two features distinguish its use from standard C-type (char *) strings. First, AJAX strings are dynamic objects, meaning that memory is dynamically reallocated as needed so that you never run out of space when using the object functions; a string will grow automatically as required. Second, AJAX strings are reference counted. This means that the object itself keeps track of how many references (pointers) to the string there are in the code that have been requested by calling library functions. It is not until all references to a string are deleted that the string itself is freed. This ensures that broken references to a string do not occur and that you always have a handle on objects in memory.

5.5.2.2. String Definition

A structure is defined called AjSStr of 4 elements (Len, Res, Use and Ptr) and with three new datatype names, AjOStr for the object itself, AjPStr for the object pointer and AjPPStr for a pointer to an AjPStr.

The Ptr pointer is just a standard C one which holds a character string and Len is its length. In contrast to C-type strings, the character string may or may not be NULL terminated; the library functions for printing AjPStr objects use the length field (Len) for how many characters to print and won't stop at the first NULL if there is one.

The Res element indicates how much reserved dynamic memory is associated with the object. This is always at least equal to Len but is often more. Res is and should be outside your direct control. If you use a library call to add anything to the string then, if it'll fit within the memory given by Res then the operation is performed immediately. If the memory required is larger than Res then more memory is allocated and the Res item is updated. More memory than required is usually allocated.

Use is the string usage counter. Sometimes you'll want two or more references to a single string rather than making a genuine copy. EMBOSS functions that do this increment the string's usage counter. The usage counter is decremented when a call to destroy either the string itself, or a reference to it, is made. When the usage counter reaches zero the object will be deleted. All of this is function internals, so you don't need to worry about it as long as you don't change the object elements directly. If you intend altering the contents of an object then safety is guaranteed if you use the available library functions.

Finally, the Padding element indicates the number of characters used to pad the string to its alignment boundary and is used only to keep pedantic compilers happy.

strPNULL is a global variable for an empty object called the "AJAX NULL string". This has a single character of reserved memory, length of zero, a C-type string which is set to NULL, a reference count of 1 and zero padding.

All this function does is increase the reference count of the object that was passed and return the same pointer. It raises a fatal error if NULL was passed.

In other words, a call to ajStrNew doesn't immediately instantiate an AjSStr object, it just returns the address of the "global AJAX NULL String". It's only when the char * string (Ptr) is given a non-NULL value (by whatever means) that memory for the string object proper will be allocated. AJAX is programmed in this way for maximum speed and efficiency of string handling. You can see this for yourself if you print the reference count of a string which you have just allocated using ajStrNew but not yet used. You might be surprised at the value of the usage count which is higher than you might expect. The reference count may well be in the hundreds owing to the call to embInit in the application code, which itself makes, indirectly, many calls to ajStrNew. If true objects had been allocated for all these strings the code would be less efficient.

Things are different if you call the alternative constructor function ajStrNewRes, which allocates memory for a string with an initial reserved size:

This function is an alternative constructor function. It sets a minimum string length (minlen) to the requested reserved size (size), or to the current length (len) plus 1 (for a terminating NULL character) if the requested size is not greater than the current length. It then calls strNew which is a static function in ajstr.c to allocate a string object (this function is shown below). The string length (thys->Len) is set, and the specified text (txt, which is an empty string when called by ajStrNewRes as in this example) is copied (using memmove) to the C-type string pointer (thys-Ptr) in the AJAX string object. A terminating NULL character is added.

Let's look at the static constructor function which actually allocates the string object:

The function first checks that a reserved size has been specified, and sets this to the default length STRSIZE if not. STRSIZE is defined in ajstr.c:

#define STRSIZE 32

The macro AJNEW0 is called. You'll recall that this is the equivalent of a calloc and allocates memory to an object pointer (ret) for a single object of a given type, in this case, a string. The memory is initialised to zero.

The rest of that block of code assigns correct values to the other elements in the string object. You can see that the reserved size is set to size and the first character of the string is set to a NULL character, meaning you have a new, empty string with the specified reserved size, a pointer to which is returned to the calling function.

The code also sets some global variables (strAlloc, strCount and strTotal) used internally for statistics and for debugging strings. You needn't worry about those.

5.5.2.4. String Destruction

The internals of string destruction are simpler than construction. The default constructor ajStrDel is shown below:

It is clear from AjPStr* Pstr that the function takes the address of a string object pointer. The function first checks that NULL is not passed (if(!Pstr)) and that the pointer itself is not NULL (if(!*Pstr)). In other words, it ensures that the AjPStr passed in by reference does actually point to something. The function must assume that it points to a string and this will be the case if there are no bugs in the code. This is why pointers when declared should be set to NULL. If they are not and receive some junk value on start-up then this function (and many others like it) will mistakenly assume that it references valid memory and will, at best, head for a segmentation fault or bus error when it tries to address that memory.

The line --thys->Use; reduces the reference count of the string by 1. If this becomes zero then AJFREE is used to free the object. It is called twice, once to free the C-type string (AJFREE(thys->Ptr);) and again to free the object proper (AJFREE(*Pstr);). Some global variables (strFree, strFreeCount and strCount) used internally for debugging and statistics are also set.

Finally, the string object pointer that was passed is set to NULL (*Pstr = NULL;) so that it's ready for re-use by the program.

5.5.2.5. String Functions

Now we'll look at two string functions to see how pointers and memory are handled. ajStrMatchS is a simple function for matching two AJAX strings:

It is passed two AJAX string objects and uses the C function strcmp to compare the C-type strings in the object, returning ajTrue if they are the same or ajFalse otherwise. The function merely reads the value of the strings passed so will never need to allocate memory.

ajStrAssignS is different. This function assigns the value of one string to another. The string is copied rather than just setting a reference (pointer) to the original:

The function takes the source string that is being copied (str) and a destination string Pstr). The destination string (Pstr) will be modified, therefore the address must be passed (AjPStr* Pstr).

You can see that if a NULL pointer is passed for the source string then an empty string is written to the destination string by calling ajStrAssignC. This is the failsafe mechanism that was mentioned before. The function should also check, as a safety measure, whether NULL is passed for the address of the destination string, but it currently doesn't do this.

ajStrSetRes is called (see below) to ensure that the destination string is a new string, not referenced by other string objects, and is big enough for its intended purpose. The length of the destination string (thys->Len) is set and the C-type string in the source string (str->Ptr) is copied (using memmove) to the destination string (thys->Ptr). ajTrue is returned if the string was reallocated or ajFalse otherwise.

It takes the address of a target string and a minimum size (size). If the target string is NULL then a string with a reserved size is allocated using ajStrNewRes. That function has already been explained. Otherwise, if the usage count is greater than 1 or if the current reserved size is less than that requested, the static function strCloneL is called (see below) to make a copy of the string but with a usage count of 1 and a minimum reserved size. ajTrue is returned if the string was reallocated or ajFalse otherwise.

strCloneL takes the address of a target string (Pstr) and a reserved size (size):

It calls ajStrNewResLenC to allocate a string with a reserved size, as has already been explained. The original target string that was passed is deleted by calling the destructor ajStrDel.

It should be said that strings are a special case and that the internals of memory management for most other objects are considerably simpler. It is only for strings, that are so widely used by the other libraries, that special handling is needed for reasons of safety and efficiency.

5.5.3. AJAX Dynamic Array Objects

The memory management macros are nicely illustrated by the array handling functions in ajarr.c. Here we'll consider the constructor and destructor functions for the AjPInt and AjPInt2d objects. These are dynamic one-dimensional (AjPInt) and two-dimensional (AjPInt2d) arrays of integers.

Both objects include variables for the current length of the array (Len) and the reserved size (Res). AjPInt includes a pointer (Ptr) to ajint which, when allocated, will point to an array of ajint values. In contrast, AjPInt2d includes a pointer (Ptr) to AjPInt which will eventually point to an array of AjPInt object pointers.

5.5.3.2. AjPInt Construction and Destruction

ajIntNewRes is a constructor for AjPInt objects, allocating an array with an initial reserved size. The code is shown below:

AJNEW0 is used to allocate memory for a single AjPInt object. AJALLOC0 is called to create an array of AJAX integers (ajint) of size size. arrTotal and arrAlloc are also set which are global variables used for debugging arrays.

AJFREE is called twice. The first call (AJFREE((*thys)->Ptr);) frees the array of integers. The second call (AJFREE(*thys);) frees the object itself. You can see that the pointer (thys) that is passed to the function is set to NULL using the code *thys = NULL;.

5.5.3.3. AjPInt2d Construction and Destruction

ajInt2dNewRes is a constructor for AjPInt2d objects, allocating a 2D array with an initial reserved size for the first dimension. The code is shown below:

AJNEW0 is again called to create the basic object, an instance of an AjPInt2d in this case. AJALLOC0 is called to create an array of pointers to 1D integer array objects (AjPInt) of size size. You can deduce from the code that the second dimensions of the array (the arrays of integers themselves) is not created until it is needed. This is for reasons of efficiency.

The function takes the address of the AjPInt2d object (thys) that is to be freed. To get to the object proper you must dereference thys, i.e. everywhere in the function body where you see *thys. You will recall that Ptr references an array of AjPInt object pointers, each of which points to array of integers.

The integer arrays are freed by calling the destructor function ajIntDel in a loop. This destructor takes the address of a AjPInt. Array notation is used to index the 'i'th element of the AjPInt array, having first dereferenced thys ((*thys)->Ptr[i]). This retrieves an individual AjPInt object, the address of which is needed by the destructor which is why you have ajIntDel(&((*thys)->Ptr[i]));.

AJFREE is then called twice. The first call (AJFREE((*thys)->Ptr);) frees the array of AjPInt. The second call (AJFREE(*thys);) frees the AjPInt2d object itself. You can see that the pointer (thys) that is passed to the function is set to NULL using the code *thys = NULL;.

5.5.3.4. AjPInt2d Putting and Getting Array Elements

The function ajInt2dGet is used to retrieve a value from a 2D integer array. The source code is below:

The element in column elem1 and row elem2 will be retrieved from the array thys. An error is raised if you try to inspect an element that has not been allocated. Otherwise the value of the element is returned.

The function ajInt2dPut is used to load a 2D integer array element with a value. If the array is of insufficient size then the memory is extended as required. The source code is below: