A User's View of Object-Oriented Modules

Sean M. Burke

The first time that most Perl programmers run into object-oriented programming is when they need to use a module whose interface is object-oriented. This is often a mystifying experience, since talk of "methods" and "constructors" is unintelligible to programmers who thought that functions and variables were all there was to worry about.

Articles and books that explain object-oriented programming (OOP) do so from the perspective of how to create programs. That's understandable, and if you learn to write object-oriented code of your own, you'll find it easy to use object-oriented code that others write. But this approach is the long way around for people whose immediate goal is just to use existing object-oriented modules, but who don't yet want to know all the gory details of having to write such modules for themselves.

This article is for those programmers - programmers who want to know about objects from the perspective of using object-oriented modules.

Modules And Their Functional Interfaces

Modules are the main way that Perl provides for bundling up code for later use by yourself or others. As I'm sure you can't help noticing from reading The Perl Journal, CPAN (the Comprehensive Perl Archive Network) is the repository for modules (or groups of modules) that others have written, to do anything from composing music to accessing Web pages. A good deal of those modules even come with every installation of Perl.

One module that you may have used before, and which is fairly typical in its interface, is Text::Wrap. It comes with Perl, so you don't even need to install it from CPAN. You use it in a program of yours, by having your program say:

use Text::Wrap;

and after that, you can access a function called wrap, which inserts line-breaks in text that you feed it, so that the text will be wrapped to seventy-two (or however many) columns.

The way this use Text::Wrap business works is that the module Text::Wrap exists as a file Text/Wrap.pm somewhere in one of your library directories. That file contains Perl code which,(And mixed in with the Perl code, there's documentation, which is what you read with perldoc Text::Wrap. The perldoc program simply ignores the code and formats the documentation text, whereas use Text::Wrap loads and runs the code while ignoring the documentation.
) among other things, defines a function called Text::Wrap::wrap(), and then exports that function. That means that when you say wrap() after having said use Text::Wrap, you'll actually be calling the Text::Wrap::wrap() function. Some modules don't export their functions, so you have to call them by their fully qualified name: Text::Wrap::wrap(...parameters...).

Regardless of whether a module exports the functions it provides, it's basically just a container for chunks of code that do useful things. The way the module allows for you to interact with it is called its interface. And when, like with Text::Wrap, its interface consists of functions, the module is said to have a functional interface.(The term "function" (and therefore "functional ") has various meanings. Iím using the term here to mean essentially a subroutine: a chunk of code called by some name which takes some parameters and returns a value.)

Using modules with functional interfaces is straightforward - instead of defining your own wrap() function with sub wrap { ... }, you entrust Text::Wrap to do that for you, along with whatever other functions it defines and exports according to the module's documentation. Without too much bother, you can even write your own modules to contain your frequently used functions; I suggest having a look at the perlmod documentation for more leads on this.

Modules With Object-Oriented Interfaces

Suppose that one day you want to write a program that automates the process of transferring a bunch of files from one server down to your local machine, and then off to another server, all with FTP.

A quick browse through search.cpan.org turns up the module Net::FTP, which you can download and install using the normal installation procedures (unless your system administrator has already installed it, as many have).

Like Text::Wrap or any other module with a familiar functional interface, you start off using Net::FTP in your program by saying:

use Net::FTP;

However, that's where the similarity ends. The first hint is that the documentation for Net::FTP refers to it as a class. A class is found in a module with an object-oriented interface.

Whereas modules like Text::Wrap provide bits of useful code as functions to be called like function(...parameters...) or like PackageName::function(...parameters...), Net::FTP and other modules with object-oriented interfaces provide methods. Methods are like functions in that they have a name and parameters; but methods look different, and are different, because you have to call them with a class name or an object as an argument. First I'll explain the syntax for method calls, and then I'll explain what they all mean.

Class methods. Some methods are meant to be called as class methods, with the class name (same as the module name) as a special argument. Class methods look like this:

ClassName->methodname(parameter1, parameter2, ...)
ClassName->methodname() # if no parameters
ClassName->methodname # same as above

Most class methods are for making new objects. The methods that make objects are called constructors, and the process of making them is called "constructing" or "instantiating" an object. Constructor methods typically have the name new, or something including "new" (for instance, new_from_file), but they can conceivably be named anything. DBI's constructor is named connect, for example.

The object returned by a constructor is typically captured in a scalar variable:

$object = ClassName->new(param1, param2...);

Object methods. Once you have an object (more later on exactly what that is), you can use the other kind of method call syntax, the syntax for object method calls. You call object methods just like class methods, except that instead of the ClassName as the special argument, you use an expression that yields an object. Usually this is just a scalar variable that captures the output of the constructor. Object method calls look like this:

$object->methodname(parameter1, parameter2, ...);
$object->methodname() # if no parameters
$object->methodname # same as above
They are sometimes written like this as well:
methodname $object (parameter1, parameter2, ...)
methodname $object # if no parameters

Incidentally, I suggest always using the parentheses and ->,(The character-pair -> is supposed to look like an arrow, not "negative greater-than"!)
and avoiding the forms that begin methodname $object or methodname ModuleName. When everything's going right, they all mean the same thing as the -> variants, but the -> is more visually distinct from function calls, as well as being immune to some rare but puzzling ambiguities that can arise when you're trying to call methods that have the same name as subroutines you've defined.

Syntax aside, all this talk of constructing objects and object methods begs the question - what is an object? There are several angles to this question that the rest of this article will answer in turn: What can you do with objects? What's in an object? What's an object value? And why do some modules use objects at all?

What Can You Do With Objects?

You've seen that you can make objects, and call object methods with them. But what are object methods for? The answer depends on the class. For Net::FTP, an object represents a session between your computer and an FTP server. So the methods you call on a Net::FTP object are for doing whatever you'd need to do across an FTP connection. You make the session and log in:

In short, object methods are for doing things related to (or with) whatever the object represents. For FTP sessions, it's about sending commands to the server at the other end of the connection, and that's about it - there, methods are for doing something to the world outside the object, and the object is just something that specifies what bit of the world (well, what FTP session) to act upon.

With most other classes, however, the object itself stores some kind of information, and it typically makes no sense to do things with such an object without considering the data inside.

What's In An Object?

An object is (with rare exceptions) a data structure containing a bunch of attributes, each of which has a value. Some of the object's attributes are private, meaning you'll never see them documented because they're not for you to read or write; but most of the object's documented attributes are at least readable, and usually writeable, by you. Net::FTP objects are a bit thin on attributes, so we'll use objects from the class Business::US_Amort for this example. Business::US_Amort is a very simple class (available from CPAN) that I wrote for making loan calculations (specifically amortization, using U.S.-style algorithms).

An object of the class Business::US_Amort represents a loan with particular attributes. The most basic attributes of a "loan object" are its interest rate, its principal (how much money it's for), and its term (how long it'll take to repay). You need to set these attributes before anything else can be done with the object, through accessors. An accessor is simply a method that accesses an attribute, either retrieving (reading, getting, fetching) or storing (writing, setting, putting). Moreover, accessors are typically the only way you can change an object's attributes.

Usually, for simplicity's sake, an accessor is named after the attribute it reads or writes. With Business::US_Amort objects, the accessors you need to use first are principal, interest_rate, and term. Then, with at least those attributes set, you can call the run() method to figure out several things about the loan. Then you can call various accessors, like total_paid_toward_interest(), to read the results:

This illustrates a convention common to accessors: calling the accessor with no arguments (as with $loan->total_paid_interest) usually means to read the value of that attribute, but providing a value (as with $loan->term(20)) means you want the attribute to be set to that value. This stands to reason: why would you be providing a value, if not to store it in the attribute?

Although a loan's term, principal, and interest rates are all single numeric values, an object's values can be any kind of scalar, array, or hash. Moreover, an attribute's values can itself be an object. For example, consider the MIDI files I wrote about in TPJ #13. A MIDI file usually consists of several musical tracks, and is complex enough to merit being an object of its own with attributes like the tempo, the file-format variant it's in, and the list of instrument tracks in the file. But tracks themselves are complex enough to be objects too, with attributes like their track type, and either a list of MIDI commands or raw data. So I ended up writing the MIDI modules so that the tracks attribute of a MIDI::Opus object is an array of objects from the class MIDI::Track. This may seem like a runaround - you ask what's in one object, and get another object, or several! But in this case, it reflects exactly what the module is for - MIDI files contain MIDI tracks, which then contain data.

What Is An Object Value?

When you call a constructor like Net::FTP->new(hostname), store the result in a variable, like $session or $loan. We've been pretending that those are the objects you're dealing with. This idea is innocuous up to a point, but it's really a misconception.(It depends, very presidentially, on what our definitions of "are" are.) The reality is that the variables $session and $loan are not objects. It's a little more indirect - they hold values that symbolize objects. The kind of value that $session or $loan hold is what I'm calling an object value.

To understand what kind of value this is, first think about the other kinds of scalar values you know about: The first two scalar values you probably ever ran into in Perl are numbers and strings, which you learned (or just assumed) will usually turn into each other on demand; that is, the three-character string "2.5" can become the quantity two and a half, and vice versa. Then, especially if you started using perl -w early on, you learned about the undefined value, which can turn into 0 if you treat it as a number, or an empty string if you treat it as a string.(You may also have learned about references, in which case youíre ready to hear that object values are just a kind of reference, except that they reflect the class that created them in addition to being an array reference, hash reference, or something else. If this makes makes sense to you, and you want to know more about how objects are implemented in Perl, have a look at the perltoot documentation or the sidebar later in this article.)

And now you're learning about object values. An object value is a value that points to a data structure somewhere in memory, which is where all the attributes for this object are stored. That data structure as a whole belongs to a class (probably the one you named in the constructor method, like ClassName->new), so that the object value can be used as part of object method calls. If you want to actually see what an object value is, you might try just saying print $object. That'll get you something like this:

Net::FTP=GLOB(0x20154240)

or

Business::US_Amort=HASH(0x15424020)

That's not very helpful if you wanted to really get at the object's insides, but that's because the object value is only a symbol for the object. This may all sound very abstruse and metaphysical, so a real-world allegory might be helpful:

You get an advertisement in the mail saying that you have been (im)personally selected to have the rare privilege of applying for a credit card. For whatever reason, this offer sounds good to you, so you fill out the form and mail it back to the credit card company. They gleefully approve the application, create your account, and send you a card with a number on it.

Now, you can do things with the number on that card - clerks at stores can ring up things you want to buy, and charge your account by keying in the number on the card. You can pay for things you order online by punching in the card number as part of your online order. You can pay off part of the account by sending the credit card people some of your money (well, a check) with some note (usually the pre-printed slip) that has the card number for the account you want to pay toward. And you should be able to call the credit card company's computer and ask it things about the card, like its balance, its credit limit, its APR, and maybe an itemization of recent purchases and payments.

Now, what you're really doing is manipulating a credit card account, a completely abstract entity with some data attached to it (balance, APR, etc). But for ease of access, you have a credit card number that is a symbol for that account. Now, that symbol is just a bunch of digits, and the number is effectively meaningless and useless in and of itself - but in the appropriate context, it's understood to mean the credit card account you're accessing.

This is exactly the relationship between objects and object values, and from this analogy, several facts about object values are a bit more explicable:

 An object value does nothing in and of itself, but it's useful when you use it in the context of an $object->method call, the same way that a card number is useful in the context of some operation dealing with a card account.

Moreover, several copies of the same object value all refer to the same object, the same way that making several copies of your card number won't change the fact that they all still refer to the same single account (this is true whether you're "copying" the number by just writing it down on different slips of paper, or whether you go to the trouble of forging exact replicas of your own plastic credit card). That's why this:

That is, $z and $y and $x are three different slots for values, but what's in those slots are all object values pointing to the same object. You don't have three different FTP connections, just three variables with values pointing to the same single FTP connection.

 You can't tell much of anything about the object just by looking at the object value, any more than you can see your credit account balance by holding the plastic card up to the light, or by adding up the digits in your credit card number.

 You can't just make up your own object values and have them work - they can come only from constructor methods of the appropriate class. Similarly, you get a credit card number only by having a bank approve your application for a credit card account - at which point they let you know what the number of your new card is.

Now, there's even more to the fact that you can't just make up your own object value: even though you can print an object value and get a string like Net::FTP=GLOB(0x20154240), that string is just a representation of an object value.

Internally, an object value has a basically different type from a string, or a number, or the undefined value - if $x holds a real string, then that value's slot in memory says "this is a value of type string, and its characters areÖ", whereas if it's an object value, the slot in memory says, "this is a value of type reference, and the location in memory that it points to is...". By looking at what's at that location, Perl can tell the class of what's there.

Perl programmers typically don't have to think about all these details. Many other languages force you to be more conscious of the differences between all of these (and also between types of numbers, which are stored differently depending on their size and whether they have fractional parts). But Perl does its best to hide the different types of scalars from you - it turns numbers into strings and back as needed, and takes the string or number representation of undef or of object values as needed. However, you can't go from a string representation of an object back to the object value. And that's why this doesn't work:

This fails because $aol ends up holding merely the string representation of the object value from $x, not the object value itself - when join tried to join the characters of the "strings" $x, $y, and $z, Perl saw that they weren't strings at all, so it gave join their string representations.

Unfortunately, this distinction between object values and their string representations doesn't really fit into the analogy of credit card numbers, because credit card numbers really are numbers - even though they don't express any meaningful quantity, if you stored one in a database as a quantity (as opposed to an ASCII string), that wouldn't stop them from being valid as credit card numbers.

This may seem rather academic, but there are two common mistakes programmers new to objects often make, both having to do with the distinction between object values and their string representations.

The first common error involves forgetting (or never having known in the first place) that when you use a value as a hash key, Perl uses the string representation of that value. When you want to use the numeric value two and a half as a key, Perl turns it into the three-character string "2.5". But if you then want to use that string as a number, Perl will treat it as meaning two and a half, so you're usually none the wiser that Perl converted the number to a string and back. But recall that Perl can't turn strings back into objects - so if you tried to use a Net::FTP object value as a hash key, Perl actually used its string representation, like "Net::FTP=GLOB(0x20154240)", but that string is unusable as an object value. (Incidentally, there's a module called Tie::RefHash which implements hashes that do let you use real object-values as keys.)

The second common error with object values is in trying to save an object value to disk (whether printing it to a file, or storing it in a conventional database file). All you'll get is the string, which will be useless.

When you want to save an object and restore it later, you may find that the object's class already provides a method specifically for this. For example, MIDI::Opus provides methods for writing an object to disk as a standard MIDI file. The file can later be read back into memory by a MIDI::Opus constructor method, which returns a new MIDI::Opus object representing whatever file you tell it to read into memory. Similar methods are available with, for example, classes that manipulate graphic images and can save them to files, which can be read back later.

But some classes, like Business::US_Amort, don't give you a way to store an object in a file. In that case, you can try using any of the Data::Dumper, Storable, or FreezeThaw modules. Using these will be unproblematic for objects of most classes, but it may run into limitations with others. For example, a Business::US_Amort object can be turned into a string with Data::Dumper, and that string written to a file. When it's restored later, its attributes will be accessible as normal. But in the unlikely case that the loan object was saved in mid-calculation, the calculation may not be resumable. This is because of the way that that particular class does its calculations, but similar limitations may occur with objects from other classses.

But often, even wanting to save an object is basically wrong. What would saving an FTP session even mean? Saving the hostname, username, and password? The current directory on both machines? The local TCP/IP port number? In the case of "saving" a Net::FTP object, you're better off just saving whatever details you actually need for your own purposes, so that you can make a new object later and just set those values for it.

So Why Do Some Modules Use Objects?

All these details of using objects are definitely enough to make you wonder - is it worth the bother? If you're a module author, writing your module with an object-oriented interface restricts the audience of potential users to those who understand the basic concepts of objects and object values, as well as Perl's syntax for calling methods. Why complicate things by having an object-oriented interface?

Basically, a module should have an object-oriented interface because the module's insides are written in an object-oriented style. This article is about the basics of object-oriented interfaces, and it'd be going far afield to explain object-oriented design. But the short story is that object-oriented design is just one way of attacking messy problems. It's a way that many programmers find very helpful (and which others happen to find to be far more of a hassle than it's worth, incidentally), and it just happens to show up for you, the module user, as the style of interface.