Design

Component Programming in D

Composability or the ability to write highly reusable software using a data source → algorithm → data sink model is a feature that depends heavily on language support. New facilities in D now make this possible.

The Specifics of the Model

Going back to what we want from a component, let's get into specifics. What kinds of sources are there?

Streams: Streams provide data sequentially, generally with no history of what came before, and no knowledge of what comes after. Streams live in the now. Files typically present themselves as streams, as is the input from your keyboard, data coming from a network socket, data flowing from a sensor, etc.

Containers: Containers are fixed collections of data that have at last one method of iterating through all elements. Typical containers would include arrays, hash tables, trees, and linked lists.

Generators: Generators are algorithms that produce a stream of data. Examples of generators would be random number generators, iotas, prime number generators, producing successive digits ofpi, and so on. Generators have a known starting point (a seed), and sometimes can produce an infinite succession of values.

List comprehensions: A source passed through an algorithm (such as a filter for list comprehensions), is also a source.

What kinds of algorithms are there?

Filter: A filter takes its input, and produces some subset of that for its output. Examples include selecting only odd values or prime values, searching for regex matches, validating the data, or passing only employees with a security clearance.

Reduce: Reduce consumes its input and produces as output a single value. Examples include summing all the data, finding the maximum, and so forth.

These are just a few examples of many. There are myriad algorithms, each with its own personality. Sorting, searching, pattern matching, encryption, compression, and more come to mind, not to mention a great number of numeric algorithms. Notably, algorithms have specific demands on the layout of the data they operate on. Filter, map, and reduce all operate with simple streams, but many algorithms need more sophisticated data access patterns.

And sinks:

Streams: writing to files, the console, a network socket, etc.

Containers: the container gets built from the data.

Forming a table with three columns, we get:

Source

Algorithm

Sink

file

sort

file

tree

filter

tree

array

map

array

socket

reduce

socket

list

max

list

iota

search

random numbers

odd

word count

A perfect component design should be able to take any item from the Source column, apply any Algorithm, and output to any Sink, even if those Sources, Algorithms, and Sinks were developed independently by groups that have never heard of each other. Furthermore, they need to be able to be snapped together (that is, be composable) like:

file → sort → odd → list

for any combination that makes sense.

Some data format transformation may be in order. For example, quicksort needs random access, something a streaming file doesn't offer reasonably well. So we'd need to insert an adapter component that transforms an input stream into an array, leading to:

file → make_array → sort → odd → list

And, of course, we won't accept performance compromises. C++ STL showed that component design can be just as efficient as hand-built non-component code, and often turns out to be even faster. Any component design based on indirection (that is, virtual dispatch), boxing, or dynamic typing is going to have a harder time delivering good performance. (After all, giving users motivation to replace the abstraction with hand-coded equivalents when they 'really mean it' is a sign that something went wrong in the abstracting process. Components must work for industrial code.)

Summing up what is needed for a modern, excellent component design:

Ability to "snap together" independently developed sources, algorithms, and sinks; that is, composability.

Strong support for encapsulation.

Ability to generate code as efficient as hand-coded versions.

A natural syntax that looks like source → algorithm → sink

Ability to work with types that are not known in advance.

Being a more recent programming language, D enjoys the benefit of standing on the shoulders of giants. Fundamentally improving component programming has enormous potential to change and significantly improve the way we write software. Here is how the D standard library does it.

Sources

What are the minimum requirements for a source?

Determine if there is input data available

Read the current input datum

Advance to the next input datum

To satisfy these requirements, an InputRange is defined as having the following:

A property: bool empty; which returns true if there is no more input, and false if there is.

A property: E front; which returns the current input datum of type E. Input ranges are parameterized by their element type (denoted here as E).

A method: void popFront(); which advances to the next input datum.

Note that this implies a one-element buffer because client code may call front multiple times without moving to the next element. There exists an even simpler notion, the unbuffered range; it only enables a modicum of algorithms and currently it is not formally abstracted in D's standard library.

A very interesting thing about an InputRange is it isn't itself a type. A type is an InputRange if it has the primitives empty, front, and popFront. This makes an InputRange a concept, rather than a type.

Since we all know C file I/O so well, here's an InputRange that reads characters from stdin:

It's pretty clear that anything with those three primitives can be expressed in a for loop the same way. For example, the D foreach statement can automatically do the equivalent with:

foreach (c; StdinByChar())
fputc(c, stdout);

Look, Ma, no types! Already, we're on to something powerfully useful. The InputRange I just described is streaming only. One pass streams are hardly sufficient for complex processing. Let's extend it.

ForwardRange

ForwardRange. It adds a property

@property R save;

Where R is the ForwardRange type. save returns a copy of the ForwardRange in its current state. It does not make a copy of the data being iterated over, only the position. Hence, the original and the copy can now traverse the ForwardRange independently of each other. A singly linked list is a typical forward range.

A ForwardRange is useful when look-ahead is needed. Algorithms that require a ForwardRange are brute force subsequence searching, and generally any pattern matching that needs to do multiple passes over its inputs, are good examples. Merge sort and bring to front also come to mind.

Note that a file stream usually cannot efficiently be a ForwardRange; the way the buffering is done precludes it.

BidirectionalRange

The next extension is for a bidirectional range, and two additional primitives are required:

@property E back;
void popBack();

These work analogously to front and popFront(), except they work their way backwards from the end of the range rather than the beginning. An example of a bidirectional range would be a doubly linked list. But doubly linked lists are not used all that often. What makes bidirectional ranges worth studying is that UTF8- and UTF16-encoded strings are, in fact, bidirectional ranges. Bidirectionality enables a few more efficient algorithms, such as finding the last occurrence of a sequence in another.

RandomAccessRange

RandomAccessRangeoverloads the [] operator so it can be indexed to refer to the nth element:

E opIndex(size_t n);

In addition to indexing, there are two options for the RandomAccessRange's functionality:

A BidirectionalRange that offers the length property or is infinite: @property size_t length;

A ForwardRange that is infinite. (An infinite range always returns false for the empty property.)
More details about InputRanges can be found at dlang.org.

Sinks

Sinks in D are called OutputRanges. An OutputRange is nothing more than a type that supports a method called put:

void put(E e);

The idea is that element e of type E gets "put" into the range. (Some other variations on an OutputRange are detailed at dlang.org.)

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!