I am currently abstracting from the intricate delegate mapping and other ops, these could be handled in an analog fashion.
With the enumerator in place, we can obviously snapshot it to get the current state of the responder chain, and also log that.

Now we can express both current features and possible variations of the Responder Chain architecture compactly as common collection operations. The current dispatch mechanism simply sends the message to the first object that is capable of responding. This corresponds to using the first object of a -select, which is expressed in the -selectFirst convenience method.

Current dispatch

If I understood him correctly, Tim wants the objects in the responder chain to return an object that they would like to respond to the message. This turns the -select into a -collect (without a -collectFirst), but is otherwise very similar.

Tim's dispatch

I hope this does Tim's ideas justice, but I think the succinct formulation should make it easy to tell wether it does or not.

In terms of combining validation with target/action, I'd be somewhat
wary of accidentally triggering actions when validation was meant, though I do appreciate the advantages of combining the two operations.
I am not sure what value the block is adding over just having an additional BOOL parameter in the target/action method.

Wednesday, December 9, 2009

Many times now, I've been asked about more Objective-XML examples. Here's a very simple one. It is adapted from Marcus Zarra's very helpful libxml and xmlreader tutorial. That tutorial shows how to parse a very simple XML format using libxml2.

Code coverage tools

if ( rare-condition ) {
-is this code tested?-
}

If you actually followed test-first, then the code in the rare if is definitely tested, because if there isn't failing test case for the rare condition, then there is no reason for the code or the test to exist.

Another objection could be that people won't follow the techniques. I haven't found this to be a big or recurring practical problem so far,
and agile techniqes tend to be empirically driven. If you suspect that this is a problem you are seeing in your environment, running a code-coverage tool to put some data behind your suspicion may be a good idea.

Test before or test after?

Note that the solution to the code-coverage question above does not
work if tests are written after the fact: in this case, the rare-case is likely not to be covered because it was written without being forced by a failing unit test.

Many if not most of the benefits of TDD are related to the way they shape the
design of the code, all of these benefits obviously don't accrue
if you've already designed or even written the code. In fact, if you ask the XP
folks about it, they will tell you that TDD is not for ensuring
quality, it is exclusively for helping with coding and design.

For example, figuring out
how to test something will force you to come to a clarity about
what the code is supposed to do that just writing the code usually
does not.

Knowing that your tests cover your code (see above) allows you to do extremely
radical refactorings at any point in the development process. The
ability to refactor at any time in turn allows you to keep your
initial designs simple without coding for anticipated changes. Not
coding for anticipated changes that may not occur or may occur
differently than you expect in turns allows you to move more quickly,
which more than pays for the expense of the tests.

Furthermore, the tests force you to think how you can call the
functionality you are about to implement, which means it shapes
architecture towards simplicity, high cohesion and low-coupling.

Generating tests

Auto-generating tests for existing methods is a means of subverting the test-driven approach: there will be the appearance of testing, but with
virtually none of the benefits. It is probably worse than not having
tests, because in the latter case you at least know that you're not
covered.

Is it a good way of starting with unit test coverage for legacy code? No. See the C2 wiki entry for a good explanation of how to approach this case. In short, start refactoring and adding unit tests when you actually need to touch the code,
be it for new features or to fix defects that are scheduled to be fixed.

Tuesday, November 10, 2009

… from my (Smalltalk) experience, the block passed to #collect: is often not a single message send, but rather a small adhoc expression, for which it does not really make sense to define a named method. Or you might need both the element and its key/index… how does HOM deal with that?

These are certainly valid observations, and were some of the reasons
that I didn't really think that much of HOM for the first couple of
years after coming up with it back in 1997 or so. Since then, I've
become less and less convinced that the problems raised are a big concern, for a number of reasons.

Inline vs. Named

One reason is that I actually looked at usage of blocks in the Squeak
image, and found that the majority of blocks with at least one argument
(so not ifTrue:, whileTrue: and other control structures) actually did
contain just a single message send, and so could be immediately expressed
as HOMs. Second, I noticed that there were a lot of fairly large (3+ LOC)
blocks that should have been separate methods but weren't.
That's when I discovered that the presence of blocks actually
encourages bad code, and the 'limitation' of HOMs actually was
encouraging better(-factored) code.

Of course, I wasn't particularly convinced by that line of reasoning,
because it smelled too much like "that's not a bug, that's a feature".
Until that is, I saw others with less vested interest reporting the same
observation:

But are these really limitations? After using higher order messages for a while I've come to think that they are not. The first limitation encourages you move logic that belongs to an object into that object's implementation instead of in the implementation of methods of other objects. The second limitation encourages you to represent application concepts as objects rather than procedural code. Both limitations have the surprising effect of guiding the code away from a procedural style towards better object-oriented design.

My experience has been that Nat is right, having a mechanism that
pushes you towards factoring and naming is better for your code
that one that pushes you towards inlining and anonymizing.

Objective-C I

In fact, the Cocoa example that Apple gives for blocks illustrates this idea
very well. They implement a "Finder like" sorting mechanism using blocks:

The block syntax is so verbose that there is no hope of actually defining the block inline, the supposed raison d'etre for blocks. So we actually need to take the
block out-of-line and name it. So it looks suspiciously like an
equivalent implementation using functions:

Of course, something as useful as a Finder-like comparison sort
really deserves to be exposed and made available for reuse, rather
than hidden inside one specific sort. Objective-C categories are
just the mechanism for this sort of thing:

Note that some of these criticisms are specific to Apple's implementation of blocks, they do not apply in the same way to
Smalltalk blocks, which are a lot less noisy.

Objective-C II

Objective-C has at least one other pertinent difference from
Smalltalk, which is that it already contains control structures
in the basic language, without blocks. (Of course, those control
structures can also take blocks as arguments, but these are the
different types of blocks that are delimited by curly braces and
cannot be passed around as first class objects).

This means that in Objective-C, we already have the ability to
do all the iterating we need, mechanisms such as blocks and
HOM are mostly conveniences, not required building blocks. If
we need indices, use a for loop. If we require keys, use a
key-enumerator and iterate over that.

In fact, I remember when my then colleagues started working
with a enum-filters, a HOM-precursor that's strikingly similar
to the Google Toolbox's GTMSEnumerator+Filter.m. They really took to
the elegance, but then also wanted to use it for various special
cases. They laughed when they realized that those special-cases
were actually already handled better by existing C control structures
such as for-loops.

FP, HANDs and Aggregate Operations

While my dislike of blocks is easy to discount by the usual
inventor's pride (your child must be ugly for mine to be pretty),
that interpretation actually reverses the causation: I came
up with HOM because I was never very fond of blocks. In fact,
when I first encountered Smalltalk during my university
years I was enthralled until I saw the iteration methods.

That's not to say that do:, collect: and friends were not light-years
ahead of Algol-type control structures, they most definitely were
and still are. Having some sort of higher-order mechanism is
vastly superior than not having a higher-order mechanism.
I do wish that "higher order mechanism" and "blocks" weren't
used as synonyms quite as much, because they are not, in fact,
synonymous.

When I first encountered Smalltalk blocks, I had just previously been
exposed to Backus's FP, and that was just so much prettier! In
FP functions are composed using functionals without ever talking
about actual data, and certainly without talking about individual
elements. I have always been on the lookout for higher levels
of expression, and this was such a higher level. Now taking
things down to "here's another element, what do you want to
do with that" was definitely a step back, and quite frankly
a bit of a let-down.

The fundamental difference I see is that in Smalltalk there
is still an iteration, even if it is encapsulated: we iterate
over some collection and then execute some code for each element.
In FP, and in HOM, there is instead an aggregate operation: we
take an existing operation and lift it up as applying to an entire collection.

This difference might seem contrived, but the research done with
the HANDS system demonstrates that it is very real:

After creating HANDS, I conducted another user study to examine the effectiveness of three features of HANDS: queries, aggregate operations, and data visibility. HANDS was compared with a limited version that lacked these features. In the limited version, programmers were able to achieve the desired results but had to use more traditional programming techniques. Children using the full-featured HANDS system performed significantly better than their peers who used the limited version.

I also find this difference to be very real.

The difference between iterating with blocks and lifting operations
to be aggregate operations also shows up in the fact that the lifting can be done on any
combination of the involved parameters, whereas you tend to only
iterate over one collection at a time, because the collection and
the iteration are in focus.

Symmetry

Finally, the comparison to functional languages shows a couple of
interesting asymmetries: in a functional language, higher order
functions can be applied both to named functions and to anonymous
functions. In essence, the higher order mechanism just takes
functions and doesn't care wether they are named or not. Also
the higher order mechanism uses the same mechanisms (functions)
as the base system,

With block-based higher order mechanisms, on the other hand,
we must make the argument an anonymous function (that's what
a block is), and we cannot use a named function, bringing
us back to the conundrum mentioned at the start that this
mechanisms encourages bad code. Not only that, it also turns
out that the base mechanism (messages and methods) is different
from the higher order mechanism, which requires anonymous functions,
rather than methods.

HOM currently solves only the latter part of this asymmetry, making
the higher order mechanism the same as the base mechanism, that
mechanism being messaging in both cases. However, it currently
cannot solve the other asymmetry: where blocks support unnamed,
inline code and not named code, HOM supports named but not unnamed
code. While I think that this is the better choice in the larger
number of cases, it would be nice to actually suport both.

One solution to this problem might be to simply support both blocks
and Higher Order Messaging, but it seems to me that the more
elegant solution would be to support inline definition of more-or-less
anonymous methods that could then be integrated into the Higher Order
Messaging framework.

Saturday, November 7, 2009

Having taken up various forms of flying last year, I have developed a strong interest in the weather, particularly wind information. While there are various web-sites with relevant information, for example Jeff Greenbaum's excellent Wind Conditions Page for Pacifica page, they don't really present the information quite the way I need, and also don't really work well on small mobile devices...

Fixing that should hopefully just be ASMOP. The Weather Underground fortunately has some reasonably well-documented XML APIs, let's see what they have to offer and wether we can get to the data we want.

That looks good, we can see the wind information near the bottom of the output, with keys "wind_degrees" and "wind_mph". So let's grab the values for those keys using the collect Higher Order Message and -objectForKey:.

While the news that Apple is adding blocks to C and Objective-C in the SnowLeopard time frame has been around for some time, a recent article shed some light on the actual API.

While there probably are some places where Objective-C blocks can be useful, I am not really impressed. In the following samples, red is used to show noise, meaning code that is just there to make the compiler happy.

As you can see, the version using blocks is very, very noisy, both syntactically and semantically, especially compared with the HOM version:

[[items collect] stringByAppendingString:@"suffix"];

No prizes for guessing which I'd prefer. To put some numbers on my preference: 234 characters vs. 52, 19 tokens vs. 3, 5 lines vs. 1. In fact, even a plain old C for-loop is more compact and less noisy than our "modern" blocked version:

Thursday, November 5, 2009

For the 95% (or more) of code that isn't performance sensitive, it gives you expressiveness very close to Smalltalk, and for the 5% or less that need high performance, it gets you the performance and predictability of C.

Sunday, January 25, 2009

I've just pushed out a new release of Objective-XML, with some pretty significant new features.

Incremental parsing

This feature, which was already discussed a little in an earlier post, is now available in an official release. In short, Objective-XML will now stream data from network data sources (specified by URL) and produce results incrementally, rather than reading all of the data first and then parsing it. This can make a huge difference in responsiveness and perceived performance for slow networks. CPU and memory consumption will be slightly higher because of extra buffering and buffer stitching required, so this should only be used when necessary.

Static iPhone library

Although Objective-XML has always been compatible with the iPhone, previous releases required copying the pre-requisite files into your project. This burden has now been eased by the inclusion of a static library target. You still need to copy the headers, either MPWMAXParser.h or MPWXmlParser.h (or both).

Unique keys

Previous releases of Objective-XML had an -objectForTag:(int)tag method for quickly retrieving attribute or element values.

In addition to providing faster access, the integer tags also served to disambiguate tag names that might occur in multiple namespaces. To handle these conflicts, there now is a -objectForUniqueKey:aKey namespace:aNamespace method. The namespace objects required for this disambiguation process are now returned by the -setHandler:... and -declareAttributes:... methods, which were previously void.

Default methods

One of the attractive features of DOM parsers is that they do something useful "out of the box": point a DOM parser at some XML and you get back a generic in-memory representation of that XML that you can then start taking apart. However, once you go down that road, you are stuck with the substantial CPU and memory overheads of that generic representation.

Streaming parser like SAX or MAX can be a lot more efficient, but it takes a lot more time and effort until achieving a first useful result. Default methods overcome this hurdle by also delivering an immediately useful generic representation without any extra work. Unlike a DOM, however, this generic representation can be incrementally replaced by more specialized and efficient processing later on.

Tuesday, January 20, 2009

Although Objective-XML's MPWSAXParser mostly provides NSXMLParser compatibility it also provides a number of useful additional features. Among these features is the ability to parse HTML files via the settings of two flags: enforceTagNesting and ignoreCase. By default, these are on and off, respectively, which gives you strict XML behavior. However, by setting enforceTagNesting to NO and ignoreCase to YES, you get a SAX parser that will happily and speedily process HTML.

Saturday, January 17, 2009

By Syntactic Noise, what people mean is extraneous characters that aren't part of what we really need to say, but are there to satisfy the language definition. Noise characters are bad because they obscure the meaning of our program, forcing us to puzzle out what it's doing.

Couldn't have said it better myself, so I'll just quote Martin Fowler. Syntactic noise is one of the reasons I think neither the for(each) statement nor the blocks added to Objective-C are particularly good replacements for Higher Order Messaging.

To me, that extra syntax is quite noisy, though the noise isn't, in fact, just syntactic. We also have to introduce, name and even correctly type a completely redundant stand-in (obj) that we don't really care about. Introducing extra entities is semantic noise. Apart from having to puzzle out what that extra entity is (and that it is, in fact, redundant) every time we read the code, it also brings us back to "element at a time" programming and thinking.

The fact that NSInvocation deals with pointers to values rather than values makes this a bit longer than it needs to be, but the gist is simple enough: iterate over the array, invoke the invocation, return the result.

That leaves the actual trampoline, which is really just an implementation detail for conveniently creating NSInvocation objects.

The example pits Cocoa's NSXMLParser against a custom parser based on libxml2, the benchmark is downloading a top 300 list of songs from iTunes.

More responsiveness using libxml2 instead of NSXMLParser

Based on my previous experience, I was expecting libxml2 to be noticeably faster, but with the advantage in processing speed being less and less important with lower and lower I/O data rates (WiFi to 3G to Edge), as I/O would start to completely overwhelm processing. Was I ever wrong!

While my expectations were technically correct for overall performance, I had completely failed to take responsiveness into account. Depending on the network selected, the NSXMLParser sample would appear to hang for 3 to 50 seconds before starting to show results. Needless to say, that is an awful user experience. The libxml example, on the other hand, would start displaying some results almost immediately. While it also was a bit faster in the total time taken, this effect seemed pretty insignificant compared to the fact that results were arriving continually pretty much during the entire time.

The difference, of course, is incremental processing. Whereas NSXMLParser's -initWithContentsOfURL: method apparently downloads the entire document first and then begins processing, the libxml2-based code in the sample downloads the XML in small chunks and processes those chunks immediately.

Alas, going with libxml2 has clear and significant disadvantages, with the code that uses libxml2 being around twice the size of the NSXMLParser-based code, at around 150 lines (non-comment, non-whitespace). If you have worked with NSXMLParser before, you will know that that is already pretty painful, so just imagine that particular brand of joy doubled, with the 150 lines of code giving you the simplest of parsers, with just 5 tags processed. Fortunately, there is a simpler way.

A simpler way: Objective-XML's SAX

Assuming you have already written a Cocoa-(Touch-)based parser using NSXMLParser, all you need to do is include Objective-XML in your projects and replace the reference to NSXMLParser with a reference to MPWSAXParser, everything else will work just as before. Well, the same except for being significantly faster (even faster than libxml2) and now also more responsive on slow connections due to incremental processing.

I have to admit that not having incremental processing was a "feature" Objective-XML shared with NSXMLParser until very recently, due to my not taking into account the fact that latency lags bandwidth. This silly oversight has now been fixed, with both MPWMAXParser and MPWSAXParser sporting URL-based parsing methods that do incremental processing.

So that's all there is to it, Objective-XML provides a drop-in replacement for NSXMLParser that has all the performance and responsiveness-benefits of a libxml2-based solution without the coding horror.

Even simpler: Messaging API for XML (MAX)

However, even a Cocoa version of the SAX API represents a pretty low-bar in terms of ease of coding. With MAX, Objective-XML provides an API that can do the same job much more simply. MAX naturally integrates XML processing with Objective-C messaging using the following two main features:

Clients get sent element-specific messages for processing

The parser handles nesting, controlled by the client

The following code for building Song objects out of iTunes <item> elements illustrates these two features:

MAX sends the -itemElement:attributes:parser: message to its client whenever it has encountered a complete <item> element, so there is no need for the client to perform string processing on tag names or
manage partial state as in a SAX parser.
The method constructs a song object using data from the <item> element's child elements which it then passes directly to the rest of the app via the parsedSong: message. It does not return an value, so MAX will not build a tree at this level.

Artist, album, title and category are the values of nested child elements of the <item> element. The (common) code shared by all these child-elements gets the character content of the respective elements and is shown below:

Unlike the <item> processing code, which did not return a value, this method does return a value. MAX uses this return value to build
a DOM-like structure which is then consumed by the next higher-level, in this case the -itemElement:attributes:parser: method shown above. Unlike a traditional DOM, the MAX tree structure is built out of domain-specific objects returned incrementally by the client.

These two pieces of sample code demonstrate how MAX can act like both a DOM parser or a SAX parser, controlled simply by wether the processing methods return objects (DOM) or not (SAX). They also demonstrated both element-specific and generic processing.

In the iTunes Song parsing example, I was able to build a MAX parser using about half the code required for the NSXMLParser-based example, a ratio that I have also encountered in larger projects. What about performance? It is slightly better than MPWSAXParser, so also somewhat better than libxml2 and significantly better than NSXMLParser.

Summary and Conclusion

The slightly misnamed XML Performance sample code for the iPhone demonstrates how important managing latency is for perceived end user performance, while showing only very little in terms of actual XML processing performance.

While ably demonstrating the performance problems of NSXMLParser, the sample code's solution of using libxml2 is really not a solution, due to the significant increase in code complexity. Objective-XML provides both a drop-in replacement for NSXMLParser with all the performance and latency benefits of the libxml2 solution, as well as a new API that is not just faster, but also much more straightforward than either NSXMLParser or libxml2.

Since I recently became the Mac tech lead for Livescribe, responsible for delivering the Mac desktop software, I am happy to report that not only did we meet all of our targetdates, we also won Best of Show at MacWorld 2009.

Spending 3 days at the booth was both exhausting and rewarding, the enthusiasmexhibited by customers was absolutely mind-blowing.