The "Halloween Problem" for XML APIs

Don't feel bad if you don't know what the Halloween problem is. According to the Transact SQL Blog, it's the basis for an interview question that only guru level database programmers can be expected to answer:

Halloween protection is needed to prevent a situation where the physical location of a row within a table changes due to an UPDATE operation. As a result, the same row may be revisited multiple times within the context of a single logical operation, which should not occur.

So, what is this problem (and why is it called "Halloween")? The most cited answer to this question that I could find seems to be a writeup of a 1995 reunion of SQL pioneers:

Don Chamberlin: ... it happened to be discovered on Halloween. The query was … they had submitted a statement that was supposed to give a ten percent raise to every employee who earned less than $25,000. This query ran successfully, with no errors, and if you examined the results, all the employees in the database earned $25,000, because it kept giving them a raise until they reached that level. So that was how the Halloween problem got its name.

In short (and as best as I can understand this as a non-guru), Halloween happens when imperative data manipulation gets tangled up with declarative queries. As a simple example, consider the problem of deferred query execution that we discuss in the XLinq Overview document (section 2.7.4).

The query will fail with a NullReferenceException when it tries to iterate on the phone object that was just deleted. Often, however, the bad results are more subtle -- the wrong data is updated, or updated in the wrong way (as in the example that gave the problem its name). This is not just a problem in XLinq; the .NET framework design guidelines warn that once you modify a value that has been found by an iteration, you should stop iterating. Some .NET classes actually enforce this; see Brad Abrams' discussion of the tradeoffs here. The workaround is pretty simple -- cache the results of the query using something like ToList() or ToArray() -- although it does impose a performance cost somewhere in the system (either the application or framework). In our example:

In the SQL world the problem is understood by implementers, largely hidden from users, and (mostly?) handled by optimizers. In DOM and XLinq, however, it's in the user's face because there isn't a code generator detecting and working around Halloween problems. If you want the benefits of the declarative set processing in XLinq (and the lazy evaluation in DOM tree navigation and XPath processing) but wish to combine them with the convenience of imperative data manipulation in both APIs, you should understand that there is a storm in their convergence zone.

Some rude Q&A:

Q: So why don't you Just Fix It?A: It would be very expensive (in performance) to snapshot data, tedious to modify the API to put in transaction-like semantics to mitigate the problem, and complex (probably pushing the state of the art) to put in some query analyzer that would be smart enough to detect and mitigate the problem in the XLinq implementation. Better, we believe, to educate users about the problem and let them decide when and how to mitigate Halloween problems.

Q: OK, so how do I fix it myself? A: In a word, cache. Aggressively evaluate the query, do the data manipulation on the results, and put the results back in the tree.

Q: Yech. How do I avoid the problem entirely?A: Feel the power of the functional programming style, Luuuke. In DOM you can use XSL, and in XLinq you can use either XSLT or functional construction to transform from one tree to another without imperative manipulation. That will avoid the problem. Or you could use non-lazy tree navigation (no NodeLists in DOM or IEnumerables in XLinq) to locate data to modify.

The XLinq feature team has discussed this problem whenever it has come up in bug reports from our testers (who just love to explore those nasty corner cases where Halloween goblins lurk). We keep coming up with the same advice that we gave in the XLinq Overview whitepaper:

Consider using a “functional” transformation approach rather than an in-place updating approach when designing your data manipulation logic. XLinq’s functional constructors make it quite easy to dynamically produce a new document with structures and values defined as transformations of some input document. You don’t need to learn an event-oriented API or XSLT to build efficient XML transformation pipeline, you can do it all with XLinq.

You don't need to be scared about this functional consturction and transformation stuff just because "introductory" articles on the subject start talking about Haskell, monads, lambda calculus, etc. in about the second paragraph. With the help of our friendly local ex-professors and Haskell geeks Dr. Meijer and Dr. Lämmel, I have learned to stop worrying and love monad comprehensions; we believe thatVB9 / C# 3.0 and the LINQ technologies will help bring functional programming to the masses, whether or not they know that they are doing it. The first step is to understand the power of language integrated QUERY for working with data and the functional construction approach to reshaping it rather than modifying it in place, even though C# 3.0 and XLinq still support traditional imperative data manipulation for all sorts of pragmatic reasons. The critical next step is to understand that mixing the two styles in the same section of code is like begging to be haunted by the Halloween Problem.