reasonablekeith has asked for the
wisdom of the Perl Monks concerning the following question:

Monks,

Servers are down this morning, which gives me time to seek advice re best practice when using XML::Parser and strict. :-)

Basically, when writing event driven parsing scripts, I always find that I need to share some variables between the handlers. My question is, whatís the best way to do this? I offer a simple example, which prints the node elements neatly indented.

Iím quite happy with this, as I donít have any global variables (I know any previous $indent value will be temporarily trashed, but I can cope with that), Iíve got a shared variable (dynamically scoped) to use in my handlers, and my handlers explicitly pick up this shared variable. My main concern is that Iíve had to use Ďno strict subsí define $indent, which makes me think Iíve done a bad thing, and that perhaps thereís a neater way of doing this, without having to turn off strict.

If you declare named subroutines within other subroutines, the value of lexical variables declared in the outter subroutine will not necessarily stay in sync with the value of that variable in the inner subroutine (if you ran this code with warnings, you would get a message saying "Variable $indent will not stay shared"). From perldoc perldiag:

When the inner subroutine is called, it will probably see the value of the outer subroutine's variable as it was before and during the *first* call to the outer subroutine; in this case, after the first call to the outer subroutine is complete, the inner and outer subroutines will no longer share a common value for the variable. In other words, the variable will no longer be shared.

Furthermore, if the outer subroutine is anonymous and references a lexical variable outside itself, then the outer and inner subroutines will never share the given variable.

This problem can usually be solved by making the inner subroutine anonymous, using the sub {} syntax. When inner anonymous subs that reference variables in outer subroutines are called or referenced, they are automatically rebound to the current values of such variables.

This may not be a problem with the code you posted, since handle_start and handle_end will (in theory) always be called in pairs and never be called except by XML::Parser. However, doing this in other parts of your code could result in unexpected results. For example this code:

Notice that when inner() is called from main, the value of $var is still 1, and the second time outter() is called the inner() sub starts with a value of 2 for $var instead of 0, as you might expect.

Arunbear's suggestion declared the $indent variable and both handler subs inside a bare block. This limited the scope of $indent to just those two subs, but it doesn't suffer from the sharing problem described above. If you really want to declare subs within subs, look into closures.

The XML::SAX API is very similar to XML::Parser's Handler style except that because it uses objects, your handlers are methods and can maintain state in the object itself. Which solves exactly the problem you've encountered.

Another advantage of SAX is that it's modular. So when you run into the problem of text content being split across multiple events, you don't need to code around it, you just plug in XML::Filter::BufferText and move on.