<- Well, some parsers such as the C ones: Jason Diamonds' Repat and
<- my Rapier, can handle all the data up to the illegal character
<- sequences and are only limited by I/O speed, not memory. The java
<- ones all tend to leak until they collapse, unless your machine has
<- oodles of memory.
I'm finding this a bit weird - why should a (Java) parser be particularly
vulnerable to leaks? Surely it's only going to be doing a pretty simple
operation a lot of times.
<- > <- # 0 - before first Adult topic
<- > <- # 1 - during Adult topics
<- > <- # 2 - afterwards
<-
<- So it has three states in processing. not really too
<- important... but best not to create an RDF pr0n database.
Nope, still don't get it. Is the end result the same? (I'm looking for neat
ways of pulling out chunks of the DMOZ tree, though I hadn't really got p|n
in mind).