Only a little bit of djvu OCR/text contents is currently used, I thinkthat we can do more:1. xml and dsed (LISP-like) representations have pros and cons, that shouldbe carefully considered;2. djvu text layer can host an unlimited number of metadata and free textcontent, indipendent from mapped OCR;3. hOCR (by tesseract) can be translated in dsed, a converting script wouldbe very useful to inject tesseract output into djvu OCR layer;4. IA shares a terrible g-zipped xml, _abbyy.gz, where any possible detailabout OCR recognition can be found, and a converting tool to dsed (perhaps,recovering too many formatting details!) would be very useful.

I'm playing into all from these issues, I'd like to know if any otherwikisource contributor is interested about.