On 15/12/09 06:09, Bryan O'Sullivan wrote:
> I just added support to Data.Text for your new Unicode-based Handle
> implementation, and I'd like to write some tests. The natural way to do
> this would be to create Handles that will write to, and read from,
> ByteStrings. Does any such code exist at the moment? I don't see it in
> base or bytestring, though all the necessary abstractions appear to be
> present.
I haven't implemented a bytestring-backed Handle, but as you say all the
abstractions should be present. It would be a great thing to have on
Hackage.
A good starting point would be the mmap-backed Handle code that I wrote
for my talk at the Haskell Implementors Workshop last year. I'd
intended to polish this up and upload to Hackage, but never got around
to it. I've put the code here for now:
http://www.haskell.org/~simonmar/mmap-handle.tar.gz
> Also, the place I hooked into the new I/O machinery was at the next
> level up from CharBuffer. Because the implementation of CharBuffer isn't
> abstract, I had no opportunity to put a text array in there, so there's
> an extra amount of copying that happens when going from byte buffer to
> char buffer to Text. It's a bit of a shame, but I don't see a way around
> it at the moment. Would you be interested in trying to remove that extra
> copy, or is the current interface set in stone?
Yes, you may remember we talked about this in Edinburgh (the conversion
would probably make more sense to you now than it did then :-).
One thing I experimented with is making CharBuffers use UTF-16. You'll
see some instances of #ifdef CHARBUF_UTF16 in the code - it partially
works, I believe the main missing piece is support in the built-in
codecs. I don't think it would be too hard to fix them, they just need
to more abstract about offsets in the CharBuffer;
writeCharBuffer/readCharBuffer already handle the UTF-16 encoding/decoding.
So one possibility is to get this working and then avoid the extra copy
by just taking out the ByteArray# inside a CharBuffer and turning it
into a text buffer. I'm not sure of the details here, but I imagine
something along those lines would work. We would then have to allocate
a new CharBuffer for the Handle.
Another possibility is (as you suggested) to make Handles independent of
the representation of the CharBuffer, making it completely abstract. I
haven't put much thought into that, it might well be a better approach.
It would presumably involve a new existential class constraint in the
Handle for the CharBuffer operations, and we'd have to be careful about
performance: currently I think the CharBuffer operations get inlined nicely.
Cheers,
Simon