On 28 March 2012 19:53, Akshay Srinivasan <akshaysrinivasan@...> wrote:
> I wouldn't mind helping out with implemeting this; I'm afraid I'll
> break something in the process though.
Things you need to do to get the simple case very fast:
1. Extend BASE-CHAR-LIMIT (not STANDARD-CHAR) to 255.
2. Make READ-SEQUENCE from :EXTERNAL-FORMAT :LATIN-1 streams read
directly into the BASE-STRINGS with no external format translation.
(The issue of stream element type is really secondary -- it's the
external format that matters: you can't do that with say :LATIN-9
streams even if the element-type is BASE-CHAR.)
The element-type default is fine there, though.
To make reading character data faster in general: re-engineer the
current external format system. It's slow, and doesn't do nearly all
the things it really should. (Byte-order marks, newline conversions,
etc.) I have some incomplete code I can pass on to someone willing to
shoulder this.
Cheers,
-- Nikodemus

On 28 March 2012 23:09, Cyrus Harmon <cyrus@...> wrote:
> BASE-CHAR-CODE-LIMIT. And it should probably be 256 so we can read #FF as it's the upper
> exclusive bound.
*blush*
You are right, as usual.
Cheers,
-- Nikodemus

On 03/29/2012 12:58 AM, Nikodemus Siivola wrote:
> On 28 March 2012 19:53, Akshay Srinivasan
> <akshaysrinivasan@...> wrote:
>
I thought standard-char was the 1-byte version ? The C-ish version
works so with a simple-array of standard-char (and base-char).
>> I wouldn't mind helping out with implemeting this; I'm afraid
>> I'll break something in the process though.
>
> Things you need to do to get the simple case very fast:
>
> 1. Extend BASE-CHAR-LIMIT (not STANDARD-CHAR) to 255.
>
> 2. Make READ-SEQUENCE from :EXTERNAL-FORMAT :LATIN-1 streams read
> directly into the BASE-STRINGS with no external format
> translation. (The issue of stream element type is really secondary
> -- it's the external format that matters: you can't do that with
> say :LATIN-9 streams even if the element-type is BASE-CHAR.)
>
> The element-type default is fine there, though.
>
> To make reading character data faster in general: re-engineer the
> current external format system. It's slow, and doesn't do nearly
> all the things it really should. (Byte-order marks, newline
> conversions, etc.) I have some incomplete code I can pass on to
> someone willing to shoulder this.
Yes, I can have a go at it; can I bug you if I stumble upon something
I don't understand ? :)
Akshay

On 29 March 2012 05:13, Akshay Srinivasan <akshaysrinivasan@...> wrote:
> I thought standard-char was the 1-byte version ? The C-ish version
> works so with a simple-array of standard-char (and base-char).
Nope. STANDARD-CHAR is "A fixed set of 96 characters required to be
present in all conforming implementations. Standard characters are
defined in Section 2.1.3 (Standard Characters)."
http://www.lispworks.com/documentation/HyperSpec/Body/t_std_ch.htm#standard-char
BASE-CHAR is this:
http://www.lispworks.com/documentation/HyperSpec/Body/t_base_c.htm
More specifically, currently in SBCL BASE-CHARs are restricted to 7
bits in Unicode builds, and 8 in non-Unicode builds.
> Yes, I can have a go at it; can I bug you if I stumble upon something
> I don't understand ? :)
This list and the IRC channel #sbcl@... are good places to ask things.
Cheers,
-- Nikodemus