Re: [Sbcl-devel] Boyer-Moore SEARCH transform

William Harold Newman <william.newman@...> writes:
> On Sat, Dec 06, 2003 at 07:27:36PM +0000, Christophe Rhodes wrote:
>> There are TODOs marked, to which I would add "deuglification". Other
>> comments welcome.
>
> It's neat, and it would be nice for our benchmark scores:-) but I'm
> not convinced that it's an appropriate thing to build into the compiler:
> * I'm hard-pressed to imagine cases (other than FILL-STRINGS) where
> an application would really care deeply about its performance for
> this case of SEARCH. I would guess that almost every application
> which spends a measurable amount of time on this case will spend a
> negligible proportion of time on it, being I/O bound.
As I say (I can't remember if I did, actually), for the benchmark's
purposes the Boyer-Mooreness of it doesn't win very much; what's
important is removing the generic array access to specialized access.
The current implementation of SEARCH does O(MN) _unspecialized_
(i.e. HAIRY-DATA-VECTOR-REF) accesses to its arguments, which
represents most of the improvements. So if this itself doesn't go in,
I'd still recommend doing _something_ about the suboptimal
string-in-string search, because I think searching for strings in
strings is more common than searching for generic vectors in other
generic vectors. Having a specialized transform that either opencodes
the simple search or calls a specialized routine is perfectly
plausible, of course; cmucl does the latter with its search transform.
> * It looks mostly portable (e.g. as a compiler macro unportably added
> to CL:SEARCH, or as an inline function SEARCH-FAST, or something;
> modulo safe-in-SBCL assumptions like 256 characters). Thus, even if
> someone does come up with an app which wants performance here, it's
> not clear that it should be provided by a clever transform embedded
> in a general-purpose compiler.
Yeah, though that 256 character assumption might not be safe for too
much longer... last year, at the UK SBCL AGM Dan and I agreed that
Unicode wasn't going to happen this year barring outside intervention;
this year, I'm more inclined to be aggressive about it.
(Aside: the UK SBCL AGM, otherwise known as "Dan and Christophe have
lunch", will probably happen sometime later this month. If anyone's
in London between Christmas and the New Year, or else in early
January, and would like to have a drink, say so!)
> If it does go into the compiler, it should probably be conditional on
> (> SPEED SPACE), [...]
Oh yes, definitely. A conversion to a call to a specialized search
function for simple-base-strings is less space-hungry, and might make
a good compromise, either for all compilation policies or for
(>= SPEED SPACE).
Cheers,
Christophe
--
http://www-jcsu.jesus.cam.ac.uk/~csr21/ +44 1223 510 299/+44 7729 383 757
(set-pprint-dispatch 'number (lambda (s o) (declare (special b)) (format s b)))
(defvar b "~&Just another Lisp hacker~%") (pprint #36rJesusCollegeCambridge)

On Sat, Dec 06, 2003 at 07:27:36PM +0000, Christophe Rhodes wrote:
> Since Eric threw down the gauntlet over on cmucl-imp ("we still beat
> sbcl on performance" :-), attached is a very first cut at
> implementation of Boyer-Moore search for constant pattern argument.
>
> For calls of the form (search "foo" <simple-base-string>), the
> transform computes the Boyer-Moore skip tables (in an ugly,
> inefficient way that is written like that because I was
> transliterating from a C implementation) and inserts the literal
> tables into the generated code, so there's no preprocessing overhead
> at runtime.
>
> This causes a 20x-30x or so speedup on the FILL-STRINGS (sic)
> benchmark from Eric Marsden's benchmark suite, which is currently
> dominated by a call to generic SEARCH of the form
> (search "xxxd" <big-string-of-xs>)
> so in fact the Boyer-Mooreness of the search isn't winning us much in
> this case, but the fact that all array references are open-coded is
> also helpful.
>
> There are TODOs marked, to which I would add "deuglification". Other
> comments welcome.
It's neat, and it would be nice for our benchmark scores:-) but I'm
not convinced that it's an appropriate thing to build into the compiler:
* I'm hard-pressed to imagine cases (other than FILL-STRINGS) where
an application would really care deeply about its performance for
this case of SEARCH. I would guess that almost every application
which spends a measurable amount of time on this case will spend a
negligible proportion of time on it, being I/O bound.
* It looks mostly portable (e.g. as a compiler macro unportably added
to CL:SEARCH, or as an inline function SEARCH-FAST, or something;
modulo safe-in-SBCL assumptions like 256 characters). Thus, even if
someone does come up with an app which wants performance here, it's
not clear that it should be provided by a clever transform embedded
in a general-purpose compiler.
Therefore, I'd suggest putting it in a non-SBCL library instead.
If it does go into the compiler, it should probably be conditional on
(> SPEED SPACE), since I for one would otherwise be surprised to have
(cond ((and (search "<html>" s)
(search "<body>" s)
(search "</body>" s))
:html)
((or (search " The " s)
(search " the " s)
(search " to " s)
(search " and " s)
:english)
...)
(or whatever) turn into as much machine code as it looks as though
this would.
--
William Harold Newman <william.newman@...>
In examining the tasks of software development versus software maintenance,
most of the tasks are the same -- except for the additional maintenance
task of "understanding the existing product". -- Robert L. Glass, _Facts
and Fallacies of Software Engineering_
PGP key fingerprint 85 CE 1C BA 79 8D 51 8C B9 25 FB EE E0 C3 E5 7C

William Harold Newman <william.newman@...> writes:
> On Sat, Dec 06, 2003 at 07:27:36PM +0000, Christophe Rhodes wrote:
>> There are TODOs marked, to which I would add "deuglification". Other
>> comments welcome.
>
> It's neat, and it would be nice for our benchmark scores:-) but I'm
> not convinced that it's an appropriate thing to build into the compiler:
> * I'm hard-pressed to imagine cases (other than FILL-STRINGS) where
> an application would really care deeply about its performance for
> this case of SEARCH. I would guess that almost every application
> which spends a measurable amount of time on this case will spend a
> negligible proportion of time on it, being I/O bound.
As I say (I can't remember if I did, actually), for the benchmark's
purposes the Boyer-Mooreness of it doesn't win very much; what's
important is removing the generic array access to specialized access.
The current implementation of SEARCH does O(MN) _unspecialized_
(i.e. HAIRY-DATA-VECTOR-REF) accesses to its arguments, which
represents most of the improvements. So if this itself doesn't go in,
I'd still recommend doing _something_ about the suboptimal
string-in-string search, because I think searching for strings in
strings is more common than searching for generic vectors in other
generic vectors. Having a specialized transform that either opencodes
the simple search or calls a specialized routine is perfectly
plausible, of course; cmucl does the latter with its search transform.
> * It looks mostly portable (e.g. as a compiler macro unportably added
> to CL:SEARCH, or as an inline function SEARCH-FAST, or something;
> modulo safe-in-SBCL assumptions like 256 characters). Thus, even if
> someone does come up with an app which wants performance here, it's
> not clear that it should be provided by a clever transform embedded
> in a general-purpose compiler.
Yeah, though that 256 character assumption might not be safe for too
much longer... last year, at the UK SBCL AGM Dan and I agreed that
Unicode wasn't going to happen this year barring outside intervention;
this year, I'm more inclined to be aggressive about it.
(Aside: the UK SBCL AGM, otherwise known as "Dan and Christophe have
lunch", will probably happen sometime later this month. If anyone's
in London between Christmas and the New Year, or else in early
January, and would like to have a drink, say so!)
> If it does go into the compiler, it should probably be conditional on
> (> SPEED SPACE), [...]
Oh yes, definitely. A conversion to a call to a specialized search
function for simple-base-strings is less space-hungry, and might make
a good compromise, either for all compilation policies or for
(>= SPEED SPACE).
Cheers,
Christophe
--
http://www-jcsu.jesus.cam.ac.uk/~csr21/ +44 1223 510 299/+44 7729 383 757
(set-pprint-dispatch 'number (lambda (s o) (declare (special b)) (format s b)))
(defvar b "~&Just another Lisp hacker~%") (pprint #36rJesusCollegeCambridge)

On Sat, Dec 06, 2003 at 10:37:57PM +0000, Christophe Rhodes wrote:
> William Harold Newman <william.newman@...> writes:
>
> > On Sat, Dec 06, 2003 at 07:27:36PM +0000, Christophe Rhodes wrote:
> >> There are TODOs marked, to which I would add "deuglification". Other
> >> comments welcome.
> >
> > It's neat, and it would be nice for our benchmark scores:-) but I'm
> > not convinced that it's an appropriate thing to build into the compiler:
> > * I'm hard-pressed to imagine cases (other than FILL-STRINGS) where
> > an application would really care deeply about its performance for
> > this case of SEARCH. I would guess that almost every application
> > which spends a measurable amount of time on this case will spend a
> > negligible proportion of time on it, being I/O bound.
>
> As I say (I can't remember if I did, actually), for the benchmark's
> purposes the Boyer-Mooreness of it doesn't win very much; what's
> important is removing the generic array access to specialized access.
> The current implementation of SEARCH does O(MN) _unspecialized_
> (i.e. HAIRY-DATA-VECTOR-REF) accesses to its arguments, which
> represents most of the improvements. So if this itself doesn't go in,
> I'd still recommend doing _something_ about the suboptimal
> string-in-string search, because I think searching for strings in
> strings is more common than searching for generic vectors in other
> generic vectors. Having a specialized transform that either opencodes
> the simple search or calls a specialized routine is perfectly
> plausible, of course; cmucl does the latter with its search transform.
I fully agree that doing O(MN) H-D-V-R operations per SEARCH is silly.
But I'd suggest a transform (for SEARCH on all sequence types, or at
least all VECTORs, not just on strings) which just uses the trivial
algorithm and takes care to avoid HAIRY-DATA-VECTOR-REF. It seems as
though it would be a good idea to have such transforms for essentially
every CL sequence function (as we already do for many of them). It was
only the Boyer-Moore-ness (and attendant limitation to a
probably-uncommon special case) that I thought was probably overkill.
I had somehow thought that DEFINE-SEQUENCE-TRAVERSER SEARCH and
DEFMACRO VECTOR-SEARCH were intended to propagate enough type
information to every AREF that it ought to avoid this kind of problem,
but maybe not. Possibly a good fix would be to make them do so,
perhaps in a way that would make many related definitions do so, e.g.
tweaking DEFINE-SEQUENCE-TRAVERSER so that it expands not only into a
DEFUN but also into a DEFTRANSFORM (>= SPEED SPACE) for the
arg-types-known case.
I don't think it's all that reasonable for users to expect
(SEARCH "foo" ...)
to expand into Boyer-Moore (though some wouldn't mind). But it does
seem reasonable (modulo the way that a lot of work is involved, and who
knows when it might be done?) to expect
(SEARCH (THE SIMPLE-STRING X) (THE STRING Y))
or
(SEARCH (THE (VECTOR T) X) (THE LIST Y))
or whatever to expand into something which doesn't do runtime vector
type dispatch on every array access.
Of course (what you already know, but other readers may not realize)
making sequence functions all work in the "obvious" reasonably
microefficient way on all sequence types (especially trying to support
all combinations of LIST and VECTOR, trying to do WITH-ARRAY-DATA-ish
operations only once per sequence operation for any given hairy array,
and possibly trying to do obvious word-at-a-time optimizations on
BIT-VECTORs) is a substantial amount of tedious work, so actually
implementing this would be a big project. But at least it could
officially be "it would be nice", so that when one is disappointed
that your favorite sequence function does a lot of unnecessary work on
every lookup, one needn't be shy about submitting a clean patch.
> (Aside: the UK SBCL AGM, otherwise known as "Dan and Christophe have
> lunch", will probably happen sometime later this month. If anyone's
> in London between Christmas and the New Year, or else in early
> January, and would like to have a drink, say so!)
I enjoyed visiting London about a decade ago (when my brother was in
grad school there) but I'm not doing so well at meeting SBCL
committers.:-(
--
William Harold Newman <william.newman@...>
Every program eventually becomes rococo, and then rubble. -- Alan Perlis
PGP key fingerprint 85 CE 1C BA 79 8D 51 8C B9 25 FB EE E0 C3 E5 7C

William Harold Newman <william.newman@...> writes:
> I fully agree that doing O(MN) H-D-V-R operations per SEARCH is silly.
> But I'd suggest a transform (for SEARCH on all sequence types, or at
> least all VECTORs, not just on strings) which just uses the trivial
> algorithm and takes care to avoid HAIRY-DATA-VECTOR-REF. It seems as
> though it would be a good idea to have such transforms for essentially
> every CL sequence function (as we already do for many of them). It was
> only the Boyer-Moore-ness (and attendant limitation to a
> probably-uncommon special case) that I thought was probably overkill.
OK. More below
> I had somehow thought that DEFINE-SEQUENCE-TRAVERSER SEARCH and
> DEFMACRO VECTOR-SEARCH were intended to propagate enough type
> information to every AREF that it ought to avoid this kind of problem,
> but maybe not. Possibly a good fix would be to make them do so,
> perhaps in a way that would make many related definitions do so, e.g.
> tweaking DEFINE-SEQUENCE-TRAVERSER so that it expands not only into a
> DEFUN but also into a DEFTRANSFORM (>= SPEED SPACE) for the
> arg-types-known case.
DEFINE-SEQUENCE-TRAVERSER and VECTOR-SEARCH propagate enough type
information to distinguish between ELT and NTH/AREF, because whether
something is a list or a vector can change the algorithm used for
sequence traversal. However, they don't propagate enough to the body
of the search for type discrimination between vectors.
Probably it wouldn't be good to, either; we have about, what, 20
specialized vectors? So VECTOR-SEARCH would expand to 400 copies of
the same body with different specializations (yay for cross-product
effects). Which would probably raise its compile time above even
UNIX-SELECT :-)
So to avoid this combinatorial explosion of code (not that we're
particularly careful about our space requirements at the moment, but
anyway) we probably need to be a little careful. Two things come to
mind.
Firstly, we could define functions for the common cases
(e.g. SEARCH/SIMPLE-BASE-STRING&SIMPLE-BASE-STRING) so that we don't
have to expand inline, we can simply compile to a call to that
specialized function (or inline if speed > space, say). This at least
means that your
((or (search "html" s) (search "body" s) ...) ...)
doesn't explode in size under ordinary compilation and is yet
relatively efficient. As discussed above, though, we can't reasonably
do this for all 400 combinations. So in addition we should provide a
transform that knows how to expand SEARCH.
> I don't think it's all that reasonable for users to expect
> (SEARCH "foo" ...)
> to expand into Boyer-Moore (though some wouldn't mind). But it does
> seem reasonable (modulo the way that a lot of work is involved, and who
> knows when it might be done?) to expect
> (SEARCH (THE SIMPLE-STRING X) (THE STRING Y))
> or
> (SEARCH (THE (VECTOR T) X) (THE LIST Y))
> or whatever to expand into something which doesn't do runtime vector
> type dispatch on every array access.
Right. Of course, given the above, we could expand to Boyer-Moore at
no extra cost for certain cases; if space is a requirement, we can
call to a function giving our precomputed tables as arguments. I'm
still willing to believe that it's overkill, but it need not be as
expensive as it first seemed.
Cheers,
Christophe
--
http://www-jcsu.jesus.cam.ac.uk/~csr21/ +44 1223 510 299/+44 7729 383 757
(set-pprint-dispatch 'number (lambda (s o) (declare (special b)) (format s b)))
(defvar b "~&Just another Lisp hacker~%") (pprint #36rJesusCollegeCambridge)

On Mon, Dec 08, 2003 at 11:22:50AM +0000, Christophe Rhodes wrote:
[unreasonable expense of speculatively compiling a different
out-of-line SEARCH for each case]
> Firstly, we could define functions for the common cases
> (e.g. SEARCH/SIMPLE-BASE-STRING&SIMPLE-BASE-STRING) so that we don't
> have to expand inline, we can simply compile to a call to that
> specialized function (or inline if speed > space, say). This at least
> means that your
> ((or (search "html" s) (search "body" s) ...) ...)
> doesn't explode in size under ordinary compilation and is yet
> relatively efficient. As discussed above, though, we can't reasonably
> do this for all 400 combinations. So in addition we should provide a
> transform that knows how to expand SEARCH.
Yes, a transform that knows how to expand SEARCH is what I had in mind.
--
William Harold Newman <william.newman@...>
In examining the tasks of software development versus software maintenance,
most of the tasks are the same -- except for the additional maintenance
task of "understanding the existing product". -- Robert L. Glass, _Facts
and Fallacies of Software Engineering_
PGP key fingerprint 85 CE 1C BA 79 8D 51 8C B9 25 FB EE E0 C3 E5 7C

Christophe Rhodes <csr21@...> writes:
> (Aside: the UK SBCL AGM, otherwise known as "Dan and Christophe have
> lunch", will probably happen sometime later this month. If anyone's
> in London between Christmas and the New Year, or else in early
> January, and would like to have a drink, say so!)
Ooh, um, maybe -- I'm probably going to be at least near London for
New Year's Eve, and could probably hang around for a day or so either
side...
Cheers,
mwh
--
Not only does the English Language borrow words from other
languages, it sometimes chases them down dark alleys, hits
them over the head, and goes through their pockets. -- Eddy Peters