Re: [Sbcl-devel] Automating widetag dispatch

On Fri, May 25, 2012 at 1:18 AM, Nikodemus Siivola
<nikodemus@...> wrote:
> +;;; Store some saetp fields for DEFINE-ARRAY-DISPATCH since
> +;;; sb!vm:*specialized-array-element-type-properties* is not always
> +;;; available.
>
> This sounds odd. Do you mean it's not there during some stage of the build?
Yes, it's not there during cross-compilation. In DEFINE-ARRAY-DISPATCH
if you replace
for (typecode specifier primitive-type-name) in %%saetp-info%%
with
for saetp across sb!vm:*specialized-array-element-type-properties*
for typecode = (sb!vm:saetp-typecode saetp)
for specifier = (sb!vm:saetp-specifier saetp)
for primitive-type-name = (sb!vm:saetp-primitive-type-name saetp)
then make.sh will fail at
; x-compiling (DEFMACRO DEFINE-ARRAY-DISPATCH ...)
with
; Undefined variable:
; SB!VM:*SPECIALIZED-ARRAY-ELEMENT-TYPE-PROPERTIES*
> Almost certainly, since it means less open coding in user-code: the
> MAP-INTO transform can then be made conditional on eg. SPEED=3, or
> INLINE declaration.
Well remember the dispatch is only being done on the INTO array. This
is like a partial open-coding, but it's still very far behind the real
open-coded MAP-INTO. When all arrays are declared (the open-coder
appears to require that), the open-coded MAP-INTO gives a 10x speed
increase. (Code attached.)
There's no obstacle to writing another macro on top of
DEFINE-ARRAY-DISPATCH which dispatches on two arrays, generating
(expt (length sb-vm:*specialized-array-element-type-properties*) 2)
number of functions. The only question is where to draw the line on
the speed/space trade-off.

I would like to optimize some (apply #'map-into ..) calls where the
sequence(s) passed are not created by me. That is, optimize the
non-open-coded MAP-INTO.
I gathered the pieces of the widetag dispatching code for
VECTOR-SUBSEQ* and wrote a general macro for creating dispatch tables.
Using the first parameter as the specialized array,
DEFINE-ARRAY-DISPATCH defines each specializing function with a
corresponding type declaration inside. VECTOR-SUBSEQ* may now be
written as:
(define-array-dispatch vector-subseq-dispatch (array start end)
(declare (optimize speed (safety 0)))
(declare (type index start end))
(subseq array start end))
(defun vector-subseq* (sequence start end)
(declare (type vector sequence))
(declare (type index start)
(type (or null index) end)
(optimize speed))
(with-array-data ((data sequence)
(start start)
(end end)
:check-fill-pointer t
:force-inline t)
(vector-subseq-dispatch data start end)))
To slightly complicate matters, the current MAP-INTO is kludgy. With
that kludginess fixed[1], a basic MAP-INTO benchmark drops from 385ms
to 227ms. When we add widetag dispatching[2][3] on top of that fix, it
goes from 227ms to 110ms.
However it's not clear whether it is appropriate to do this
optimization inside SBCL since a time/space trade-off is involved. Is
an 86K core size increase (uncompressed) worth it? This is only for
MAP-INTO.
Of course, if a user wants performance then he should be using
declarations with the open-coded MAP-INTO. But there are still cases
where declarations can't easily be made, my own situation being one of
them.
It would be nice to have something like
(defmacro with-declared-array-type (array &body body)
(check-type array symbol)
`(typecase ,array
,@(loop
:for saetp
:across sb-vm:*specialized-array-element-type-properties*
:collect `((simple-array ,(sb-vm:saetp-specifier saetp))
,@body))
(otherwise
,@body)))
available in userland. This would make it easier to do such
optimizations without depending upon internals. Ideally it would have
constant lookup, although the linear search with TYPEP is quick enough
for most purposes.
(Out of curiosity I implemented WITH-DECLARED-ARRAY-TYPE using a
stack-allocated vector of FLET functions, but this was slower than
TYPECASE even for the worst-case TYPEP search. A fast
WITH-DECLARED-ARRAY-TYPE seems possible in principle, though it would
presumably require new special operator(s) and/or magic.)
Getting back to SBCL innards, considering that SUBSEQ and FILL already
use widetag dispatch there may be another place which would benefit
from it. If so then [2] will help. If not then it may be needless
abstraction.
[1] 0001-fix-MAP-INTO-performance.patch -- same as bug #1001043
[2] 0002-automate-widetag-dispatching.patch
[3] 0003-widetag-dispatch-for-MAP-INTO.patch -- needs [1]

On 24 May 2012 12:10, James M. Lawrence <llmjjmll@...> wrote:
> (define-array-dispatch vector-subseq-dispatch (array start end)
+;;; Store some saetp fields for DEFINE-ARRAY-DISPATCH since
+;;; sb!vm:*specialized-array-element-type-properties* is not always
+;;; available.
This sounds odd. Do you mean it's not there during some stage of the build?
Otherwise the code is lovely.
> However it's not clear whether it is appropriate to do this
> optimization inside SBCL since a time/space trade-off is involved. Is
> an 86K core size increase (uncompressed) worth it? This is only for
> MAP-INTO.
Almost certainly, since it means less open coding in user-code:
the MAP-INTO transform can then be made conditional on eg. SPEED=3, or
INLINE declaration.
> It would be nice to have something like
>
> (defmacro with-declared-array-type (array &body body)
> available in userland. This would make it easier to do such
> optimizations without depending upon internals. Ideally it would have
> constant lookup, although the linear search with TYPEP is quick enough
> for most purposes.
> (Out of curiosity I implemented WITH-DECLARED-ARRAY-TYPE using a
> stack-allocated vector of FLET functions, but this was slower than
> TYPECASE even for the worst-case TYPEP search.
Putting functions in the vector means you get full call overhead, and
disables local call analysis -- so it's not entirely surprising, even
if the dispatch worked fast.
> WITH-DECLARED-ARRAY-TYPE seems possible in principle, though it would
> presumably require new special operator(s) and/or magic.)
We /could/ make big typecases with array-types use binary search
on the widetag...
> Getting back to SBCL innards, considering that SUBSEQ and FILL already
> use widetag dispatch there may be another place which would benefit
> from it. If so then [2] will help. If not then it may be needless
> abstraction.
I think most sequence functions would benefit from it. Aside from SUBSEQ
and FILL we're currently relying excessively on inlining to make them
fast -- fast out-of-line implementations would be better, IMO.
Unless someone gets there first, I plan on merging these over the weekend.
Cheers,
-- Nikodemus

On Fri, May 25, 2012 at 1:18 AM, Nikodemus Siivola
<nikodemus@...> wrote:
> +;;; Store some saetp fields for DEFINE-ARRAY-DISPATCH since
> +;;; sb!vm:*specialized-array-element-type-properties* is not always
> +;;; available.
>
> This sounds odd. Do you mean it's not there during some stage of the build?
Yes, it's not there during cross-compilation. In DEFINE-ARRAY-DISPATCH
if you replace
for (typecode specifier primitive-type-name) in %%saetp-info%%
with
for saetp across sb!vm:*specialized-array-element-type-properties*
for typecode = (sb!vm:saetp-typecode saetp)
for specifier = (sb!vm:saetp-specifier saetp)
for primitive-type-name = (sb!vm:saetp-primitive-type-name saetp)
then make.sh will fail at
; x-compiling (DEFMACRO DEFINE-ARRAY-DISPATCH ...)
with
; Undefined variable:
; SB!VM:*SPECIALIZED-ARRAY-ELEMENT-TYPE-PROPERTIES*
> Almost certainly, since it means less open coding in user-code: the
> MAP-INTO transform can then be made conditional on eg. SPEED=3, or
> INLINE declaration.
Well remember the dispatch is only being done on the INTO array. This
is like a partial open-coding, but it's still very far behind the real
open-coded MAP-INTO. When all arrays are declared (the open-coder
appears to require that), the open-coded MAP-INTO gives a 10x speed
increase. (Code attached.)
There's no obstacle to writing another macro on top of
DEFINE-ARRAY-DISPATCH which dispatches on two arrays, generating
(expt (length sb-vm:*specialized-array-element-type-properties*) 2)
number of functions. The only question is where to draw the line on
the speed/space trade-off.

On 25 May 2012 12:39, James M. Lawrence <llmjjmll@...> wrote:
I've merged these patches, with one cleanup and one bugfix on top of
them. Many thanks!
It would be good if you could double-check that my bugfix didn't cause
performance regressions for you: in my tests it actually showed up as
a trivial speedup, but you know what they say about lies and
benchmarks...
Cheers,
-- nikodemus

On Sun, May 27, 2012 at 9:00 AM, Nikodemus Siivola
<nikodemus@...> wrote:
> On 25 May 2012 12:39, James M. Lawrence <llmjjmll@...> wrote:
>
> I've merged these patches, with one cleanup and one bugfix on top of
> them. Many thanks!
>
> It would be good if you could double-check that my bugfix didn't cause
> performance regressions for you: in my tests it actually showed up as
> a trivial speedup, but you know what they say about lies and
> benchmarks...
I'm amazed that I tested MAP-INTO with 2 input sequences and with 0
input sequences -- each with vectors and lists -- but not with 1 input
sequence. Thanks for catching that.
Regarding the special case of 1 vector, my benchmark results are
195 ms -- no widetag dispatch / no special case
165 ms -- with widetag dispatch / no special case (current build a5f57fb)
120 ms -- no widetag dispatch / with special case
70 ms -- with widetag dispatch / with special case
This for fixnum arrays on 32-bit SBCL. The four variants are
attached.
For floating-point types, the only one that really benefits from
widetag dispatching is single-float on 64-bit SBCL, since that type is
immediate, right? Everything else is boxed because the out-of-line
MAP-INTO can only call the out-of-line version of the function passed.
I feel a bit guilty for increasing the image size, but maybe it
doesn't matter.

Community

Help

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

CountryState

JavaScript is required for this form.

I agree to receive quotes, newsletters and other information from sourceforge.net and its partners regarding IT services and products. I understand that I can withdraw my consent at any time. Please refer to our Privacy Policy or Contact Us for more details