> The right place for this seems to be USE-STANDARD-RETURNS. The following
> patch seems to do the trick:
>
> Index: src/compiler/gtn.lisp
> ===================================================================
> RCS file: /cvsroot/sbcl/sbcl/src/compiler/gtn.lisp,v
> retrieving revision 1.17
> diff -u -r1.17 gtn.lisp
> --- src/compiler/gtn.lisp 14 Jul 2005 18:56:59 -0000 1.17
> +++ src/compiler/gtn.lisp 29 Oct 2005 08:30:20 -0000
> @@ -111,8 +111,7 @@
> (defun use-standard-returns (tails)
> (declare (type tail-set tails))
> (let ((funs (tail-set-funs tails)))
> - (or (and (find-if #'xep-p funs)
> - (find-if #'has-full-call-use funs))
> + (or (find-if #'has-full-call-use funs)
> (block punt
> (dolist (fun funs t)
> (dolist (ref (leaf-refs fun))
>
> I *think* XEP-P is there as an optimization, as it is much cheaper then
> HAS-FULL-CALL-USE.
No, it's not for compiler speed. That's the case that's semantically
required to avoid stack blowup. We must use unknown convention so that
the TR full call can be TR, since the function that we are calling may
re-call this function tail recursively. If an XEP is in the tail set
this means that a function in the tail set can be called via a full call
It sounds like you want refine the test for when using the unknown
convention is more efficient. I forget whether you want it to use the
unknown convention or not. I thought at first you wanted the known
convention, but now it seems the unknown. Of course, depending on the
context either may be more efficient. The current code uses known
return more or less whenever possible, which may be too agressive.
In the following case XEP-P isn't true for (FLET
> TRICK) however:
>
> (defun foo ())
>
> (defun test ()
> (flet ((trick ()
> (foo)))
> (if *print-level*
> (trick)
> (trick)))
> nil)
>
> I suppose the next question is whether the tail-set is complete or not:
> FOO does not appear in there, just (FLET TRICK).
Unless you block compile, which I believe has been removed in SBCL,
DEFUN FOO is compiled almost entirely independently from TEST (a
seperate component.) So FOO is not in any tail set related to the
compilation of TEST because it is not treated as a local function. I
believe you can force compilation in a single block by wrapping all the
forms in a LET.
>
> In any case SBCL with the above patch builds fine and passes all tests.
>
Yes, it should, unless there is a test for stack blowup in particular cases.
Rob

Greetings,
Test setup: SBCL 0.9.6.8 from CVS as of 20051029T1239Z, built with
:sb-futex, :sb-thread but without :sb-unicode, running on Debian
GNU/Linux x86, kernel 2.6.8 SMP, the machine is 2.8GHz P4 with
hyperthreading enabled. It may be possible that this cannot be
(as) easily reproduced on a slow machine.
To reproduce, save the attached source code and modify *filename*
to point to a text file whose size is 3185311 bytes (other
similarly large file should do, that's just what I used). The file
should be "random" data, just in case you get to witness
corruption, which in this case means that data from another part of
the file is "spliced" into another part from the receivers point of
view. I.e. you can't notice corruption if you test with a file
generated from /dev/zero or something.
Now compile and load the file in SBCL. Then launch three netcats
on the same machine as follows:
nc -l -p 40000 > data-40000.txt&
nc -l -p 40001 > data-40001.txt&
nc -l -p 40002 > data-40002.txt&
Now evaluate (test-client::test) in SBCL.
What you should see is that one or two of those netcats soon
successfully exit, after having written as much data as was in your
input text file. But one or two of them are suspended with a
message such as
"[3] +suspended (tty input) nc -l -p 40002 >| data-40002.txt"
I don't know why that happens, but if you simply bring it to the
foreground and examine the living threads in SBCL, you'll see that
the thread with port 40002 (in this example) hasn't died. If you
(sb-thread:release-foreground <the-thread>) it and break it, you
should see a backtrace similar to the following:
0: ((LAMBDA ()))
1: (SB-SYS:INVOKE-INTERRUPTION #<FUNCTION (LAMBDA #) {901202D}>)
2: ("foreign function: call_into_lisp")
3: ("foreign function: post_signal_tramp")
4: ("foreign function: __select")
5: (SB-SYS:SERVE-ALL-EVENTS NIL)
6: (SB-IMPL::FD-STREAM-MISC-ROUTINE
#<SB-SYS:FD-STREAM for "a constant string" {A936729}>
:FINISH-OUTPUT
NIL
#<unused argument>)
7: (FINISH-OUTPUT #<SB-SYS:FD-STREAM for "a constant string" {A936729}>)
8: ((FLET #:CLEANUP-FUN-96))
9: (TEST-CLIENT::CLIENT 40002)
10: ((LAMBDA ()))
11: ((LAMBDA ()))
12: ("foreign function: call_into_lisp")
13: ("foreign function: funcall0")
14: ("foreign function: new_thread_trampoline")
15: ("foreign function: #x40037B63")
That thread is simply stuck and probably remains so forever. This
is one of the problems, though maybe this is simply due to my
example code and/or incorrect use of netcat, because I haven't
witnessed similar in the actual application I have problems with.
I'm also seeing data corruption when sending data to a network but
I haven't yet been able to reproduce that with the attached example
although that's why I initially wrote it. The way my example
writes data mimics the bivalent streams code in acl-compat. And
the reason why I did that is that I see the corruption when using
portableaserve client code to do requests from multiple threads at
the same time. I suspect that the problem lies in SBCL, because if
I log the data just before the bivalent streams code
WRITE-SEQUENCEs it to a socket FD-STREAM, the log gets correct data
but with Ethereal I see that corrupted data goes to the network. I
tried to fiddle with FROB-OUTPUT in fd-stream.lisp and got
different results but couldn't manage to get rid of the problem.
Afterwards I noticed one possibly problematic case in
fd-stream.lisp. Apparently FD-STREAM buffers are allocated with
ALLOCATE-SYSTEM-MEMORY and according to Gábor Melis
WITH-PINNED-OBJECTS is not necessary for such objects. But it
seems to me that those buffers are bypassed when FROB-OUTPUT is
called from the last branch of COND in OUTPUT-RAW-BYTES.
On one test run I also got an "invalid number of arguments" error
in one of the threads for no apparent reason. Unfortunately I
haven't been able to reproduce that since and I didn't record the
backtrace etc.
I hope to come up with examples that can be used to reproduce all
of these problems reliably, but since that may take a lot of time,
I thought to send this partial info in the meantime.
--
Hannu

Hi,
attached is a patch to make the x86-64 disassembler understand the
SSE instructions.
The SSE instruction definitions were, in my opinion, written using
too much code duplication. Instead of making this worse when adding
the printer clauses I rewrote the definitions completely using
macrolets like the other ports do. SSE is encoded more regularly
than I would have believed before this exercise ;-), so despite the
added functionality the source code size remains nearly the same.
What's left to do: Write printers for LDMXCSR and STMXCSR.
Besides the obvious, namely that SSE instructions are disassembled,
and some added comments, the following things change, too:
- MOVSS and MOVSD used to always emit a REX prefix. They do this no
longer when it is unnecessary, that is, when no extended register
is involved. For example:
Currently:
F2480F1058F9 MOVSD XMM3, [RAX-7]
With the patch:
F20F1058F9 MOVSD XMM3, [RAX-7]
- The assembler does more error checking, for example the
float-to-integer conversion instructions check that the destination
is indeed a general-purpose register and not an XMM register and that
it has a suitable size.
- In MOVSS and MOVSD the assembler no longer uses TN-P to decide which
argument to encode as reg and which as reg/mem. Instead it checks
which of the arguments are XMM-register TNs. This has two effects:
* It is now additionally possible to use these instructions with
a stack TN as the source operand. (The current version throws
an error in this case, while already supporting the move in the
opposite direction.) That is:
(define-vop ...
(:temporary (:sc signed-stack) stack-temp)
(:temporary (:sc single-reg) register-temp)
(:generator ...
;; Already supported, used for example in SINGLE-FLOAT-BITS:
(inst movss stack-temp register-temp)
;; Additionally supported with the patch:
(inst movss register-temp stack-temp)))
* (Visible but of no import, mentioned only for completeness):
Register-register moves are encoded using the other one of the two
possible encodings:
Currently:
F30F11C2 MOVSS XMM2, XMM0
With the patch:
F30F10D0 MOVSS XMM2, XMM0
- I have changed the assembly of the four instructions CVTSD2SI,
CVTSS2SI, CVTTSD2SI and CVTTSS2SI (conversion of floats to integers)
to use the size of the integer argument as the operand size. (Their
current implementation ignores the size of the integer argument and
always treats it as qword, but the conversion instructions that work
in the opposite direction, CVTSI2SD and CVTSI2SS, already honor the
integer argument's size.)
I have checked that currently in all uses of the float-to-integer
conversion instructions the integer operand is qword-sized, so this
change does not affect existing code.
Some internal changes with no visible effect:
- Because the SSE instructions now more strictly determine which
operand size to use, MAYBE-EMIT-REX-PREFIX and MAYBE-EMIT-REX-FOR-EA
no longer need to support :FLOAT and :DOUBLE as the OPERAND-SIZE
argument. WIDTH-BITS does not, either.
- I optimised the tests for float registers in REG-TN-ENCODING and
MAYBE-EMIT-REX-PREFIX, so that TN-P is no longer doubly checked and
(in REG-TN-ENCODING) the name of the SB of the SC of the TN is not
calculated twice.
With kind regards
Lutz Euler

On Fri, 28 Oct 2005, Rob MacLachlan wrote:
>> compiler has access to more information. For some reason
>> FLUSH-FULL-CALL-TAIL-TRANSFER explicitly forbids tail-calls to
>> functions when the return convention isn't :UNKNOWN.
> recursive. If a function that is returning with the unknown convention calls
> a function with known returns then it must suppress tail recursion so that
> the can receive the values and massage them into the unknown convention.
Thank you for the explanation.
So the actual question is how to make sure to use the unknown convention
in the first place, in order to preserve the tail call and avoid massaging
values.
The right place for this seems to be USE-STANDARD-RETURNS. The following patch
seems to do the trick:
Index: src/compiler/gtn.lisp
===================================================================
RCS file: /cvsroot/sbcl/sbcl/src/compiler/gtn.lisp,v
retrieving revision 1.17
diff -u -r1.17 gtn.lisp
--- src/compiler/gtn.lisp 14 Jul 2005 18:56:59 -0000 1.17
+++ src/compiler/gtn.lisp 29 Oct 2005 08:30:20 -0000
@@ -111,8 +111,7 @@
(defun use-standard-returns (tails)
(declare (type tail-set tails))
(let ((funs (tail-set-funs tails)))
- (or (and (find-if #'xep-p funs)
- (find-if #'has-full-call-use funs))
+ (or (find-if #'has-full-call-use funs)
(block punt
(dolist (fun funs t)
(dolist (ref (leaf-refs fun))
I *think* XEP-P is there as an optimization, as it is much cheaper then
HAS-FULL-CALL-USE. In the following case XEP-P isn't true for (FLET
TRICK) however:
(defun foo ())
(defun test ()
(flet ((trick ()
(foo)))
(if *print-level*
(trick)
(trick)))
nil)
I suppose the next question is whether the tail-set is complete or not:
FOO does not appear in there, just (FLET TRICK).
In any case SBCL with the above patch builds fine and passes all tests.
Cheers,
-- Nikodemus Schemer: "Buddha is small, clean, and serious."
Lispnik: "Buddha is big, has hairy armpits, and laughs."

Hi. I'm trying to use Kevin Rosenberg's kmrcl (which is a basis for CL-
SQL,
among other things.)
I have compile SBCL 0.9.5 with threads on a Fedora Core 3 system.
I am trying to move the Elephant system to 0.9.5, along with other
things;
this is a major hangup. I am not terribly experienced with SBCL; I
could
be missing something really simple.
The following simple statement works on SBCL 0.8.18 on the same system,
but fails on 0.9.5:
SBCL 0.8.18
[read@... kmrcl-1.77]$ /usr/bin/sbcl
This is SBCL 0.8.18, an implementation of ANSI Common Lisp.
More information about SBCL is available at <http://www.sbcl.org/&gt;.
SBCL is free software, provided as is, with absolutely no warranty.
It is mostly in the public domain; some portions are provided under
BSD-style licenses. See the CREDITS and COPYING files in the
distribution for more information.
* (defstruct (spud (:include #+sbcl sb-impl::file-stream)))
SPUD
*
SBCL 0.9.5
[read@... kmrcl-1.77]$ lisp
This is SBCL 0.9.5, an implementation of ANSI Common Lisp.
More information about SBCL is available at <http://www.sbcl.org/&gt;.
SBCL is free software, provided as is, with absolutely no warranty.
It is mostly in the public domain; some portions are provided under
BSD-style licenses. See the CREDITS and COPYING files in the
distribution for more information.
* (defstruct (spud (:include #+sbcl sb-impl::file-stream)))
debugger invoked on a SIMPLE-ERROR in thread #<THREAD "initial
thread" {9003449}>:
Class is not a structure class: FILE-STREAM
Type HELP for debugger help, or (SB-EXT:QUIT) to exit from SBCL.
restarts (invokable by number or by possibly-abbreviated name):
0: [ABORT] Exit debugger, returning to top level.
(SB-KERNEL::COMPILER-LAYOUT-OR-LOSE FILE-STREAM)
0]

>
> Unless I'm grossly misunderstanding something, there's no inlining
> going on in your example regardless of which definition of FOO is
> used. The difference between the two cases is that one gets tail-call
> optimized, the other doesn't.
>
> Surprisingly the one that doesn't get optimized is the one where the
> compiler has access to more information. For some reason
> FLUSH-FULL-CALL-TAIL-TRANSFER explicitly forbids tail-calls to
> functions when the return convention isn't :UNKNOWN.
>
This is because in designing the tail-call support in Python I was
drawing on the Scheme implementation tradition where tail recursion is
not considered an optional optimization. All calls that can be tail
recursive must be tail recursive. If a function that is returning with
the unknown convention calls a function with known returns then it must
suppress tail recursion so that the can receive the values and massage
them into the unknown convention.
As I recall, SBCL has taken the position that since ANSI says nothing
about tail recursion then it is unimportant.
Rob

<mega@...> wrote:
> The five minute work of optimizing WITH-RECURSIVE-LOCK for the case when
> the current thread already has the lock quickly turned into confusion
> (reading annotation 3 is enough):
>
> http://paste.lisp.org/display/12880#3
>
> The executive summary is this: in the example, wrapping the same code in
> a function makes the program faster if the function is not inlined.
Unless I'm grossly misunderstanding something, there's no inlining
going on in your example regardless of which definition of FOO is
used. The difference between the two cases is that one gets tail-call
optimized, the other doesn't.
Surprisingly the one that doesn't get optimized is the one where the
compiler has access to more information. For some reason
FLUSH-FULL-CALL-TAIL-TRANSFER explicitly forbids tail-calls to
functions when the return convention isn't :UNKNOWN.
--
Juho Snellman

* G=E1bor Melis:
> The executive summary is this: in the example, wrapping the same code in=
=20
> a function makes the program faster if the function is not inlined.
Maybe this is a branch prediction issue?

Dear sbcl-devel,
A while back I suggested making it merely a warning, rather than an
error, for an sb-alien:enum to use a value more than once.
Thankfully, this patch was accepted. However, I don't think I went
quite far enough. I've finally gotten around to properly automating
the generation of the file containing the offending enum code and
working this into my asdf file so that this gets compiled
automatically. In this setting the warning is enough to drop me into
the debugger. Yes, I could catch the warning, but it would seem
better to just issue a style-warning, or perhaps no warning at all
for this instance. I know this is a rather trivial change, and my
apologies for essentially fixing the same thing twice, but I think
it's the right thing to. Thanks for considering this patch.
Cyrus
Index: sbcl/src/code/host-alieneval.lisp
===================================================================
RCS file: /cvsroot/sbcl/sbcl/src/code/host-alieneval.lisp,v
retrieving revision 1.38
diff -u -r1.38 host-alieneval.lisp
--- sbcl/src/code/host-alieneval.lisp 15 Oct 2005 13:32:32
-0000 1.38
+++ sbcl/src/code/host-alieneval.lisp 26 Oct 2005 22:14:06 -0000
@@ -656,7 +656,7 @@
(unless (and max (> max val)) (setq max val))
(unless (and min (< min val)) (setq min val))
(when (rassoc val from-alist)
- (warn "The element value ~S is used more than once." val))
+ (style-warn "The element value ~S is used more than once."
val))
(when (assoc sym from-alist :test #'eq)
(error "The enumeration element ~S is used more than
once." sym))
(push (cons sym val) from-alist)))

The five minute work of optimizing WITH-RECURSIVE-LOCK for the case when=20
the current thread already has the lock quickly turned into confusion=20
(reading annotation 3 is enough):
http://paste.lisp.org/display/12880#3
The executive summary is this: in the example, wrapping the same code in=20
a function makes the program faster if the function is not inlined.
G=E1bor

I've pretty much come to the conclusion that the way I'd like to handle
the current problem of resignaling PACKAGE-LOCK-VIOLATIONs from the
compiler is...
1) Don't Do It. Document the fact that compiler handles package lock
violations and code at fault signals a runtime PROGRAM-ERROR.
2) Provide a new operators
macro WITH-DISABLED-PACKAGE-LOCKS (&key symbols packages all) &body body
macro WITH-ENABLED-PACKAGE-LOCKS (&key symbols packages all) &body body
that are analogous to the DISABLE/ENABLE-PACKAGE-LOCKS declarations,
except that they operate in the dynamic scope, and can control package
locks on symbol, package, and global granularities. They also interact
with the declarations in a documented manner.
3) New declarations
DISABLED-PACKAGE-LOCKS &key symbols packages all
ENABLED-PACKAGE-LOCKS &key symbols packages all
that are just like the old DISABLE/ENABLE-PACKAGE-LOCKS declarations,
except that they can also manage other then symbol granularities.
4) Deprecate WITHOUT-PACKAGE-LOCKS, WITH-UNLOCKED-PACKAGES,
ENABLE-PACKAGE-LOCKS, and DISABLE-PACKAGE-LOCKS, retairing them
completely after couple of releases.
How does this sound?
Cheers,
-- Nikodemus Schemer: "Buddha is small, clean, and serious."
Lispnik: "Buddha is big, has hairy armpits, and laughs."

On Tuesday 25 October 2005 19:25, Nathan Froyd wrote:
> > You mean all destructive operations? That's a fine approach to
> > take, but the consequence thread unsafety should not be too
> > severe and gc lossage/deadlock is unacceptable.
>
> My approach has been similar to Svein's: if you are dealing with
> shared state, then you need a lock to protect writing *and* reading.=20
> Just assuming that locking is necessary only for writing is not
> sufficient.
Yes, you're right. It was sloppy wording on my part. Anyway, I'm seeking=20
agreement on the gc lossage/deadlock issue. It's unacceptable because=20
it takes ages to find the random memory corruption an unsafe hash table=20
can cause. It certainly took me almost two weeks to find it.
Later today (hopefully) I'll post a patch that deals with these issues=20
while trying to keep the performance impact to a minimum.
G=E1bor

On Tue, Oct 25, 2005 at 03:02:29PM +0200, G=E1bor Melis wrote:
> > Users should indeed=20
> > be capable of grabbing their own locks where required; just make sure
> > to specify which operations, if any, are atomic or otherwise
> > thread-safe.
> >
> > For the record: My current assumption is that all operations are
> > thread-unsafe. It has served me well.
>=20
> You mean all destructive operations? That's a fine approach to take, bu=
t=20
> the consequence thread unsafety should not be too severe and gc=20
> lossage/deadlock is unacceptable.
My approach has been similar to Svein's: if you are dealing with shared
state, then you need a lock to protect writing *and* reading. Just
assuming that locking is necessary only for writing is not sufficient.
Caveat hash table user.
--=20
Nathan | From Man's effeminate slackness it begins. --Paradise Lost
The last good thing written in C was Franz Schubert's Symphony Number 9.
--Erwin Dieterich

asdf-install unconditionally sends a GET request that begins like
this:
GET http://some.site.example.com/foo/bar.tar.gz HTTP/1.0
The "http://..."; bit is "absoluteURI form".
However, RFC 2616 in section 5.1.2 says:
The absoluteURI form is REQUIRED when the request is being made to
a proxy. The proxy is requested to forward the request or service
it from a valid cache, and return the response. Note that the proxy
MAY forward the request on to another proxy or directly to the
server
[...]
The most common form of Request-URI is that used to identify a
resource on an origin server or gateway. In this case the absolute
path of the URI MUST be transmitted (see section 3.2.1, abs_path)
as the Request-URI [...]
While the RFC also says that HTTP/1.1 servers must accept the
absoluteURI form even if they're not acting as a proxy, in practice
there is at least one server that rejects the absoluteURI request from
asdf-install. (Try doing "(asdf-install:install :acl-compat)").
Attached is a patch that changes asdf-install to use the absoluteURI
form when talking to a proxy, but the absolute path of the URI
otherwise.
Zach

On Tuesday 25 October 2005 12:54, Svein Ove Aas wrote:
> > Thread safe by default hash tables should be expensive. The
> > attached test shows that simply adding locking makes (setf gethash)
> > ~1.5 times slower.
>
> How was this measured? Mutex locks are likely to be considerably more
> expensive on SMP systems, which are becoming more common these days.
The attached test program was run on a single cpu system.
> > However, using a recursive lock that's acquired for the entire
> > test and for each operation the speed penalty is negligible. Of
> > course it puts the burden of clawing back performance on the user
> > like this:
> >
> > (sb-ext:with-locked-hash-table (hash-table)
> > (frob-madly hash-table))
> >
> > or maybe
> >
> > (setf (sb-ext:hash-table-owner hash-table) *current-thread*)
>
> How about C?
>
> (sb-ext:with-locked-objects (hash-table other-thingy)
> (frob-madly hash-table)
> (touch-weirdly other-thingy))
To me this looks equivalent to:
(sb-ext:with-locked-hash-table (hash-table)
(with-mutex (other-lock)
(frob-madly hash-table)
(touch-weirldy other-thingy)))
Associating a lock with each object a'la java as your example suggests=20
is an idea I don't like: holding a reference to an object and having=20
the ability to lock it should be different things.
>
> > I prefer A) in the long run, while B) has the endearing quality of
> > being > easily implementable.
>
> I prefer C, since it has the appeal of generality, but I'd go for B.
> A is right out - increasing hash-table costs by 50% would have people
> writing their own tables just to avoid locking.
I think you have A) and B) mixed up.
> Users should indeed=20
> be capable of grabbing their own locks where required; just make sure
> to specify which operations, if any, are atomic or otherwise
> thread-safe.
>
> For the record: My current assumption is that all operations are
> thread-unsafe. It has served me well.
You mean all destructive operations? That's a fine approach to take, but=20
the consequence thread unsafety should not be too severe and gc=20
lossage/deadlock is unacceptable.
G=C3=A1bor

I don't have a nice reproducible test to demonstrate that the patch
is required, but I can say that empirical testing involving dropping
into gdb at opportune times suggest that without the patch the stack
is indeed misaligned in callbacks and C code called from callbacks.
Clearly, some solid evidence would be better than this hearsay, but
that's all I've got at the moment.
Cyrus
On Oct 25, 2005, at 1:34 AM, Christophe Rhodes wrote:
> Nikodemus Siivola <tsiivola@...> writes:
>
>
>> Thanks. I'm still slightly confused as to why this matters: after the
>> callback (modulo funcall3) we're in Lisp, which doesn't require
>> the 16
>> byte alignment.
>>
>> Does the misaligned callback cause further calls out to C to be also
>> misaligned, or where's the rub?
>>
>
> I think it's possible -- I'd certainly like to see a test case for
> lisp -> C -> lisp -> C, preferably in a way which would excite the
> 16-byte alignment (so maybe the second C call could be the equivalent
> of "return *reg_NFP;", if we can't think of another way).
>
> In any case, given that it's currently relatively straightforward, if
> pathological, to call out to C at more-or-less any time[*], I think it
> would be worth keeping the C stack 16-bit aligned at all times; I
> don't know if Cyrus' patch is required for that or not, but it
> plausibly is.
>
> Cheers,
>
> Christophe
>

Nikodemus Siivola <tsiivola@...> writes:
> Thanks. I'm still slightly confused as to why this matters: after the
> callback (modulo funcall3) we're in Lisp, which doesn't require the 16
> byte alignment.
>
> Does the misaligned callback cause further calls out to C to be also
> misaligned, or where's the rub?
I think it's possible -- I'd certainly like to see a test case for
lisp -> C -> lisp -> C, preferably in a way which would excite the
16-byte alignment (so maybe the second C call could be the equivalent
of "return *reg_NFP;", if we can't think of another way).
In any case, given that it's currently relatively straightforward, if
pathological, to call out to C at more-or-less any time[*], I think it
would be worth keeping the C stack 16-bit aligned at all times; I
don't know if Cyrus' patch is required for that or not, but it
plausibly is.
Cheers,
Christophe

Fine by me. I was just aping what rtoy had done for cmucl, minus the
new stack frame and all.
Cyrus
On Oct 25, 2005, at 1:15 AM, Christophe Rhodes wrote:
> Cyrus Harmon <ch-sbcl@...> writes:
>
>
>> Great! So making sure that the callback stack is aligned on a 16-byte
>> boundary should be sufficient. Here's a patch to do just that:
>>
>
>
>> + (round-up-16 (n) (* 16 (ceiling n 16))))
>>
>
> I think the 'standard' sbcl idiom for doing this is
> (logandc2 (+ n 15) 15)
> which may or may not be microefficient, but in any case is vaguely
> consistently used over the codebase.
>
> Cheers,
>
> Christophe
>

Well, the last time I tried to fix the alignment when calling out to
C, I was told to find the underlying problem and fix it. In that same
vein, I'm trying to keep the number stack 16-byte aligned for
callbacks as well. Yes, the only (known) danger is on further calls
out to C, but my patch to fix the alignment down there was never
incorporated. I'm not sure if it matters while we're strictly in
lisp, but Christophe seemed to indicate a preference for making sure
the stack stays aligned in the first place.
Thanks again,
Cyrus
On Oct 25, 2005, at 1:13 AM, Nikodemus Siivola wrote:
> On Tue, 25 Oct 2005, Cyrus Harmon wrote:
>
>
>> Great! So making sure that the callback stack is aligned on a 16-
>> byte boundary should be sufficient. Here's a patch to do just that:
>>
>
> Thanks. I'm still slightly confused as to why this matters: after
> the callback (modulo funcall3) we're in Lisp, which doesn't require
> the 16 byte alignment.
>
> Does the misaligned callback cause further calls out to C to be
> also misaligned, or where's the rub?
>
> Cheers,
>
> -- Nikodemus Schemer: "Buddha is small, clean, and
> serious."
> Lispnik: "Buddha is big, has hairy armpits, and
> laughs."
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by the JBoss Inc.
> Get Certified Today * Register for a JBoss Training Course
> Free Certification Exam for All Training Attendees Through End of 2005
> Visit http://www.jboss.com/services/certification for more information
> _______________________________________________
> Sbcl-devel mailing list
> Sbcl-devel@...
> https://lists.sourceforge.net/lists/listinfo/sbcl-devel
>

Cyrus Harmon <ch-sbcl@...> writes:
> Great! So making sure that the callback stack is aligned on a 16-byte
> boundary should be sufficient. Here's a patch to do just that:
> + (round-up-16 (n) (* 16 (ceiling n 16))))
I think the 'standard' sbcl idiom for doing this is
(logandc2 (+ n 15) 15)
which may or may not be microefficient, but in any case is vaguely
consistently used over the codebase.
Cheers,
Christophe