SB-UNIX is an internal implementation package. If you use
functionality provided by it, sooner or later your code will break,
because SBCL changed its internals: It is subject to change without
notice. When we stop using a function that used to live there, that
function gets deleted. Things may also change names and
interfaces.

CL-USER> (documentation (find-package :sb-unix) t)
"private: a wrapper layer for SBCL itself to use when talking
with an underlying Unix-y operating system.
This was a public package in CMU CL, but that was different.
CMU CL's UNIX package tried to provide a comprehensive,
stable Unix interface suitable for the end user.
This package only tries to implement what happens to be
needed by the current implementation of SBCL, and makes
no guarantees of interface stability."

Instead, use either SB-POSIX (which is the supported
external API), or call the foreign functions directly. Alternatively,
if you're using something from SB-UNIX that doesn't have a
counterpart in SB-POSIX or elsewhere, put a feature request / bug
report on Launchpad
explaining what you need. Just saying "wanted: a supported equivalent
of SB-UNIX:UNIX-FOO" is enough.

(The same holds more or less for all internal packages, of course,
but SB-UNIX is the most common offender.)

I realize this is an imperfect world, and sometimes using an
unsupported API is the best thing you can do, but please try to
avoid this especially in libraries used by other people as well.

Logically speaking, POSITION with trivial :KEY
and :TEST arguments should be much faster on bit-vectors than
on simple vectors: the system should be able to pull one words worth
of bits out of the vector at a single go, check if any are set (or
unset), and if so locate the one we're interested in -- else going on
to grab the next word.

Practically speaking, no-one who needed fast POSITION on
bit-vectors seems to have cared enough to implement it, and so until
yesterday (1.0.54.101) SBCL painstakingly pulled things one bit at a
time from the vector, creating a lot of unnecessary memory traffic and
branches.

How much of a difference does this make? I think the technical term
is "quite a bit of a difference." See here
for the benchmark results. First chart is from the new implementation,
second from the new one. Other calls to POSITION are included
for comparison: ones prefixed with generic- all go through
the full generic POSITION, while the others know the type of
the sequence at the call-site, and are able to sidestep a few
things.

So, if you at some point considered using bit-vectors, but decided
against them because POSITION wasn't up to snuff, now might
be a good time to revisit that decision.

Gory details at the end of src/code/bit-bash.lisp, full
story (including how the system dispatches to the specialized version)
best read from git.

Also, if you're looking for an SBCL project for next year, consider
the following:

Using a similar strategy for POSITION on base-strings:
on a 64-bit system one memory read will net you 8 base-chars.

Using similar strategy for POSITION on all vectors
with element-type width of half-word or less.

Improving the performance of the generic POSITION for
other cases, using eg. specialized out-of-line versions.

SBCL 1.0.54 is
barely out of the door, but I'm actually going to mention something
that went in the repository today, and will be in the next
release:

(TL;DR: Threads on Darwin are looking pretty solid right now. Go
give them a shake and let me know what falls out.)

commit 8340bf74c31b29e9552ef8f705b6e1298547c6ab
Author: Nikodemus Siivola
Date: Fri Nov 18 22:37:22 2011 +0200
semaphores in the runtime
Trivial refactorings:
* Rename STATE_SUSPENDED STATE_STOPPED for elegance. (Spells with
the same number of letters as STATE_RUNNING, things line up
nicer.)
* Re-express make_fixnum in terms of MAKE_FIXNUM so that we can
use the latter to define STATE_* names in a manner acceptable to
use in switch-statements.
* Move Mach exception handling initialization to darwin_init from
create_initial_thread so that current_mach_task gets initialized
before the first thread struct is initialized.
The Beef:
Replace condition variables in the runtime with semaphores.
On most platforms use sem_t, but on Darwin use semaphore_t. Hide
the difference behind, os_sem_t, os_sem_init, os_sem_destroy,
os_sem_post, and os_sem_wait.
POSIX realtime semaphores are supposedly safe to use in signal
handlers, unlike condition variables -- and experimentally at
least Mach semaphores on Darwin are a lot less prone to
problems.
(Our pthread mutex usage isn't quite kosher either, but it's the
pthread_cond_wait and pthread_cond_broadcast pair that seemed to
be causing most of the trouble.)

(There are some other neat things lurking in HEAD in addition to this, but
I'll let you discover them for yourself.)

Features of Common Lisp Abhishek Reddy used to have a page
up on the topic, based on Robert Strandh's list. It's been down for a
while now, so I rescued a copy from the Wayback Machine and put it
up. So: Features
of Common Lisp.

Reporting Bugs, Howto. I think it is actually a good thing
that I need to say this, because it tends to be a sign of new people
in the community, but if you've never read it, go now and
read Simon Tatham's How to
Report Bugs Effectively.

TL;DR. *sigh* Short version: provide directions to
reproduce such that your idiot cousing could follow them while drunk.
Don't be afraid of giving too much details. Don't speculate on causes.

Specific hints.

Use (lisp-implementation-version) to check the
version of the Lisp you're actually running.

Use "uname -a" to get information about the OS and architecture
you're running on.

When providing information, copy-paste as much as possible
directly from the terminal or Emacs.

SBCL 1.0.54 due in a few days. This means we're in our
monthly freeze, and testing is much appreciated. This month's release
contains a lot of changes -- including plenty of threading
work.

Microbench. A while ago I mentioned a microbenchmarking suite
I'd been working on on-again, off-again. It's still not much to look
at, and comes with zero documentation -- but curious souls can now get it from Github
It should work on SBCL, CMUCL, CCL, and Lispworks. Clisp and
ACL not tested yet, but a port should be fairly trivial.

What Microbench Is Good For, and Why You Should Not Trust
Benchmarks at All. Look
here. Pay special attention to double-sans-result+ and
double-unsafe-sans-result+. When I published some of the
results earlier, I was a bit surprised that they didn't perform much
better then double+. Then I ran the same benchmarks on CCL
and saw it running those two benchmarks 5 times faster!

With a bit of help from Paul Khuong the difference turned out to be
SBCL's loading of floating-point constants, which is heavily optimized
for inline-use. I have a pending patch that makes this smarter, whose
effect you can at link see above.

The moral of "be sure what you're /really/ benchmarking" is an old
one, but bears repeating. What makes microbenchmarks attractive to me,
however -- despite their many shortcomings -- is that when something
turns out slow (in comparison to another implementation, a previous
version of SBCL, or another comparable benchmark operation) is tends
to be easier to figure out the cause than with a macrobenchmark.

You probably also noticed that CCL seesm to do really badly
at inline floating point arithmetic if my benchmarks are to be trusted.
They're not. I'm 99% sure this is a case of the something specific in
the way those benchmarks are implemented heavily pessimizing them for
CCL.

I've been working on and off on a new microbenchmark tool --
primarily for SBCL, but usable for other implementations as well.
Last night I finally got around to teaching it how to generate
pretty pictures, using Google Visualization API, and wrote a number
of microbenchmarks that show the variety of numeric performance
in SBCL.

Each benchmark does a roughly comparable task: adds two
numbers. What varies is what the types of the numbers are, and how
much the compiler knows about the situation. (In some benchmarks there
may be an extra comparison or so per iteration to keep the compiler
from getting and flushing out the code as effectless.) There are
basically four classes of performance:

Superb: Modular inline integer arithmetic. This is
performance essentially identical with what you'd expect from C
or ASM.

Good: Compiler knows the types, the argument types are
inline-arithmetic-friendly, the result type is not in doubt
(addition of two fixnums can be a bignum), and the function doing
the addition is inlined at the site where the results are unboxed
and the result is used.

Decent: Compiler knows the types, the types are
inline-arithmetic-friendly and have an immediate representation,
but the function doing the addition is out of line.

Bad Generic arithmetic on anything else but fixnums small
enough for the result to be a fixnum is just not that great.

What should be of interest to anyone optimizing floating point
performance is that type-checking doesn't really cost anything
measurable most of the time. All of those benchmarks do full type
typechecks except for double-unsafe-sans-result+, and the
gain over the safe variant is minuscule.

What matters is that you generate inline
arithmetic so that your floating points don't get boxed. On x86-64
SBCL has immediate single-floats, so occastional boxing isn't quite as
disastrous (compare single+ and double+), but
getting rid of the boxed representations completely is a huge win --
just compare single+ to complex-double-inline+.

Postscript: I know not everyone reading this will be clear on
unboxed, boxed, immediate, non-immediate, etc. My apologies. I will
try to remedy the situation and write about the different
representations and how and why they matter at a later date.

Post-Postscript: I will be publishing the benchmark tool once it
settles down, and once I have a chance to test-drive it with
something besides SBCL. Could be a while, though. If you urgently
need it, get in tough and we'll arrange something.

Yesterday I said SBCL now had extensible CAS. Was sind
paranormale Tonbandstimmen, what is CAS, and why should you care
if it's extensible? Turn off the music and I'll tell you.

CAS is short for compare-and-swap. Compare-and-swap is a fairly
common atomic operation. It compares the current value of a memory
location with another value, and if they are the same it replaces the
value of that memory location with a specified new value.

Depending on the language and the exact design of the interface, it
might just return true for success, or it might return the old
value. In SBCL it does the latter, which is sometimes very convenient,
but also means you need to compare the return value to the
expected-old-value you specified to know if CAS succeeded.

Because it is atomic, if you have two threads doing CAS on the
same memory location in parallel, only one can succeed:

If you have the least bit of threads on your mind, you
can imagine how this can be quite useful.

Out of the box current bleeding edge SBCL supports CAS on a number
places: car, cdr, first, rest,
svref, slot-value,
standard-instance-access,
funcallable-standard-instance-access, symbol-value,
symbol-plist, and defstruct-defined slot accessors
with slot types fixnum and t. (Note:
slot-value is not currently supported by CAS if
slot-value-using-class or friends are involved -- that's
still in the works.)

With the exception of slot-value all of those
pretty much come down to a single LOCK:CMPXCGH instruction
on Intel architectures.

...but what it you have a data structure -- say a queue of some
sort -- and want to implement cas-queue-head which does CAS
on the first element of the queue. Fine. You can do that without
any CAS support from the implementation by using eg. a lock.

...but what if you want to write a macro that operates on a CAS-able
place?

Where instead of I increasing by 1 each time through the loop and
iterating across the whole vector, it could increase I by
who-knows-how-many on a single attempt skipping entries and even
running out of bounds. Ouch.

Turns out that to write a macro that operates on a CASable place
you need something analogous to get-setf-expansion, except
for CAS instead of SETF. As of yesterday, SBCL has
sb-ext:get-cas-expansion that you can use to write a macro
like my-atomic-incf correctly and safely.

to make their CASable place a first-class citizens on equal footing
with the baked-in ones -- so that

(my-atomic-incf (queue-head queue))

will Just Work. (Assuming your cas-queue-head works, of
course.)

I think that's pretty nifty. I'm still looking at adding support
for (cas slot-value-using-class), which will be even niftier.
Who says there's no innovation in open source? (Maybe I'm feeling a
bit hubristic right now. I'll come down soon enough when the first
bug-reports hit the fan.)

Killing lutexes, replacing them with userspace synchronization.
This main affects non-Linux platforms. Performance implications
are a mixed bag, depending on what you're doing. Some things
perform an order of magnitude better, some things about the same,
some things somewhat worse. The stability improvements on Darwin
are well worth the costs, though and I will try to address the
cases where performance suffers in due course.