5-10. But the ram is all there at this point, and can be transparently

5-10. But the ram is all there at this point, and can be transparently

accessed by all CPUs.

accessed by all CPUs.

+

+

Other than that, we could unify those versions (of printk) by just defining an empty

+

(for now) version of the spinlock functions in raminit stage. Then think

+

about where we can place our locking for those platforms that need it

+

this early...?

+

+

it's more than spinlock.

+

you must also fix the use of the console_drivers struct. There are

+

several things you need to get right to make this work.

Revision as of 21:44, 20 June 2009

After CAR the APs should be stopped until CPU init time, which is relatively short.

v2 has / used to have working locking code since it was first ported to
opteron. It may be that it broke while adding 5 more printks but it is
there somewhere.

Making the BSP poll for the APs (which is what we would do if we need to
check the APs shared memory) basically renders the BSP unusable to do
stuff while waiting for the APs.

With simple locking, everything can run in parallel, and only serial
output needs to get synced. Which is what we actually want.

There's no real problem, we've just
been doing too much cut and paste in the past without testing the new
code. This made us end up with different versions of printk, some with
locking, some without.

And, no, porting the code from v3 over is not an option at this point.
It does too much different stuff. Let's rather start dropping unneeded
implementations in v2 until things look sane again and then we can
decide what implementation we want.

So each AP has some part of RAM to copy the buffer to?

The way SMP works, the BSP sets up its ram. At that point, the APs can
use the BSP ram. That's why APs have a stack in the first place.

APs have a working stack when they are setting up their own RAM.

The way this works on amd64 is that the AP comes up, goes to cache as
ram, finds it is an AP and goes to sleep again.
Then it wakes up again in stage2 when the BSP sends an IPI. At this
point (at least remote) RAM is available. They never set up their own
ram (in terms of Jedec init, or setting up a ram controller), but only
have to clear it, in case of ECC memory.

the pre-ram locking can't be done with a stack, because the cache between CPUs is not always necessarily in the same state.

well, that may be true on intel stuff. The AMD startup (at least as I
understand it) depends on the BSP memory
being functional enough to provide the APs with a stack.

the post-ram code does not need it, works quite nicely already.

actually, this is only partially true. It is still possible for a
malfunctioning AP to lock the BSP out. It's just not something we've
seen much of.

The point (of the v3 early init code and stack structure) was to stay as much the same, but also
allow the BSP to better monitor (and control)
what was going on. I would still claim the structure of the code is a
big improvement. The v2 SMP startup is not an easy read.

I was wondering how you make the APs not conflict in the part of RAM they
copy their buffers to. I was also wondering how it would affect
interleaving, etc. That kind of thing seems difficult to debug, and is the
reason I'd want to see the APs messages.

You give each AP a seperate stack. All that code is in there already.

The struct-based stack is a direct copy of the v2 startup. Rather than
using lots of fiddly offsets onto a memory area, it provides a struct
which contains variables and a stack. The variables are shared between
the AP and the BSP. It is a more C-like way to do it.
Once I did it I realized that the on-stack variables could be used as
a communications path from the AP to the BSP, most important one being
a POST variable that could be set by the AP
and monitored by the BSP. This is a bit better than what we have in
v2, where we get one bit back from the AP which tells us "done" or
"not done". No real progress indication
is available. At some point, the BSP times out the AP, but there is no
error code. Plus, the way in which the shared variable is set up in v2
is not very straightforward.

Many systems only have
one memory controller. But on all coreboot systems, even those with
multiple memory controllers, the controllers are all set up by the BSP.
Parallelizing here makes only very little sense.
The reason we're parallelizing is you have to clear memory if you have
ECC on some (older?) systems where the memory controller is incapable of
doing that automatically. So when we have to clear 32G at a rate of 3 or
6GB/s we want to put that load on several CPUs, so it's 1-2s instead of
5-10. But the ram is all there at this point, and can be transparently
accessed by all CPUs.

Other than that, we could unify those versions (of printk) by just defining an empty
(for now) version of the spinlock functions in raminit stage. Then think
about where we can place our locking for those platforms that need it
this early...?

it's more than spinlock.
you must also fix the use of the console_drivers struct. There are
several things you need to get right to make this work.