On 04.01.2010, at 11:45, Isaku Yamahata wrote:
> On Mon, Jan 04, 2010 at 04:26:46AM +0100, Alexander Graf wrote:>> >> On 03.01.2010, at 21:50, Benjamin Herrenschmidt wrote:>> >>> On Sun, 2010-01-03 at 21:27 +0100, Alexander Graf wrote:>>> >>>> I think if unin_pci is the only user, it'd be better to do it hacky>>>> inside unin_pci.c. But if there's a chance there's another user, it'd>>>> be better to make it generic.>>>> >>>> Since this is the first time I ever stumbled across type 0 and type 1>>>> PCI config space addresses, I simply don't know if there are any. Blue>>>> Swirls comment indicated that there are. Ben also sounded as if it's>>>> rather common to not use the x86 format. On the other hand, it looks>>>> like nobody in qemu needed it so far - and we're implementing ARM,>>>> MIPS and all other sorts of platforms.>>>> >>>> So if anyone with knowledge in PCI could shed some light here, please>>>> do so.>>> >>> My feeling is that what you're better off doing is to have the qemu core>>> take an abstract struct to identify a device config space location, that>>> consists of separate fields for:>>> >>> - The PCI domain (which is what host bridge it hangs off since bus>>> numbers are not unique between domains)>>> >>> - The bus number>> >> Hm, I think it'd make more sense to just store a PCIBus pointer in there. We could then fetch the bus and domain id from there.>> >> I'll write something up :-).>> >> >> Alex>> > > Does the following patch help?> I did only compile test though.
I sent out v2 already, which contains a complete resolution framework. It also allows for incremental cleanup of the users, as I'd rather like to see everyone use pci_host instead of hacky homegrown functions. But I don't think changing all at once is going to fly wrt reviewablity.
I'd be really glad if you could take a glimpse at my version. You're definitely more knowledgable in the PCI areas than me :-). I verified that it fixes Uninorth and x86 still works.
Alex

On Mon, Jan 04, 2010 at 11:55:10AM +0100, Alexander Graf wrote:
> > On 04.01.2010, at 11:45, Isaku Yamahata wrote:> > > On Mon, Jan 04, 2010 at 04:26:46AM +0100, Alexander Graf wrote:> >> > >> On 03.01.2010, at 21:50, Benjamin Herrenschmidt wrote:> >> > >>> On Sun, 2010-01-03 at 21:27 +0100, Alexander Graf wrote:> >>> > >>>> I think if unin_pci is the only user, it'd be better to do it hacky> >>>> inside unin_pci.c. But if there's a chance there's another user, it'd> >>>> be better to make it generic.> >>>> > >>>> Since this is the first time I ever stumbled across type 0 and type 1> >>>> PCI config space addresses, I simply don't know if there are any. Blue> >>>> Swirls comment indicated that there are. Ben also sounded as if it's> >>>> rather common to not use the x86 format. On the other hand, it looks> >>>> like nobody in qemu needed it so far - and we're implementing ARM,> >>>> MIPS and all other sorts of platforms.> >>>> > >>>> So if anyone with knowledge in PCI could shed some light here, please> >>>> do so.> >>> > >>> My feeling is that what you're better off doing is to have the qemu core> >>> take an abstract struct to identify a device config space location, that> >>> consists of separate fields for:> >>> > >>> - The PCI domain (which is what host bridge it hangs off since bus> >>> numbers are not unique between domains)> >>> > >>> - The bus number> >> > >> Hm, I think it'd make more sense to just store a PCIBus pointer in there. We could then fetch the bus and domain id from there.> >> > >> I'll write something up :-).> >> > >> > >> Alex> >> > > > > Does the following patch help?> > I did only compile test though.> > I sent out v2 already, which contains a complete resolution framework. It also allows for incremental cleanup of the users, as I'd rather like to see everyone use pci_host instead of hacky homegrown functions. But I don't think changing all at once is going to fly wrt reviewablity.
Agreed. Anyway that patch is just for discussion, not for commit.
If wanted, I'm willing to split it up into a series of patches or
rebase it on top of your patches. (Or throw it away)
> I'd be really glad if you could take a glimpse at my version. You're definitely more knowledgable in the PCI areas than me :-). I verified that it fixes Uninorth and x86 still works.
I'll have a look at it tomorrow.

On Mon, 2010-01-04 at 13:07 +0200, Michael S. Tsirkin wrote:
> BTW, I think we really should think about the right way to address the> swap/noswap issue without using a preprocessor. Maybe make pci host> bridge explicitly specify whether to swap bytes? How about adding a> field in PCIHostState to make it do this?
No, this is a non issue if you get your design right. Just abstract out
the reference to a device in a struct like Alex is proposing and have
the host bridge specific code fill that up appropriately. I don't see
why there would be any need for swapping and in any case, this should go
away once the host bridge code knows how to interpret the write to
whatever config access registers it exposes.
Ben.

On Tue, Jan 05, 2010 at 07:10:58AM +1100, Benjamin Herrenschmidt wrote:
> On Mon, 2010-01-04 at 13:07 +0200, Michael S. Tsirkin wrote:> > BTW, I think we really should think about the right way to address the> > swap/noswap issue without using a preprocessor. Maybe make pci host> > bridge explicitly specify whether to swap bytes? How about adding a> > field in PCIHostState to make it do this? > > No, this is a non issue if you get your design right. Just abstract out> the reference to a device in a struct like Alex is proposing and have> the host bridge specific code fill that up appropriately. I don't see> why there would be any need for swapping and in any case, this should go> away once the host bridge code knows how to interpret the write to> whatever config access registers it exposes.> > Ben.
Well, the main issue if I understand correcttly is that basically the
same hardware bridge can be connected to host in different ways. Yes, we
can say "if it's connected differently it's a different device" but this
is slightly ugly, device should not have to know how it's connected. It
would be cleaner to have a "connector" device in the middle that swaps
bytes. Even though yes, what you describe would be less ugly than using
proprocessor as we do now.

On Mon, 2010-01-04 at 23:12 +0200, Michael S. Tsirkin wrote:
> Well, the main issue if I understand correcttly is that basically the> same hardware bridge can be connected to host in different ways. Yes, we> can say "if it's connected differently it's a different device" but this> is slightly ugly, device should not have to know how it's connected. It> would be cleaner to have a "connector" device in the middle that swaps> bytes. Even though yes, what you describe would be less ugly than using> proprocessor as we do now.
Well, the thing is... PCI is always little endian. I'm not 100% familiar
on how emulation of devices works in qemu, but it's possible that the
problem here is simply not how a standard PCI host bridge is connected
to the processor changing but rather whether it's connected to an x86
host or a ppc host should make the byte order appear different. IE. a
PPC operating system will byteswap accesses. If qemu just passes on
accesses -as-is- instead of doing natural byteswapping then indeed you
will need that added swap there too.
I still think though that this should be buried in the accessors for the
host bridge themselves, eventually controlled by a flag set when
instanciating the host bridge in case it can indeed be wired in
different ways.
Cheers,
Ben.

On Tue, Jan 05, 2010 at 08:25:30AM +1100, Benjamin Herrenschmidt wrote:
> On Mon, 2010-01-04 at 23:12 +0200, Michael S. Tsirkin wrote:> > Well, the main issue if I understand correcttly is that basically the> > same hardware bridge can be connected to host in different ways. Yes, we> > can say "if it's connected differently it's a different device" but this> > is slightly ugly, device should not have to know how it's connected. It> > would be cleaner to have a "connector" device in the middle that swaps> > bytes. Even though yes, what you describe would be less ugly than using> > proprocessor as we do now.> > Well, the thing is... PCI is always little endian.> I'm not 100% familiar> on how emulation of devices works in qemu, but it's possible that the> problem here is simply not how a standard PCI host bridge is connected> to the processor changing but rather whether it's connected to an x86> host or a ppc host should make the byte order appear different. IE. a> PPC operating system will byteswap accesses. If qemu just passes on> accesses -as-is- instead of doing natural byteswapping then indeed you> will need that added swap there too.>
Yes, but I think how you program your host to pci bridge is platform specific,
the standard (mostly) applies to what happens below the bridge. There's
no real standard for how PCI host bridge is connected to processor
AFAIK, it's by luck we can share code there at all.
> I still think though that this should be buried in the accessors for the> host bridge themselves, eventually controlled by a flag set when> instanciating the host bridge in case it can indeed be wired in> different ways.> > Cheers,> Ben.>
buried sounds scary, but generally yes, a flag in host bridge code
is the idea.

> Yes, but I think how you program your host to pci bridge is platform specific,> the standard (mostly) applies to what happens below the bridge. There's> no real standard for how PCI host bridge is connected to processor> AFAIK, it's by luck we can share code there at all.
Well, yes and no ... there's a standard on how a PCI host bridge is
connected in the sense that how normal MMIO accesses go through in term
of endianness is well defined.
How you actually issue config space cycles is a property of a given
bridge. How you issue IO cycles as well in fact.
Cheers,
Ben.

On Tue, Jan 05, 2010 at 08:53:52AM +1100, Benjamin Herrenschmidt wrote:
> > > Yes, but I think how you program your host to pci bridge is platform specific,> > the standard (mostly) applies to what happens below the bridge. There's> > no real standard for how PCI host bridge is connected to processor> > AFAIK, it's by luck we can share code there at all.> > Well, yes and no ... there's a standard on how a PCI host bridge is> connected in the sense that how normal MMIO accesses go through in term> of endianness is well defined.>
Go through where? From processor to PCI?
Which spec do you refer to?
> How you actually issue config space cycles is a property of a given> bridge. How you issue IO cycles as well in fact.> > Cheers,> Ben.

On Tue, 2010-01-05 at 00:25 +0200, Michael S. Tsirkin wrote:
> On Tue, Jan 05, 2010 at 08:53:52AM +1100, Benjamin Herrenschmidt wrote:> > > > > Yes, but I think how you program your host to pci bridge is platform specific,> > > the standard (mostly) applies to what happens below the bridge. There's> > > no real standard for how PCI host bridge is connected to processor> > > AFAIK, it's by luck we can share code there at all.> > > > Well, yes and no ... there's a standard on how a PCI host bridge is> > connected in the sense that how normal MMIO accesses go through in term> > of endianness is well defined.> > > > Go through where? From processor to PCI?> Which spec do you refer to?
The PCI spec from memory :-) Byte swizzling for MMIO between a processor
and PCI bus is well defined.
Now of course, since issuing config space cycles tend to be host-bridge
chip specific, everything is possible there :-) In -most- cases though,
they use a mechanism similar to x86 using the cf8/cfc ports or
equivalent mapped wherever the PIO region is mapped in MMIO space for
non-x86 processors, and thus end up with the exact same byte swizzling.
In fact, this is true of accesses to PCI devices as well. Take for
example, a device that has a 32-bit MMIO register. This register is
meant to appear as little endian (well, unless the device itself plays
tricks but most don't) whatever the host processor is. Thus an x86 host
will need no byteswap but a powerpc host (assuming the ppc is running in
BE mode) will.
This is why for example the base readl and writel function in Linux do
byteswap on powerpc.
This is important to understand that this is a property of how the PCI
bridge is connected to the host processor, such that the PCI "native"
byte order is preserved along with address invariance for sub-32-bit
quantities.
The endianness of the host bridge config space access register is thus
most of the time just a natural side effect of said register being part
of the bridge PIO space and thus getting the natural byteswap explained
above for a 32-bit LE register on PCI.
Cheers,
Ben.

On Tue, Jan 05, 2010 at 09:51:48AM +1100, Benjamin Herrenschmidt wrote:
> On Tue, 2010-01-05 at 00:25 +0200, Michael S. Tsirkin wrote:> > On Tue, Jan 05, 2010 at 08:53:52AM +1100, Benjamin Herrenschmidt wrote:> > > > > > > Yes, but I think how you program your host to pci bridge is platform specific,> > > > the standard (mostly) applies to what happens below the bridge. There's> > > > no real standard for how PCI host bridge is connected to processor> > > > AFAIK, it's by luck we can share code there at all.> > > > > > Well, yes and no ... there's a standard on how a PCI host bridge is> > > connected in the sense that how normal MMIO accesses go through in term> > > of endianness is well defined.> > > > > > > Go through where? From processor to PCI?> > Which spec do you refer to?> > The PCI spec from memory :-) Byte swizzling for MMIO between a processor> and PCI bus is well defined.
Couldn't find it in spec.
> Now of course, since issuing config space cycles tend to be host-bridge> chip specific, everything is possible there :-) In -most- cases though,> they use a mechanism similar to x86 using the cf8/cfc ports or> equivalent mapped wherever the PIO region is mapped in MMIO space for> non-x86 processors, and thus end up with the exact same byte swizzling.> > In fact, this is true of accesses to PCI devices as well. Take for> example, a device that has a 32-bit MMIO register. This register is> meant to appear as little endian (well, unless the device itself plays> tricks but most don't) whatever the host processor is. Thus an x86 host> will need no byteswap but a powerpc host (assuming the ppc is running in> BE mode) will.> > This is why for example the base readl and writel function in Linux do> byteswap on powerpc.> > This is important to understand that this is a property of how the PCI> bridge is connected to the host processor, such that the PCI "native"> byte order is preserved along with address invariance for sub-32-bit> quantities.> > The endianness of the host bridge config space access register is thus> most of the time just a natural side effect of said register being part> of the bridge PIO space and thus getting the natural byteswap explained> above for a 32-bit LE register on PCI.> > Cheers,> Ben.
So, it appears that this is not the case for many platforms: bridge
itself does a byteswap to make devices behind it work according to spec,
but this does not apply to programming bridge itself.
This seems common on BE platforms, this is why qemu has
ifdef TARGET_WORDS_BIGENDIAN there IIUC.

On 04.01.2010, at 23:59, Michael S. Tsirkin wrote:
> On Tue, Jan 05, 2010 at 09:51:48AM +1100, Benjamin Herrenschmidt wrote:>> On Tue, 2010-01-05 at 00:25 +0200, Michael S. Tsirkin wrote:>>> On Tue, Jan 05, 2010 at 08:53:52AM +1100, Benjamin Herrenschmidt wrote:>>>> >>>>> Yes, but I think how you program your host to pci bridge is platform specific,>>>>> the standard (mostly) applies to what happens below the bridge. There's>>>>> no real standard for how PCI host bridge is connected to processor>>>>> AFAIK, it's by luck we can share code there at all.>>>> >>>> Well, yes and no ... there's a standard on how a PCI host bridge is>>>> connected in the sense that how normal MMIO accesses go through in term>>>> of endianness is well defined.>>>> >>> >>> Go through where? From processor to PCI?>>> Which spec do you refer to?>> >> The PCI spec from memory :-) Byte swizzling for MMIO between a processor>> and PCI bus is well defined.> > Couldn't find it in spec.> >> Now of course, since issuing config space cycles tend to be host-bridge>> chip specific, everything is possible there :-) In -most- cases though,>> they use a mechanism similar to x86 using the cf8/cfc ports or>> equivalent mapped wherever the PIO region is mapped in MMIO space for>> non-x86 processors, and thus end up with the exact same byte swizzling.>> >> In fact, this is true of accesses to PCI devices as well. Take for>> example, a device that has a 32-bit MMIO register. This register is>> meant to appear as little endian (well, unless the device itself plays>> tricks but most don't) whatever the host processor is. Thus an x86 host>> will need no byteswap but a powerpc host (assuming the ppc is running in>> BE mode) will.>> >> This is why for example the base readl and writel function in Linux do>> byteswap on powerpc.>> >> This is important to understand that this is a property of how the PCI>> bridge is connected to the host processor, such that the PCI "native">> byte order is preserved along with address invariance for sub-32-bit>> quantities.>> >> The endianness of the host bridge config space access register is thus>> most of the time just a natural side effect of said register being part>> of the bridge PIO space and thus getting the natural byteswap explained>> above for a 32-bit LE register on PCI.>> >> Cheers,>> Ben.> > So, it appears that this is not the case for many platforms: bridge> itself does a byteswap to make devices behind it work according to spec,> but this does not apply to programming bridge itself.> > This seems common on BE platforms, this is why qemu has> ifdef TARGET_WORDS_BIGENDIAN there IIUC.
IIRC qemu's mmio functions just pass the register value the guest had at that moment to the mmio function.
While that is pretty simple, it means that we behave different from real hardware. Real hardware has 2 components:
1) CPU
2) Device
The CPU sends an MMIO request in local byte order to the device. The device also has a native endianness - either BE or LE.
So what the byte swap here does is simply converting from wrong LE (what Linux thought the device needs) to a proper variable.
I'm not sure what the correct solution to this is. I'm inclined to suggest that mmio callbacks should get called with tswap'ed values. But then again the code as is works in quite a lot of cases and I don't want to spend months getting it back to where it is just because of a cosmetic change.
Alex

On Tue, Jan 05, 2010 at 12:08:19AM +0100, Alexander Graf wrote:
> > On 04.01.2010, at 23:59, Michael S. Tsirkin wrote:> > > On Tue, Jan 05, 2010 at 09:51:48AM +1100, Benjamin Herrenschmidt wrote:> >> On Tue, 2010-01-05 at 00:25 +0200, Michael S. Tsirkin wrote:> >>> On Tue, Jan 05, 2010 at 08:53:52AM +1100, Benjamin Herrenschmidt wrote:> >>>> > >>>>> Yes, but I think how you program your host to pci bridge is platform specific,> >>>>> the standard (mostly) applies to what happens below the bridge. There's> >>>>> no real standard for how PCI host bridge is connected to processor> >>>>> AFAIK, it's by luck we can share code there at all.> >>>> > >>>> Well, yes and no ... there's a standard on how a PCI host bridge is> >>>> connected in the sense that how normal MMIO accesses go through in term> >>>> of endianness is well defined.> >>>> > >>> > >>> Go through where? From processor to PCI?> >>> Which spec do you refer to?> >> > >> The PCI spec from memory :-) Byte swizzling for MMIO between a processor> >> and PCI bus is well defined.> > > > Couldn't find it in spec.> > > >> Now of course, since issuing config space cycles tend to be host-bridge> >> chip specific, everything is possible there :-) In -most- cases though,> >> they use a mechanism similar to x86 using the cf8/cfc ports or> >> equivalent mapped wherever the PIO region is mapped in MMIO space for> >> non-x86 processors, and thus end up with the exact same byte swizzling.> >> > >> In fact, this is true of accesses to PCI devices as well. Take for> >> example, a device that has a 32-bit MMIO register. This register is> >> meant to appear as little endian (well, unless the device itself plays> >> tricks but most don't) whatever the host processor is. Thus an x86 host> >> will need no byteswap but a powerpc host (assuming the ppc is running in> >> BE mode) will.> >> > >> This is why for example the base readl and writel function in Linux do> >> byteswap on powerpc.> >> > >> This is important to understand that this is a property of how the PCI> >> bridge is connected to the host processor, such that the PCI "native"> >> byte order is preserved along with address invariance for sub-32-bit> >> quantities.> >> > >> The endianness of the host bridge config space access register is thus> >> most of the time just a natural side effect of said register being part> >> of the bridge PIO space and thus getting the natural byteswap explained> >> above for a 32-bit LE register on PCI.> >> > >> Cheers,> >> Ben.> > > > So, it appears that this is not the case for many platforms: bridge> > itself does a byteswap to make devices behind it work according to spec,> > but this does not apply to programming bridge itself.> > > > This seems common on BE platforms, this is why qemu has> > ifdef TARGET_WORDS_BIGENDIAN there IIUC.> > IIRC qemu's mmio functions just pass the register value the guest had at that moment to the mmio function.> > While that is pretty simple, it means that we behave different from real hardware. Real hardware has 2 components:> > 1) CPU> 2) Device> > The CPU sends an MMIO request in local byte order to the device. The device also has a native endianness - either BE or LE.> So what the byte swap here does is simply converting from wrong LE (what Linux thought the device needs) to a proper variable.> > I'm not sure what the correct solution to this is. I'm inclined to suggest that mmio callbacks should get called with tswap'ed values. But then again the code as is works in quite a lot of cases and I don't want to spend months getting it back to where it is just because of a cosmetic change.> > Alex
If I understand correctly this is just a byteswap to emulate how host
bridge is connected, unrelated to local byte order. By anyway, I agree:
let's not spend months on this.

> So, it appears that this is not the case for many platforms: bridge> itself does a byteswap to make devices behind it work according to spec,> but this does not apply to programming bridge itself.> > This seems common on BE platforms, this is why qemu has> ifdef TARGET_WORDS_BIGENDIAN there IIUC.
Sadly, it is quite often that misleaded HW designers thing they are
doing SW a service by making something BE instead of LE on a BE
platform, but in the end just forcing everybody to special case :-)
It's hard to say what is the most common. All powerpc chrp have it LE
afaik, FSL tried to be smart (FAIL) and got their SoC config space
access reg BE instead. Apple stuff is just "special", etc...
In any case, it doesn't matter much as long as qemu gets normal MMIO
emulation right. Host bridge config space is chipset specific and as
such platform specific.
Cheers,
Ben.

On Tue, 2010-01-05 at 00:08 +0100, Alexander Graf wrote:
> > IIRC qemu's mmio functions just pass the register value the guest had> at that moment to the mmio function.
That means that qemu HW emulation needs, for each device, to add a layer
of byteswap depending on whether the CPU is LE or BE which sucks :-)
> While that is pretty simple, it means that we behave different from> real hardware. Real hardware has 2 components:> > 1) CPU> 2) Device> > The CPU sends an MMIO request in local byte order to the device. The> device also has a native endianness - either BE or LE.> So what the byte swap here does is simply converting from wrong LE> (what Linux thought the device needs) to a proper variable.
Not quite. The CPU sends an MMIO request to a PCI host bridge. For
devices (and I'm not talking about the host bridge registers themselves,
those are on the CPU bus and could be wired in any fancy way), but for
PCI devices behind that bridge, you expect things to be wired in such a
way that the BE CPU MSB is wired to the PCI LSB.
IE. If a piece of C code writes 0xaabbccdd to a PCI device 32-bit
register using a native aligned access from the CPU of the form
*(unsigned int *)addr, the device should see 0xaabbccdd with a LE CPU
and 0xddccbbaa with a BE CPU :-) Really ! This is why writel() in linux
will byteswap on BE CPUs.
Ideally, QEMU should do the same swapping for all accesses over PCI in
order to avoid to CPU conditional byteswap tricks in the device
emulation proper.
Now, regarding the host bridge own registers, as I said above, if they
are legacy IO cf8/cfc (which is the case of most CHRPs for example) they
will be LE just the same way as above. Some SoCs like FSL do try to be
smart and get them BE though. But yeah, that is all specific to a given
host bridge.
> I'm not sure what the correct solution to this is. I'm inclined to> suggest that mmio callbacks should get called with tswap'ed values.> But then again the code as is works in quite a lot of cases and I> don't want to spend months getting it back to where it is just because> of a cosmetic change.
Well, I'd say that MMIO callbacks should continue being native for
anything on the CPU bus, -but- when crossing a bridge, they may need to
get some swapping done depending on what that bridge does. For example,
I would expect a powerpc/amba bridge to do a similar swapping as above
for a powerpc/pci bridge.
Cheers,
Ben.
> Alex