The relation between processor operations and PCIe bus packets is implementation dependent, so there's no straight answer to "what happens if" questions of that sort. The general rule is however that direct operations to a PCIe-mapped address space are intended for control, and not for data transfer. Any properly designed piece of hardware is DMA capable for handling the latter, so there's no motivation to optimize the processor's capability in this field.

As for enumeration, this is done with Configuration Packets, which is another form of TLP packets.

Regards, Eli

Hello,

The relation between processor operations and PCIe bus packets is implementation dependent, so there's no straight answer to "what happens if" questions of that sort. The general rule is however that direct operations to a PCIe-mapped address space are intended for control, and not for data transfer. Any properly designed piece of hardware is DMA capable for handling the latter, so there's no motivation to optimize the processor's capability in this field.

As for enumeration, this is done with Configuration Packets, which is another form of TLP packets.

Since it’s a PC, it’s likely that the CPU itself performs a simple write operation on its own bus, and that the memory controller chipset, which is connected to the CPU’s bus, has the direct connection to the PCIe bus. So what happens is that the chipset (which, in PCIe terms functions as a Root Complex) generates a Memory Write packet for transmission over the bus. This packet consists of a header, which is either 3 or 4 32-bit words long (depending on if 32 or 64 bit addressing is used) and one 32-bit word containing the word to be written. This packet simply says “write this data to this address”.

Can you expound on the above for memcpy to pcie memory? For a "simple" memcpy that does a bunch of 64 bit stores/writes to the pcie memory, I would assume that a Memory Write packet will be generated for each of these writes such that the number of stores to complete the memcpy would equal the number of Memory Write packets. Is this correct? What difference would there be if the pcie memory were mapped as write combining? Do the number of Memory Write packets decrease? How about if memcpy were optimized using SSE instructions? Would there be any difference?

Finally, for reading the pcie configuration using inb/outb during bus enumeration... Do these inb/outb requests also get converted into TLP packets?

[color=#FF0080]Since it’s a PC, it’s likely that the CPU itself performs a simple write operation on its own bus, and that the memory controller chipset, which is connected to the CPU’s bus, has the direct connection to the PCIe bus. So what happens is that the chipset (which, in PCIe terms functions as a Root Complex) generates a Memory Write packet for transmission over the bus. This packet consists of a header, which is either 3 or 4 32-bit words long (depending on if 32 or 64 bit addressing is used) and one 32-bit word containing the word to be written. This packet simply says “write this data to this address”.[/color]

Can you expound on the above for memcpy to pcie memory? For a "simple" memcpy that does a bunch of 64 bit stores/writes to the pcie memory, I would assume that a Memory Write packet will be generated for each of these writes such that the number of stores to complete the memcpy would equal the number of Memory Write packets. Is this correct? What difference would there be if the pcie memory were mapped as write combining? Do the number of Memory Write packets decrease? How about if memcpy were optimized using SSE instructions? Would there be any difference?

Finally, for reading the pcie configuration using inb/outb during bus enumeration... Do these inb/outb requests also get converted into TLP packets?