a good compact vertex format?

It is cache friendly, as it fits nicely in 16 bytes, but recently I read in the Gallium specs that the driver probably rearanges each component (position, texture...) modulo 4 bytes anyway. It means an additional 2 bytes of padding quietly added after my position and texture coordinates, and now 20 bytes/vertex which is not so nice.

Is this padding behaviour a general hardware limitation, or driver specific? Is there a way around, to truely get 16 bytes per vertex in GPU mem for the proposed format, without tricks like moving up the 3rd texture coord into a 4th position coordinate to respect alignement, and moving it back in the vertex shader at a cost?