Il 21/11/2012 10:15, Kevin Wolf ha scritto:
>> > + if ((bs->open_flags & BDRV_O_NOCACHE)) {>> > + bs->file->buffer_alignment = align;>> > + }> Any reason to restrict this to BDRV_O_NOCACHE?> > There have been patches to change the BDRV_O_NOCACHE flag from the> monitor, in which case bdrv_set_buffer_alignment() wouldn't be called> anew and O_DIRECT requests start to fail again.>
bdrv_set_buffer_alignment() is completely broken. It should set host
alignment, but in fact it is passed the guest alignment.
In practice, we only support logical_block_size matching the host's or
bigger (which is unsafe due to torn writes, but works).
So I suggest that we just look at writes outside the device models, and
"fix" them to always read a multiple of 4k.
Paolo

On 21/11/12 17:03, Paolo Bonzini wrote:
> Il 21/11/2012 10:15, Kevin Wolf ha scritto:>>>> + if ((bs->open_flags & BDRV_O_NOCACHE)) {>>>> + bs->file->buffer_alignment = align;>>>> + }>> Any reason to restrict this to BDRV_O_NOCACHE?>>>> There have been patches to change the BDRV_O_NOCACHE flag from the>> monitor, in which case bdrv_set_buffer_alignment() wouldn't be called>> anew and O_DIRECT requests start to fail again.>>> > bdrv_set_buffer_alignment() is completely broken. It should set host> alignment, but in fact it is passed the guest alignment.> > In practice, we only support logical_block_size matching the host's or> bigger (which is unsafe due to torn writes, but works).
For other reasons (partition table format) we want to have host block
size == guest block size on s390 anyway - so it would not really matter for
us.
But I certainly agree that it makes more sense to use the host block size
for the alignment checks.
> So I suggest that we just look at writes outside the device models, and> "fix" them to always read a multiple of 4k.
Wouldnt that cause performance regressions for block devices with 512 byte
block size, because we read more than necessary. Wouldnt that also require
read/update/write combinations for valid 512 byte writes?
Christian

Am 07.12.2012 21:26, schrieb Heinz Graalfs:
> Hello Kevin,> > I'm resending my answer as of Nov 23rd.> > Is this still on your queue?
No, it wasn't. I guess I was waiting for a new version of the patch.
>>> }>>> >>> void *qemu_blockalign(BlockDriverState *bs, size_t size)>>> diff --git a/block/raw-posix.c b/block/raw-posix.c>>> index f2f0404..baebf1d 100644>>> --- a/block/raw-posix.c>>> +++ b/block/raw-posix.c>>> @@ -700,6 +700,12 @@ static BlockDriverAIOCB *paio_submit(BlockDriverState *bs, int fd,>>> acb->aio_nbytes = nb_sectors * 512;>>> acb->aio_offset = sector_num * 512;>>> >>> + /* O_DIRECT also requires an aligned length */>>> + if (bs->open_flags & BDRV_O_NOCACHE) {>>> + acb->aio_nbytes += acb->bs->buffer_alignment - 1;>>> + acb->aio_nbytes &= ~(acb->bs->buffer_alignment - 1);>>> + }>>>> Modifying aio_nbytes, but not the iov looks wrong to me. This may work>> in the handle_aiocb_rw_linear() code path, but not with actual vectored I/O.> > Current coding ensures that read IO buffers always seem to be aligned> correctly. Whereas read length values are not always appropriate for an> O_DIRECT scenario.> > For a 2048 formatted disk I verified that> > 1. non vectored IO - the length needs to be adapted several times,> which is accomplished now by the patch.> > 2. vectored IO - the qiov's total length is always a multiple of the> logical block size > (which is also verified in virtio_blk_handle_read())> The particular iov length fields are already correctly setup as a> multiple of the logical block size when processed in> virtio_blk_handle_request().
I must admit that I don't quite understand this. As far as I know,
virtio-blk doesn't make any difference between requests with niov = 1
and real vectored requests. So how can the length of the latter always
be right, whereas the length of the former may be wrong?
The other point is that requests may not even be coming from virtio-blk.
They could be made by other device emulations or they could come from a
block job. (They also could be the result of a merge in the block layer,
though if the original requests were aligned, the result will stay aligned)
Kevin

Hi Kevin,
I'm using the bdrv_pread() function during boot partition detection ...
In detail:
bdrv_pread() is called to read 32 bytes from a 2048 bytes formatted
disk. This results in setting up a read of 512 bytes (1 sector
multiplied by 512 current code in paio_submit()), which is wrong for a
O_DIRECT opened file, and produces the error.
Heinz
On Mon, 2012-12-10 at 09:55 +0100, Kevin Wolf wrote:
> Am 07.12.2012 21:26, schrieb Heinz Graalfs:> > Hello Kevin,> > > > I'm resending my answer as of Nov 23rd.> > > > Is this still on your queue?> > No, it wasn't. I guess I was waiting for a new version of the patch.> > >>> }> >>> > >>> void *qemu_blockalign(BlockDriverState *bs, size_t size)> >>> diff --git a/block/raw-posix.c b/block/raw-posix.c> >>> index f2f0404..baebf1d 100644> >>> --- a/block/raw-posix.c> >>> +++ b/block/raw-posix.c> >>> @@ -700,6 +700,12 @@ static BlockDriverAIOCB *paio_submit(BlockDriverState *bs, int fd,> >>> acb->aio_nbytes = nb_sectors * 512;> >>> acb->aio_offset = sector_num * 512;> >>> > >>> + /* O_DIRECT also requires an aligned length */> >>> + if (bs->open_flags & BDRV_O_NOCACHE) {> >>> + acb->aio_nbytes += acb->bs->buffer_alignment - 1;> >>> + acb->aio_nbytes &= ~(acb->bs->buffer_alignment - 1);> >>> + }> >>> >> Modifying aio_nbytes, but not the iov looks wrong to me. This may work> >> in the handle_aiocb_rw_linear() code path, but not with actual vectored I/O.> > > > Current coding ensures that read IO buffers always seem to be aligned> > correctly. Whereas read length values are not always appropriate for an> > O_DIRECT scenario.> > > > For a 2048 formatted disk I verified that> > > > 1. non vectored IO - the length needs to be adapted several times,> > which is accomplished now by the patch.> > > > 2. vectored IO - the qiov's total length is always a multiple of the> > logical block size > > (which is also verified in virtio_blk_handle_read())> > The particular iov length fields are already correctly setup as a> > multiple of the logical block size when processed in> > virtio_blk_handle_request().> > I must admit that I don't quite understand this. As far as I know,> virtio-blk doesn't make any difference between requests with niov = 1> and real vectored requests. So how can the length of the latter always> be right, whereas the length of the former may be wrong?> > The other point is that requests may not even be coming from virtio-blk.> They could be made by other device emulations or they could come from a> block job. (They also could be the result of a merge in the block layer,> though if the original requests were aligned, the result will stay aligned)> > Kevin>