Am 02.12.2010 13:07, schrieb Stefan Hajnoczi:
> On Tue, Nov 30, 2010 at 12:48 PM, Kevin Wolf <kwolf@redhat.com> wrote:>> This implements an asynchronous version of bdrv_pwrite.>>>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>>> --->> block.c | 167 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>> block.h | 2 +>> 2 files changed, 169 insertions(+), 0 deletions(-)> > Is this function is necessary?> > Current synchronous code uses pwrite() so this function makes it easy> to convert existing code. But if that code took the block-based> nature of storage into account then this read-modify-write helper> isn't needed.
For qcow2, most writes (refcount tables, L2 tables, etc.) are aligned to
512 byte sectors, but there are still some left that use pwrite with an
unaligned count. I'm not completely sure which data, but qemu-iotests
crashed with tmp_buf == NULL, so there are some ;-) Probably things like
header and snapshot table writes.
I'm not sure what other image formats do (we might want to use
block-queue for them, too, eventually), but usually that means that they
do strange things.
> I guess what I'm saying is that this function should only be used when> you really need rmw (in many cases with image metadata it can be> avoided because you have enough metadata cached in memory to do full> sector writes). If it turns out we don't need rmw then we can> eliminate this function.
Maybe what we really should do is completely change the block layer
functions to use bytes as their unit and do any RMW in posix-aio-compat
and linux-aio. Other backends don't need it and without O_DIRECT we
don't even need to do it with files.
Also, using units of 512 bytes is completely arbitrary and may still
involve RMW if the host uses a different sector size.
>> + switch (acb->state) {>> + case 0: {>> + /* Read first sector if needed */> > Please use an enum instead of int literals with comments. Or you> could try separate functions and see if the switch statement really> saves that many lines of code.
Okay, will use an enum.
I think the switch may not save that many lines of code, but it improves
readability because with chained functions (and no forward declarations)
you have to read backwards.
>> + case 3: {>> + /* Read last sector if needed */>> + if (acb->bytes == 0) {>> + goto done;>> + }>> +>> + acb->state = 4;>> + acb->iov.iov_base = acb->tmp_buf;> > acb->tmp_buf may be NULL here if we took the state transition to 2> instead of doing 1.
Yup, is already fixed.
>> +done:>> + qemu_free(acb->tmp_buf);>> + acb->common.cb(acb->common.opaque, ret);> > Callback not invoked from a BH. In an error case we might have made> no blocking calls, i.e. never returned and this callback can cause> reentrancy.
Good point.
>> +BlockDriverAIOCB *bdrv_aio_pwrite(BlockDriverState *bs, int64_t offset,>> + void* buf, size_t bytes, BlockDriverCompletionFunc *cb, void *opaque)>> +{>> + PwriteAIOCB *acb;>> +>> + acb = qemu_aio_get(&blkqueue_aio_pool, bs, cb, opaque);>> + acb->state = 0;>> + acb->offset = offset;>> + acb->buf = buf;>> + acb->bytes = bytes;>> + acb->tmp_buf = NULL;>> +>> + bdrv_aio_pwrite_cb(acb, 0);> > We're missing the usual !bs->drv, bs->read_only, bdrv_check_request()> checks here. Are we okay to wait until calling> bdrv_aio_readv/bdrv_aio_writev for these checks?
I think we are, but if you prefer, I can copy them here.
Kevin

On Thu, Dec 2, 2010 at 12:30 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 02.12.2010 13:07, schrieb Stefan Hajnoczi:>> On Tue, Nov 30, 2010 at 12:48 PM, Kevin Wolf <kwolf@redhat.com> wrote:>> I guess what I'm saying is that this function should only be used when>> you really need rmw (in many cases with image metadata it can be>> avoided because you have enough metadata cached in memory to do full>> sector writes). If it turns out we don't need rmw then we can>> eliminate this function.>> Maybe what we really should do is completely change the block layer> functions to use bytes as their unit and do any RMW in posix-aio-compat> and linux-aio. Other backends don't need it and without O_DIRECT we> don't even need to do it with files.
Yeah that sounds like something worth exploring more. Perhaps
together with some input from Christoph on moving QEMU to the native
block size (e.g. 4k on some devices).
>>> +BlockDriverAIOCB *bdrv_aio_pwrite(BlockDriverState *bs, int64_t offset,>>> + void* buf, size_t bytes, BlockDriverCompletionFunc *cb, void *opaque)>>> +{>>> + PwriteAIOCB *acb;>>> +>>> + acb = qemu_aio_get(&blkqueue_aio_pool, bs, cb, opaque);>>> + acb->state = 0;>>> + acb->offset = offset;>>> + acb->buf = buf;>>> + acb->bytes = bytes;>>> + acb->tmp_buf = NULL;>>> +>>> + bdrv_aio_pwrite_cb(acb, 0);>>>> We're missing the usual !bs->drv, bs->read_only, bdrv_check_request()>> checks here. Are we okay to wait until calling>> bdrv_aio_readv/bdrv_aio_writev for these checks?>> I think we are, but if you prefer, I can copy them here.
No, I just wanted to make sure you took them into account. In theory
those error cases won't affect your code and it's fine to wait for
bdrv_aio_readv/bdrv_aio_writev to catch them. I haven't thought
through the cases in detail though.
Stefan