Comments

This adds some new cache functions to qcow2 which can be used for caching
refcount blocks and L2 tables. When used with cache=writethrough they work
like the old caching code which is spread all over qcow2, so for this case we
have merely a cleanup.
The interesting case is with writeback caching (this includes cache=none) where
data isn't written to disk immediately but only kept in cache initially. This
leads to some form of metadata write batching which avoids the current "write
to refcount block, flush, write to L2 table" pattern for each single request
when a lot of cluster allocations happen. Instead, cache entries are only
written out if its required to maintain the right order. In the pure cluster
allocation case this means that all metadata updates for requests are done in
memory initially and on sync, first the refcount blocks are written to disk,
then fsync, then L2 tables.
This improves performance of scenarios with lots of cluster allocations
noticably (e.g. installation or after taking a snapshot).
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
Makefile.objs | 2 +-
block/qcow2-cache.c | 270 +++++++++++++++++++++++++++++++++++++++++++++++++++
block/qcow2.h | 17 +++
3 files changed, 288 insertions(+), 1 deletions(-)
create mode 100644 block/qcow2-cache.c

On Fri, Jan 14, 2011 at 3:22 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 14.01.2011 15:36, schrieb Stefan Hajnoczi:>> On Mon, Jan 10, 2011 at 4:53 PM, Kevin Wolf <kwolf@redhat.com> wrote:>>> + for (i = 0; i < c->size; i++) {>>> + c->entries[i].table = qemu_blockalign(bs, s->cluster_size);>>> + }>>>> These could be allocated lazily. For a single cache it doesn't>> matter, but we will have n QcowCaches where n is the number of>> dependencies?>> There is one L2 cache and one refcount block cache, both initialized> only once during bdrv_open.>> Also, the only dependency we have is that L2 depends on refcounts being> flushed first or vice versa, i.e. the two caches (not tables!) that> exist may depend on each other but new caches are never created.
I understand now. Without having looked at patch 2/2 I thought this
would impose ordering between allocating write requests, but it's not
needed to meet block device semantics.
Still, I'm reluctant to introduce request reordering into the stack.
The cache makes no attempt to keep things in order whatsoever. On the
other hand, if we're using cache=writeback then the host page cache
can decide in what order to write out pages too.
>>> + c->entries[i].offset = 0;>>> + ret = bdrv_pread(bs->file, offset, c->entries[i].table, s->cluster_size);>>> + if (ret < 0) {>>> + return ret;>>> + }>>> +>>> + c->entries[i].cache_hits = 32;>>>> 32?>> It should be 42, "an arbitrary but carefully chosen number" ;-)>> The point is that we don't want a new entry to be freed immediately> again, so it gets some hits for the start. I'll add a comment for that.
I see.
Stefan