*[PATCH 00/20] Separate `ref_cache` into a separate module@ 2017-03-20 16:33 Michael Haggerty
2017-03-20 16:33 ` [PATCH 01/20] get_ref_dir(): don't call read_loose_refs() for "refs/bisect" Michael Haggerty
` (23 more replies)0 siblings, 24 replies; 38+ messages in thread
From: Michael Haggerty @ 2017-03-20 16:33 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Nguyễn Thái Ngọc Duy, David Turner, Jeff King, git, Michael Haggerty
I had a window of opportunity last week to hack intensely on Git, with
the following goals:
* Separate `ref_cache` out of `files_ref_cache`.
* Separate a new `packed_ref_cache` class out of `files_ref_cache`.
Change the latter to use an instance of the former for all of its
interactions with the `packed-refs` file.
* Mmap `packed-refs` files rather than reading-and-parsing.
* Use the mmapped version of the `packed-refs` file as the "cache"
rather than using a separate `ref_cache`.
* (And the main goal): Avoid reading and parsing the *whole
`packed-refs` file* (as we do now) every time any part of it is
needed. Instead, use binary search to find the reference and/or
range of references that we want, and parse the info out of the
mmapped image on the fly.
I've completed a draft of an epic 48-patch series implementing all of
the above points on my GitHub fork [1] as branch
`wip/mmap-packed-refs`. It dramatically speeds up performance and
reduces memory usage for some tasks in repositories with very many
packed references.
But the later parts of that series aren't completely polished yet, and
such a large patch series would be indigestible anyway, so here I
submit the first part...
This patch series extracts a `ref_cache` module out of
`files_ref_cache`, and goes some way to disentangling those two
modules, which until now were overly intimate with each other:
* Remove `verify_refname_available()` from the refs VTABLE, instead
implementing it in a generic way that uses only the usual refs API
to talk to the `ref_store`.
* Split `ref_cache`-related code into a new module,
`refs/ref-cache.{c,h}`. Encapsulate the data structure in a new
class, `struct ref_cache`.
* Change the lazy-filling mechanism of `ref_cache` to call back to its
backing `ref_store` via a callback function rather than calling
`read_loose_refs()` directly.
* Move the special handling of `refs/bisect/` from `ref_cache` to
`files_ref_store`.
* Make `cache_ref_iterator_begin()` smarter, and change external users
to iterate via this interface instead of using
`do_for_each_entry_in_dir()`.
Even after this patch series, the modules are still too intimate for
my taste, but I think this is a big step forward, and it is enough to
allow the other changes that I've been working on.
These patches depend on Duy's nd/files-backend-git-dir branch, v6 [2].
They are also available from my GitHub fork [1] as branch
`separate-ref-cache`.
Happily, this patch series actually removes a few more lines than it
adds, mostly thanks to the simpler `verify_refname_available()`
implementation.
Michael
[1] https://github.com/mhagger/git
[2] http://public-inbox.org/git/20170318020337.22767-1-pclouds@gmail.com/
Michael Haggerty (20):
get_ref_dir(): don't call read_loose_refs() for "refs/bisect"
refs_read_raw_ref(): new function
refs_ref_iterator_begin(): new function
refs_verify_refname_available(): implement once for all backends
refs_verify_refname_available(): use function in more places
Rename `add_ref()` to `add_ref_entry()`
Rename `find_ref()` to `find_ref_entry()`
Rename `remove_entry()` to `remove_entry_from_dir()`
refs: split `ref_cache` code into separate files
ref-cache: introduce a new type, ref_cache
refs: record the ref_store in ref_cache, not ref_dir
ref-cache: use a callback function to fill the cache
refs: handle "refs/bisect/" in `loose_fill_ref_dir()`
do_for_each_entry_in_dir(): eliminate `offset` argument
get_loose_ref_dir(): function renamed from get_loose_refs()
get_loose_ref_cache(): new function
cache_ref_iterator_begin(): make function smarter
commit_packed_refs(): use reference iteration
files_pack_refs(): use reference iteration
do_for_each_entry_in_dir(): delete function
Makefile | 1 +
refs.c | 111 ++++-
refs.h | 2 +-
refs/files-backend.c | 1229 +++++++-------------------------------------------
refs/ref-cache.c | 523 +++++++++++++++++++++
refs/ref-cache.h | 267 +++++++++++
refs/refs-internal.h | 22 +-
7 files changed, 1066 insertions(+), 1089 deletions(-)
create mode 100644 refs/ref-cache.c
create mode 100644 refs/ref-cache.h
--
2.11.0
^permalinkrawreply [flat|threaded] 38+ messages in thread

*[PATCH 01/20] get_ref_dir(): don't call read_loose_refs() for "refs/bisect"
2017-03-20 16:33 [PATCH 00/20] Separate `ref_cache` into a separate module Michael Haggerty
@ 2017-03-20 16:33 ` Michael Haggerty
2017-03-20 16:33 ` [PATCH 02/20] refs_read_raw_ref(): new function Michael Haggerty
` (22 subsequent siblings)23 siblings, 0 replies; 38+ messages in thread
From: Michael Haggerty @ 2017-03-20 16:33 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Nguyễn Thái Ngọc Duy, David Turner, Jeff King, git, Michael Haggerty
Since references under "refs/bisect/" are per-worktree, they have to
be sought in the worktree rather than in the main repository. But
since loose references are found by traversing directories, the
reference iterator won't even get the idea to look for a
"refs/bisect/" directory in the worktree if there is not a directory
with that name in the main repository. Thus `get_ref_dir()` manually
inserts a dir_entry for "refs/bisect/" whenever it reads the entry for
"refs/".
The current code then immediately calls `read_loose_refs()` on that
directory. But since the dir_entry is created with its `incomplete`
flag set, any traversal that gets to this point will read the
directory automatically. So there is no need to call
`read_loose_refs()` explicitly; the lazy mechanism suffices.
And in fact, the attempt to `read_loose_refs()` was broken anyway.
That function needs its `dirname` argument to have a trailing `/`
character, but the invocation here was passing it "refs/bisect"
without a trailing slash. So `read_loose_refs()` would read
`$GIT_DIR/refs/bisect" correctly, but if it found an entry "foo" in
that directory, it would try to read "$GIT_DIR/refs/bisectfoo".
Normally it wouldn't find anything at that path, but the failure was
canceled out because `get_ref_dir()` *also* forgot to reset the
`REF_INCOMPLETE` bit on the dir_entry. So the read was attempted again
when it was accessed, via the lazy mechanism, and this time the read
was done correctly.
This code has been broken since it was first introduced.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
refs/files-backend.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 4242486118..e7a075e864 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -191,8 +191,6 @@ static struct ref_dir *get_ref_dir(struct ref_entry *entry)
"refs/bisect/",
12, 1);
add_entry_to_dir(dir, child_entry);
- read_loose_refs("refs/bisect",
- &child_entry->u.subdir);
}
}
entry->flag &= ~REF_INCOMPLETE;
--
2.11.0
^permalinkrawreply [flat|threaded] 38+ messages in thread

*Re: [PATCH 00/20] Separate `ref_cache` into a separate module
2017-03-20 16:33 [PATCH 00/20] Separate `ref_cache` into a separate module Michael Haggerty
` (19 preceding siblings ...)
2017-03-20 16:33 ` [PATCH 20/20] do_for_each_entry_in_dir(): delete function Michael Haggerty
@ 2017-03-20 17:25 ` Junio C Hamano
2017-03-20 18:12 ` Jeff King
` (2 subsequent siblings)23 siblings, 0 replies; 38+ messages in thread
From: Junio C Hamano @ 2017-03-20 17:25 UTC (permalink / raw)
To: Michael Haggerty; +Cc: Nguyễn Thái Ngọc Duy, David Turner, Jeff King, git
Michael Haggerty <mhagger@alum.mit.edu> writes:
> * (And the main goal): Avoid reading and parsing the *whole
> `packed-refs` file* (as we do now) every time any part of it is
> needed. Instead, use binary search to find the reference and/or
> range of references that we want, and parse the info out of the
> mmapped image on the fly.
Oooohh.... Juicy.
> This patch series extracts a `ref_cache` module out of
> `files_ref_cache`, and goes some way to disentangling those two
> modules, which until now were overly intimate with each other:
>
> * Remove `verify_refname_available()` from the refs VTABLE, instead
> implementing it in a generic way that uses only the usual refs API
> to talk to the `ref_store`.
>
> * Split `ref_cache`-related code into a new module,
> `refs/ref-cache.{c,h}`. Encapsulate the data structure in a new
> class, `struct ref_cache`.
>
> * Change the lazy-filling mechanism of `ref_cache` to call back to its
> backing `ref_store` via a callback function rather than calling
> `read_loose_refs()` directly.
Nice.
> * Move the special handling of `refs/bisect/` from `ref_cache` to
> `files_ref_store`.
>
> * Make `cache_ref_iterator_begin()` smarter, and change external users
> to iterate via this interface instead of using
> `do_for_each_entry_in_dir()`.
>
> Even after this patch series, the modules are still too intimate for
> my taste, but I think this is a big step forward, and it is enough to
> allow the other changes that I've been working on.
>
> These patches depend on Duy's nd/files-backend-git-dir branch, v6 [2].
> They are also available from my GitHub fork [1] as branch
> `separate-ref-cache`.
>
> Happily, this patch series actually removes a few more lines than it
> adds, mostly thanks to the simpler `verify_refname_available()`
> implementation.
Thanks.
^permalinkrawreply [flat|threaded] 38+ messages in thread

*Re: [PATCH 04/20] refs_verify_refname_available(): implement once for all backends
2017-03-20 16:33 ` [PATCH 04/20] refs_verify_refname_available(): implement once for all backends Michael Haggerty
@ 2017-03-20 17:42 ` Jeff King
2017-03-20 22:20 ` Michael Haggerty0 siblings, 1 reply; 38+ messages in thread
From: Jeff King @ 2017-03-20 17:42 UTC (permalink / raw)
To: Michael Haggerty; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, David Turner, git
On Mon, Mar 20, 2017 at 05:33:09PM +0100, Michael Haggerty wrote:
> It turns out that we can now implement
> `refs_verify_refname_available()` based on the other virtual
> functions, so there is no need for it to be defined at the backend
> level. Instead, define it once in `refs.c` and remove the
> `files_backend` definition.
Does this mean that backends can no longer provide storage for
D/F-conflicted refnames (i.e., "refs/foo" and "refs/foo/bar")? It looks
like the global verify_refname_available() has that logic baked in.
I know that was a complex subject because of the potential compatibility
issues (e.g., fetching from a server with a more capable backend), but
I'd just worry we are shutting a door on some options. OTOH, it probably
wouldn't be that hard for the global function to check a flag specific
to the ref_store, and allow such refs when it is set.
> int refs_verify_refname_available(struct ref_store *refs,
> const char *refname,
> - const struct string_list *extra,
> + const struct string_list *extras,
> const struct string_list *skip,
> struct strbuf *err)
> {
> [...]
> + /*
> + * We are still at a leading dir of the refname (e.g.,
> + * "refs/foo"; if there is a reference with that name,
> + * it is a conflict, *unless* it is in skip.
> + */
> + if (skip && string_list_has_string(skip, dirname.buf))
> + continue;
> +
> + if (!refs_read_raw_ref(refs, dirname.buf, oid.hash, &referent, &type)) {
> + strbuf_addf(err, "'%s' exists; cannot create '%s'",
> + dirname.buf, refname);
> + goto cleanup;
> + }
We don't really care about reading the ref value here; we just care if
it exists. Does that matter for efficiency (e.g., for the files backend
it's a stat() versus an open/read/close)? I guess the difference only
matters when it _does_ exist, which is the uncommon error case.
(Also, I suspect the loose ref cache always just reads everything in the
current code, though with the iterator approach in theory we could stop
doing that).
-Peff
^permalinkrawreply [flat|threaded] 38+ messages in thread

*Re: [PATCH 11/20] refs: record the ref_store in ref_cache, not ref_dir
2017-03-20 16:33 ` [PATCH 11/20] refs: record the ref_store in ref_cache, not ref_dir Michael Haggerty
@ 2017-03-20 17:51 ` Jeff King
2017-03-20 22:39 ` Michael Haggerty0 siblings, 1 reply; 38+ messages in thread
From: Jeff King @ 2017-03-20 17:51 UTC (permalink / raw)
To: Michael Haggerty; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, David Turner, git
On Mon, Mar 20, 2017 at 05:33:16PM +0100, Michael Haggerty wrote:
> Instead of keeping a pointer to the ref_store in every ref_dir entry,
> store it once in `struct ref_cache`, and change `struct ref_dir` to
> include a pointer to its containing `ref_cache` instead. This makes it
> easier to add to the information that is stored in `struct ref_cache`
> without inflating the size of the individual entries.
This last sentence confused me. It's a pointer either way, no?
Do you just mean that we are free to add whatever we like to the
"ref_cache" without polluting the "ref_store" that is a more public data
structure?
-Peff
^permalinkrawreply [flat|threaded] 38+ messages in thread

*Re: [PATCH 00/20] Separate `ref_cache` into a separate module
2017-03-20 16:33 [PATCH 00/20] Separate `ref_cache` into a separate module Michael Haggerty
` (20 preceding siblings ...)
2017-03-20 17:25 ` [PATCH 00/20] Separate `ref_cache` into a separate module Junio C Hamano
@ 2017-03-20 18:12 ` Jeff King
2017-03-20 18:24 ` Ævar Arnfjörð Bjarmason
2017-03-20 22:32 ` Junio C Hamano23 siblings, 0 replies; 38+ messages in thread
From: Jeff King @ 2017-03-20 18:12 UTC (permalink / raw)
To: Michael Haggerty; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, David Turner, git
On Mon, Mar 20, 2017 at 05:33:05PM +0100, Michael Haggerty wrote:
> I've completed a draft of an epic 48-patch series implementing all of
> the above points on my GitHub fork [1] as branch
> `wip/mmap-packed-refs`. It dramatically speeds up performance and
> reduces memory usage for some tasks in repositories with very many
> packed references.
Having played a bit with the work-in-progress, I'm very excited about
the performance improvements.
> But the later parts of that series aren't completely polished yet, and
> such a large patch series would be indigestible anyway, so here I
> submit the first part...
>
> This patch series extracts a `ref_cache` module out of
> `files_ref_cache`, and goes some way to disentangling those two
> modules, which until now were overly intimate with each other:
> [...]
I just read through it and didn't see anything objectionable. I had a
few questions, but I expect that most can be answered with an
explanation rather than a re-roll.
> Even after this patch series, the modules are still too intimate for
> my taste, but I think this is a big step forward, and it is enough to
> allow the other changes that I've been working on.
The resulting code looks like a big improvement in modularity to me. My
one complaint is that the virtual functions make it hard to dig through
the code. E.g., when looking at one patch I saw that we called
ref_iterator_peel(), and I wanted to know what that entailed. My editor
helpfully jumps straight to the definition, but of course it has nothing
useful; it's just a vtable wrapper. And I had to go dig up the name
"files_peel_ref()" manually.
I guess that's the price we pay for modularity.
-Peff
^permalinkrawreply [flat|threaded] 38+ messages in thread

*Re: [PATCH 04/20] refs_verify_refname_available(): implement once for all backends
2017-03-20 17:42 ` Jeff King@ 2017-03-20 22:20 ` Michael Haggerty0 siblings, 0 replies; 38+ messages in thread
From: Michael Haggerty @ 2017-03-20 22:20 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, David Turner, git
On 03/20/2017 06:42 PM, Jeff King wrote:
> On Mon, Mar 20, 2017 at 05:33:09PM +0100, Michael Haggerty wrote:
>
>> It turns out that we can now implement
>> `refs_verify_refname_available()` based on the other virtual
>> functions, so there is no need for it to be defined at the backend
>> level. Instead, define it once in `refs.c` and remove the
>> `files_backend` definition.
>
> Does this mean that backends can no longer provide storage for
> D/F-conflicted refnames (i.e., "refs/foo" and "refs/foo/bar")? It looks
> like the global verify_refname_available() has that logic baked in.
The `verify_refname_available()` function implements the "no D/F
conflict" policy. But it's called from the backends, not from the common
code, and nobody says that a backend needs to call the function.
> [...]
>> int refs_verify_refname_available(struct ref_store *refs,
>> const char *refname,
>> - const struct string_list *extra,
>> + const struct string_list *extras,
>> const struct string_list *skip,
>> struct strbuf *err)
>> {
>> [...]
>> + /*
>> + * We are still at a leading dir of the refname (e.g.,
>> + * "refs/foo"; if there is a reference with that name,
>> + * it is a conflict, *unless* it is in skip.
>> + */
>> + if (skip && string_list_has_string(skip, dirname.buf))
>> + continue;
>> +
>> + if (!refs_read_raw_ref(refs, dirname.buf, oid.hash, &referent, &type)) {
>> + strbuf_addf(err, "'%s' exists; cannot create '%s'",
>> + dirname.buf, refname);
>> + goto cleanup;
>> + }
>
> We don't really care about reading the ref value here; we just care if
> it exists. Does that matter for efficiency (e.g., for the files backend
> it's a stat() versus an open/read/close)? I guess the difference only
> matters when it _does_ exist, which is the uncommon error case.
Yes, I assume that the conflict cases are unusual so I wasn't worrying
too much about their performance.
Not quite so obvious is that the new code sometimes checks for conflicts
against both loose and packed references in situations where the old
code only checked against packed references. Specifically, this happens
when the code has just read, or locked, or failed to find a loose
reference, so it could infer that there are no loose references that
could conflict. I don't think that will be noticeable, because it is the
reading of the whole `packed-refs` file that is a big expensive
operation that it is important to avoid. Anyway, later patches (i.e.,
after there is a `packed_ref_store`) can switch back to checking only
packed references and get back the old optimization.
> (Also, I suspect the loose ref cache always just reads everything in the
> current code, though with the iterator approach in theory we could stop
> doing that).
The loose ref cache reads directories into the cache lazily,
breadth-first. So it reads all of the entries in the directories leading
to the reference being looked up, but no others. When iterating, it
reads the parent directories of the prefix that is being iterated over
plus the whole subtree under the prefix, and that's it (though this
optimization is not yet wired through to `git for-each-ref`).
Michael
^permalinkrawreply [flat|threaded] 38+ messages in thread

*Re: [PATCH 11/20] refs: record the ref_store in ref_cache, not ref_dir
2017-03-20 17:51 ` Jeff King@ 2017-03-20 22:39 ` Michael Haggerty0 siblings, 0 replies; 38+ messages in thread
From: Michael Haggerty @ 2017-03-20 22:39 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, David Turner, git
On 03/20/2017 06:51 PM, Jeff King wrote:
> On Mon, Mar 20, 2017 at 05:33:16PM +0100, Michael Haggerty wrote:
>
>> Instead of keeping a pointer to the ref_store in every ref_dir entry,
>> store it once in `struct ref_cache`, and change `struct ref_dir` to
>> include a pointer to its containing `ref_cache` instead. This makes it
>> easier to add to the information that is stored in `struct ref_cache`
>> without inflating the size of the individual entries.
>
> This last sentence confused me. It's a pointer either way, no?
>
> Do you just mean that we are free to add whatever we like to the
> "ref_cache" without polluting the "ref_store" that is a more public data
> structure?
Yeah, that's explained poorly. It might be clearer as
> This makes it easier to add to the information that is accessible
> from a `ref_dir` without increasing the size of every `ref_dir`
> instance.
It used to be that `ref_dir` contained a pointer to the `ref_store` that
contains it. (BTW, such a pointer *can't* be turned into a pointer to
the `ref_cache` because (1) the `ref_dir` shouldn't have to know what
kind of `ref_store` it is being used with, and (2) a `packed_ref_cache`
can be detached from its old `ref_cache` if the stat info indicates that
the `packed-refs` file has been modified since it was last read.)
But we want to add a `fill_ref_dir` callback pointer in a place that is
reachable from the `ref_dir` so that it can fill itself when necessary.
We could add it to the `ref_dir` struct, but that would inflate its
size. Instead, it makes more sense for the `ref_dir` to have a pointer
to the enclosing `ref_cache`, and store the `fill_ref_dir` pointer in
`ref_cache`. The `ref_cache` also gets a pointer to the enclosing
`ref_store`; that way a `ref_dir` also has a way to access the
`ref_store` that contains it.
An alternative would be to pass a pointer to the `ref_cache` down the
call stack when it is being accessed, but that seemed like a lot of work
that wouldn't lead to a very clean result.
Michael
^permalinkrawreply [flat|threaded] 38+ messages in thread

*Re: [PATCH 18/20] commit_packed_refs(): use reference iteration
2017-03-20 18:05 ` Jeff King@ 2017-03-22 8:42 ` Michael Haggerty
2017-03-22 13:06 ` Jeff King0 siblings, 1 reply; 38+ messages in thread
From: Michael Haggerty @ 2017-03-22 8:42 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, David Turner, git
On 03/20/2017 07:05 PM, Jeff King wrote:
> On Mon, Mar 20, 2017 at 05:33:23PM +0100, Michael Haggerty wrote:
>
>> -/*
>> - * An each_ref_entry_fn that writes the entry to a packed-refs file.
>> - */
>> -static int write_packed_entry_fn(struct ref_entry *entry, void *cb_data)
>> -{
>> - enum peel_status peel_status = peel_entry(entry, 0);
>> -
>> - if (peel_status != PEEL_PEELED && peel_status != PEEL_NON_TAG)
>> - error("internal error: %s is not a valid packed reference!",
>> - entry->name);
>> - write_packed_entry(cb_data, entry->name, entry->u.value.oid.hash,
>> - peel_status == PEEL_PEELED ?
>> - entry->u.value.peeled.hash : NULL);
>> - return 0;
>> -}
>
> This assertion goes away. It can't be moved into write_packed_entry()
> because the peel status is only known in the caller.
>
> But here:
>
>> @@ -1376,8 +1362,18 @@ static int commit_packed_refs(struct files_ref_store *refs)
>> die_errno("unable to fdopen packed-refs descriptor");
>>
>> fprintf_or_die(out, "%s", PACKED_REFS_HEADER);
>> - do_for_each_entry_in_dir(get_packed_ref_dir(packed_ref_cache),
>> - write_packed_entry_fn, out);
>> +
>> + iter = cache_ref_iterator_begin(packed_ref_cache->cache, NULL, 0);
>> + while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
>> + struct object_id peeled;
>> + int peel_error = ref_iterator_peel(iter, &peeled);
>> +
>> + write_packed_entry(out, iter->refname, iter->oid->hash,
>> + peel_error ? NULL : peeled.hash);
>> + }
>
> Should we be checking that peel_error is only PEELED or NON_TAG?
This is a good question, and it took me a while to figure out the whole
answer. At first I deleted this check without much thought because it
was just an internal consistency check that should never trigger. But
actually the story is more complicated than that.
Tl;dr: the old code is slightly wrong and I think the new code is correct.
First the superficial answer: we can't `peel_error` in
`commit_packed_refs()` as you suggested, because `ref_iterator_peel()`
doesn't return an `enum peel_status`. It only returns 0 on OK / nonzero
for problems. A legitimate reference should never have a status
`PEEL_INVALID`. That status is meant for stale packed refs that are
hidden by more recent loose refs, which *can* legitimately point at
objects that have since been GCed. Since `ref_iterator_peel()` is
someday meant to be an exposed part of the API, I didn't want it to give
out more information than pass/fail [1]. Also, the reason for a peel
failure is not necessarily known accurately (information is lost when a
reference is packed then the packed-refs file is read).
So, when could the old error message have been emitted? It is when there
is an entry in the packed-ref cache that is not `REF_KNOWS_PEELED`, and
when the peel is attempted, the result is `PEEL_INVALID`,
`PEEL_IS_SYMREF`, or `PEEL_BROKEN`. Since a packed ref cannot be a
symref, `PEEL_IS_SYMREF` and `PEEL_BROKEN` can be ruled out. So we're
left with `PEEL_INVALID`.
How could an entry get into the packed-refs cache without
`REF_KNOWS_PEELED`? One of the following:
* It was read from a `packed-refs` file that didn't have the
`fully-peeled` attribute. In that case, we *don't want* to emit an
error, because the broken value is presumably masked by a loose version
of the same reference (which we just don't happen to be packing this
time). The old code incorrectly emits the error message in this case. It
was probably never reported as a bug because this scenario is rare.
* It was a loose reference that was just added to the packed ref cache
by `files_packed_refs()` via `pack_if_possible_fn()` in preparation for
being packed. The latter function refuses to pack a reference for which
`entry_resolves_to_object()` returns false, and otherwise calls
`peel_entry()` itself and checks the return value. So a reference added
this way should always be `REF_KNOWS_PEELED`.
Therefore, I think it is a good thing to remove this check. I'll improve
the commit message to make this story clearer.
Michael
[1] We could change this policy, for example by only documenting the
pass/fail return value externally, while distinguishing between the
types of failure when iterating internal to the module.
^permalinkrawreply [flat|threaded] 38+ messages in thread

*Re: [PATCH 18/20] commit_packed_refs(): use reference iteration
2017-03-22 8:42 ` Michael Haggerty@ 2017-03-22 13:06 ` Jeff King0 siblings, 0 replies; 38+ messages in thread
From: Jeff King @ 2017-03-22 13:06 UTC (permalink / raw)
To: Michael Haggerty; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, David Turner, git
On Wed, Mar 22, 2017 at 09:42:58AM +0100, Michael Haggerty wrote:
> > Should we be checking that peel_error is only PEELED or NON_TAG?
>
> This is a good question, and it took me a while to figure out the whole
> answer. At first I deleted this check without much thought because it
> was just an internal consistency check that should never trigger. But
> actually the story is more complicated than that.
>
> Tl;dr: the old code is slightly wrong and I think the new code is correct.
OK, that all makes sense. Thanks for digging.
-Peff
^permalinkrawreply [flat|threaded] 38+ messages in thread