*Re: [PATCH 00/26] Moving global state into the repository object (part 1)
2018-02-13 11:49 ` Duy Nguyen@ 2018-02-13 12:13 ` Duy Nguyen
2018-02-13 16:52 ` Brandon Williams
2018-02-13 17:47 ` Stefan Beller0 siblings, 2 replies; 239+ messages in thread
From: Duy Nguyen @ 2018-02-13 12:13 UTC (permalink / raw)
To: Stefan Beller
Cc: Junio C Hamano, Git Mailing List, Eric Sunshine, Jonathan Tan
On Tue, Feb 13, 2018 at 6:49 PM, Duy Nguyen <pclouds@gmail.com> wrote:
> On Mon, Feb 12, 2018 at 05:22:15PM -0800, Stefan Beller wrote:
>> This is a real take on the first part of the recent RFC[1].
>>
>> ...
>>
>> Duy suggested that we shall not use the repository blindly, but
>> should carefully examine whether to pass on an object store or the
>> refstore or such[4], which I agree with if it makes sense. This
>> series unfortunately has an issue with that as I would not want to
>> pass down the `ignore_env` flag separately from the object store, so
>> I made all functions that only take the object store to have the raw
>> object store as the first parameter, and others using the full
>> repository.
>
> Second proposal :) How about you store ignore_env in raw_object_store?
> This would not be the first time an object has some configuration
> passed in at construction time. And it has a "constructor" now,
> raw_object_store_init() (I probably should merge _setup in it too)
A bit more on this configuration parameters. Down the road I think we
need something like this anyway to delete global config vars like
packed_git_window_size, delta_base_cache_limit... Either all these
end up in raw_object_store, or raw_object_store holds a link to
"struct config_set".
The ignore_env specifically though looks to me like a stop gap
solution until everything goes through repo_init() first. At that
point we don't have to delay getenv() anymore. We can getenv() all at
repo_init() then pass them in raw_object_store and ignore_env should
be gone. So sticking it inside raw_object_store _temporarily_ does not
sound so bad.
--
Duy
^permalinkrawreply [flat|nested] 239+ messages in thread

*Re: [PATCH 00/26] Moving global state into the repository object (part 1)
2018-02-13 12:13 ` Duy Nguyen@ 2018-02-13 16:52 ` Brandon Williams
2018-02-13 17:47 ` Stefan Beller1 sibling, 0 replies; 239+ messages in thread
From: Brandon Williams @ 2018-02-13 16:52 UTC (permalink / raw)
To: Duy Nguyen
Cc: Stefan Beller, Junio C Hamano, Git Mailing List, Eric Sunshine,
Jonathan Tan
On 02/13, Duy Nguyen wrote:
> On Tue, Feb 13, 2018 at 6:49 PM, Duy Nguyen <pclouds@gmail.com> wrote:
> > On Mon, Feb 12, 2018 at 05:22:15PM -0800, Stefan Beller wrote:
> >> This is a real take on the first part of the recent RFC[1].
> >>
> >> ...
> >>
> >> Duy suggested that we shall not use the repository blindly, but
> >> should carefully examine whether to pass on an object store or the
> >> refstore or such[4], which I agree with if it makes sense. This
> >> series unfortunately has an issue with that as I would not want to
> >> pass down the `ignore_env` flag separately from the object store, so
> >> I made all functions that only take the object store to have the raw
> >> object store as the first parameter, and others using the full
> >> repository.
> >
> > Second proposal :) How about you store ignore_env in raw_object_store?
> > This would not be the first time an object has some configuration
> > passed in at construction time. And it has a "constructor" now,
> > raw_object_store_init() (I probably should merge _setup in it too)
>
> A bit more on this configuration parameters. Down the road I think we
> need something like this anyway to delete global config vars like
> packed_git_window_size, delta_base_cache_limit... Either all these
> end up in raw_object_store, or raw_object_store holds a link to
> "struct config_set".
>
> The ignore_env specifically though looks to me like a stop gap
> solution until everything goes through repo_init() first. At that
> point we don't have to delay getenv() anymore. We can getenv() all at
> repo_init() then pass them in raw_object_store and ignore_env should
> be gone. So sticking it inside raw_object_store _temporarily_ does not
> sound so bad.
I like this approach, I mean at the moment we are replicating a single
bit of data but that allows us to be able to limit the scope of where a
repository struct is passed, giving us a better abstraction layer.
> --
> Duy
--
Brandon Williams
^permalinkrawreply [flat|nested] 239+ messages in thread

*Re: [PATCH 00/26] Moving global state into the repository object (part 1)
2018-02-13 12:13 ` Duy Nguyen
2018-02-13 16:52 ` Brandon Williams@ 2018-02-13 17:47 ` Stefan Beller
2018-02-13 18:57 ` Junio C Hamano1 sibling, 1 reply; 239+ messages in thread
From: Stefan Beller @ 2018-02-13 17:47 UTC (permalink / raw)
To: Duy Nguyen; +Cc: Junio C Hamano, Git Mailing List, Eric Sunshine, Jonathan Tan
On Tue, Feb 13, 2018 at 4:13 AM, Duy Nguyen <pclouds@gmail.com> wrote:
> On Tue, Feb 13, 2018 at 6:49 PM, Duy Nguyen <pclouds@gmail.com> wrote:
>> On Mon, Feb 12, 2018 at 05:22:15PM -0800, Stefan Beller wrote:
>>> This is a real take on the first part of the recent RFC[1].
>>>
>>> ...
>>>
>>> Duy suggested that we shall not use the repository blindly, but
>>> should carefully examine whether to pass on an object store or the
>>> refstore or such[4], which I agree with if it makes sense. This
>>> series unfortunately has an issue with that as I would not want to
>>> pass down the `ignore_env` flag separately from the object store, so
>>> I made all functions that only take the object store to have the raw
>>> object store as the first parameter, and others using the full
>>> repository.
>>
>> Second proposal :) How about you store ignore_env in raw_object_store?
>> This would not be the first time an object has some configuration
>> passed in at construction time. And it has a "constructor" now,
>> raw_object_store_init() (I probably should merge _setup in it too)
>
> A bit more on this configuration parameters. Down the road I think we
> need something like this anyway to delete global config vars like
> packed_git_window_size, delta_base_cache_limit... Either all these
> end up in raw_object_store, or raw_object_store holds a link to
> "struct config_set".
That makes sense long term.
> The ignore_env specifically though looks to me like a stop gap
> solution until everything goes through repo_init() first. At that
> point we don't have to delay getenv() anymore. We can getenv() all at
> repo_init() then pass them in raw_object_store and ignore_env should
> be gone. So sticking it inside raw_object_store _temporarily_ does not
> sound so bad.
Oh, that is an interesting perspective. Here is how I arrived at the opposite
conclusion initially: Searching for 'ignore_env' shows that we care about it
as well for the index and graft paths, which are not the object store, hence
it would be better kept in the repository. (The alternative would be to
duplicate it into the raw object store, but I do not like data duplication)
But maybe it is better to duplicate this one bit instead of passing through
a larger scoped object.
I'll rework the patches.
Thanks!
Stefan
^permalinkrawreply [flat|nested] 239+ messages in thread

*Re: [PATCH 00/26] Moving global state into the repository object (part 1)
2018-02-13 17:47 ` Stefan Beller@ 2018-02-13 18:57 ` Junio C Hamano
2018-02-13 19:23 ` Stefan Beller0 siblings, 1 reply; 239+ messages in thread
From: Junio C Hamano @ 2018-02-13 18:57 UTC (permalink / raw)
To: Stefan Beller; +Cc: Duy Nguyen, Git Mailing List, Eric Sunshine, Jonathan Tan
Stefan Beller <sbeller@google.com> writes:
> Oh, that is an interesting perspective. Here is how I arrived at the opposite
> conclusion initially: Searching for 'ignore_env' shows that we care about it
> as well for the index and graft paths, which are not the object store, hence
> it would be better kept in the repository. (The alternative would be to
> duplicate it into the raw object store, but I do not like data duplication)
>
> But maybe it is better to duplicate this one bit instead of passing through
> a larger scoped object.
If a larger scoped repository refers to a smaller scoped
object-store, is there still a need to duplicate that bit, instead
of referring to the bit the smaller scoped one has when asking about
the bit in the larger scoped one?
I am not sure if these "do not look at environment variables" is an
attribute of these entities---it sounds more like an attribute for
each invocation of an operation, i.e. "I want to learn where the
index is but would ignore GIT_INDEX environment for this particular
query." and "What's the value of this ref? Please honor the
common-dir environment during this query".
So from that point of view, it may not matter where the "bit" lives,
among repository, object-store, or ref-store.
^permalinkrawreply [flat|nested] 239+ messages in thread

*Re: [PATCH 00/26] Moving global state into the repository object (part 1)
2018-02-13 18:57 ` Junio C Hamano@ 2018-02-13 19:23 ` Stefan Beller
2018-02-13 19:35 ` Junio C Hamano0 siblings, 1 reply; 239+ messages in thread
From: Stefan Beller @ 2018-02-13 19:23 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Duy Nguyen, Git Mailing List, Eric Sunshine, Jonathan Tan
On Tue, Feb 13, 2018 at 10:57 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> Oh, that is an interesting perspective. Here is how I arrived at the opposite
>> conclusion initially: Searching for 'ignore_env' shows that we care about it
>> as well for the index and graft paths, which are not the object store, hence
>> it would be better kept in the repository. (The alternative would be to
>> duplicate it into the raw object store, but I do not like data duplication)
>>
>> But maybe it is better to duplicate this one bit instead of passing through
>> a larger scoped object.
>
> If a larger scoped repository refers to a smaller scoped
> object-store, is there still a need to duplicate that bit, instead
> of referring to the bit the smaller scoped one has when asking about
> the bit in the larger scoped one?
No (in theory). But in practice it may be worthwhile:
"What's the value of this ref?"
"Oh let me check the ignore_env bit that happens
to live in the object store first."
would be super confusing to me.
> I am not sure if these "do not look at environment variables" is an
> attribute of these entities---it sounds more like an attribute for
> each invocation of an operation, i.e. "I want to learn where the
> index is but would ignore GIT_INDEX environment for this particular
> query." and "What's the value of this ref? Please honor the
> common-dir environment during this query".
That sounds like we want to have a configset struct eventually.
For now the ignore_env bit lives in the repository, as that helps
when working with submodules, when reading its comments.
Unfortunately 359efeffc1 (repository: introduce the repository
object, 2017-06-22) did not reason about the existence of the ignore_env
flag in its commit message.
> So from that point of view, it may not matter where the "bit" lives,
> among repository, object-store, or ref-store.
It matters on the scale of confusing the developer?
Stefan
^permalinkrawreply [flat|nested] 239+ messages in thread

*Re: [PATCH 00/26] Moving global state into the repository object (part 1)
2018-02-13 1:22 [PATCH 00/26] Moving global state into the repository object (part 1) Stefan Beller
` (27 preceding siblings ...)
2018-02-13 11:49 ` Duy Nguyen@ 2018-02-13 19:26 ` Jonathan Nieder
2018-02-14 0:57 ` Duy Nguyen
2018-02-13 19:33 ` Brandon Williams
2018-02-15 20:42 ` Junio C Hamano30 siblings, 1 reply; 239+ messages in thread
From: Jonathan Nieder @ 2018-02-13 19:26 UTC (permalink / raw)
To: Stefan Beller; +Cc: gitster, git, pclouds, sunshine, jonathantanmy
Hi,
Stefan Beller wrote:
> This is a real take on the first part of the recent RFC[1].
>
> Jonathan Tan suggested[2] that "sha1_loose_object_info to handle arbitrary repositories"
> might be a good breaking point for a first part at that RFC at patch 38.
> This series is smaller and contains only 26 patches as the patches in the big
> RFC were slightly out of order.
Thanks. This looks like a nice reviewable series, so I'm happy to see
it broken out.
[...]
> Comments in the early range of that RFC were on 003 where Junio pointed out
> that the coccinelle patch ought to be not in contrib/coccinelle, so I put it
> in a sub directory there, as 'make coccicheck' doesn't traverse subdirs.
Can you say a little more about this? Was the problem that the
semantic patch wasn't idempotent, that it was too slow to run, or
something else?
If we're including the semantic patch for reference but never running
it, I think I'd prefer it to go in the commit message. But if it's
useful to run then we should make it idempotent so it can go in
contrib/coccinelle.
[...]
> Duy suggested that we shall not use the repository blindly, but should carefully
> examine whether to pass on an object store or the refstore or such[4], which
> I agree with if it makes sense. This series unfortunately has an issue with that
> as I would not want to pass down the `ignore_env` flag separately from the object
> store, so I made all functions that only take the object store to have the raw
> object store as the first parameter, and others using the full repository.
I think I want to push back on this a little.
The advantage of a function taking e.g. an object_store as an argument
instead of a repository is that it increases its flexibility, since it
allows callers that do not have access to a repository to call it. The
disadvantage is also that it increases the flexibility without any
callers benefitting from that:
1. It ties us to assumptions from today. If e.g. an object access in
the future starts relying on some other information from the
repository (e.g. its config) then we'd have to either add a
back-pointer from the object store to its repository or add
additional arguments for that additional data at that point.
If all callers already have a repository, it is simpler to pass
that repository as context so that we have the flexibility to make
more use of it later.
2. It complicates the caller. Instead of consistently passing the
same repository argument as context to functions that access that
repository, the caller would have to pull out relevant fields like
the object store from it.
3. It prevents us from making opportunistic use of other information
from the repository, such as its name for use in error messages.
In lower-level funcitons that need to be usable by callers without a
repository (e.g. to find packfiles in an alternate) it makes sense to
not pass a repository, but without such a use case in mind I don't
think it needs to be a general goal.
To put it another way, most callers do not *care* whether they are
working with a repository's object store, ref database, or some other
aspect of the repository. They just know they want to e.g. read an
object from this repository.
It's similar to how FILE * works: some operations rely on the buffer
the FILE * manages and some other operations only rely on the
underlying file descriptor, but using the FILE * consistently provides
a clean abstraction that generally makes life easier.
> Eric Sunshine brought up memory leaks with the RFC, and I would think to
> have plugged all holes.
Yay, thank you!
I'll try to find time to look at the patches in detail soon, but no
promises (i.e. if someone else reviews them first, then even better
;-)).
Sincerely,
Jonathan
^permalinkrawreply [flat|nested] 239+ messages in thread

*Re: [PATCH 00/26] Moving global state into the repository object (part 1)
2018-02-13 19:23 ` Stefan Beller@ 2018-02-13 19:35 ` Junio C Hamano
2018-02-13 19:43 ` Stefan Beller0 siblings, 1 reply; 239+ messages in thread
From: Junio C Hamano @ 2018-02-13 19:35 UTC (permalink / raw)
To: Stefan Beller; +Cc: Duy Nguyen, Git Mailing List, Eric Sunshine, Jonathan Tan
Stefan Beller <sbeller@google.com> writes:
> For now the ignore_env bit lives in the repository, as that helps
> when working with submodules, when reading its comments.
> Unfortunately 359efeffc1 (repository: introduce the repository
> object, 2017-06-22) did not reason about the existence of the ignore_env
> flag in its commit message.
>
>> So from that point of view, it may not matter where the "bit" lives,
>> among repository, object-store, or ref-store.
>
> It matters on the scale of confusing the developer?
What I meant is that from the point of view, having the bit in the
data (not on the invocation) is confusing no matter which data
structure holds it--they are equally bad.
^permalinkrawreply [flat|nested] 239+ messages in thread

*Re: [PATCH 00/26] Moving global state into the repository object (part 1)
2018-02-13 19:35 ` Junio C Hamano@ 2018-02-13 19:43 ` Stefan Beller
2018-02-14 0:30 ` Junio C Hamano0 siblings, 1 reply; 239+ messages in thread
From: Stefan Beller @ 2018-02-13 19:43 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Duy Nguyen, Git Mailing List, Eric Sunshine, Jonathan Tan
On Tue, Feb 13, 2018 at 11:35 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> For now the ignore_env bit lives in the repository, as that helps
>> when working with submodules, when reading its comments.
>> Unfortunately 359efeffc1 (repository: introduce the repository
>> object, 2017-06-22) did not reason about the existence of the ignore_env
>> flag in its commit message.
>>
>>> So from that point of view, it may not matter where the "bit" lives,
>>> among repository, object-store, or ref-store.
>>
>> It matters on the scale of confusing the developer?
>
> What I meant is that from the point of view, having the bit in the
> data (not on the invocation) is confusing no matter which data
> structure holds it--they are equally bad.
Right.
Which is why I'd strongly consider having it only in the repository
object as that is the largest-scoped thing we'd want. e.g. submodules
should care about environment variables differently:
GIT_WORK_TREE=~/mysuperproject git checkout \
--recurse-submodules master
So with such a command in mind, the environment variable would
only apply to the superproject and the nested submodules should
ignore the env, but compute their paths off the superproject, i.e.
the superproject repository, not its object store?
^permalinkrawreply [flat|nested] 239+ messages in thread

*Re: [PATCH 00/26] Moving global state into the repository object (part 1)
2018-02-13 19:43 ` Stefan Beller@ 2018-02-14 0:30 ` Junio C Hamano0 siblings, 0 replies; 239+ messages in thread
From: Junio C Hamano @ 2018-02-14 0:30 UTC (permalink / raw)
To: Stefan Beller; +Cc: Duy Nguyen, Git Mailing List, Eric Sunshine, Jonathan Tan
Stefan Beller <sbeller@google.com> writes:
> Which is why I'd strongly consider having it only in the repository
> object as that is the largest-scoped thing we'd want. e.g. submodules
> should care about environment variables differently:
>
> GIT_WORK_TREE=~/mysuperproject git checkout \
> --recurse-submodules master
>
> So with such a command in mind, the environment variable would
> only apply to the superproject and the nested submodules should
> ignore the env, but compute their paths off the superproject, i.e.
> the superproject repository, not its object store?
In the longer term endgame state, what is in the repository object
cannot just be a simple "do we honor environment variable" bit
anymore.
It will be more like "we may (or may not) look at environment when
we create a repository object", i.e. a bit passed to repo_init()
constructor, and from then on, the repository object knows where the
object store and its alternates are, where the top of the working
tree is, where the repository (i.e. the directory that has "refs/"
in it) is, what "worktree" of the repository we are talking about by
itself. There is no need for a bit "do we or do we not look at
environment?" that needs to be consulted at runtime, which is quite
a round-about thing.
In your example, the repository object that represents the
superproject will pay attention to GIT_WORK_TREE environment when it
is consturcted, and repository objects dynamically constructed while
the command "recurse-submodules" through them will be constructed
with their "where is the top of my working tree?" set appropriately.
They won't be storing "when figuring out where my working tree is,
do not pay attention to GIT_WORK_TREE environment variable" bit.
^permalinkrawreply [flat|nested] 239+ messages in thread

*Re: [PATCH 00/26] Moving global state into the repository object (part 1)
2018-02-13 19:26 ` Jonathan Nieder@ 2018-02-14 0:57 ` Duy Nguyen0 siblings, 0 replies; 239+ messages in thread
From: Duy Nguyen @ 2018-02-14 0:57 UTC (permalink / raw)
To: Jonathan Nieder
Cc: Stefan Beller, Junio C Hamano, Git Mailing List, Eric Sunshine,
Jonathan Tan
On Wed, Feb 14, 2018 at 2:26 AM, Jonathan Nieder <jrnieder@gmail.com> wrote:
>> Duy suggested that we shall not use the repository blindly, but should carefully
>> examine whether to pass on an object store or the refstore or such[4], which
>> I agree with if it makes sense. This series unfortunately has an issue with that
>> as I would not want to pass down the `ignore_env` flag separately from the object
>> store, so I made all functions that only take the object store to have the raw
>> object store as the first parameter, and others using the full repository.
>
> I think I want to push back on this a little.
>
> The advantage of a function taking e.g. an object_store as an argument
> instead of a repository is that it increases its flexibility, since it
> allows callers that do not have access to a repository to call it. The
> disadvantage is also that it increases the flexibility without any
> callers benefitting from that:
>
> 1. It ties us to assumptions from today. If e.g. an object access in
> the future starts relying on some other information from the
> repository (e.g. its config) then we'd have to either add a
> back-pointer from the object store to its repository or add
> additional arguments for that additional data at that point.
>
> If all callers already have a repository, it is simpler to pass
> that repository as context so that we have the flexibility to make
> more use of it later.
It's essentially putting all global variables in the same place again.
Only this time it's not global namespace, but "struct repository".
It's no worse than the current state though.
>
> 2. It complicates the caller. Instead of consistently passing the
> same repository argument as context to functions that access that
> repository, the caller would have to pull out relevant fields like
> the object store from it.
Well, I see that as a good point :)
>
> 3. It prevents us from making opportunistic use of other information
> from the repository, such as its name for use in error messages.
It does not exactly prevent us. It's just more effort to pass this
around. The repository name for example, there's no reason we can't
have object store name (which could be initialized the same as repo
name).
> In lower-level funcitons that need to be usable by callers without a
> repository (e.g. to find packfiles in an alternate) it makes sense to
> not pass a repository, but without such a use case in mind
I do agree with your benefit argument. But I'd like to point out
though that having all input to object store visible from something
like "struct raw_object_store" makes it easier to reason about the
code. I know how object store works, but occasionally I'm still
surprised when it getenv (or read $GIT_DIR/index, but not in object
store code) behind the scene. Imagine how hard it is for newcomers.
I would count that as benefit, even though it's not a use case per se.
Another potential benefit is writing unit tests will be much easier
(you can configure object store through struct repository too, but
setting one piece here, one piece there to control object store
behavior is not a nice experience). It's a nice thing to have, but not
a deciding factor.
> I don't think it needs to be a general goal.
My stand is a bit more aggressive. We should try to achieve better
abstraction if possible. But if it makes Stefan's life hell, it's not
worth doing. Converting to 'struct repository' is already a step
forward. Actually if it discourages him from finishing this work, it's
already not worth doing.
--
Duy
^permalinkrawreply [flat|nested] 239+ messages in thread

*Re: [PATCH 00/26] Moving global state into the repository object (part 1)
2018-02-13 1:22 [PATCH 00/26] Moving global state into the repository object (part 1) Stefan Beller
` (29 preceding siblings ...)
2018-02-13 19:33 ` Brandon Williams@ 2018-02-15 20:42 ` Junio C Hamano
2018-02-15 21:09 ` Stefan Beller
2018-02-16 17:46 ` [PATCHv2 00/16] " Stefan Beller
30 siblings, 2 replies; 239+ messages in thread
From: Junio C Hamano @ 2018-02-15 20:42 UTC (permalink / raw)
To: Stefan Beller; +Cc: git, pclouds, sunshine, jonathantanmy
Stefan Beller <sbeller@google.com> writes:
> This is a real take on the first part of the recent RFC[1].
For the patches remaining in this series, The scope is about right
and the size is more manageable. With topics already on 'master',
they have some interactions:
- ot/mru-on-list & gs/retire-mru
The packed_git MRU has been greatly simplified by using list API
directly.
- cc/sha1-file-name
The function signature of sha1_file_name() has been updated.
I could certainly carry evil merge resolutions going forward (these
changes need to be made to object-store.h that has no mechanical
conflicts), but it may be less distracting for later readers of the
series if we rebase it on top of a more recent master.
^permalinkrawreply [flat|nested] 239+ messages in thread

*Re: [PATCH 00/26] Moving global state into the repository object (part 1)
2018-02-15 20:42 ` Junio C Hamano@ 2018-02-15 21:09 ` Stefan Beller
2018-02-16 17:46 ` [PATCHv2 00/16] " Stefan Beller
1 sibling, 0 replies; 239+ messages in thread
From: Stefan Beller @ 2018-02-15 21:09 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Duy Nguyen, Eric Sunshine, Jonathan Tan
On Thu, Feb 15, 2018 at 12:42 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> This is a real take on the first part of the recent RFC[1].
>
> For the patches remaining in this series, The scope is about right
> and the size is more manageable.
ok, thanks.
> With topics already on 'master',
> they have some interactions:
>
> - ot/mru-on-list & gs/retire-mru
>
> The packed_git MRU has been greatly simplified by using list API
> directly.
>
> - cc/sha1-file-name
>
> The function signature of sha1_file_name() has been updated.
>
> I could certainly carry evil merge resolutions going forward (these
> changes need to be made to object-store.h that has no mechanical
> conflicts), but it may be less distracting for later readers of the
> series if we rebase it on top of a more recent master.
I was debating when to reroll this series as I want to make the
change that Duy proposed, moving the 'ignore_env' into the
object store as well.
I'll rebase on top of the latest master or these topicc while at it.
Thanks for the heads up.
Stefan
^permalinkrawreply [flat|nested] 239+ messages in thread

*[PATCHv2 00/16] Moving global state into the repository object (part 1)
2018-02-15 20:42 ` Junio C Hamano
2018-02-15 21:09 ` Stefan Beller@ 2018-02-16 17:46 ` " Stefan Beller
2018-02-16 17:46 ` [PATCH 01/16] repository: introduce raw object store field Stefan Beller
` (17 more replies)1 sibling, 18 replies; 239+ messages in thread
From: Stefan Beller @ 2018-02-16 17:46 UTC (permalink / raw)
To: gitster; +Cc: git, jonathantanmy, pclouds, sbeller, sunshine
v2:
* duplicated the 'ignore_env' bit into the object store as well
* the #define trick no longer works as we do not have a "the_objectstore" global,
which means there is just one patch per function that is converted.
As this follows the same structure of the previous series, I am still confident
there is no hidden dependencies to globals outside the object store in these
converted functions.
* rebased on top of current master, resolving the merge conflicts.
I think I used the list.h APIs right, but please double check.
Thanks,
Stefan
v1:
This is a real take on the first part of the recent RFC[1].
Jonathan Tan suggested[2] that "sha1_loose_object_info to handle arbitrary repositories"
might be a good breaking point for a first part at that RFC at patch 38.
This series is smaller and contains only 26 patches as the patches in the big
RFC were slightly out of order.
I developed this series partly by writing patches, but mostly by cherrypicking
from that RFC on top of current master. I noticed no external conflicts apart
from one addition to the repositories _INIT macro, which was easy to resolve.
Comments in the early range of that RFC were on 003 where Junio pointed out
that the coccinelle patch ought to be not in contrib/coccinelle, so I put it
in a sub directory there, as 'make coccicheck' doesn't traverse subdirs.
brian had a questoin on patch 25 in the RFC, but that seemed to resolve itself
without any suggestion to include into this series[3].
Duy suggested that we shall not use the repository blindly, but should carefully
examine whether to pass on an object store or the refstore or such[4], which
I agree with if it makes sense. This series unfortunately has an issue with that
as I would not want to pass down the `ignore_env` flag separately from the object
store, so I made all functions that only take the object store to have the raw
object store as the first parameter, and others using the full repository.
Eric Sunshine brought up memory leaks with the RFC, and I would think to
have plugged all holes.
[1] https://public-inbox.org/git/20180205235508.216277-1-sbeller@google.com/
[2] https://public-inbox.org/git/20180207143300.ce1c39ca07f6a0d64fe0e7ca@google.com/
[3] https://public-inbox.org/git/20180206011940.GD7904@genre.crustytoothpaste.net/
[4] https://public-inbox.org/git/CACsJy8CGgekpX4cZkyyTSPrj87uQVKZSOL7fyT__P2dh_1LmVQ@mail.gmail.com/
Thanks,
Stefan
Stefan Beller (16):
repository: introduce raw object store field
object-store: move alt_odb_list and alt_odb_tail to object store
object-store: free alt_odb_list
object-store: move packed_git and packed_git_mru to object store
object-store: close all packs upon clearing the object store
pack: move prepare_packed_git_run_once to object store
pack: move approximate object count to object store
sha1_file: add raw_object_store argument to alt_odb_usable
sha1_file: allow link_alt_odb_entries to handle arbitrary object
stores
sha1_file: allow prepare_alt_odb to handle arbitrary object stores
sha1_file: allow sha1_file_name to handle arbitrary object stores
sha1_file: allow stat_sha1_file to handle arbitrary object stores
sha1_file: allow open_sha1_file to handle arbitrary object stores
sha1_file: allow map_sha1_file_1 to handle arbitrary object stores
sha1_file: allow map_sha1_file to handle arbitrary object stores
sha1_file: allow sha1_loose_object_info to handle arbitrary object
stores
builtin/am.c | 2 +-
builtin/clone.c | 2 +-
builtin/count-objects.c | 6 +-
builtin/fetch.c | 2 +-
builtin/fsck.c | 13 +++--
builtin/gc.c | 4 +-
builtin/grep.c | 2 +-
builtin/index-pack.c | 1 +
builtin/merge.c | 2 +-
builtin/pack-objects.c | 19 +++---
builtin/pack-redundant.c | 6 +-
builtin/receive-pack.c | 3 +-
cache.h | 45 ++------------
environment.c | 5 +-
fast-import.c | 6 +-
http-backend.c | 6 +-
http-push.c | 1 +
http-walker.c | 4 +-
http.c | 6 +-
object-store.h | 80 +++++++++++++++++++++++++
object.c | 27 +++++++++
pack-bitmap.c | 4 +-
pack-check.c | 1 +
pack-revindex.c | 1 +
packfile.c | 64 ++++++++++----------
packfile.h | 2 +-
path.c | 2 +-
reachable.c | 1 +
repository.c | 22 +++++--
repository.h | 7 ++-
server-info.c | 6 +-
sha1_file.c | 123 +++++++++++++++++++++------------------
sha1_name.c | 11 ++--
streaming.c | 5 +-
34 files changed, 314 insertions(+), 177 deletions(-)
create mode 100644 object-store.h
--
2.16.1.291.g4437f3f132-goog
^permalinkrawreply [flat|nested] 239+ messages in thread

*Re: [PATCHv2 00/16] Moving global state into the repository object (part 1)
2018-02-16 17:46 ` [PATCHv2 00/16] " Stefan Beller
` (15 preceding siblings ...)
2018-02-16 17:46 ` [PATCH 16/16] sha1_file: allow sha1_loose_object_info " Stefan Beller
@ 2018-02-16 22:34 ` Jonathan Nieder
2018-02-20 18:55 ` Stefan Beller
2018-02-20 19:03 ` Junio C Hamano
2018-02-21 1:54 ` [PATCHv3 00/27] " Stefan Beller
17 siblings, 2 replies; 239+ messages in thread
From: Jonathan Nieder @ 2018-02-16 22:34 UTC (permalink / raw)
To: Stefan Beller; +Cc: gitster, git, jonathantanmy, pclouds, sunshine
Hi,
Stefan Beller wrote:
> v2:
> * duplicated the 'ignore_env' bit into the object store as well
> * the #define trick no longer works as we do not have a "the_objectstore" global,
> which means there is just one patch per function that is converted.
> As this follows the same structure of the previous series, I am still confident
> there is no hidden dependencies to globals outside the object store in these
> converted functions.
> * rebased on top of current master, resolving the merge conflicts.
> I think I used the list.h APIs right, but please double check.
For what it's worth, I think I prefer v1. I put some comments on why
on patch 0 of v1 and would be interested in your thoughts on them
(e.g. as a reply to that). I also think that even if we want to
switch to a style that passes around object_store separately from
repository, it is easier to do the migration in two steps: first get
rid of hidden dependencies on the_repository, then do the (simpler)
automatic migration from
f(the_repository)
to
f(the_repository->object_store)
*afterwards*.
Thoughts?
Thanks,
Jonathan
^permalinkrawreply [flat|nested] 239+ messages in thread

*Re: [PATCHv2 00/16] Moving global state into the repository object (part 1)
2018-02-16 22:34 ` [PATCHv2 00/16] Moving global state into the repository object (part 1) Jonathan Nieder
@ 2018-02-20 18:55 ` Stefan Beller
2018-02-20 19:00 ` Brandon Williams
2018-02-20 19:03 ` Junio C Hamano1 sibling, 1 reply; 239+ messages in thread
From: Stefan Beller @ 2018-02-20 18:55 UTC (permalink / raw)
To: Jonathan Nieder
Cc: Junio C Hamano, git, Jonathan Tan, Duy Nguyen, Eric Sunshine
On Fri, Feb 16, 2018 at 2:34 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> Hi,
>
> Stefan Beller wrote:
>
>> v2:
>> * duplicated the 'ignore_env' bit into the object store as well
>> * the #define trick no longer works as we do not have a "the_objectstore" global,
>> which means there is just one patch per function that is converted.
>> As this follows the same structure of the previous series, I am still confident
>> there is no hidden dependencies to globals outside the object store in these
>> converted functions.
>> * rebased on top of current master, resolving the merge conflicts.
>> I think I used the list.h APIs right, but please double check.
>
> For what it's worth, I think I prefer v1. I put some comments on why
> on patch 0 of v1 and would be interested in your thoughts on them
> (e.g. as a reply to that). I also think that even if we want to
> switch to a style that passes around object_store separately from
> repository, it is easier to do the migration in two steps: first get
> rid of hidden dependencies on the_repository, then do the (simpler)
> automatic migration from
>
> f(the_repository)
>
> to
>
> f(the_repository->object_store)
>
> *afterwards*.
>
> Thoughts?
I would prefer to not spend more time on these conversions.
If Duy and you would come to a conclusion to either pick this
or the previous version I would be happy.
I do not see the benefit in splitting up the series even further and
do this multistep f(repo) -> f(object store), as the cost in potential
merge conflicts is too high. Note that brian just sent another object
id conversion series, also touching sha1_file.c, which I am sure will
produce merge conflicts for Junio.
For the next part 2 and onwards I'd be happy to take either this
strategy or Duys strategy as requested.
Stefan
^permalinkrawreply [flat|nested] 239+ messages in thread

*Re: [PATCHv2 00/16] Moving global state into the repository object (part 1)
2018-02-20 18:55 ` Stefan Beller@ 2018-02-20 19:00 ` Brandon Williams
2018-02-20 19:03 ` Stefan Beller0 siblings, 1 reply; 239+ messages in thread
From: Brandon Williams @ 2018-02-20 19:00 UTC (permalink / raw)
To: Stefan Beller
Cc: Jonathan Nieder, Junio C Hamano, git, Jonathan Tan, Duy Nguyen,
Eric Sunshine
On 02/20, Stefan Beller wrote:
> On Fri, Feb 16, 2018 at 2:34 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> > Hi,
> >
> > Stefan Beller wrote:
> >
> >> v2:
> >> * duplicated the 'ignore_env' bit into the object store as well
> >> * the #define trick no longer works as we do not have a "the_objectstore" global,
> >> which means there is just one patch per function that is converted.
> >> As this follows the same structure of the previous series, I am still confident
> >> there is no hidden dependencies to globals outside the object store in these
> >> converted functions.
> >> * rebased on top of current master, resolving the merge conflicts.
> >> I think I used the list.h APIs right, but please double check.
> >
> > For what it's worth, I think I prefer v1. I put some comments on why
> > on patch 0 of v1 and would be interested in your thoughts on them
> > (e.g. as a reply to that). I also think that even if we want to
> > switch to a style that passes around object_store separately from
> > repository, it is easier to do the migration in two steps: first get
> > rid of hidden dependencies on the_repository, then do the (simpler)
> > automatic migration from
> >
> > f(the_repository)
> >
> > to
> >
> > f(the_repository->object_store)
> >
> > *afterwards*.
> >
> > Thoughts?
>
> I would prefer to not spend more time on these conversions.
> If Duy and you would come to a conclusion to either pick this
> or the previous version I would be happy.
>
> I do not see the benefit in splitting up the series even further and
> do this multistep f(repo) -> f(object store), as the cost in potential
> merge conflicts is too high. Note that brian just sent another object
> id conversion series, also touching sha1_file.c, which I am sure will
> produce merge conflicts for Junio.
>
> For the next part 2 and onwards I'd be happy to take either this
> strategy or Duys strategy as requested.
I think Jonathan is trying to point out that converting to f(repo) maybe
easier than converting to f(repo->object_store) upfront which would make
it easier to write the patches (which most of them are already written)
and to review because you can use the #define trick to make some sort of
guarantees.
After we have successfully completed the migration to f(repo), then we
can revisit the subsystems which want to have a clearer abstraction
layer and make the jump to f(repo->object_store).
--
Brandon Williams
^permalinkrawreply [flat|nested] 239+ messages in thread

*Re: [PATCHv2 00/16] Moving global state into the repository object (part 1)
2018-02-16 22:34 ` [PATCHv2 00/16] Moving global state into the repository object (part 1) Jonathan Nieder
2018-02-20 18:55 ` Stefan Beller@ 2018-02-20 19:03 ` Junio C Hamano
2018-02-20 19:06 ` Stefan Beller1 sibling, 1 reply; 239+ messages in thread
From: Junio C Hamano @ 2018-02-20 19:03 UTC (permalink / raw)
To: Jonathan Nieder; +Cc: Stefan Beller, git, jonathantanmy, pclouds, sunshine
Jonathan Nieder <jrnieder@gmail.com> writes:
> For what it's worth, I think I prefer v1. I put some comments on why
> on patch 0 of v1 and would be interested in your thoughts on them
> (e.g. as a reply to that). I also think that even if we want to
> switch to a style that passes around object_store separately from
> repository, it is easier to do the migration in two steps: first get
> rid of hidden dependencies on the_repository, then do the (simpler)
> automatic migration from
>
> f(the_repository)
>
> to
>
> f(the_repository->object_store)
>
> *afterwards*.
>
> Thoughts?
Are we envisioning the future in which one repository has more than
one object-store (I am counting an object store and its alternates
that are pointed by its $GIT_OBJECT_DIRECTORY/info/alternates as a
single logical "object store")? If not, doing f(the_repository)
migration, stopping there without f(the_repository->object_store)
may perfectly be adequate, I would think.
^permalinkrawreply [flat|nested] 239+ messages in thread

*Re: [PATCHv2 00/16] Moving global state into the repository object (part 1)
2018-02-20 19:00 ` Brandon Williams@ 2018-02-20 19:03 ` Stefan Beller0 siblings, 0 replies; 239+ messages in thread
From: Stefan Beller @ 2018-02-20 19:03 UTC (permalink / raw)
To: Brandon Williams
Cc: Jonathan Nieder, Junio C Hamano, git, Jonathan Tan, Duy Nguyen,
Eric Sunshine
On Tue, Feb 20, 2018 at 11:00 AM, Brandon Williams <bmwill@google.com> wrote:
> On 02/20, Stefan Beller wrote:
>> On Fri, Feb 16, 2018 at 2:34 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:
>> > Hi,
>> >
>> > Stefan Beller wrote:
>> >
>> >> v2:
>> >> * duplicated the 'ignore_env' bit into the object store as well
>> >> * the #define trick no longer works as we do not have a "the_objectstore" global,
>> >> which means there is just one patch per function that is converted.
>> >> As this follows the same structure of the previous series, I am still confident
>> >> there is no hidden dependencies to globals outside the object store in these
>> >> converted functions.
>> >> * rebased on top of current master, resolving the merge conflicts.
>> >> I think I used the list.h APIs right, but please double check.
>> >
>> > For what it's worth, I think I prefer v1. I put some comments on why
>> > on patch 0 of v1 and would be interested in your thoughts on them
>> > (e.g. as a reply to that). I also think that even if we want to
>> > switch to a style that passes around object_store separately from
>> > repository, it is easier to do the migration in two steps: first get
>> > rid of hidden dependencies on the_repository, then do the (simpler)
>> > automatic migration from
>> >
>> > f(the_repository)
>> >
>> > to
>> >
>> > f(the_repository->object_store)
>> >
>> > *afterwards*.
>> >
>> > Thoughts?
>>
>> I would prefer to not spend more time on these conversions.
>> If Duy and you would come to a conclusion to either pick this
>> or the previous version I would be happy.
>>
>> I do not see the benefit in splitting up the series even further and
>> do this multistep f(repo) -> f(object store), as the cost in potential
>> merge conflicts is too high. Note that brian just sent another object
>> id conversion series, also touching sha1_file.c, which I am sure will
>> produce merge conflicts for Junio.
>>
>> For the next part 2 and onwards I'd be happy to take either this
>> strategy or Duys strategy as requested.
>
> I think Jonathan is trying to point out that converting to f(repo) maybe
> easier than converting to f(repo->object_store) upfront
I agree.
> which would make
> it easier to write the patches (which most of them are already written)
true, but for this series we also have the conversion to f(object_store)
written already.
> and to review because you can use the #define trick to make some sort of
> guarantees.
That is true, so it would be a tradeoff between reviewers and maintainers time?
> After we have successfully completed the migration to f(repo), then we
> can revisit the subsystems which want to have a clearer abstraction
> layer and make the jump to f(repo->object_store).
I would think we can take this series as-is and then move on with making
f(repo) abstractions, eventually moving to f(specialized-subsystem) as those
patches are not written yet (neither direct conversions, nor the repo trick;
the patches I already have need adaption which takes enough time on its own.)
>
> --
> Brandon Williams
^permalinkrawreply [flat|nested] 239+ messages in thread

*Re: [PATCHv2 00/16] Moving global state into the repository object (part 1)
2018-02-20 19:03 ` Junio C Hamano@ 2018-02-20 19:06 ` Stefan Beller
2018-02-20 19:55 ` Junio C Hamano0 siblings, 1 reply; 239+ messages in thread
From: Stefan Beller @ 2018-02-20 19:06 UTC (permalink / raw)
To: Junio C Hamano, Christian Couder
Cc: Jonathan Nieder, git, Jonathan Tan, Duy Nguyen, Eric Sunshine
On Tue, Feb 20, 2018 at 11:03 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Jonathan Nieder <jrnieder@gmail.com> writes:
>
>> For what it's worth, I think I prefer v1. I put some comments on why
>> on patch 0 of v1 and would be interested in your thoughts on them
>> (e.g. as a reply to that). I also think that even if we want to
>> switch to a style that passes around object_store separately from
>> repository, it is easier to do the migration in two steps: first get
>> rid of hidden dependencies on the_repository, then do the (simpler)
>> automatic migration from
>>
>> f(the_repository)
>>
>> to
>>
>> f(the_repository->object_store)
>>
>> *afterwards*.
>>
>> Thoughts?
>
> Are we envisioning the future in which one repository has more than
> one object-store (I am counting an object store and its alternates
> that are pointed by its $GIT_OBJECT_DIRECTORY/info/alternates as a
> single logical "object store")? If not, doing f(the_repository)
> migration, stopping there without f(the_repository->object_store)
> may perfectly be adequate, I would think.
>
I do not envision that, maybe Christian Couder does?
The partial clone world would be happy with just one object store
per repo, I would think.
The step to take an object store would just add expressiveness
to the code, which may help in understanding what part of the code is
related to what other part of the code, so it may be a readability gain
on its own?
^permalinkrawreply [flat|nested] 239+ messages in thread

*Re: [PATCHv2 00/16] Moving global state into the repository object (part 1)
2018-02-20 19:06 ` Stefan Beller@ 2018-02-20 19:55 ` Junio C Hamano
2018-02-20 20:16 ` Stefan Beller0 siblings, 1 reply; 239+ messages in thread
From: Junio C Hamano @ 2018-02-20 19:55 UTC (permalink / raw)
To: Stefan Beller
Cc: Christian Couder, Jonathan Nieder, git, Jonathan Tan, Duy Nguyen,
Eric Sunshine
Stefan Beller <sbeller@google.com> writes:
> The step to take an object store would just add expressiveness
> to the code, which may help in understanding what part of the code is
> related to what other part of the code, so it may be a readability gain
> on its own?
It certainly would allow you to have a standalone object store that
is not associated with *any* repository, but if we are not using
such a layout, I doubt it would be a readability gain to add excess
and unexercised expressiveness to the code.
^permalinkrawreply [flat|nested] 239+ messages in thread

*Re: [PATCH 16/27] sha1_file: add repository argument to sha1_file_name
2018-02-21 1:54 ` [PATCH 16/27] sha1_file: add repository argument to sha1_file_name Stefan Beller
@ 2018-02-22 0:51 ` Brandon Williams
2018-02-23 22:36 ` Stefan Beller0 siblings, 1 reply; 239+ messages in thread
From: Brandon Williams @ 2018-02-22 0:51 UTC (permalink / raw)
To: Stefan Beller
Cc: git, gitster, jonathantanmy, pclouds, sunshine, Jonathan Nieder
On 02/20, Stefan Beller wrote:
> Add a repository argument to allow sha1_file_name callers to be more
> specific about which repository to handle. This is a small mechanical
> change; it doesn't change the implementation to handle repositories
> other than the_repository yet.
>
> As with the previous commits, use a macro to catch callers passing a
> repository other than the_repository at compile time.
>
> While at it, move the declaration to object-store.h, where it should
> be easier to find.
Seems like we may want to make a sha1-file.h or an oid-file.h or
something like that at some point as that seems like a better place for
the function than in the object-store.h file?
--
Brandon Williams
^permalinkrawreply [flat|nested] 239+ messages in thread

*[PATCH 0/2] Fix initializing the_hash_algo
2018-02-14 18:08 ` Brandon Williams@ 2018-02-23 9:56 ` Nguyễn Thái Ngọc Duy
2018-02-23 9:56 ` [PATCH 1/2] setup.c: initialize the_repository correctly in all cases Nguyễn Thái Ngọc Duy
` (3 more replies)
2018-02-26 10:30 ` [PATCH 0/4] Delete ignore_env member in struct repository Nguyễn Thái Ngọc Duy
1 sibling, 4 replies; 239+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-02-23 9:56 UTC (permalink / raw)
To: git
Cc: Brandon Williams, Stefan Beller, Junio C Hamano, Eric Sunshine,
Jonathan Tan, brian m . carlson,
Nguyễn Thái Ngọc Duy
On Thu, Feb 15, 2018 at 1:08 AM, Brandon Williams <bmwill@google.com> wrote:
> At some point yes we would definitely want the setup code to fully
> initialize a repository object (in this case the_repository). And I
> would even like to change the function signatures of all the builtin
> commands to take a repository object so that they don't implicitly rely
> on the_repository at all.
>
> When I introduced struct repository I seem to remember there being a
> couple things which were different about setup that made it difficult to
> simply call repo_init() on the_repository. If you can fix whatever
> those issues with setup were (I can't remember all of them) then that
> would be great :)
I can certainly try! I start to remember all the hairy details in that
setup code.
The first step may be something like this, which identifies all the
"repo init" entry points. This is basically a revert of e26f7f19b6
(repository: pre-initialize hash algo pointer - 2018-01-19) and doing
things the proper way, hopefully.
This is on 'master', independent from Stefan's series. I have another
patch on top of that series to remove the use of ignore_env in
sha1_file.c (and things seem to work). Basically whenever you have to
initialize the hash algorithm, there's a good chance you need to
initialize object store as well. But I'll hold that off until
Stefan's and this one are both merged.
But yeah, it looks like we need some surgery in setup.c if we want
something as pretty as repo_submodule_init() but for the main repo.
Nguyễn Thái Ngọc Duy (2):
setup.c: initialize the_repository correctly in all cases
Revert "repository: pre-initialize hash algo pointer"
builtin/index-pack.c | 5 +++++
builtin/init-db.c | 3 ++-
cache.h | 3 ++-
common-main.c | 4 ++++
diff-no-index.c | 5 +++++
path.c | 2 +-
repository.c | 2 +-
setup.c | 5 ++++-
t/helper/test-dump-split-index.c | 2 ++
9 files changed, 26 insertions(+), 5 deletions(-)
--
2.16.1.435.g8f24da2e1a
^permalinkrawreply [flat|nested] 239+ messages in thread

*Re: [PATCH 05/27] object-store: move packed_git and packed_git_mru to object store
2018-02-21 21:51 ` Junio C Hamano@ 2018-02-23 20:01 ` Stefan Beller0 siblings, 0 replies; 239+ messages in thread
From: Stefan Beller @ 2018-02-23 20:01 UTC (permalink / raw)
To: Junio C Hamano
Cc: git, Jonathan Tan, Duy Nguyen, Eric Sunshine, Jonathan Nieder
On Wed, Feb 21, 2018 at 1:51 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> +
>> +/*
>> + * The mru list_head is supposed to be initialized using
>> + * the LIST_HEAD macro, assigning prev/next to itself.
>> + * However this doesn't work in this case as some compilers dislike
>> + * that macro on member variables. Use NULL instead as that is defined
>> + * and accepted, deferring the real init to prepare_packed_git_mru(). */
>> +#define __MRU_INIT { NULL, NULL }
>> +#define RAW_OBJECT_STORE_INIT { NULL, NULL, __MRU_INIT, NULL, NULL }
>
> I do not think it has to be this way to abuse two NULLs, if you
> faithfully mimicked how LIST_HEAD() macro is constructed. The
> reason why it does not try to introduce
>
> struct list_head x = LIST_HEAD_INIT;
>
> and instead, uses
>
> LIST_HEAD(x);
>
> is because of the need for self referral. If we learn from it, we
> can do the same, i.e. instead of doing
>
> struct raw_object_store x = RAW_OBJECT_STORE_INIT;
>
> we can do
>
> RAW_OBJECT_STORE(x);
>
> that expands to
>
> struct raw_object_store x = {
> ...
> { &x.packed_git_mru, &x.packed_git_mru },
> ...
> };
>
> no? Then we do not need such a lengthy comment that reads only like
> an excuse ;-)
We cannot do this, because the object store is directly
embedded into the the_repository (not as a pointer),
which is a global on its own. So we'd have to do this
at the repository level, which I would not want.
I'd want to have the #defines where the structs are declared.
Any guidance on how to do that correctly with self referential
definitions without making the object store a pointer then?
^permalinkrawreply [flat|nested] 239+ messages in thread

*Re: [PATCH 05/27] object-store: move packed_git and packed_git_mru to object store
2018-02-22 6:44 ` Jonathan Nieder@ 2018-02-23 21:42 ` Stefan Beller0 siblings, 0 replies; 239+ messages in thread
From: Stefan Beller @ 2018-02-23 21:42 UTC (permalink / raw)
To: Jonathan Nieder
Cc: git, Junio C Hamano, Jonathan Tan, Duy Nguyen, Eric Sunshine
>> 2. Applying the semantic patch
>> contrib/coccinelle/refactoring/packed_git.cocci to adjust callers.
>> This semantic patch is placed in a sub directory of the coccinelle
>> contrib dir, as this semantic patch is not expected to be of general
>> usefulness; it is only useful during developing this series and
>> merging it with other topics in flight. At a later date, just
>> delete that semantic patch.
>
> Can the semantic patch go in the commit message instead? It is very
> brief.
done
>
> Actually, I don't see this semantic patch in the diffstat. Is the
> commit message stale?
>
>> 3. Applying line wrapping fixes from "make style" to break the
>> resulting long lines.
>>
>> 4. Adding missing #includes of repository.h and object-store.h
>> where needed.
>
> Is there a way to automate this step? (I'm asking for my own
> reference when writing future patches, not because of any concern
> about the correctness of this one.)
no, for several reasons.
(a) I don't know how to write it in coccinelle, because
(b) I have not figured out the order in which we include headers, apart from
"cache.h goes first", the rest of the includes sometimes looks like a random
order, because different patches add new includes at different places.
I have the impression, that (1) some add the include after all other
existing includes or (2) try to figure out where to add the include to make
sense in the existing include order or (3) sort alphabetically or (4) put it
randomly, so the chance for a merge conflict with other series in flight
decreases.
(c) I did it in a semi automated fashion:
while make fails:
add another include
The mess with including repository or object-store comes from the fact that
I had v2 based on object-store, not the repository and cherry-picked this patch
over to v3. Fixed all of the includes now.
>> @@ -59,10 +83,25 @@ struct raw_object_store {
>> */
>> char *objectdir;
>>
>> + struct packed_git *packed_git;
>> + /*
>> + * A most-recently-used ordered version of the packed_git list, which can
>> + * be iterated instead of packed_git (and marked via mru_mark).
>> + */
>> + struct list_head packed_git_mru;
>
> I don't understand the new part of the comment. Can you explain here,
> for me?
cherrypicking error, fixed.
>> -#define RAW_OBJECT_STORE_INIT { NULL, NULL, NULL }
>> +
>> +/*
>> + * The mru list_head is supposed to be initialized using
>> + * the LIST_HEAD macro, assigning prev/next to itself.
>> + * However this doesn't work in this case as some compilers dislike
>> + * that macro on member variables. Use NULL instead as that is defined
>> + * and accepted, deferring the real init to prepare_packed_git_mru(). */
>
> style nit: '*/' should be on its own line.
>
> More importantly, we can avoid such an issue as described by Junio. :)
See reply to Junio, I am not quite sure I like that.
^permalinkrawreply [flat|nested] 239+ messages in thread

*Re: [PATCH 2/2] Revert "repository: pre-initialize hash algo pointer"
2018-02-23 20:04 ` Stefan Beller@ 2018-02-23 22:26 ` Junio C Hamano0 siblings, 0 replies; 239+ messages in thread
From: Junio C Hamano @ 2018-02-23 22:26 UTC (permalink / raw)
To: Stefan Beller
Cc: Nguyễn Thái Ngọc Duy, git, Brandon Williams,
Eric Sunshine, Jonathan Tan, brian m . carlson
Stefan Beller <sbeller@google.com> writes:
> On Fri, Feb 23, 2018 at 11:50 AM, Junio C Hamano <gitster@pobox.com> wrote:
>> Stefan Beller <sbeller@google.com> writes:
>>
>>> I wonder if there is yet another missing case in the enumeration of
>>> the previous patch:
>>> Some commands are able to operate on GIT_OBJECT_DIR instead
>>> of GIT_DIR (git repack?), which may not even explore the full git directory,
>>> and so doesn't know about the hash value.
>>
>> ... because GIT_DIR/config is not known? "repack" is not one of
>> them, though---it needs to at least use refs as the starting point
>> so a standalone OBJECT_DIR is insufficient.
>
> Yes, I could have worded this as a question:
> Is there any command that operates on GIT_OBJECT_DIR
> without trying to discover GIT_DIR ?
If somebody finds one that would be a good argument not to pursue
the approach. Lack of response to the question would not amount to
that much---it is possible all people who bothered to think of
overlooked something obvious, though. "When I asked around nobody
thought of a possibly way for this to cause breakages, so let's
declare it is safe to do so and do it" is not how we want to do
things X-<.
^permalinkrawreply [flat|nested] 239+ messages in thread

*Re: [PATCH 16/27] sha1_file: add repository argument to sha1_file_name
2018-02-22 0:51 ` Brandon Williams@ 2018-02-23 22:36 ` Stefan Beller
2018-02-23 23:11 ` Brandon Williams0 siblings, 1 reply; 239+ messages in thread
From: Stefan Beller @ 2018-02-23 22:36 UTC (permalink / raw)
To: Brandon Williams
Cc: git, Junio C Hamano, Jonathan Tan, Duy Nguyen, Eric Sunshine,
Jonathan Nieder
On Wed, Feb 21, 2018 at 4:51 PM, Brandon Williams <bmwill@google.com> wrote:
> On 02/20, Stefan Beller wrote:
>> Add a repository argument to allow sha1_file_name callers to be more
>> specific about which repository to handle. This is a small mechanical
>> change; it doesn't change the implementation to handle repositories
>> other than the_repository yet.
>>
>> As with the previous commits, use a macro to catch callers passing a
>> repository other than the_repository at compile time.
>>
>> While at it, move the declaration to object-store.h, where it should
>> be easier to find.
>
> Seems like we may want to make a sha1-file.h or an oid-file.h or
> something like that at some point as that seems like a better place for
> the function than in the object-store.h file?
It depends what our long term goal is.
Do we want header and source file name to match for each function?
Or do we want a coarser set of headers, such that we have a broad
object-store.h, but that is implemented in sha1_file.c, packfile.c
for the parts of the raw_objectstore and other .c files for the higher
levels of the object store?
For now I'd just keep it in object-store.h as moving out just a couple
functions seems less consistent?
Stefan
^permalinkrawreply [flat|nested] 239+ messages in thread

*Re: [PATCH 16/27] sha1_file: add repository argument to sha1_file_name
2018-02-23 22:36 ` Stefan Beller@ 2018-02-23 23:11 ` Brandon Williams0 siblings, 0 replies; 239+ messages in thread
From: Brandon Williams @ 2018-02-23 23:11 UTC (permalink / raw)
To: Stefan Beller
Cc: git, Junio C Hamano, Jonathan Tan, Duy Nguyen, Eric Sunshine,
Jonathan Nieder
On 02/23, Stefan Beller wrote:
> On Wed, Feb 21, 2018 at 4:51 PM, Brandon Williams <bmwill@google.com> wrote:
> > On 02/20, Stefan Beller wrote:
> >> Add a repository argument to allow sha1_file_name callers to be more
> >> specific about which repository to handle. This is a small mechanical
> >> change; it doesn't change the implementation to handle repositories
> >> other than the_repository yet.
> >>
> >> As with the previous commits, use a macro to catch callers passing a
> >> repository other than the_repository at compile time.
> >>
> >> While at it, move the declaration to object-store.h, where it should
> >> be easier to find.
> >
> > Seems like we may want to make a sha1-file.h or an oid-file.h or
> > something like that at some point as that seems like a better place for
> > the function than in the object-store.h file?
>
> It depends what our long term goal is.
> Do we want header and source file name to match for each function?
> Or do we want a coarser set of headers, such that we have a broad
> object-store.h, but that is implemented in sha1_file.c, packfile.c
> for the parts of the raw_objectstore and other .c files for the higher
> levels of the object store?
>
> For now I'd just keep it in object-store.h as moving out just a couple
> functions seems less consistent?
Fair enough :)
--
Brandon Williams
^permalinkrawreply [flat|nested] 239+ messages in thread

*[PATCHv4 00/27] Moving global state into the repository object (part 1)
2018-02-21 1:54 ` [PATCHv3 00/27] " Stefan Beller
` (27 preceding siblings ...)
2018-02-22 0:26 ` [PATCHv3 00/27] Moving global state into the repository object (part 1) Stefan Beller
@ 2018-02-24 0:47 ` " Stefan Beller
2018-02-24 0:47 ` [PATCHv4 01/27] repository: introduce raw object store field Stefan Beller
` (28 more replies)28 siblings, 29 replies; 239+ messages in thread
From: Stefan Beller @ 2018-02-24 0:47 UTC (permalink / raw)
To: sbeller; +Cc: git, gitster, jonathantanmy, pclouds, sunshine, bmwill
v4:
* addressed feedback from the single patches (mostly nits)
* rebased on latest master
v3:
* reverted back to use the repository for most of the functions
(including unduplicating 'ignore_env')
* rebased on master again (I lost that state when doing v2, as
I did both rebase as well as conversion to object store in one go)
* one additional patch, that moves the alternates related things to
object-store.h, such that inclusion of cache.h (in object-store.h)
is not required any more.
v2:
* duplicated the 'ignore_env' bit into the object store as well
* the #define trick no longer works as we do not have a "the_objectstore" global,
which means there is just one patch per function that is converted.
As this follows the same structure of the previous series, I am still confident
there is no hidden dependencies to globals outside the object store in these
converted functions.
* rebased on top of current master, resolving the merge conflicts.
I think I used the list.h APIs right, but please double check.
Thanks,
Stefan
v1:
This is a real take on the first part of the recent RFC[1].
Jonathan Tan suggested[2] that "sha1_loose_object_info to handle arbitrary repositories"
might be a good breaking point for a first part at that RFC at patch 38.
This series is smaller and contains only 26 patches as the patches in the big
RFC were slightly out of order.
I developed this series partly by writing patches, but mostly by cherrypicking
from that RFC on top of current master. I noticed no external conflicts apart
from one addition to the repositories _INIT macro, which was easy to resolve.
Comments in the early range of that RFC were on 003 where Junio pointed out
that the coccinelle patch ought to be not in contrib/coccinelle, so I put it
in a sub directory there, as 'make coccicheck' doesn't traverse subdirs.
brian had a questoin on patch 25 in the RFC, but that seemed to resolve itself
without any suggestion to include into this series[3].
Duy suggested that we shall not use the repository blindly, but should carefully
examine whether to pass on an object store or the refstore or such[4], which
I agree with if it makes sense. This series unfortunately has an issue with that
as I would not want to pass down the `ignore_env` flag separately from the object
store, so I made all functions that only take the object store to have the raw
object store as the first parameter, and others using the full repository.
Eric Sunshine brought up memory leaks with the RFC, and I would think to
have plugged all holes.
[1] https://public-inbox.org/git/20180205235508.216277-1-sbeller@google.com/
[2] https://public-inbox.org/git/20180207143300.ce1c39ca07f6a0d64fe0e7ca@google.com/
[3] https://public-inbox.org/git/20180206011940.GD7904@genre.crustytoothpaste.net/
[4] https://public-inbox.org/git/CACsJy8CGgekpX4cZkyyTSPrj87uQVKZSOL7fyT__P2dh_1LmVQ@mail.gmail.com/
Thanks,
Stefan
Jonathan Nieder (2):
sha1_file: allow map_sha1_file_1 to handle arbitrary repositories
sha1_file: allow sha1_loose_object_info to handle arbitrary
repositories
Stefan Beller (25):
repository: introduce raw object store field
object-store: migrate alternates struct and functions from cache.h
object-store: move alt_odb_list and alt_odb_tail to object store
object-store: free alt_odb_list
object-store: move packed_git and packed_git_mru to object store
object-store: close all packs upon clearing the object store
pack: move prepare_packed_git_run_once to object store
pack: move approximate object count to object store
sha1_file: add raw_object_store argument to alt_odb_usable
sha1_file: add repository argument to link_alt_odb_entry
sha1_file: add repository argument to read_info_alternates
sha1_file: add repository argument to link_alt_odb_entries
sha1_file: add repository argument to prepare_alt_odb
sha1_file: allow link_alt_odb_entries to handle arbitrary repositories
sha1_file: allow prepare_alt_odb to handle arbitrary repositories
sha1_file: add repository argument to sha1_file_name
sha1_file: add repository argument to stat_sha1_file
sha1_file: add repository argument to open_sha1_file
sha1_file: add repository argument to map_sha1_file_1
sha1_file: add repository argument to map_sha1_file
sha1_file: add repository argument to sha1_loose_object_info
sha1_file: allow sha1_file_name to handle arbitrary repositories
sha1_file: allow stat_sha1_file to handle arbitrary repositories
sha1_file: allow open_sha1_file to handle arbitrary repositories
sha1_file: allow map_sha1_file to handle arbitrary repositories
builtin/am.c | 2 +-
builtin/clone.c | 2 +-
builtin/count-objects.c | 5 +-
builtin/fetch.c | 2 +-
builtin/fsck.c | 12 ++--
builtin/gc.c | 3 +-
builtin/grep.c | 2 +-
builtin/merge.c | 2 +-
builtin/pack-objects.c | 20 ++++---
builtin/pack-redundant.c | 5 +-
builtin/receive-pack.c | 2 +-
cache.h | 87 ---------------------------
environment.c | 5 +-
fast-import.c | 7 ++-
http-backend.c | 5 +-
http-walker.c | 3 +-
http.c | 5 +-
object-store.h | 124 ++++++++++++++++++++++++++++++++++++++
object.c | 27 +++++++++
pack-bitmap.c | 3 +-
packfile.c | 64 ++++++++++----------
packfile.h | 2 +-
path.c | 2 +-
repository.c | 21 +++++--
repository.h | 7 ++-
server-info.c | 5 +-
sha1_file.c | 125 +++++++++++++++++++++------------------
sha1_name.c | 10 ++--
streaming.c | 5 +-
29 files changed, 335 insertions(+), 229 deletions(-)
create mode 100644 object-store.h
--
2.16.1.291.g4437f3f132-goog
^permalinkrawreply [flat|nested] 239+ messages in thread

*[PATCH v2 0/5] Fix initializing the_hash_algo
2018-02-23 9:56 ` [PATCH 0/2] Fix initializing the_hash_algo Nguyễn Thái Ngọc Duy
` (2 preceding siblings ...)
2018-02-23 22:47 ` [PATCH 0/2] Fix initializing the_hash_algo brian m. carlson
@ 2018-02-24 3:34 ` " Nguyễn Thái Ngọc Duy
2018-02-24 3:34 ` [PATCH v2 1/5] setup.c: initialize the_repository correctly in all cases Nguyễn Thái Ngọc Duy
` (5 more replies)3 siblings, 6 replies; 239+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-02-24 3:34 UTC (permalink / raw)
To: git
Cc: Brandon Williams, Stefan Beller, Junio C Hamano, Eric Sunshine,
Jonathan Tan, brian m . carlson,
Nguyễn Thái Ngọc Duy
Brian questioned the unnecessary alarms in v1 when "git diff" or "git
index-pack" run in no-repository mode. I had the same feeling after
sending v1. But instead of suppresing alarms, we could do better.
v2 breaks those "fall back to SHA-1" code into separate patches and
handles it properly (I hope) instead of blind fall back like v1:
- for index-pack, we can determine the needed algorithm from the pack
file. I'm making an assumption here that pack files with new hash
algo must step up file format version. But I think it's a reasonable
assumption.
- for diff --no-index, I still fall back to SHA-1 to generate the
hashes like before. We could probably introduce a new command line
option to use a different hash. But that work could be done later
when an actual new hash has come
Note, I didn't test but this series could potentially break 'pu' (in a
good way). I initially was puzzled why the test suite didn't fail when
the_repository->hash_algo was NULL (i.e. before Brian's fix). Then I
found out that more work to abstract away SHA-1 hashing functions have
been done since then. But because the_hash_algo is pre-initialized
with SHA-1, these special cases did not show up until now. If there
are more the_hash_algo conversion on 'pu', more "no-repo" cases
could be spotted by the test suite.
I think this makes pre-initializing the_hash_algo to NULL a very good
point: during the abstraction work, we at least must identify the use
cases like this (code running without repo). We don't have to fix it
right away, but we can start thinking about how to deal with it. If we
ignore it and switch to a new hash, these code will keep using SHA-1
and cause more headache in future.
Nguyễn Thái Ngọc Duy (5):
setup.c: initialize the_repository correctly in all cases
sha1_file.c: keep a good name for "unknown" hash_algos[UNKNOWN]
index-pack: check (and optionally set) hash algo based on input file
diff.c: initialize hash algo when running in --no-index mode
Revert "repository: pre-initialize hash algo pointer"
builtin/index-pack.c | 26 +++++++++++++++++++++++++-
builtin/init-db.c | 3 ++-
cache.h | 3 ++-
common-main.c | 10 ++++++++++
diff.c | 12 ++++++++++++
path.c | 2 +-
repository.c | 2 +-
setup.c | 5 ++++-
sha1_file.c | 2 +-
t/helper/test-dump-split-index.c | 2 ++
10 files changed, 60 insertions(+), 7 deletions(-)
--
2.16.1.435.g8f24da2e1a
^permalinkrawreply [flat|nested] 239+ messages in thread

*Re: [PATCH 08/27] pack: move approximate object count to object store
2018-02-23 22:22 ` Stefan Beller@ 2018-02-26 8:55 ` Jeff King
2018-02-26 20:57 ` Stefan Beller0 siblings, 1 reply; 239+ messages in thread
From: Jeff King @ 2018-02-26 8:55 UTC (permalink / raw)
To: Stefan Beller
Cc: Brandon Williams, git, Junio C Hamano, Jonathan Tan, Duy Nguyen,
Eric Sunshine, Jonathan Nieder
On Fri, Feb 23, 2018 at 02:22:14PM -0800, Stefan Beller wrote:
> >> + /*
> >> + * A fast, rough count of the number of objects in the repository.
> >> + * These two fields are not meant for direct access. Use
> >> + * approximate_object_count() instead.
> >> + */
> >> + unsigned long approximate_object_count;
> >> + unsigned approximate_object_count_valid : 1;
> >
> > Patch looks fine and is effectively a no-op, though what is the need for
> > both of these variables? Maybe it can be simplified down to just use
> > one? Just musing as its out of the scope of this patch and we probably
> > shouldn't try to change that in this series.
>
> I agree we should. It was introduced in e323de4ad7f (sha1_file:
> allow sha1_loose_object_info to handle arbitrary repositories, 2017-08-30)
> and I think it was seen as a clever optimization trick back then?
I think you meant 8e3f52d778 (find_unique_abbrev: move logic out of
get_short_sha1(), 2016-10-03)?
Yes, it was just to avoid the dual-meaning of "0" for "not set" and a
repository with no packfiles. It would probably be fine to get rid of
it. If you have no packfiles then you probably don't have enough objects
to worry about micro-optimizing. And anyway, the "wasted" case wouldn't
even make any syscalls (it would do a noop prepare_packed_git() and then
realize the packed_git list is empty).
-Peff
^permalinkrawreply [flat|nested] 239+ messages in thread

*Re: [PATCH 0/4] Delete ignore_env member in struct repository
2018-02-26 10:30 ` [PATCH 0/4] Delete ignore_env member in struct repository Nguyễn Thái Ngọc Duy
` (4 preceding siblings ...)
2018-02-26 18:07 ` [PATCH 0/4] Delete ignore_env member in struct repository Junio C Hamano
@ 2018-02-26 20:46 ` Stefan Beller
2018-02-27 0:18 ` Duy Nguyen
2018-02-27 9:58 ` [PATCH v2 " Nguyễn Thái Ngọc Duy
6 siblings, 1 reply; 239+ messages in thread
From: Stefan Beller @ 2018-02-26 20:46 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy
Cc: Brandon Williams, git, Junio C Hamano, Jonathan Tan, Eric Sunshine
On Mon, Feb 26, 2018 at 2:30 AM, Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
> It turns out I don't need my other series [1] in order to delete this
> field. This series moves getenv() calls from
> repo_set_gitdir()/repo_setup_env() and prepare_alt_odb() back in
> environment.c where they belong in my opinion.
>
> The repo_set_gitdir() now takes $GIT_DIR and optionally all other
> configurable paths. If those paths are NULL, default repo layout will
> be used. With getenv() no longer called inside repo_set_gitdir(),
> ignore_env has no reason to stay. This is in 1/4.
>
> The getenv() in prepare_alt_odb() is also moved back to
> setup_git_env() in 3/4. It demonstrates how we could move other
> getenv() back to if we want.
>
> This series is built on top of Stefan's object-store-part1, v4. I
> could rebase it on 'master' too, but then Junio may need to resolve
> some conflicts.
>
Thanks for working on this,
I found this series a pleasant read, the only issue I saw was Erics upbringing
of multiple getenv calls without strdup()ing the content.
What is the plan from here on?
Should I build further series on top of yours? The next series will
focus on the pack side of things (pack.h, packfile.{c,h})
So maybe we'll have Junio merge down my series (and yours as it
needs one reroll?) and then build on top of that?
Thanks,
Stefan
^permalinkrawreply [flat|nested] 239+ messages in thread

*Re: [PATCHv4 00/27] Moving global state into the repository object (part 1)
2018-02-24 15:00 ` [PATCHv4 00/27] Moving global state into the repository object (part 1) Duy Nguyen
@ 2018-02-26 20:50 ` Stefan Beller
2018-02-27 0:02 ` Duy Nguyen0 siblings, 1 reply; 239+ messages in thread
From: Stefan Beller @ 2018-02-26 20:50 UTC (permalink / raw)
To: Duy Nguyen
Cc: Git Mailing List, Junio C Hamano, Jonathan Tan, Eric Sunshine,
Brandon Williams
On Sat, Feb 24, 2018 at 7:00 AM, Duy Nguyen <pclouds@gmail.com> wrote:
> I notice that there are a few global state (or configuration rather)
> left after this: packed_git_window_size, packed_git_limit and
> delta_base_cache_limit. These can be changed by $GIT_DIR/config, but
> since it's still global, a submodule repository will use the same
> settings of its super repository. If $SUBMODULE/config changes any of
> these, they are ignored.
That sounds all packing related, which I plan on working on next.
> The natural thing to do is move these to raw_object_store too (and
> repo_submodule_init needs to check submodule's config file). But one
> may argue these should be per-process instead of per-repo though. I
> don't know. But I thought I should mention this.
For now a process and a repository is the same as git-gc or git-repack
doesn't know about the --recurse-submodules flag, yet.
I wonder if we ever want to teach those commands the submodule
recursion, because of the issue you bring up, which settings do we apply
for a submodule? Traditionally we'd just have the command line override
the configuration, which I don't know if it is a good idea for these settings.
Thanks,
Stefan
^permalinkrawreply [flat|nested] 239+ messages in thread

*Re: [PATCH 08/27] pack: move approximate object count to object store
2018-02-26 8:55 ` Jeff King@ 2018-02-26 20:57 ` Stefan Beller0 siblings, 0 replies; 239+ messages in thread
From: Stefan Beller @ 2018-02-26 20:57 UTC (permalink / raw)
To: Jeff King
Cc: Brandon Williams, git, Junio C Hamano, Jonathan Tan, Duy Nguyen,
Eric Sunshine, Jonathan Nieder
On Mon, Feb 26, 2018 at 12:55 AM, Jeff King <peff@peff.net> wrote:
> On Fri, Feb 23, 2018 at 02:22:14PM -0800, Stefan Beller wrote:
>
>> >> + /*
>> >> + * A fast, rough count of the number of objects in the repository.
>> >> + * These two fields are not meant for direct access. Use
>> >> + * approximate_object_count() instead.
>> >> + */
>> >> + unsigned long approximate_object_count;
>> >> + unsigned approximate_object_count_valid : 1;
>> >
>> > Patch looks fine and is effectively a no-op, though what is the need for
>> > both of these variables? Maybe it can be simplified down to just use
>> > one? Just musing as its out of the scope of this patch and we probably
>> > shouldn't try to change that in this series.
>>
>> I agree we should. It was introduced in e323de4ad7f (sha1_file:
>> allow sha1_loose_object_info to handle arbitrary repositories, 2017-08-30)
>> and I think it was seen as a clever optimization trick back then?
>
> I think you meant 8e3f52d778 (find_unique_abbrev: move logic out of
> get_short_sha1(), 2016-10-03)?
Yes, copy paste error.
>
> Yes, it was just to avoid the dual-meaning of "0" for "not set" and a
> repository with no packfiles. It would probably be fine to get rid of
> it. If you have no packfiles then you probably don't have enough objects
> to worry about micro-optimizing. And anyway, the "wasted" case wouldn't
> even make any syscalls (it would do a noop prepare_packed_git() and then
> realize the packed_git list is empty).
Good point!
>
> -Peff
^permalinkrawreply [flat|nested] 239+ messages in thread

*Re: [PATCH 2/2] Revert "repository: pre-initialize hash algo pointer"
2018-02-24 3:44 ` Duy Nguyen@ 2018-02-26 23:09 ` Junio C Hamano
2018-03-03 1:51 ` Duy Nguyen0 siblings, 1 reply; 239+ messages in thread
From: Junio C Hamano @ 2018-02-26 23:09 UTC (permalink / raw)
To: Duy Nguyen
Cc: brian m. carlson, Git Mailing List, Brandon Williams,
Stefan Beller, Eric Sunshine, Jonathan Tan
Duy Nguyen <pclouds@gmail.com> writes:
> Yes that's the intention. But after writing cover letter for v2 and
> sending it out, it looks to me that this thing must stay until all our
> code is converted to using the_hash_algo (I don't know if there are
> more to convert or it's finished already). So an alternative is we do
> the opposite: default to GIT_HASH_SHA1, but when an env variable is
> set, reset it back to NULL. This env variable will _always_ be set by
> the test suite to help us catch problems.
If we were to do the "do not blindly initialize and always
initialize explicitly for debugging" (which I doubt is a good
direction to go in), then what you outlined above certainly is a
less evil way to do so than what the patch under discussion did, I
would think.
^permalinkrawreply [flat|nested] 239+ messages in thread

*Re: [PATCHv4 00/27] Moving global state into the repository object (part 1)
2018-02-26 20:50 ` Stefan Beller@ 2018-02-27 0:02 ` Duy Nguyen0 siblings, 0 replies; 239+ messages in thread
From: Duy Nguyen @ 2018-02-27 0:02 UTC (permalink / raw)
To: Stefan Beller
Cc: Git Mailing List, Junio C Hamano, Jonathan Tan, Eric Sunshine,
Brandon Williams
On Tue, Feb 27, 2018 at 3:50 AM, Stefan Beller <sbeller@google.com> wrote:
>> The natural thing to do is move these to raw_object_store too (and
>> repo_submodule_init needs to check submodule's config file). But one
>> may argue these should be per-process instead of per-repo though. I
>> don't know. But I thought I should mention this.
>
> For now a process and a repository is the same as git-gc or git-repack
I think you're thinking about the pack writing part, but these are
configuration for pack reading (aka "object store"). If you read a
blob from a submodule (e.g. git-grep), you'll use these configurations
at some point.
There are of course more configuration for pack writing (like
zlib_compression_level) which I deliberately avoided to mention
because I don't even know where they belong.
> doesn't know about the --recurse-submodules flag, yet.
> I wonder if we ever want to teach those commands the submodule
> recursion, because of the issue you bring up, which settings do we apply
> for a submodule? Traditionally we'd just have the command line override
> the configuration, which I don't know if it is a good idea for these settings.
--
Duy
^permalinkrawreply [flat|nested] 239+ messages in thread

*Re: [PATCH 0/4] Delete ignore_env member in struct repository
2018-02-26 20:46 ` Stefan Beller@ 2018-02-27 0:18 ` Duy Nguyen0 siblings, 0 replies; 239+ messages in thread
From: Duy Nguyen @ 2018-02-27 0:18 UTC (permalink / raw)
To: Stefan Beller
Cc: Brandon Williams, git, Junio C Hamano, Jonathan Tan, Eric Sunshine
On Tue, Feb 27, 2018 at 3:46 AM, Stefan Beller <sbeller@google.com> wrote:
> What is the plan from here on?
> Should I build further series on top of yours? The next series will
> focus on the pack side of things (pack.h, packfile.{c,h})
>
> So maybe we'll have Junio merge down my series (and yours as it
> needs one reroll?) and then build on top of that?
I think that's the less painful way for everybody.
--
Duy
^permalinkrawreply [flat|nested] 239+ messages in thread

*Re: [PATCH 1/4] repository.c: move env-related setup code back to environment.c
2018-02-26 20:30 ` Stefan Beller@ 2018-02-27 0:28 ` Duy Nguyen0 siblings, 0 replies; 239+ messages in thread
From: Duy Nguyen @ 2018-02-27 0:28 UTC (permalink / raw)
To: Stefan Beller
Cc: Brandon Williams, git, Junio C Hamano, Jonathan Tan, Eric Sunshine
On Tue, Feb 27, 2018 at 3:30 AM, Stefan Beller <sbeller@google.com> wrote:
>> diff --git a/setup.c b/setup.c
>> index c5d55dcee4..6fac1bb58a 100644
>> --- a/setup.c
>> +++ b/setup.c
>> @@ -1116,8 +1116,7 @@ const char *setup_git_directory_gently(int *nongit_ok)
>> const char *gitdir = getenv(GIT_DIR_ENVIRONMENT);
>> if (!gitdir)
>> gitdir = DEFAULT_GIT_DIR_ENVIRONMENT;
>> - repo_set_gitdir(the_repository, gitdir);
>> - setup_git_env();
>> + setup_git_env(gitdir);
>> }
>
> What makes git_dir special, such that we have to pass it in here?
> Could the check for getenv(GIT_DIR_ENVIRONMENT) and fallback to
> DEFAULT_... be part of setup_git_env() ?
> Oh I guess that cannot be done easily as the submodule code
> spcifically doesn't want to discover the git dir itself.
No, submodule code already does not care about setup_git_env().
Back to your first question, this is related to the "NEEDSWORK"
comment in this code. We _should_ always set_git_dir() before leaving
setup_git_directory_gently() then everybody behaves the same way and
we don't need this special setup_git_env() here at all. BUT we need to
be careful about setting environment variable $GIT_DIR, done inside
set_git_dir(), because sub-processes will inherit it and if we're not
careful, we'll break .git discovery in those. There's another thing I
don't like about this is setup_work_tree() using set_git_dir() the
second time to adjust relative $GIT_DIR and friends when it moves
$CWD. I'll need to fix this, soon.
So in short, it's special because the current situation is messy and
(hopefully) temporary. But it should eventually be gone.
--
Duy
^permalinkrawreply [flat|nested] 239+ messages in thread