*Re: [PATCH v1] read-cache: speed up index load through parallelization
2018-08-23 17:31 ` Stefan Beller
2018-08-23 19:44 ` Ben Peart@ 2018-08-24 18:40 ` Duy Nguyen
2018-08-28 14:53 ` Ben Peart1 sibling, 1 reply; 199+ messages in thread
From: Duy Nguyen @ 2018-08-24 18:40 UTC (permalink / raw)
To: Stefan Beller; +Cc: Ben Peart, Git Mailing List, Junio C Hamano
On Thu, Aug 23, 2018 at 7:33 PM Stefan Beller <sbeller@google.com> wrote:
> > +core.fastIndex::
> > + Enable parallel index loading
> > ++
> > +This can speed up operations like 'git diff' and 'git status' especially
> > +when the index is very large. When enabled, Git will do the index
> > +loading from the on disk format to the in-memory format in parallel.
> > +Defaults to true.
>
> "fast" is a non-descriptive word as we try to be fast in any operation?
> Maybe core.parallelIndexReading as that just describes what it
> turns on/off, without second guessing its effects?
Another option is index.threads (the "index" section currently only
has one item, index.version). The value could be the same as
grep.threads or pack.threads.
(and if you're thinking about parallelizing write as well but it
should be tuned differently, then perhaps index.readThreads, but I
don't think we need to go that far)
--
Duy
^permalinkrawreply [flat|nested] 199+ messages in thread

*Re: [PATCH v1] read-cache: speed up index load through parallelization
2018-08-24 18:20 ` [PATCH v1] read-cache: speed up index load through parallelization Duy Nguyen
@ 2018-08-24 18:40 ` Ben Peart
2018-08-24 19:00 ` Duy Nguyen0 siblings, 1 reply; 199+ messages in thread
From: Ben Peart @ 2018-08-24 18:40 UTC (permalink / raw)
To: Duy Nguyen; +Cc: Junio C Hamano, Ben Peart, Git Mailing List
On 8/24/2018 2:20 PM, Duy Nguyen wrote:
> On Fri, Aug 24, 2018 at 5:37 PM Duy Nguyen <pclouds@gmail.com> wrote:
>> On Thu, Aug 23, 2018 at 10:36 PM Ben Peart <peartben@gmail.com> wrote:
>>>> Nice to see this done without a new index extension that records
>>>> offsets, so that we can load existing index files in parallel.
>>>>
>>>
>>> Yes, I prefer this simpler model as well. I wasn't sure it would
>>> produce a significant improvement given the primary thread still has to
>>> run through the variable length cache entries but was pleasantly surprised.
>>
>> Out of curiosity, how much time saving could we gain by recording
>> offsets as an extension (I assume we need, like 4 offsets if the
>> system has 4 cores)? Much much more than this simpler model (which may
>> justify the complexity) or just "meh" compared to this?
>
> To answer my own question, I ran a patched git to precalculate
> individual thread parameters, removed the scheduler code and hard
> coded these parameters (I ran just 4 threads, one per core). I got
> 0m2.949s (webkit.git, 275k files, 100 read-cache runs). Compared to
> 0m4.996s from Ben's patch (same test settings of course) I think it's
> definitely worth adding some extra complexity.
>
I took a run at doing that last year [1] but that was before the
mem_pool work that allowed us to avoid the thread contention on the heap
so the numbers aren't an apples to apples comparison (they would be
better today).
The trade-off is the additional complexity to be able to load the index
extension without having to parse through all the variable length cache
entries. My patch worked but there was feedback requested to make it
more generic and robust that I haven't gotten around to yet.
This patch series went for simplicity over absolutely the best possible
performance.
[1]
https://public-inbox.org/git/20171109141737.47976-1-benpeart@microsoft.com/^permalinkrawreply [flat|nested] 199+ messages in thread

*Re: [PATCH v1] read-cache: speed up index load through parallelization
2018-08-24 18:40 ` Ben Peart@ 2018-08-24 19:00 ` Duy Nguyen
2018-08-24 19:57 ` Ben Peart0 siblings, 1 reply; 199+ messages in thread
From: Duy Nguyen @ 2018-08-24 19:00 UTC (permalink / raw)
To: Ben Peart; +Cc: Junio C Hamano, Ben Peart, Git Mailing List
On Fri, Aug 24, 2018 at 8:40 PM Ben Peart <peartben@gmail.com> wrote:
>
>
>
> On 8/24/2018 2:20 PM, Duy Nguyen wrote:
> > On Fri, Aug 24, 2018 at 5:37 PM Duy Nguyen <pclouds@gmail.com> wrote:
> >> On Thu, Aug 23, 2018 at 10:36 PM Ben Peart <peartben@gmail.com> wrote:
> >>>> Nice to see this done without a new index extension that records
> >>>> offsets, so that we can load existing index files in parallel.
> >>>>
> >>>
> >>> Yes, I prefer this simpler model as well. I wasn't sure it would
> >>> produce a significant improvement given the primary thread still has to
> >>> run through the variable length cache entries but was pleasantly surprised.
> >>
> >> Out of curiosity, how much time saving could we gain by recording
> >> offsets as an extension (I assume we need, like 4 offsets if the
> >> system has 4 cores)? Much much more than this simpler model (which may
> >> justify the complexity) or just "meh" compared to this?
> >
> > To answer my own question, I ran a patched git to precalculate
> > individual thread parameters, removed the scheduler code and hard
> > coded these parameters (I ran just 4 threads, one per core). I got
> > 0m2.949s (webkit.git, 275k files, 100 read-cache runs). Compared to
> > 0m4.996s from Ben's patch (same test settings of course) I think it's
> > definitely worth adding some extra complexity.
> >
>
> I took a run at doing that last year [1] but that was before the
> mem_pool work that allowed us to avoid the thread contention on the heap
> so the numbers aren't an apples to apples comparison (they would be
> better today).
Ah.. sorry I was not aware. A big chunk of 2017 is blank to me when it
comes to git.
> The trade-off is the additional complexity to be able to load the index
> extension without having to parse through all the variable length cache
> entries. My patch worked but there was feedback requested to make it
> more generic and robust that I haven't gotten around to yet.
One more comment. Instead of forcing this special index at the bottom,
add a generic one that gives positions of all extensions and put that
one at the bottom. Then you can still quickly locate your offset table
extension, and you could load UNTR and TREE extensions in parallel too
(those scale up to worktree size)
> This patch series went for simplicity over absolutely the best possible
> performance.
Well, you know my stance on this now :) Not that it really matters.
> [1]
> https://public-inbox.org/git/20171109141737.47976-1-benpeart@microsoft.com/
PS. I still think it's worth bring v4's performance back to v2. It's
low hanging fruit because I'm pretty sure Junio did not add v4 code
with cpu performance in mind. It was about file size at that time and
cpu consumption was still dwarfed by hashing.
--
Duy
^permalinkrawreply [flat|nested] 199+ messages in thread

*Re: [PATCH v1] read-cache: speed up index load through parallelization
2018-08-24 19:00 ` Duy Nguyen@ 2018-08-24 19:57 ` Ben Peart0 siblings, 0 replies; 199+ messages in thread
From: Ben Peart @ 2018-08-24 19:57 UTC (permalink / raw)
To: Duy Nguyen; +Cc: Junio C Hamano, Ben Peart, Git Mailing List
On 8/24/2018 3:00 PM, Duy Nguyen wrote:
> On Fri, Aug 24, 2018 at 8:40 PM Ben Peart <peartben@gmail.com> wrote:
>>
>>
>>
>> On 8/24/2018 2:20 PM, Duy Nguyen wrote:
>>> On Fri, Aug 24, 2018 at 5:37 PM Duy Nguyen <pclouds@gmail.com> wrote:
>>>> On Thu, Aug 23, 2018 at 10:36 PM Ben Peart <peartben@gmail.com> wrote:
>>>>>> Nice to see this done without a new index extension that records
>>>>>> offsets, so that we can load existing index files in parallel.
>>>>>>
>>>>>
>>>>> Yes, I prefer this simpler model as well. I wasn't sure it would
>>>>> produce a significant improvement given the primary thread still has to
>>>>> run through the variable length cache entries but was pleasantly surprised.
>>>>
>>>> Out of curiosity, how much time saving could we gain by recording
>>>> offsets as an extension (I assume we need, like 4 offsets if the
>>>> system has 4 cores)? Much much more than this simpler model (which may
>>>> justify the complexity) or just "meh" compared to this?
>>>
>>> To answer my own question, I ran a patched git to precalculate
>>> individual thread parameters, removed the scheduler code and hard
>>> coded these parameters (I ran just 4 threads, one per core). I got
>>> 0m2.949s (webkit.git, 275k files, 100 read-cache runs). Compared to
>>> 0m4.996s from Ben's patch (same test settings of course) I think it's
>>> definitely worth adding some extra complexity.
>>>
>>
>> I took a run at doing that last year [1] but that was before the
>> mem_pool work that allowed us to avoid the thread contention on the heap
>> so the numbers aren't an apples to apples comparison (they would be
>> better today).
>
> Ah.. sorry I was not aware. A big chunk of 2017 is blank to me when it
> comes to git.
>
>> The trade-off is the additional complexity to be able to load the index
>> extension without having to parse through all the variable length cache
>> entries. My patch worked but there was feedback requested to make it
>> more generic and robust that I haven't gotten around to yet.
>
> One more comment. Instead of forcing this special index at the bottom,
> add a generic one that gives positions of all extensions and put that
> one at the bottom. Then you can still quickly locate your offset table
> extension, and you could load UNTR and TREE extensions in parallel too
> (those scale up to worktree size)
>
That is pretty much what Junio's feedback was and what I was referring
to as making it "more generic." The "more robust" was the request to
add a SHA to the extension ensure it wasn't corrupt and was a valid
extension.
>> This patch series went for simplicity over absolutely the best possible
>> performance.
>
> Well, you know my stance on this now :) Not that it really matters.
>
>> [1]
>> https://public-inbox.org/git/20171109141737.47976-1-benpeart@microsoft.com/
>
> PS. I still think it's worth bring v4's performance back to v2. It's
> low hanging fruit because I'm pretty sure Junio did not add v4 code
> with cpu performance in mind. It was about file size at that time and
> cpu consumption was still dwarfed by hashing.
>
I see that as a nice follow up patch. If the extension exists, use it
and jump directly to the blocks and spin up threads. If it doesn't
exist, fall back to the code in this patch that has to find/compute the
blocks on the fly.
^permalinkrawreply [flat|nested] 199+ messages in thread

*Re: [PATCH v1] read-cache: speed up index load through parallelization
2018-08-24 18:40 ` Duy Nguyen@ 2018-08-28 14:53 ` Ben Peart0 siblings, 0 replies; 199+ messages in thread
From: Ben Peart @ 2018-08-28 14:53 UTC (permalink / raw)
To: Duy Nguyen, Stefan Beller; +Cc: Ben Peart, Git Mailing List, Junio C Hamano
On 8/24/2018 2:40 PM, Duy Nguyen wrote:
> On Thu, Aug 23, 2018 at 7:33 PM Stefan Beller <sbeller@google.com> wrote:
>>> +core.fastIndex::
>>> + Enable parallel index loading
>>> ++
>>> +This can speed up operations like 'git diff' and 'git status' especially
>>> +when the index is very large. When enabled, Git will do the index
>>> +loading from the on disk format to the in-memory format in parallel.
>>> +Defaults to true.
>> "fast" is a non-descriptive word as we try to be fast in any operation?
>> Maybe core.parallelIndexReading as that just describes what it
>> turns on/off, without second guessing its effects?
> Another option is index.threads (the "index" section currently only
> has one item, index.version). The value could be the same as
> grep.threads or pack.threads.
>
> (and if you're thinking about parallelizing write as well but it
> should be tuned differently, then perhaps index.readThreads, but I
> don't think we need to go that far)
I like that. I'll switch to index.threads and make 'true' or '0' mean
"automatically determine the number of threads to use" similar to
pack.threads.
^permalinkrawreply [flat|nested] 199+ messages in thread

*Re: [PATCH v2 2/3] read-cache: load cache extensions on worker thread
2018-08-29 17:12 ` Junio C Hamano@ 2018-08-29 21:42 ` Ben Peart
2018-08-29 22:19 ` Junio C Hamano0 siblings, 1 reply; 199+ messages in thread
From: Ben Peart @ 2018-08-29 21:42 UTC (permalink / raw)
To: Junio C Hamano, Ben Peart; +Cc: git, pclouds
On 8/29/2018 1:12 PM, Junio C Hamano wrote:
> Ben Peart <Ben.Peart@microsoft.com> writes:
>
>> This is possible because the current extensions don't access the cache
>> entries in the index_state structure so are OK that they don't all exist
>> yet.
>>
>> The CACHE_EXT_TREE, CACHE_EXT_RESOLVE_UNDO, and CACHE_EXT_UNTRACKED
>> extensions don't even get a pointer to the index so don't have access to the
>> cache entries.
>>
>> CACHE_EXT_LINK only uses the index_state to initialize the split index.
>> CACHE_EXT_FSMONITOR only uses the index_state to save the fsmonitor last
>> update and dirty flags.
>
> Good to see such an analysis here. Once we define an extension
> section, which requires us to have the cache entries before
> populating it, this scheme would falls down, of course, but the
> extension mechanism is all about protecting ourselves from the
> future changes, so we'd at least need a good feel for how we read an
> unknown extension from the future with the current code. Perhaps
> just like the main cache entries were pre-scanned to apportion them
> to worker threads, we can pre-scan the sections and compare them
> with a white-list built into our binary before deciding that it is
> safe to read them in parallel (and otherwise, we ask the last thread
> for reading extensions to wait until the workers that read the main
> index all return)?
>
Yes, when we add a new extension that requires the cache entries to
exist and be parsed, we will need to add a mechanism to ensure that
happens for that extension. I agree a white list is probably the right
way to deal with it. Until we have that need, it would just add
unnecessary complexity so I think we should wait till it is actually needed.
There isn't any change in behavior with unknown extensions and this
patch. If an unknown extension exists it will just get ignored and
reported as an "unknown extension" or "die" if it is marked as "required."
I'll fix the rest of your suggestions - thanks for the close review.
^permalinkrawreply [flat|nested] 199+ messages in thread

*Re: [PATCH v3 0/4] read-cache: speed up index load through parallelization
2018-09-06 21:03 ` [PATCH v3 0/4] read-cache: speed up index load through parallelization Ben Peart
` (3 preceding siblings ...)
2018-09-06 21:03 ` [PATCH v3 4/4] read-cache: speed up index load through parallelization Ben Peart
@ 2018-09-07 17:21 ` " Junio C Hamano
2018-09-07 18:31 ` Ben Peart
2018-09-08 13:18 ` Duy Nguyen4 siblings, 2 replies; 199+ messages in thread
From: Junio C Hamano @ 2018-09-07 17:21 UTC (permalink / raw)
To: Ben Peart; +Cc: git\, pclouds\, Ben Peart
Ben Peart <benpeart@microsoft.com> writes:
> On further investigation with the previous patch, I noticed that my test
> repos didn't contain the cache tree extension in their index. After doing a
> commit to ensure they existed, I realized that in some instances, the time
> to load the cache tree exceeded the time to load all the cache entries in
> parallel. Because the thread to read the cache tree was started last (due
> to having to parse through all the cache entries first) we weren't always
> getting optimal performance.
>
> To better optimize for this case, I decided to write the EOIE extension
> as suggested by Junio [1] in response to my earlier multithreading patch
> series [2]. This enables me to spin up the thread to load the extensions
> earlier as it no longer has to parse through all the cache entries first.
Hmph. I kinda liked the simplicity of the previous one, but if we
need to start reading the extension sections sooner by eliminating
the overhead to scan the cache entries, perhaps we should bite the
bullet now.
> The big changes in this iteration are:
>
> - add the EOIE extension
> - update the index extension worker thread to start first
I guess I'd need to see the actual patch to find this out, but once
we rely on a new extension, then we could omit scanning the main
index even to partition the work among workers (i.e. like the topic
long ago, you can have list of pointers into the main index to help
partitioning, plus reset the prefix compression used in v4). I
think you didn't get that far in this round, which is good. If the
gain with EOIE alone (and starting the worker for the extension
section early) is large enough without such a pre-computed work
partition table, the simplicity of this round may give us a good
stopping point.
> This patch conflicts with Duy's patch to remove the double memory copy and
> pass in the previous ce instead. The two will need to be merged/reconciled
> once they settle down a bit.
Thanks. I have a feeling that 67922abb ("read-cache.c: optimize
reading index format v4", 2018-09-02) is already 'next'-worthy
and ready to be built on, but I'd prefer to hear from Duy to double
check.
^permalinkrawreply [flat|nested] 199+ messages in thread

*Re: [PATCH v3 0/4] read-cache: speed up index load through parallelization
2018-09-07 17:21 ` [PATCH v3 0/4] " Junio C Hamano
@ 2018-09-07 18:31 ` Ben Peart
2018-09-08 13:18 ` Duy Nguyen1 sibling, 0 replies; 199+ messages in thread
From: Ben Peart @ 2018-09-07 18:31 UTC (permalink / raw)
To: Junio C Hamano, Ben Peart; +Cc: git, pclouds, Ben Peart
On 9/7/2018 1:21 PM, Junio C Hamano wrote:
> Ben Peart <benpeart@microsoft.com> writes:
>
>> On further investigation with the previous patch, I noticed that my test
>> repos didn't contain the cache tree extension in their index. After doing a
>> commit to ensure they existed, I realized that in some instances, the time
>> to load the cache tree exceeded the time to load all the cache entries in
>> parallel. Because the thread to read the cache tree was started last (due
>> to having to parse through all the cache entries first) we weren't always
>> getting optimal performance.
>>
>> To better optimize for this case, I decided to write the EOIE extension
>> as suggested by Junio [1] in response to my earlier multithreading patch
>> series [2]. This enables me to spin up the thread to load the extensions
>> earlier as it no longer has to parse through all the cache entries first.
>
> Hmph. I kinda liked the simplicity of the previous one, but if we
> need to start reading the extension sections sooner by eliminating
> the overhead to scan the cache entries, perhaps we should bite the
> bullet now.
>
I preferred the simplicity as well but when I was profiling the code and
found out that loading the extensions was most often the last thread to
complete, I took this intermediate step to speed things up.
>> The big changes in this iteration are:
>>
>> - add the EOIE extension
>> - update the index extension worker thread to start first
>
> I guess I'd need to see the actual patch to find this out, but once
> we rely on a new extension, then we could omit scanning the main
> index even to partition the work among workers (i.e. like the topic
> long ago, you can have list of pointers into the main index to help
> partitioning, plus reset the prefix compression used in v4). I
> think you didn't get that far in this round, which is good. If the
> gain with EOIE alone (and starting the worker for the extension
> section early) is large enough without such a pre-computed work
> partition table, the simplicity of this round may give us a good
> stopping point.
>
Agreed. I didn't go that far in this series as it doesn't appear to be
necessary. We could always add that later if it turned out to be worth
the additional complexity.
>> This patch conflicts with Duy's patch to remove the double memory copy and
>> pass in the previous ce instead. The two will need to be merged/reconciled
>> once they settle down a bit.
>
> Thanks. I have a feeling that 67922abb ("read-cache.c: optimize
> reading index format v4", 2018-09-02) is already 'next'-worthy
> and ready to be built on, but I'd prefer to hear from Duy to double
> check.
>
I'll take a closer look at what this will entail. It gets more
complicated as we don't actually have a previous expanded cache entry
when starting each thread. I'll see how complex it makes the code and
how much additional performance it gives.
^permalinkrawreply [flat|nested] 199+ messages in thread

*Re: [PATCH v3 2/4] eoie: add End of Index Entry (EOIE) extension
2018-09-07 17:55 ` Junio C Hamano@ 2018-09-07 20:23 ` Ben Peart
2018-09-08 6:29 ` Martin Ågren0 siblings, 1 reply; 199+ messages in thread
From: Ben Peart @ 2018-09-07 20:23 UTC (permalink / raw)
To: Junio C Hamano, Ben Peart; +Cc: git, pclouds, Ben Peart
On 9/7/2018 1:55 PM, Junio C Hamano wrote:
> Ben Peart <benpeart@microsoft.com> writes:
>
>> The extension consists of:
>>
>> - 32-bit offset to the end of the index entries
>>
>> - 160-bit SHA-1 over the extension types and their sizes (but not
>> their contents). E.g. if we have "TREE" extension that is N-bytes
>> long, "REUC" extension that is M-bytes long, followed by "EOIE",
>> then the hash would be:
>>
>> SHA-1("TREE" + <binary representation of N> +
>> "REUC" + <binary representation of M>)
>
> I didn't look at the documentation patch in the larger context, but
> please make sure that it is clear to the readers that these fixed
> width integers "binary representations" use network byte order.
>
At the top of the documentation it says "All binary numbers are in
network byte order" and that is not repeated for any of the other
sections that are documenting the file format.
> I briefly wondered if the above should include
>
> + "EOIE" + <binary representation of (32+160)/8 = 24>
>
> as it is pretty much common file format design to include the header
> part of the checksum record (with checksum values padded out with NUL
> bytes) when you define a record to hold the checksum of the entire
> file. Since this does not protect the contents of each section from
> bit-flipping, adding the data for EOIE itself in the sum does not
> give us much (iow, what I am adding above is a constant that does
> not contribute any entropy).
>
> How big is a typical TREE extension in _your_ work repository
> housing Windows sources? I am guessing that replacing SHA-1 with
> something faster (as this is not about security but is about disk
> corruption) and instead hash also the contents of these sections
> would NOT help all that much in the performance department, as
> having to page them in to read through would already consume
> non-trivial amount of time, and that is why you are not hashing the
> contents.
>
The purpose of the SHA isn't to detect disk corruption (we already have
a SHA for the entire index that can serve that purpose) but to help
ensure that this was actually a valid EOIE extension and not a lucky
random set of bytes. I had used leading and trailing signature bytes
along with the length and version bytes to validate it was an actual
EOIE extension but you suggested [1] that I use a SHA of the 4 byte
extension type + 4 byte extension length instead so I rewrote it that way.
[1]
https://public-inbox.org/git/xmqq1sl017dw.fsf@gitster.mtv.corp.google.com/>> + /*
>> + * CACHE_EXT_ENDOFINDEXENTRIES must be written as the last entry before the SHA1
>
> s/SHA1/trailing checksum/ or something so that we can withstand
> NewHash world order?
>
I thought about this but in the document elsewhere it refers to it as
"160-bit SHA-1 over the content of the index file before this checksum."
and there are at least a dozen other references to "SHA-1" so I figured
we can fix them all at the same time when we have a new/better name. :-)
>> + * so that it can be found and processed before all the index entries are
>> + * read.
>> + */
>> + if (!strip_extensions && offset && !git_env_bool("GIT_TEST_DISABLE_EOIE", 0)) {
>> + struct strbuf sb = STRBUF_INIT;
>> +
>> + write_eoie_extension(&sb, &eoie_c, offset);
>> + err = write_index_ext_header(&c, NULL, newfd, CACHE_EXT_ENDOFINDEXENTRIES, sb.len) < 0
>> || ce_write(&c, newfd, sb.buf, sb.len) < 0;
>> strbuf_release(&sb);
>> if (err)
>
> OK.
>
>> +#define EOIE_SIZE 24 /* <4-byte offset> + <20-byte hash> */
>> +#define EOIE_SIZE_WITH_HEADER (4 + 4 + EOIE_SIZE) /* <4-byte signature> + <4-byte length> + EOIE_SIZE */
>> +
>> +#ifndef NO_PTHREADS
>> +static unsigned long read_eoie_extension(void *mmap, size_t mmap_size)
>> +{
>> + /*
>> + * The end of index entries (EOIE) extension is guaranteed to be last
>> + * so that it can be found by scanning backwards from the EOF.
>> + *
>> + * "EOIE"
>> + * <4-byte length>
>> + * <4-byte offset>
>> + * <20-byte hash>
>> + */
>> + const char *index, *eoie = (const char *)mmap + mmap_size - GIT_SHA1_RAWSZ - EOIE_SIZE_WITH_HEADER;
>> + uint32_t extsize;
>> + unsigned long offset, src_offset;
>> + unsigned char hash[GIT_MAX_RAWSZ];
>> + git_hash_ctx c;
>> +
>> + /* validate the extension signature */
>> + index = eoie;
>> + if (CACHE_EXT(index) != CACHE_EXT_ENDOFINDEXENTRIES)
>> + return 0;
>> + index += sizeof(uint32_t);
>> +
>> + /* validate the extension size */
>> + extsize = get_be32(index);
>> + if (extsize != EOIE_SIZE)
>> + return 0;
>> + index += sizeof(uint32_t);
>
> Do we know we have at least 8-byte to consume to perform the above
> two checks, or is that something we need to verify at the beginning
> of this function? Better yet, as we know that a correct EOIE with
> its own header is 28-byte long, we probably should abort if
> mmap_size is smaller than that.
>
I'll add that additional test.
>> + /*
>> + * Validate the offset we're going to look for the first extension
>> + * signature is after the index header and before the eoie extension.
>> + */
>> + offset = get_be32(index);
>> + if ((const char *)mmap + offset < (const char *)mmap + sizeof(struct cache_header))
>> + return 0;
>
> Claims that the end is before the beginning, which we reject as bogus. Good.
>
>> + if ((const char *)mmap + offset >= eoie)
>> + return 0;
>
> Claims that the end is beyond the EOIE marker we should have placed
> after it, which we reject as bogus. Good.
>
>> + index += sizeof(uint32_t);
>> +
>> + /*
>> + * The hash is computed over extension types and their sizes (but not
>> + * their contents). E.g. if we have "TREE" extension that is N-bytes
>> + * long, "REUC" extension that is M-bytes long, followed by "EOIE",
>> + * then the hash would be:
>> + *
>> + * SHA-1("TREE" + <binary representation of N> +
>> + * "REUC" + <binary representation of M>)
>> + */
>> + src_offset = offset;
>> + the_hash_algo->init_fn(&c);
>> + while (src_offset < mmap_size - the_hash_algo->rawsz - EOIE_SIZE_WITH_HEADER) {
>> + /* After an array of active_nr index entries,
> (Style nit).
>> + * there can be arbitrary number of extended
>> + * sections, each of which is prefixed with
>> + * extension name (4-byte) and section length
>> + * in 4-byte network byte order.
>> + */
>> + uint32_t extsize;
>> + memcpy(&extsize, (char *)mmap + src_offset + 4, 4);
>> + extsize = ntohl(extsize);
>
> Earlier we were using get_be32() but now we use memcpy with ntohl()?
> How are we choosing which one to use?
>
I literally copy/pasted this logic from the code that actually loads the
extensions then removed the call to load the extension and replaced it
with the call to update the hash. I kept it the same to facilitate
consistency for any future fixes or changes.
> I think you meant to cast mmap to (const char *) here. It may make it
> easier to write and read if we started this function like so:
>
> static unsigned long read_eoie_extension(void *mmap_, size_t mmap_size)
> {
> const char *mmap = mmap_;
>
> then we do not have to keep casting mmap and cast to a wrong type by
> mistake.
>
Good suggestion.
>> +
>> + /* verify the extension size isn't so large it will wrap around */
>> + if (src_offset + 8 + extsize < src_offset)
>> + return 0;
>
> Good.
>
>> + the_hash_algo->update_fn(&c, (const char *)mmap + src_offset, 8);
>> +
>> + src_offset += 8;
>> + src_offset += extsize;
>> + }
>> + the_hash_algo->final_fn(hash, &c);
>> + if (hashcmp(hash, (unsigned char *)index))
>> + return 0;
>> +
>> + /* Validate that the extension offsets returned us back to the eoie extension. */
>> + if (src_offset != mmap_size - the_hash_algo->rawsz - EOIE_SIZE_WITH_HEADER)
>> + return 0;
>
> Very good.
>
>> + return offset;
>> +}
>> +#endif
>
> Overall it looks like it is carefully done.
Thanks for the careful review!
> Thanks.
>
^permalinkrawreply [flat|nested] 199+ messages in thread

*Re: [PATCH v3 0/4] read-cache: speed up index load through parallelization
2018-09-07 17:21 ` [PATCH v3 0/4] " Junio C Hamano
2018-09-07 18:31 ` Ben Peart@ 2018-09-08 13:18 ` Duy Nguyen1 sibling, 0 replies; 199+ messages in thread
From: Duy Nguyen @ 2018-09-08 13:18 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Ben Peart, Git Mailing List, Ben Peart
On Fri, Sep 7, 2018 at 7:21 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Ben Peart <benpeart@microsoft.com> writes:
>
> > On further investigation with the previous patch, I noticed that my test
> > repos didn't contain the cache tree extension in their index. After doing a
> > commit to ensure they existed, I realized that in some instances, the time
> > to load the cache tree exceeded the time to load all the cache entries in
> > parallel. Because the thread to read the cache tree was started last (due
> > to having to parse through all the cache entries first) we weren't always
> > getting optimal performance.
> >
> > To better optimize for this case, I decided to write the EOIE extension
> > as suggested by Junio [1] in response to my earlier multithreading patch
> > series [2]. This enables me to spin up the thread to load the extensions
> > earlier as it no longer has to parse through all the cache entries first.
>
> Hmph. I kinda liked the simplicity of the previous one, but if we
> need to start reading the extension sections sooner by eliminating
> the overhead to scan the cache entries, perhaps we should bite the
> bullet now.
My view is slightly different. If we have to optimize might as well
try to squeeze the best out of it. Simplicity is already out of the
window at this point (but maintainability remains).
> > The big changes in this iteration are:
> >
> > - add the EOIE extension
> > - update the index extension worker thread to start first
>
> I guess I'd need to see the actual patch to find this out, but once
> we rely on a new extension, then we could omit scanning the main
> index even to partition the work among workers (i.e. like the topic
> long ago, you can have list of pointers into the main index to help
> partitioning, plus reset the prefix compression used in v4). I
> think you didn't get that far in this round, which is good. If the
> gain with EOIE alone (and starting the worker for the extension
> section early) is large enough without such a pre-computed work
> partition table, the simplicity of this round may give us a good
> stopping point.
I suspect the reduced gain in 1M files case compared to 100k files in
4/4 is because of scanning the index to split work to worker threads.
Since the index is now larger, the scanning takes more time before we
can start worker threads and we gain less from parallelization. I have
not experimented to see if this is true or there is something else.
> > This patch conflicts with Duy's patch to remove the double memory copy and
> > pass in the previous ce instead. The two will need to be merged/reconciled
> > once they settle down a bit.
>
> Thanks. I have a feeling that 67922abb ("read-cache.c: optimize
> reading index format v4", 2018-09-02) is already 'next'-worthy
> and ready to be built on, but I'd prefer to hear from Duy to double
> check.
Yes I think it's good. I ran the entire test suite with v4 just to
double check (and thinking of testing v4 version in travis too).
--
Duy
^permalinkrawreply [flat|nested] 199+ messages in thread

*Re: [PATCH v5 1/5] eoie: add End of Index Entry (EOIE) extension
2018-09-17 14:54 ` Ben Peart@ 2018-09-17 16:05 ` Duy Nguyen
2018-09-17 17:31 ` Junio C Hamano0 siblings, 1 reply; 199+ messages in thread
From: Duy Nguyen @ 2018-09-17 16:05 UTC (permalink / raw)
To: Ben Peart; +Cc: Ben Peart, Git Mailing List, Junio C Hamano, Ben Peart
On Mon, Sep 17, 2018 at 4:55 PM Ben Peart <peartben@gmail.com> wrote:
> On 9/15/2018 6:02 AM, Duy Nguyen wrote:
>
> >> default:
> >> if (*ext < 'A' || 'Z' < *ext)
> >> return error("index uses %.4s extension, which we do not understand",
> >> @@ -1889,6 +1893,11 @@ static size_t estimate_cache_size(size_t ondisk_size, unsigned int entries)
> >> return ondisk_size + entries * per_entry;
> >> }
> >>
> >> +#ifndef NO_PTHREADS
> >> +static unsigned long read_eoie_extension(void *mmap_, size_t mmap_size);
> >> +#endif
> >
> > Keep functions unconditionally built as much as possible. I don't see
> > why this read_eoie_extension() must be built only on multithread
> > platforms.
> >
>
> This is conditional to avoid generating a warning on single threaded
> platforms where the function is currently unused. That seemed like a
> better choice than calling it and ignoring it on single threaded
> platforms just to avoid a compiler warning.
The third option is ignore the compiler. I consider that warning a
helpful suggestion, not a strict rule.
Most devs don't run single thread builds (I think) so is this function
is updated in a way that breaks single thread mode, it can only be
found out when this function is used in single thread mode. At that
point the function may have changed a lot. If it's built
unconditionally, at least single thread users will yell up much sooner
and we could fix it much earlier.
> >> @@ -2520,11 +2534,13 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
> >> return err;
> >>
> >> /* Write extension data here */
> >> + offset = lseek(newfd, 0, SEEK_CUR) + write_buffer_len;
> >> + the_hash_algo->init_fn(&eoie_c);
> >
> > Don't write (or even calculate to write it) unless it's needed. Which
> > means only do this when parallel reading is enabled and the index size
> > large enough, or when a test variable is set so you can force writing
> > this extension.
>
> I made the logic always write the extension based on the earlier
> discussion [1] where it was suggested this should have been part of the
> original index format for extensions from the beginning. This helps
> ensure it is available for current and future uses we haven't even
> discovered yet.
But it _is_ available now. If you need it, you write the extension
out. If we make this part of index version 5 (and make it not an
extension anymore) then I buy that argument. As it is, it's an
optional extension.
> [1] https://public-inbox.org/git/xmqqwp2s1h1x.fsf@gitster.mtv.corp.google.com/
>
>
> >> +
> >> +static void write_eoie_extension(struct strbuf *sb, git_hash_ctx *eoie_context, unsigned long offset)
> >
> > We normally just put function implementations before it's used to
> > avoid static forward declaration. Any special reason why it's not done
> > here?
> >
>
> This was done to promote readability of the (already large) read-cache.c
> file. I first considered moving the EOIE read/write functions into a
> separate file entirely but they need access to information only
> available within read-cache.c so I compromised and moved them to the end
> of the file instead.
I consider grouping extension related functions closer to
read_index_extension gives better readability, or at least better than
just putting new functions at the end in no particular order. But I
guess this is personal view.
--
Duy
^permalinkrawreply [flat|nested] 199+ messages in thread

*Re: [PATCH v5 2/5] read-cache: load cache extensions on a worker thread
2018-09-15 10:22 ` Duy Nguyen
2018-09-15 10:24 ` Duy Nguyen
2018-09-15 16:23 ` Duy Nguyen@ 2018-09-17 16:26 ` Ben Peart
2018-09-17 16:45 ` Duy Nguyen
2018-09-17 21:32 ` Junio C Hamano3 siblings, 1 reply; 199+ messages in thread
From: Ben Peart @ 2018-09-17 16:26 UTC (permalink / raw)
To: Duy Nguyen, Ben Peart; +Cc: Git Mailing List, Junio C Hamano, Ben Peart
On 9/15/2018 6:22 AM, Duy Nguyen wrote:
>> +index.threads::
>> + Specifies the number of threads to spawn when loading the index.
>> + This is meant to reduce index load time on multiprocessor machines.
>> + Specifying 0 or 'true' will cause Git to auto-detect the number of
>> + CPU's and set the number of threads accordingly. Defaults to 'true'.
>
> I'd rather this variable defaults to 0. Spawning threads have
> associated cost and most projects out there are small enough that this
> multi threading could just add more cost than gain. It only makes
> sense to enable this on huge repos.
>
> Wait there's no way to disable this parallel reading? Does not sound
> right. And if ordinary numbers mean the number of threads then 0
> should mean no threading. Auto detection could have a new keyword,
> like 'auto'.
>
The index.threads setting is patterned after the pack.threads setting
for consistency. Specifying 1 (or 'false') will disable multithreading
but I will call that out explicitly in the documentation to make it more
obvious.
The THREAD_COST logic is designed to ensure small repos don't incur more
cost than gain. If you have data on that logic that shows it isn't
working properly, I'm happy to change the logic as necessary.
>> --- a/read-cache.c
>> +++ b/read-cache.c
>> @@ -23,6 +23,10 @@
>> #include "split-index.h"
>> #include "utf8.h"
>> #include "fsmonitor.h"
>> +#ifndef NO_PTHREADS
>> +#include <pthread.h>
>> +#include <thread-utils.h>
>> +#endif
>
> I don't think you're supposed to include system header files after
> "cache.h". Including thread-utils.h should be enough (and it keeps the
> exception of inclduing pthread.h in just one place). Please use
> "pthread-utils.h" instead of <pthread-utils.h> which is usually for
> system header files. And include ptherad-utils.h unconditionally.
>
Thanks, I'll fix that.
>>
>> /* Mask for the name length in ce_flags in the on-disk index */
>>
>> @@ -1898,6 +1902,46 @@ static unsigned long read_eoie_extension(void *mmap_, size_t mmap_size);
>> #endif
>> static void write_eoie_extension(struct strbuf *sb, git_hash_ctx *eoie_context, unsigned long offset);
>>
>> +struct load_index_extensions
>> +{
>> +#ifndef NO_PTHREADS
>> + pthread_t pthread;
>> +#endif
>> + struct index_state *istate;
>> + void *mmap;
>> + size_t mmap_size;
>> + unsigned long src_offset;
>> +};
>> +
>> +static void *load_index_extensions(void *_data)
>> +{
>> + struct load_index_extensions *p = _data;
>> + unsigned long src_offset = p->src_offset;
>> +
>> + while (src_offset <= p->mmap_size - the_hash_algo->rawsz - 8) {
>> + /* After an array of active_nr index entries,
>> + * there can be arbitrary number of extended
>> + * sections, each of which is prefixed with
>> + * extension name (4-byte) and section length
>> + * in 4-byte network byte order.
>> + */
>> + uint32_t extsize;
>> + memcpy(&extsize, (char *)p->mmap + src_offset + 4, 4);
>> + extsize = ntohl(extsize);
>> + if (read_index_extension(p->istate,
>> + (const char *)p->mmap + src_offset,
>> + (char *)p->mmap + src_offset + 8,
>> + extsize) < 0) {
>> + munmap(p->mmap, p->mmap_size);
>> + die("index file corrupt");
>
> _()
>
You're feedback style can be a bit abrupt and terse. I _think_ what you
are trying to say here is that the "die" call should use the _() macro
around the string.
This is an edit of the previous code that loaded index extensions and
doesn't change the use of _(). I don't know the rules for when _()
should be used and didn't have any luck finding where it was documented
so left it unchanged.
FWIW, in this file alone there are 20 existing instances of die() or
die_errorno() and only two that use the _() macro. A quick grep through
the source code shows thousands of die() calls the vast majority of
which do not use the _() macro. This appears to be an area that is
unclear and inconsistent and could use some attention in a separate patch.
>> + /* if we created a thread, join it otherwise load the extensions on the primary thread */
>> +#ifndef NO_PTHREADS
>> + if (extension_offset && pthread_join(p.pthread, NULL))
>> + die(_("unable to join load_index_extensions_thread"));
>
> I guess the last _ is a typo and you wanted "unable to join
> load_index_extensions thread". Please use die_errno() instead.
>
Why should this be die_errorno() here? All other instances of
pthread_join() failing in a fatal way use die(), not die_errorno().
>> +#endif
>> + if (!extension_offset) {
>> + p.src_offset = src_offset;
>> + load_index_extensions(&p);
>> }
>> munmap(mmap, mmap_size);
>> return istate->cache_nr;
>> --
>> 2.18.0.windows.1
>>
>
>
^permalinkrawreply [flat|nested] 199+ messages in thread

*Re: [PATCH v5 2/5] read-cache: load cache extensions on a worker thread
2018-09-17 16:26 ` Ben Peart@ 2018-09-17 16:45 ` Duy Nguyen0 siblings, 0 replies; 199+ messages in thread
From: Duy Nguyen @ 2018-09-17 16:45 UTC (permalink / raw)
To: Ben Peart; +Cc: Ben Peart, Git Mailing List, Junio C Hamano, Ben Peart
On Mon, Sep 17, 2018 at 6:26 PM Ben Peart <peartben@gmail.com> wrote:
>
>
>
> On 9/15/2018 6:22 AM, Duy Nguyen wrote:
> >> +index.threads::
> >> + Specifies the number of threads to spawn when loading the index.
> >> + This is meant to reduce index load time on multiprocessor machines.
> >> + Specifying 0 or 'true' will cause Git to auto-detect the number of
> >> + CPU's and set the number of threads accordingly. Defaults to 'true'.
> >
> > I'd rather this variable defaults to 0. Spawning threads have
> > associated cost and most projects out there are small enough that this
> > multi threading could just add more cost than gain. It only makes
> > sense to enable this on huge repos.
> >
> > Wait there's no way to disable this parallel reading? Does not sound
> > right. And if ordinary numbers mean the number of threads then 0
> > should mean no threading. Auto detection could have a new keyword,
> > like 'auto'.
> >
>
> The index.threads setting is patterned after the pack.threads setting
> for consistency. Specifying 1 (or 'false') will disable multithreading
> but I will call that out explicitly in the documentation to make it more
> obvious.
>
> The THREAD_COST logic is designed to ensure small repos don't incur more
> cost than gain. If you have data on that logic that shows it isn't
> working properly, I'm happy to change the logic as necessary.
THREAD_COST does not apply to this extension thread if I remember correctly.
> >> +static void *load_index_extensions(void *_data)
> >> +{
> >> + struct load_index_extensions *p = _data;
> >> + unsigned long src_offset = p->src_offset;
> >> +
> >> + while (src_offset <= p->mmap_size - the_hash_algo->rawsz - 8) {
> >> + /* After an array of active_nr index entries,
> >> + * there can be arbitrary number of extended
> >> + * sections, each of which is prefixed with
> >> + * extension name (4-byte) and section length
> >> + * in 4-byte network byte order.
> >> + */
> >> + uint32_t extsize;
> >> + memcpy(&extsize, (char *)p->mmap + src_offset + 4, 4);
> >> + extsize = ntohl(extsize);
> >> + if (read_index_extension(p->istate,
> >> + (const char *)p->mmap + src_offset,
> >> + (char *)p->mmap + src_offset + 8,
> >> + extsize) < 0) {
> >> + munmap(p->mmap, p->mmap_size);
> >> + die("index file corrupt");
> >
> > _()
> >
>
> You're feedback style can be a bit abrupt and terse. I _think_ what you
> are trying to say here is that the "die" call should use the _() macro
> around the string.
Yes. Sorry I should have explained a bit better.
> This is an edit of the previous code that loaded index extensions and
> doesn't change the use of _(). I don't know the rules for when _()
> should be used and didn't have any luck finding where it was documented
> so left it unchanged.
>
> FWIW, in this file alone there are 20 existing instances of die() or
> die_errorno() and only two that use the _() macro. A quick grep through
> the source code shows thousands of die() calls the vast majority of
> which do not use the _() macro. This appears to be an area that is
> unclear and inconsistent and could use some attention in a separate patch.
This is one of the gray areas where we have to determine if the
message should be translated or not. And it should be translated
unless it's part of the plumbing output, to be consumed by scripts.
I know there's lots of messages still untranslated. I'm trying to do
something about that. But I cannot just go fix up all strings when you
all keep adding more strings for me to go fix. When you add a new
string, please consider if it should be translated or not. In this
case since it already receives reviewer attention we should be able to
determine it now, instead of delaying it for later.
> >> + /* if we created a thread, join it otherwise load the extensions on the primary thread */
> >> +#ifndef NO_PTHREADS
> >> + if (extension_offset && pthread_join(p.pthread, NULL))
> >> + die(_("unable to join load_index_extensions_thread"));
> >
> > I guess the last _ is a typo and you wanted "unable to join
> > load_index_extensions thread". Please use die_errno() instead.
> >
>
> Why should this be die_errorno() here? All other instances of
> pthread_join() failing in a fatal way use die(), not die_errorno().
That argument does not fly well in my opinion. I read the man page and
it listed the error codes, which made me think that we need to use
die_errno() to show the error. My mistake though is the error is
returned as the return value, not in errno, so die_errno() would not
catch it. But we could still do something like
int ret = pthread_join();
die(_("blah blah: %s"), strerror(ret));
Other code can also be improved, but that's a separate issue.
--
Duy
^permalinkrawreply [flat|nested] 199+ messages in thread

*Re: [PATCH v5 2/5] read-cache: load cache extensions on a worker thread
2018-09-15 16:23 ` Duy Nguyen@ 2018-09-17 17:19 ` Junio C Hamano0 siblings, 0 replies; 199+ messages in thread
From: Junio C Hamano @ 2018-09-17 17:19 UTC (permalink / raw)
To: Duy Nguyen; +Cc: Ben Peart, Git Mailing List, Ben Peart
Duy Nguyen <pclouds@gmail.com> writes:
> On Sat, Sep 15, 2018 at 12:22 PM Duy Nguyen <pclouds@gmail.com> wrote:
>> Wait there's no way to disable this parallel reading? Does not sound
>> right. And if ordinary numbers mean the number of threads then 0
>> should mean no threading. Auto detection could have a new keyword,
>> like 'auto'.
>
> My bad. Disabling threading means _1_ thread. What was I thinking...
I did the same during my earlier review. It seems that it somehow
is unintuitive to us that we do not specify how many _extra_ threads
of control we dedicate to ;-).
^permalinkrawreply [flat|nested] 199+ messages in thread

*Re: [PATCH v5 1/5] eoie: add End of Index Entry (EOIE) extension
2018-09-17 16:05 ` Duy Nguyen@ 2018-09-17 17:31 ` Junio C Hamano
2018-09-17 17:38 ` Duy Nguyen0 siblings, 1 reply; 199+ messages in thread
From: Junio C Hamano @ 2018-09-17 17:31 UTC (permalink / raw)
To: Duy Nguyen; +Cc: Ben Peart, Ben Peart, Git Mailing List, Ben Peart
Duy Nguyen <pclouds@gmail.com> writes:
> But it _is_ available now. If you need it, you write the extension
> out.
Are you arguing for making it omitted when it is not needed (e.g.
small enough index file)? IOW, did you mean "If you do not need it,
you do not write it out" by the above?
I do not think overhead of writing (or preparing to write) the
extension for a small index file is by definition small enough ;-).
I do not think the configuration that decides if the reader side
uses parallel reading should have any say in the decision to write
(or omit) the extension, by the way.
^permalinkrawreply [flat|nested] 199+ messages in thread

*Re: [PATCH v5 1/5] eoie: add End of Index Entry (EOIE) extension
2018-09-17 17:31 ` Junio C Hamano@ 2018-09-17 17:38 ` Duy Nguyen
2018-09-17 19:08 ` Junio C Hamano0 siblings, 1 reply; 199+ messages in thread
From: Duy Nguyen @ 2018-09-17 17:38 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Ben Peart, Ben Peart, Git Mailing List, Ben Peart
On Mon, Sep 17, 2018 at 7:31 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Duy Nguyen <pclouds@gmail.com> writes:
>
> > But it _is_ available now. If you need it, you write the extension
> > out.
>
> Are you arguing for making it omitted when it is not needed (e.g.
> small enough index file)? IOW, did you mean "If you do not need it,
> you do not write it out" by the above?
Yes I did.
> I do not think overhead of writing (or preparing to write) the
> extension for a small index file is by definition small enough ;-).
Good point.
I get annoyed by the "ignoring unknown extension xxx" messages while
testing though (not just this extension) and I think it will be the
same for other git implementations. But perhaps other implementations
just silently drop the extension. Most of the extensions we have added
so far (except the ancient 'TREE') are optional and are probably not
present 99% of time when a different git impl reads an index created
by C Git. This 'EIOE' may be a good test then to see if they follow
the "ignore optional extensions" rule since it will always appear in
new C Git releases.
--
Duy
^permalinkrawreply [flat|nested] 199+ messages in thread

*Re: [PATCH v5 3/5] read-cache: load cache entries on worker threads
2018-09-15 11:09 ` Duy Nguyen@ 2018-09-17 18:52 ` Ben Peart0 siblings, 0 replies; 199+ messages in thread
From: Ben Peart @ 2018-09-17 18:52 UTC (permalink / raw)
To: Duy Nguyen, Ben Peart; +Cc: Git Mailing List, Junio C Hamano, Ben Peart
On 9/15/2018 7:09 AM, Duy Nguyen wrote:
> On Sat, Sep 15, 2018 at 01:07:46PM +0200, Duy Nguyen wrote:
>> 12:50:00.084237 read-cache.c:1721 start loading index
>> 12:50:00.119941 read-cache.c:1943 performance: 0.034778758 s: loaded all extensions (1667075 bytes)
>> 12:50:00.185352 read-cache.c:2029 performance: 0.100152079 s: loaded 367110 entries
>> 12:50:00.189683 read-cache.c:2126 performance: 0.104566615 s: finished scanning all entries
>> 12:50:00.217900 read-cache.c:2029 performance: 0.082309193 s: loaded 367110 entries
>> 12:50:00.259969 read-cache.c:2029 performance: 0.070257130 s: loaded 367108 entries
>> 12:50:00.263662 read-cache.c:2278 performance: 0.179344458 s: read cache .git/index
>
> The previous mail wraps these lines and make it a bit hard to read. Corrected now.
>
> --
> Duy
>
Interesting! Clearly the data shape makes a big difference here as I
had run a similar test but in my case, the extensions thread actually
finished last (and it's cost is what drove me to move that onto a
separate thread that starts first).
Purpose First Last Duration
load_index_extensions_thread 719.40 968.50 249.10
load_cache_entries_thread 718.89 738.65 19.76
load_cache_entries_thread 730.39 753.83 23.43
load_cache_entries_thread 741.23 751.23 10.00
load_cache_entries_thread 751.93 780.88 28.95
load_cache_entries_thread 763.60 791.31 27.72
load_cache_entries_thread 773.46 783.46 10.00
load_cache_entries_thread 783.96 794.28 10.32
load_cache_entries_thread 795.61 805.52 9.91
load_cache_entries_thread 805.99 827.21 21.22
load_cache_entries_thread 816.85 826.85 10.00
load_cache_entries_thread 827.03 837.96 10.93
In my tests, the scanning thread clearly delayed the later ce threads
but given the extension was so slow, it didn't impact the overall time
nearly as much as your case.
I completely agree that the optimal solution would be to go back to my
original patch/design. It eliminates the overhead of the scanning
thread entirely and allows all threads to start at the same time. This
would ensure the best performance whether the extensions were the
longest thread or the cache entry threads took the longest.
I ran out of time and energy last year so dropped it to work on other
tasks. I appreciate your offer of help. Perhaps between the two of us
we could successfully get it through the mailing list this time. :-)
Let me go back and see what it would take to combine the current EOIE
patch with the older IEOT patch.
I'm also intrigued with your observation that over committing the cpu
actually results in time savings. I hadn't tested that. It looks like
that could have a positive impact on the overall time and warrant a
change to the default nr_threads logic.
^permalinkrawreply [flat|nested] 199+ messages in thread

*Re: [PATCH v5 1/5] eoie: add End of Index Entry (EOIE) extension
2018-09-17 17:38 ` Duy Nguyen@ 2018-09-17 19:08 ` Junio C Hamano0 siblings, 0 replies; 199+ messages in thread
From: Junio C Hamano @ 2018-09-17 19:08 UTC (permalink / raw)
To: Duy Nguyen; +Cc: Ben Peart, Ben Peart, Git Mailing List, Ben Peart
Duy Nguyen <pclouds@gmail.com> writes:
> I get annoyed by the "ignoring unknown extension xxx" messages while
> testing though (not just this extension) and I think it will be the
> same for other git implementations. But perhaps other implementations
> just silently drop the extension. Most of the extensions we have added
> so far (except the ancient 'TREE') are optional and are probably not
Most of the index extensions are optional, including TREE. I think
"link" is the only one that the readers that do not understand it
are told to abort without causing damage.
> present 99% of time when a different git impl reads an index created
> by C Git. This 'EIOE' may be a good test then to see if they follow
> the "ignore optional extensions" rule since it will always appear in
> new C Git releases.
I think we probably should squelch "ignoring unknown" unless some
sort of GIT_TRACE/DEBUG switch is set.
Patches welcome ;-)
Thanks.
^permalinkrawreply [flat|nested] 199+ messages in thread

*Re: [PATCH v6 0/7] speed up index load through parallelization
2018-09-26 19:54 ` [PATCH v6 0/7] speed up index load through parallelization Ben Peart
` (6 preceding siblings ...)
2018-09-26 19:54 ` [PATCH v6 7/7] read-cache: load cache entries on worker threads Ben Peart
@ 2018-09-26 22:06 ` Junio C Hamano
2018-09-27 17:13 ` Duy Nguyen8 siblings, 0 replies; 199+ messages in thread
From: Junio C Hamano @ 2018-09-26 22:06 UTC (permalink / raw)
To: Ben Peart; +Cc: git, pclouds, Ben Peart
Ben Peart <peartben@gmail.com> writes:
> Base Ref: master
> Web-Diff: https://github.com/benpeart/git/commit/a0300882d4
> Checkout: git fetch https://github.com/benpeart/git read-index-multithread-v6 && git checkout a0300882d4
>
>
> This iteration brings back the Index Entry Offset Table (IEOT) extension
> which enables us to multi-thread the cache entry parsing without having
> the primary thread have to scan all the entries first. In cases where the
> cache entry parsing is the most expensive part, this yields some additional
> savings.
Nice.
> Test w/100,000 files Baseline Optimize V4 Extensions Entries
> ----------------------------------------------------------------------------
> 0002.1: read_cache 22.36 18.74 -16.2% 18.64 -16.6% 12.63 -43.5%
>
> Test w/1,000,000 files Baseline Optimize V4 Extensions Entries
> -----------------------------------------------------------------------------
> 0002.1: read_cache 304.40 270.70 -11.1% 195.50 -35.8% 204.82 -32.7%
>
> Note that on the 1,000,000 files case, multi-threading the cache entry parsing
> does not yield a performance win. This is because the cost to parse the
> index extensions in this repo, far outweigh the cost of loading the cache
> entries.
> ...
> The high cost of parsing the index extensions is driven by the cache tree
> and the untracked cache extensions. As this is currently the longest pole,
> any reduction in this time will reduce the overall index load times so is
> worth further investigation in another patch series.
Interesting.
> One option would be to load each extension on a separate thread but I
> believe that is overkill for the vast majority of repos. Instead, some
> optimization of the loading code for these two extensions is probably worth
> looking into as a quick examination shows that the bulk of the time for both
> of them is spent in xcalloc().
Thanks. Looking forward to block some quality time off to read this
through, but from the cursory look (read: diff between the previous
round), this looks quite promising.
^permalinkrawreply [flat|nested] 199+ messages in thread

*Re: [PATCH v6 0/7] speed up index load through parallelization
2018-09-26 19:54 ` [PATCH v6 0/7] speed up index load through parallelization Ben Peart
` (7 preceding siblings ...)
2018-09-26 22:06 ` [PATCH v6 0/7] speed up index load through parallelization Junio C Hamano
@ 2018-09-27 17:13 ` Duy Nguyen8 siblings, 0 replies; 199+ messages in thread
From: Duy Nguyen @ 2018-09-27 17:13 UTC (permalink / raw)
To: Ben Peart; +Cc: Git Mailing List, Junio C Hamano, Ben Peart
On Wed, Sep 26, 2018 at 9:54 PM Ben Peart <peartben@gmail.com> wrote:
> The high cost of parsing the index extensions is driven by the cache tree
> and the untracked cache extensions. As this is currently the longest pole,
> any reduction in this time will reduce the overall index load times so is
> worth further investigation in another patch series.
>
> Name First Last Elapsed
> | + git!read_index_extension 684.052 870.244 186.192
> | + git!cache_tree_read 684.052 797.801 113.749
> | + git!read_untracked_extension 797.801 870.244 72.443
>
> One option would be to load each extension on a separate thread but I
> believe that is overkill for the vast majority of repos.
They both grow proportional to the number of trees in worktree, which
probably also scales to the worktree size. Frankly I think the
parallel index loading is already overkill for the majority of repos,
so speeding up further of the 1% giant repos does not sound that bad.
And I think you already lay the foundation for loading index stuff in
parallel with this series.
> Instead, some
> optimization of the loading code for these two extensions is probably worth
> looking into as a quick examination shows that the bulk of the time for both
> of them is spent in xcalloc().
Another easy "optimization" is delaying loading these until we need
them (or load them in background, read_index() returns even before
these extensions are finished, but this is of course trickier).
UNTR extension for example is only useful for "git status" (and maybe
one or two other use cases). Not having to load them all the time is
likely a win. The role of TREE extension has grown bigger these days
so it's still maybe worth putting more effort into making it load it
faster rather than just hiding the cost.
--
Duy
^permalinkrawreply [flat|nested] 199+ messages in thread

*Re: [PATCH v6 4/7] config: add new index.threads config setting
2018-09-28 13:39 ` Ben Peart@ 2018-09-28 17:07 ` Junio C Hamano
2018-09-28 19:41 ` Ben Peart0 siblings, 1 reply; 199+ messages in thread
From: Junio C Hamano @ 2018-09-28 17:07 UTC (permalink / raw)
To: Ben Peart; +Cc: SZEDER Gábor, git, pclouds, Ben Peart, Ben Peart
Ben Peart <peartben@gmail.com> writes:
>> Why does multithreading have to be disabled in this test?
>
> If multi-threading is enabled, it will write out the IEOT extension
> which changes the SHA and causes the test to fail.
I think it is a design mistake to let the writing processes's
capability decide what is written in the file to be read later by a
different process, which possibly may have different capability. If
you are not writing with multiple threads, it should not matter if
that writer process is capable of and configured to spawn 8 threads
if the process were reading the file---as it is not reading the file
it is writing right now.
I can understand if the design is to write IEOT only if the
resulting index is expected to become large enough (above an
arbitrary threshold like 100k entries) to matter. I also can
understand if IEOT is omitted when the repository configuration says
that no process is allowed to read the index with multi-threaded
codepath in that repository.
^permalinkrawreply [flat|nested] 199+ messages in thread

*Re: [PATCH v6 3/7] eoie: add End of Index Entry (EOIE) extension
2018-09-28 0:19 ` SZEDER Gábor@ 2018-09-28 18:38 ` Ben Peart0 siblings, 0 replies; 199+ messages in thread
From: Ben Peart @ 2018-09-28 18:38 UTC (permalink / raw)
To: SZEDER Gábor; +Cc: git, gitster, pclouds, Ben Peart, Ben Peart
On 9/27/2018 8:19 PM, SZEDER Gábor wrote:
> On Wed, Sep 26, 2018 at 03:54:38PM -0400, Ben Peart wrote:
>> The End of Index Entry (EOIE) is used to locate the end of the variable
>
> Nit: perhaps start with:
>
> The End of Index Entry (EOIE) optional extension can be used to ...
>
> to make it clearer for those who don't immediately realize the
> significance of the upper case 'E' in the extension's signature.
>
>> length index entries and the beginning of the extensions. Code can take
>> advantage of this to quickly locate the index extensions without having
>> to parse through all of the index entries.
>>
>> Because it must be able to be loaded before the variable length cache
>> entries and other index extensions, this extension must be written last.
>> The signature for this extension is { 'E', 'O', 'I', 'E' }.
>>
>> The extension consists of:
>>
>> - 32-bit offset to the end of the index entries
>>
>> - 160-bit SHA-1 over the extension types and their sizes (but not
>> their contents). E.g. if we have "TREE" extension that is N-bytes
>> long, "REUC" extension that is M-bytes long, followed by "EOIE",
>> then the hash would be:
>>
>> SHA-1("TREE" + <binary representation of N> +
>> "REUC" + <binary representation of M>)
>>
>> Signed-off-by: Ben Peart <Ben.Peart@microsoft.com>
>> ---
>> Documentation/technical/index-format.txt | 23 ++++
>> read-cache.c | 151 +++++++++++++++++++++--
>> t/README | 5 +
>> t/t1700-split-index.sh | 1 +
>> 4 files changed, 172 insertions(+), 8 deletions(-)
>>
>
>> diff --git a/t/README b/t/README
>> index 3ea6c85460..aa33ac4f26 100644
>> --- a/t/README
>> +++ b/t/README
>> @@ -327,6 +327,11 @@ GIT_TEST_COMMIT_GRAPH=<boolean>, when true, forces the commit-graph to
>> be written after every 'git commit' command, and overrides the
>> 'core.commitGraph' setting to true.
>>
>> +GIT_TEST_DISABLE_EOIE=<boolean> disables writing the EOIE extension.
>> +This is used to allow tests 1, 4-9 in t1700-split-index.sh to succeed
>> +as they currently hard code SHA values for the index which are no longer
>> +valid due to the addition of the EOIE extension.
>
> Is this extension enabled by default? The commit message doesn't
> explicitly say so, but I don't see any way to turn it on or off, while
> there is this new GIT_TEST environment variable to disable it for one
> particular test, so it seems so. If that's indeed the case, then
> wouldn't it be better to update those hard-coded SHA1 values in t1700
> instead?
>
Yes, it is enabled by default and the only way to disable it is the
GIT_TEST_DISABLE_EOIE environment variable.
The tests in t1700-split-index.sh assume that there are no extensions in
the index file so anything that adds an extension, will break one or
more of the tests.
First in 'enable split index', they hard code SHA values assuming there
are no extensions. If some option adds an extension, these hard coded
values no longer match and the test fails.
Later in 'disable split index' they save off the SHA of the index with
split-index turned off and then in later tests, compare it to the SHA of
the shared index. Because extensions are stripped when the shared index
is written out this only works if there were not extensions in the
original index.
I'll document this behavior and reasoning in the test directly.
This did cause me to reexamine how EOIE and IEOT behave when split index
is turned on. These two extensions help most with a large index. When
split index is turned on, the large index is actually the shared index
as the index is now the smaller set of deltas.
Currently, the extensions are stripped out of the shared index which
means they are not available when they are needed to quickly load the
shared index. I'll see if I can update the patch so that these
extensions are still written out and available in the shared index to
speed up when it is loaded.
Thanks!
>> Naming Tests
>> ------------
>>
>> diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
>> index be22398a85..1f168378c8 100755
>> --- a/t/t1700-split-index.sh
>> +++ b/t/t1700-split-index.sh
>> @@ -7,6 +7,7 @@ test_description='split index mode tests'
>> # We need total control of index splitting here
>> sane_unset GIT_TEST_SPLIT_INDEX
>> sane_unset GIT_FSMONITOR_TEST
>> +GIT_TEST_DISABLE_EOIE=true; export GIT_TEST_DISABLE_EOIE
>>
>> test_expect_success 'enable split index' '
>> git config splitIndex.maxPercentChange 100 &&
>> --
>> 2.18.0.windows.1
>>
^permalinkrawreply [flat|nested] 199+ messages in thread

*Re: [PATCH v6 4/7] config: add new index.threads config setting
2018-09-28 17:07 ` Junio C Hamano@ 2018-09-28 19:41 ` Ben Peart
2018-09-28 20:30 ` Ramsay Jones0 siblings, 1 reply; 199+ messages in thread
From: Ben Peart @ 2018-09-28 19:41 UTC (permalink / raw)
To: Junio C Hamano; +Cc: SZEDER Gábor, git, pclouds, Ben Peart, Ben Peart
On 9/28/2018 1:07 PM, Junio C Hamano wrote:
> Ben Peart <peartben@gmail.com> writes:
>
>>> Why does multithreading have to be disabled in this test?
>>
>> If multi-threading is enabled, it will write out the IEOT extension
>> which changes the SHA and causes the test to fail.
>
> I think it is a design mistake to let the writing processes's
> capability decide what is written in the file to be read later by a
> different process, which possibly may have different capability. If
> you are not writing with multiple threads, it should not matter if
> that writer process is capable of and configured to spawn 8 threads
> if the process were reading the file---as it is not reading the file
> it is writing right now.
>
> I can understand if the design is to write IEOT only if the
> resulting index is expected to become large enough (above an
> arbitrary threshold like 100k entries) to matter. I also can
> understand if IEOT is omitted when the repository configuration says
> that no process is allowed to read the index with multi-threaded
> codepath in that repository.
>
There are two different paths which determine how many blocks are
written to the IEOT. The first is the default path. On this path, the
number of blocks is determined by the number of cache entries divided by
the THREAD_COST. If there are sufficient entries to make it faster to
use threading, then it will automatically use enough blocks to optimize
the performance of reading the entries across multiple threads.
I currently cap the maximum number of blocks to be the number of cores
that would be available to process them on that same machine purely as
an optimization. The majority of the time, the index will be read from
the same machine that it was written on so this works well. Before I
added that logic, you would usually end up with more blocks than
available threads which meant some threads had more to do than the other
threads and resulted in worse performance. For example, 4 blocks across
3 threads results in the 1st thread having twice as much work to do as
the other threads.
If the index is copied to a machine with a different number of cores, it
will still all work - it just may not be optimal for that machine. This
is self correcting because as soon as the index is written out, it will
be optimized for that machine.
If the "automatically try to make it perform optimally" logic doesn't
work for some reason, we have path #2.
The second path is when the user specifies a specific number of blocks
via the GIT_TEST_INDEX_THREADS=<n> environment variable or the
index.threads=<n> config setting. If they ask for n blocks, they will
get n blocks. This is the "I know what I'm doing and want to control
the behavior" path.
I just added one additional test (see patch below) to avoid a divide by
zero bug and simplify things a bit. With this change, if there are
fewer than two blocks, the IEOT extension is not written out as it isn't
needed. The load would be single threaded anyway so there is no reason
to write out a IEOT extensions that won't be used.
diff --git a/read-cache.c b/read-cache.c
index f5d766088d..a1006fa824 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2751,18+2751,23 @@ static int do_write_index(struct index_state
*istate, struct tempfile *tempfil
e,
*/
if (!nr) {
ieot_blocks = istate->cache_nr / THREAD_COST;
- if (ieot_blocks < 1)
- ieot_blocks = 1;
cpus = online_cpus();
if (ieot_blocks > cpus - 1)
ieot_blocks = cpus - 1;
} else {
ieot_blocks = nr;
}
- ieot = xcalloc(1, sizeof(struct index_entry_offset_table)
- + (ieot_blocks * sizeof(struct
index_entry_offset)));
- ieot->nr = 0;
- ieot_work = DIV_ROUND_UP(entries, ieot_blocks);
+
+ /*
+ * no reason to write out the IEOT extension if we don't
+ * have enough blocks to utilize multi-threading
+ */
+ if (ieot_blocks > 1) {
+ ieot = xcalloc(1, sizeof(struct
index_entry_offset_table)
+ + (ieot_blocks * sizeof(struct
index_entry_offset)));
+ ieot->nr = 0;
+ ieot_work = DIV_ROUND_UP(entries, ieot_blocks);
+ }
}
#endif
^permalinkrawreply [flat|nested] 199+ messages in thread

*Re: [PATCH v6 4/7] config: add new index.threads config setting
2018-09-28 19:41 ` Ben Peart@ 2018-09-28 20:30 ` Ramsay Jones
2018-09-28 22:15 ` Junio C Hamano0 siblings, 1 reply; 199+ messages in thread
From: Ramsay Jones @ 2018-09-28 20:30 UTC (permalink / raw)
To: Ben Peart, Junio C Hamano
Cc: SZEDER Gábor, git, pclouds, Ben Peart, Ben Peart
On 28/09/18 20:41, Ben Peart wrote:
>
>
> On 9/28/2018 1:07 PM, Junio C Hamano wrote:
>> Ben Peart <peartben@gmail.com> writes:
>>
>>>> Why does multithreading have to be disabled in this test?
>>>
>>> If multi-threading is enabled, it will write out the IEOT extension
>>> which changes the SHA and causes the test to fail.
>>
>> I think it is a design mistake to let the writing processes's
>> capability decide what is written in the file to be read later by a
>> different process, which possibly may have different capability. If
>> you are not writing with multiple threads, it should not matter if
>> that writer process is capable of and configured to spawn 8 threads
>> if the process were reading the file---as it is not reading the file
>> it is writing right now.
>>
>> I can understand if the design is to write IEOT only if the
>> resulting index is expected to become large enough (above an
>> arbitrary threshold like 100k entries) to matter. I also can
>> understand if IEOT is omitted when the repository configuration says
>> that no process is allowed to read the index with multi-threaded
>> codepath in that repository.
>>
>
> There are two different paths which determine how many blocks are written to the IEOT. The first is the default path. On this path, the number of blocks is determined by the number of cache entries divided by the THREAD_COST. If there are sufficient entries to make it faster to use threading, then it will automatically use enough blocks to optimize the performance of reading the entries across multiple threads.
>
> I currently cap the maximum number of blocks to be the number of cores that would be available to process them on that same machine purely as an optimization. The majority of the time, the index will be read from the same machine that it was written on so this works well. Before I added that logic, you would usually end up with more blocks than available threads which meant some threads had more to do than the other threads and resulted in worse performance. For example, 4 blocks across 3 threads results in the 1st thread having twice as much work to do as the other threads.
>
> If the index is copied to a machine with a different number of cores, it will still all work - it just may not be optimal for that machine. This is self correcting because as soon as the index is written out, it will be optimized for that machine.
>
> If the "automatically try to make it perform optimally" logic doesn't work for some reason, we have path #2.
>
> The second path is when the user specifies a specific number of blocks via the GIT_TEST_INDEX_THREADS=<n> environment variable or the index.threads=<n> config setting. If they ask for n blocks, they will get n blocks. This is the "I know what I'm doing and want to control the behavior" path.
>
> I just added one additional test (see patch below) to avoid a divide by zero bug and simplify things a bit. With this change, if there are fewer than two blocks, the IEOT extension is not written out as it isn't needed. The load would be single threaded anyway so there is no reason to write out a IEOT extensions that won't be used.
>
>
>
> diff --git a/read-cache.c b/read-cache.c
> index f5d766088d..a1006fa824 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -2751,18 +2751,23 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfil
> e,
> */
> if (!nr) {
> ieot_blocks = istate->cache_nr / THREAD_COST;
> - if (ieot_blocks < 1)
> - ieot_blocks = 1;
> cpus = online_cpus();
> if (ieot_blocks > cpus - 1)
> ieot_blocks = cpus - 1;
So, am I reading this correctly - you need cpus > 2 before an
IEOT extension block is written out?
OK.
ATB,
Ramsay Jones
> } else {
> ieot_blocks = nr;
> }
> - ieot = xcalloc(1, sizeof(struct index_entry_offset_table)
> - + (ieot_blocks * sizeof(struct index_entry_offset)));
> - ieot->nr = 0;
> - ieot_work = DIV_ROUND_UP(entries, ieot_blocks);
> +
> + /*
> + * no reason to write out the IEOT extension if we don't
> + * have enough blocks to utilize multi-threading
> + */
> + if (ieot_blocks > 1) {
> + ieot = xcalloc(1, sizeof(struct index_entry_offset_table)
> + + (ieot_blocks * sizeof(struct index_entry_offset)));
> + ieot->nr = 0;
> + ieot_work = DIV_ROUND_UP(entries, ieot_blocks);
> + }
> }
> #endif
>
>
^permalinkrawreply [flat|nested] 199+ messages in thread

*Re: [PATCH v6 4/7] config: add new index.threads config setting
2018-09-28 22:15 ` Junio C Hamano@ 2018-10-01 13:17 ` Ben Peart
2018-10-01 15:06 ` SZEDER Gábor0 siblings, 1 reply; 199+ messages in thread
From: Ben Peart @ 2018-10-01 13:17 UTC (permalink / raw)
To: Junio C Hamano, Ramsay Jones
Cc: SZEDER Gábor, git, pclouds, Ben Peart, Ben Peart
On 9/28/2018 6:15 PM, Junio C Hamano wrote:
> Ramsay Jones <ramsay@ramsayjones.plus.com> writes:
>
>>> if (!nr) {
>>> ieot_blocks = istate->cache_nr / THREAD_COST;
>>> - if (ieot_blocks < 1)
>>> - ieot_blocks = 1;
>>> cpus = online_cpus();
>>> if (ieot_blocks > cpus - 1)
>>> ieot_blocks = cpus - 1;
>>
>> So, am I reading this correctly - you need cpus > 2 before an
>> IEOT extension block is written out?
>>
>> OK.
>
> Why should we be even calling online_cpus() in this codepath to
> write the index in a single thread to begin with?
>
> The number of cpus that readers would use to read this index file
> has nothing to do with the number of cpus available to this
> particular writer process.
>
As I mentioned in my other reply, this is optimizing for the most common
case where the index is read from the same machine that wrote it and the
user is taking the default settings (ie index.threads=true).
Aligning the number of blocks to the number of threads that will be
processing them avoids situations where one thread may have up to double
the work to do as the other threads (for example, if there were 3 blocks
to be processed by 2 threads).
^permalinkrawreply [flat|nested] 199+ messages in thread

*Re: [PATCH v6 4/7] config: add new index.threads config setting
2018-10-01 13:17 ` Ben Peart@ 2018-10-01 15:06 ` SZEDER Gábor0 siblings, 0 replies; 199+ messages in thread
From: SZEDER Gábor @ 2018-10-01 15:06 UTC (permalink / raw)
To: Ben Peart
Cc: Junio C Hamano, Ramsay Jones, git, pclouds, Ben Peart, Ben Peart
On Mon, Oct 01, 2018 at 09:17:53AM -0400, Ben Peart wrote:
>
>
> On 9/28/2018 6:15 PM, Junio C Hamano wrote:
> >Ramsay Jones <ramsay@ramsayjones.plus.com> writes:
> >
> >>> if (!nr) {
> >>> ieot_blocks = istate->cache_nr / THREAD_COST;
> >>>- if (ieot_blocks < 1)
> >>>- ieot_blocks = 1;
> >>> cpus = online_cpus();
> >>> if (ieot_blocks > cpus - 1)
> >>> ieot_blocks = cpus - 1;
> >>
> >>So, am I reading this correctly - you need cpus > 2 before an
> >>IEOT extension block is written out?
> >>
> >>OK.
> >
> >Why should we be even calling online_cpus() in this codepath to
> >write the index in a single thread to begin with?
> >
> >The number of cpus that readers would use to read this index file
> >has nothing to do with the number of cpus available to this
> >particular writer process.
> >
>
> As I mentioned in my other reply, this is optimizing for the most common
> case where the index is read from the same machine that wrote it and the
> user is taking the default settings (ie index.threads=true).
I think this is a reasonable assumption to make, but it should be
mentioned in the relevant commit message. Alas, as far as I can tell,
not a single commit message has been updated in v7.
> Aligning the number of blocks to the number of threads that will be
> processing them avoids situations where one thread may have up to double the
> work to do as the other threads (for example, if there were 3 blocks to be
> processed by 2 threads).
^permalinkrawreply [flat|nested] 199+ messages in thread

*Re: [PATCH v7 3/7] eoie: add End of Index Entry (EOIE) extension
2018-10-01 13:45 ` [PATCH v7 3/7] eoie: add End of Index Entry (EOIE) extension Ben Peart
@ 2018-10-01 15:17 ` SZEDER Gábor
2018-10-02 14:34 ` Ben Peart
2018-10-01 15:30 ` Duy Nguyen1 sibling, 1 reply; 199+ messages in thread
From: SZEDER Gábor @ 2018-10-01 15:17 UTC (permalink / raw)
To: Ben Peart; +Cc: git, gitster, pclouds, Ben Peart
On Mon, Oct 01, 2018 at 09:45:52AM -0400, Ben Peart wrote:
> From: Ben Peart <benpeart@microsoft.com>
>
> The End of Index Entry (EOIE) is used to locate the end of the variable
> length index entries and the beginning of the extensions. Code can take
> advantage of this to quickly locate the index extensions without having
> to parse through all of the index entries.
>
> Because it must be able to be loaded before the variable length cache
> entries and other index extensions, this extension must be written last.
> The signature for this extension is { 'E', 'O', 'I', 'E' }.
>
> The extension consists of:
>
> - 32-bit offset to the end of the index entries
>
> - 160-bit SHA-1 over the extension types and their sizes (but not
> their contents). E.g. if we have "TREE" extension that is N-bytes
> long, "REUC" extension that is M-bytes long, followed by "EOIE",
> then the hash would be:
>
> SHA-1("TREE" + <binary representation of N> +
> "REUC" + <binary representation of M>)
>
> Signed-off-by: Ben Peart <peartben@gmail.com>
I think the commit message should explicitly mention that this this
extension
- will always be written and why,
- but is optional, so other Git implementations not supporting it will
have no troubles reading the index,
- and that it is written even to the shared index and why, and that
because of this the index checksums in t1700 had to be updated.
^permalinkrawreply [flat|nested] 199+ messages in thread

*Re: [PATCH v7 3/7] eoie: add End of Index Entry (EOIE) extension
2018-10-01 15:17 ` SZEDER Gábor@ 2018-10-02 14:34 ` Ben Peart0 siblings, 0 replies; 199+ messages in thread
From: Ben Peart @ 2018-10-02 14:34 UTC (permalink / raw)
To: SZEDER Gábor; +Cc: git, gitster, pclouds, Ben Peart
On 10/1/2018 11:17 AM, SZEDER Gábor wrote:
> On Mon, Oct 01, 2018 at 09:45:52AM -0400, Ben Peart wrote:
>> From: Ben Peart <benpeart@microsoft.com>
>>
>> The End of Index Entry (EOIE) is used to locate the end of the variable
>> length index entries and the beginning of the extensions. Code can take
>> advantage of this to quickly locate the index extensions without having
>> to parse through all of the index entries.
>>
>> Because it must be able to be loaded before the variable length cache
>> entries and other index extensions, this extension must be written last.
>> The signature for this extension is { 'E', 'O', 'I', 'E' }.
>>
>> The extension consists of:
>>
>> - 32-bit offset to the end of the index entries
>>
>> - 160-bit SHA-1 over the extension types and their sizes (but not
>> their contents). E.g. if we have "TREE" extension that is N-bytes
>> long, "REUC" extension that is M-bytes long, followed by "EOIE",
>> then the hash would be:
>>
>> SHA-1("TREE" + <binary representation of N> +
>> "REUC" + <binary representation of M>)
>>
>> Signed-off-by: Ben Peart <peartben@gmail.com>
>
> I think the commit message should explicitly mention that this this
> extension
>
> - will always be written and why,
> - but is optional, so other Git implementations not supporting it will
> have no troubles reading the index,
> - and that it is written even to the shared index and why, and that
> because of this the index checksums in t1700 had to be updated.
>
Sure, I'll add that additional information to the commit message on the
next spin.
^permalinkrawreply [flat|nested] 199+ messages in thread