From: Yaroslav Halchenko <yoh@onerussian.com>
To: Stefan Beller <sbeller@google.com>
Cc: Prathamesh Chavan <pc44800@gmail.com>, "git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: in case you want a use-case with lots of submodules
Date: Mon, 19 Jun 2017 16:20:21 -0400
Message-ID: <20170619202021.dmomy5ztwoeat3eg@hopa.kiewit.dartmouth.edu> (raw)
In-Reply-To: <CAGZ79kZhj31eBYnboyxDLuFp1ceeqk8kj0nrnQaCmpRJCVFU4w@mail.gmail.com>
On Mon, 19 Jun 2017, Stefan Beller wrote:
> On Mon, Jun 19, 2017 at 8:59 AM, Yaroslav Halchenko <yoh@onerussian.com> wrote:
> > Hi All,
> > On a recent trip I've listened to the git minutes podcast episode and
> > got excited to hear Stefan Beller (CCed just in case) describing
> > ongoing work on submodules mechanism. I got excited, since e.g.
> > performance improvements would be of great benefit to us too.
> If you're mostly interested in performance improvements of the status
> quo (i.e. "make git-submodule fast"), then the work of Prathamesh
> Chavan (cc'd) might be more interesting to you than what I do.
> He is porting git-submodule (which is mostly a shell script nowadays)
> to C, such that we can save a lot of process invocations and can do
> processing within one process.
ah -- cool. I would be eager to test it out, thanks! would be
interesting to see if it positively affects our overall performance.
Pointers to that development would be welcome!
> > http://datasets.datalad.org ATM provides quite a sizeable (ATM 370
> > repositories, up to 4 levels deep) hierarchy of git/git-annex
> > repositories all tied together via git submodules mechanism. And as the
> > collection grows, interactions with it become slower, so additional
> > options (such as --ignore-submodules=dirty to status) become our
> > friends.
> I am not as much concerned about the 370 number than about the
> 4 layers of nesting. In my experience the nested submodule case
> is a little bit error prone and the bug reports are not as frequent as
> there are not as many users of nesting, yet(?)
well -- part of the story here is that we are forced to use/have full
blown .git/ directories (for git-annex symlinks to content files to
work) within submodules instead of .git file with a reference under
parent's .git/modules. So we can 'slice' at any level and I
guess that is why may be avoiding some possibly issues due to nesting
and the "parent has all .git/modules" approach.
> In a neighboring thread on the mailing list we have a discussion
> on the usefulness of being on branches than in detached HEAD
> in the submodules.
> https://public-inbox.org/git/0092CDD27C5F9D418B0F3E9B5D05BE08010287DF@SBS2011.opfingen.plc2.de/> This would not break non-ambiguously, rather it would add
> ease of use.
that is indeed a common caveat... I am not sure if any heuristic
approach would provide a 'bullet proof' solution. I might even prefer a
hardcoded 'branch-name' to be listed/associated with each submodule
within .gitmodules. In the datalad case, detached HEAD is common
whenever someone installs "outdated" (branch of which progressed
forward) submodule. In this case we just check if the branch after "git
clone" (but before git submodule update) includes the pointed by
Subproject commit, and if so -- we announce that it must be the branch
(so far it is always "master" branch anyways ;) )
> > So I thought to share this as a use-case happen you need more
> > motivation or just a real-case test-bed for your work. And thank
> > you again for making Git even Greater.
> Thanks for the motivation. :)
the least I could do ;)
--
Yaroslav O. Halchenko
Center for Open Neuroscience http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419
WWW: http://www.linkedin.com/in/yarik