[Gnu-arch-users] Re: [PATCH] arch speedups on big trees

From:

Miles Bader

Subject:

[Gnu-arch-users] Re: [PATCH] arch speedups on big trees

Date:

Sat, 10 Jan 2004 04:17:11 -0500

User-agent:

Mutt/1.3.28i

On Fri, Jan 09, 2004 at 05:16:13PM -0500, Chris Mason wrote:
> > You can't rely on your DB to catch any conflicts if you're running at a
> > point where the user could have changed the tree, so it seems that you've
> > _got_ to at least do a single full inventory at the start of a given
> > user-level tla command.
>
> See above, all tla commands used to change the tree also update the
> mapping.
Um, that doesn't work for taglines (I know, you said you were only dealing
with explicit tags right now, but I sort of thought it was just a few missing
implementaion details, not a fundamental incompatibility).
It also fails in the case of direct manipulation by users, whether
intentional -- e.g. a user `mv's or `rm's a directory, something which is
advertised to work even for explicit tags -- or by accidentally modifying a
.arch-ids directory. Certainly any id-tag mapping resulting from the DB
needs to be verified before use (just like if you try to use the mapping you
see in a changeset).
However I do note that the direct manipulations of explicit tags which tla
explicitly supports only results in renames or deletion of tags, so a DB that
is being used as just a `big-bag-of-ids' will still remain valid.
> > Then use the `changeset based' method I described earlier for everything
> > else; if you've already implemented a version of this, it shouldn't be too
> > hard...
>
> Because the problem can be solved without reading the whole tree ;-)
> There's just no good reason for arch to take significantly longer than
> patch for applying the changeset.
>
> The project tree really is a database, just one stored in the
> filesystem. You've got objects (files, directories) and metadata (inode
> information and ids).
Yeah, but it's a database where the user is allowed -- and indeed
_encouraged_ in many cases -- to go in and directly edit the bytes from which
higher-level structures are determined. That restricts what you can do.
However, you obviously are using explicit tags and want to trust your DB if
it gets you your speedup.
OK (well, it's Tom that gets to decide ... :-).
But I think any solution should work well for both taglines and expicit tags
to the extent it can.
Just trying to organize my thoughts here:
(1) For a tagline tree, it will have to do a full-tree inventory to get all
the taglines, so it doesn't seem to make sense use the on-disk DB in
this case. However there's no problem with taglines being represented
an in-core version of the DB, so the explicit and tagline cases share
all the code after the initial file-in-the-DB step; if the tagging
method is `explicit', just read the DB from disk into core, and if the
tagging method is `tagline', do a full-tree inventory to fill in the
(in-core only) DB.
(2) I note that one problem with merging inode-sigs and your `DB' is that
inode-sigs are tied to (past) revisions, whereas your DB is tied to the
project tree. Together with the fact that you _do_ want to keep inode
sig info even for tagline trees (whereas you don't want the `project
tree' DB on disk in this case), maybe merging the two concepts isn't
such a hot idea after all.
_However_, the inode-sigs could be useful when actually building the
in-core DB for taglines from a full-tree inventory: if the inode
information of a file is up-to-date (with respect to whatever
inode-signature info you happen to have lying around -- note that it
could be _serveral_ old inode-sigs files), then you can get the file's
tag without reading the actual file data, something which could be an
important optimization.
(3) Every entry in the in-core DB has a `verified' flag, which is set to
false for explicit trees, and to true for tagline trees (since a
full-tree inventory was done to fill in the in-core DB).
(4) When you need a id-tag <-> pathname mapping, you can look in the in-core
DB; if the verified flag is false, you gotta go actually look at the
disk to check things out, and if OK, you can use it (and set the flag
to true). Otherwise I guess you have to toss the DB and do a full-tree
inventory make a new one (fully verified this time).
(5) All operations update the DB, though for a tagline tree, the changes
are kept strictly within the in-core version.
(6) Might the resulting up-to-date DB be useful for producing a new
inode-sig file at the end of the tla run? [thus making
replay/update/etc update inode-sigs, which would be very useful]
Hmmm...
The main difference of what you've done from the above seems to be the lack
of tagline support, and no support for verification (I've so far not looked
at your code though).
[Of course there are my complaints about the on-disk DB format, but yeah, I
suppose those are orthogonal.]
-Miles
--
Come now, if we were really planning to harm you, would we be waiting here,
beside the path, in the very darkest part of the forest?