*Re: Warnings from git fsck after lkml import
2018-07-05 5:40 Warnings from git fsck after lkml import ebiederm
@ 2018-07-05 23:13 ` Eric Wong
2018-07-06 0:36 ` ebiederm
2018-07-12 18:31 ` Warnings from git fsck after lkml import Konstantin Ryabitsev
0 siblings, 2 replies; 13+ messages in thread
From: Eric Wong @ 2018-07-05 23:13 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: meta
"Eric W. Biederman" <ebiederm@xmission.com> wrote:
> It looks like public-inbox has some challenges when importing some
> questionable emails. The import of lkml has resulted in several commits
> with bad dates that git fsck complains about. I have previously
> reported this to Konstantin Ryabitsev who maintains kernel.org but since
> I have not seen any discussion of this I thought I should report it
> directly here as well.
Thanks for bringing this up publically.
Yes, I early during v2 development I noticed old mails had some
-1400 timezone values (but the furthest is -1200). I opted to
attempt to preserve the wonky timezones since fast-import
happily accepts -1400 and I didn't anticipate problems...
> At a practical level these errors initially preventing me from cloning
> the repos as in .gitconfig I had:
> > [transfer]
> > fsckobjects = true
> > [fetch]
> > fsckobjects = true
> > [receive]
> > fsckobjects = true
...But I didn't know people cared to set those :x
Now I wonder if git should only warn for bad-but-still-usable
objects on clone, as I wouldn't consider a malformed date to be
on the level as actual FS corruption. Or at least complete
the clone and fail with a special exit code.
> Beyond the cloning issue while I don't expect public-inbox to fix the
> emails themselves it should be able to detect and prevent creating
> buggy commits.
Right, the emails themselves have wonky dates. I got public-inbox
to massage the dates into the bare minimum of what fast-import
finds acceptable(*). fast-import is rather liberal.
> Importing a large repo like linux-kernel seems like a good test case for
> finding these kinds of issues.
Fwiw, linux.git and git.git both warn about missingTaggerEntry
on fsck, yet clone fine with fsckObjects=true. Maybe clone
should not abort on badTimeZone, either. *shrug*
(*) In retrospect, especially with v2 which requires SQLite/Xapian,
I'm thinking it's not even worth the trouble to parse out
authorship information for git commit headers. Not sure if
people would still use things like "git log --author=" for
v2...
^permalinkrawreply [flat|nested] 13+ messages in thread

*Re: Warnings from git fsck after lkml import
2018-07-06 0:36 ` ebiederm@ 2018-07-06 3:47 ` ebiederm
2018-07-06 21:32 ` [PATCH] MsgTime.pm: Use strptime to compute the time zone ebiederm
0 siblings, 1 reply; 13+ messages in thread
From: ebiederm @ 2018-07-06 3:47 UTC (permalink / raw)
To: Eric Wong; +Cc: meta
ebiederm@xmission.com (Eric W. Biederman) writes:
> Eric Wong <e@80x24.org> writes:
>
>> "Eric W. Biederman" <ebiederm@xmission.com> wrote:
>>> It looks like public-inbox has some challenges when importing some
>>> questionable emails. The import of lkml has resulted in several commits
>>> with bad dates that git fsck complains about. I have previously
>>> reported this to Konstantin Ryabitsev who maintains kernel.org but since
>>> I have not seen any discussion of this I thought I should report it
>>> directly here as well.
>>
>> Thanks for bringing this up publically.
>>
>> Yes, I early during v2 development I noticed old mails had some
>> -1400 timezone values (but the furthest is -1200). I opted to
>> attempt to preserve the wonky timezones since fast-import
>> happily accepts -1400 and I didn't anticipate problems...
>
> I think 0.git was generated after your earlier fix.
>
> Looking at the commits in question this is a different issue.
> On some of the later ones I am really not certain what it is
> but here is a representative sample you can look at.
Except below is looking at the pretty output of git show.
To actually see the problem git show --format=raw is needed.
Which for commit 59173dc1fe67b113ace4ce83e7f522414b3e0404
shows me:
author Dieter Ferdinand <dieter.ferdinand@gmx.de> 1166001998 +1
Which makes it clear the ``timezone'' was passed straight through
without modification. The date in the email was: "Date: Wed, 13 Dec 2006 10:26:38 +1"
And the problem is the timezone is not a 4 byte number. I see the same
pattern with the rest of the bad time zone warnings.
So it should be straight forward if the timezone is not 4 digits to just not pass
the time zone through.
Eric
^permalinkrawreply [flat|nested] 13+ messages in thread

*Re: Warnings from git fsck after lkml import
2018-07-05 23:13 ` Eric Wong
2018-07-06 0:36 ` ebiederm@ 2018-07-12 18:31 ` Konstantin Ryabitsev
2018-07-12 22:19 ` ebiederm
2018-07-12 22:29 ` Eric Wong1 sibling, 2 replies; 13+ messages in thread
From: Konstantin Ryabitsev @ 2018-07-12 18:31 UTC (permalink / raw)
To: Eric Wong; +Cc: Eric W. Biederman, meta
[-- Attachment #1: Type: text/plain, Size: 1164 bytes --]
On Thu, Jul 05, 2018 at 11:13:46PM +0000, Eric Wong wrote:
>"Eric W. Biederman" <ebiederm@xmission.com> wrote:
>> It looks like public-inbox has some challenges when importing some
>> questionable emails. The import of lkml has resulted in several commits
>> with bad dates that git fsck complains about. I have previously
>> reported this to Konstantin Ryabitsev who maintains kernel.org but since
>> I have not seen any discussion of this I thought I should report it
>> directly here as well.
>
>Thanks for bringing this up publically.
>
>Yes, I early during v2 development I noticed old mails had some
>-1400 timezone values (but the furthest is -1200). I opted to
>attempt to preserve the wonky timezones since fast-import
>happily accepts -1400 and I didn't anticipate problems...
So, I can fix those in the archives, but this obviously requires
rebasing the whole repo, and I'm not sure what kind of impact that would
have. I'm assuming it's not sufficient to just fix the git repo, as all
commit IDs after the modified commit are going to be different -- so
additional changes to sqlite and xapian dbs would be required?
-K
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]^permalinkrawreply [flat|nested] 13+ messages in thread

*Re: Warnings from git fsck after lkml import
2018-07-12 18:31 ` Warnings from git fsck after lkml import Konstantin Ryabitsev
@ 2018-07-12 22:19 ` ebiederm
2018-07-12 22:29 ` Eric Wong1 sibling, 0 replies; 13+ messages in thread
From: ebiederm @ 2018-07-12 22:19 UTC (permalink / raw)
To: Konstantin Ryabitsev; +Cc: Eric Wong, meta
Konstantin Ryabitsev <konstantin@linuxfoundation.org> writes:
> On Thu, Jul 05, 2018 at 11:13:46PM +0000, Eric Wong wrote:
>>"Eric W. Biederman" <ebiederm@xmission.com> wrote:
>>> It looks like public-inbox has some challenges when importing some
>>> questionable emails. The import of lkml has resulted in several commits
>>> with bad dates that git fsck complains about. I have previously
>>> reported this to Konstantin Ryabitsev who maintains kernel.org but since
>>> I have not seen any discussion of this I thought I should report it
>>> directly here as well.
>>
>>Thanks for bringing this up publically.
>>
>>Yes, I early during v2 development I noticed old mails had some
>>-1400 timezone values (but the furthest is -1200). I opted to
>>attempt to preserve the wonky timezones since fast-import
>>happily accepts -1400 and I didn't anticipate problems...
>
> So, I can fix those in the archives, but this obviously requires
> rebasing the whole repo, and I'm not sure what kind of impact that
> would have. I'm assuming it's not sufficient to just fix the git repo,
> as all commit IDs after the modified commit are going to be different
> -- so additional changes to sqlite and xapian dbs would be required?
Unless I am mistaken the cheap/clever version is to
- Rebuild the 3 .git trees.
- Notice that the object id's (aka sha1 hashes) of the emails remains
the same.
- Use sqlite3 to update the meta table of msgmap.sqlite3
My currently msgmap.sqlite3 contains:
CREATE TABLE meta (key VARCHAR(32) PRIMARY KEY, val VARCHAR(255) NOT NULL);
/* No STAT tables available */
sqlite> select * from meta;
created_at|1530525399
last_xap15-6|c8f95c6728579303c200adbfb5469215da7e7836
last_xap15-5|31ed379430c456f90bdd172b223020c0e6d7cb8d
last_xap15-4|88294f6d487193f5984791ee81213a25130d0559
last_xap15-3|93d9eace2721494d8457c7f5f6de803c0d648172
last_xap15-2|d48078ceeec1f51313253a56ed3ba0eae7fde909
last_xap15-1|6b67b9f5e0cd82d3c734e6cdc44c1f722ab6fb6a
last_xap15-0|b67bf7f62c8125d67461cc6e7d1736ddc8844a18
Which matches the HEAD commits the lkml git trees.
eric@x220:~/public-inbox/vger.kernel.org/linux-kernel-good$ git --git-dir=git/0.git show --pretty=oneline | head -1
b67bf7f62c8125d67461cc6e7d1736ddc8844a18 [-mm patch] drivers/firewire/: cleanups
eric@x220:~/public-inbox/vger.kernel.org/linux-kernel-good$ git --git-dir=git/1.git show --pretty=oneline | head -1
6b67b9f5e0cd82d3c734e6cdc44c1f722ab6fb6a Re: [git patches] libata updates for 2.6.34
eric@x220:~/public-inbox/vger.kernel.org/linux-kernel-good$ git --git-dir=git/2.git show --pretty=oneline | head -1
d48078ceeec1f51313253a56ed3ba0eae7fde909 Re: linux-next: Tree for Jan 10 (staging/sb105x)
eric@x220:~/public-inbox/vger.kernel.org/linux-kernel-good$ git --git-dir=git/3.git show --pretty=oneline | head -1
93d9eace2721494d8457c7f5f6de803c0d648172 Re: randconfig bug: ARM/KVM link error in hyp_idmap section
eric@x220:~/public-inbox/vger.kernel.org/linux-kernel-good$ git --git-dir=git/4.git show --pretty=oneline | head -1
88294f6d487193f5984791ee81213a25130d0559 Re: [PATCH 2/2] sdhci-of-arasan: Set controller to test mode when fails-without-test-cd is present
eric@x220:~/public-inbox/vger.kernel.org/linux-kernel-good$ git --git-dir=git/5.git show --pretty=oneline | head -1
31ed379430c456f90bdd172b223020c0e6d7cb8d Re: [PATCH 0/2] of: change overlay apply input data from EDT to FDT
eric@x220:~/public-inbox/vger.kernel.org/linux-kernel-good$ git --git-dir=git/6.git show --pretty=oneline | head -1
c8f95c6728579303c200adbfb5469215da7e7836 [PATCH] slimbus: stream: Fix htmldocs warnings
However all you have to do is ensure you preserve msgmap.sqlite3 and
public-inbox-index is capable of rebuilding everything else.
Eric
^permalinkrawreply [flat|nested] 13+ messages in thread

*Re: Warnings from git fsck after lkml import
2018-07-12 18:31 ` Warnings from git fsck after lkml import Konstantin Ryabitsev
2018-07-12 22:19 ` ebiederm@ 2018-07-12 22:29 ` Eric Wong1 sibling, 0 replies; 13+ messages in thread
From: Eric Wong @ 2018-07-12 22:29 UTC (permalink / raw)
To: Konstantin Ryabitsev; +Cc: Eric W. Biederman, meta
Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Thu, Jul 05, 2018 at 11:13:46PM +0000, Eric Wong wrote:
> > "Eric W. Biederman" <ebiederm@xmission.com> wrote:
> > > It looks like public-inbox has some challenges when importing some
> > > questionable emails. The import of lkml has resulted in several commits
> > > with bad dates that git fsck complains about. I have previously
> > > reported this to Konstantin Ryabitsev who maintains kernel.org but since
> > > I have not seen any discussion of this I thought I should report it
> > > directly here as well.
> >
> > Thanks for bringing this up publically.
> >
> > Yes, I early during v2 development I noticed old mails had some
> > -1400 timezone values (but the furthest is -1200). I opted to
> > attempt to preserve the wonky timezones since fast-import
> > happily accepts -1400 and I didn't anticipate problems...
>
> So, I can fix those in the archives, but this obviously requires rebasing
> the whole repo, and I'm not sure what kind of impact that would have. I'm
> assuming it's not sufficient to just fix the git repo, as all commit IDs
> after the modified commit are going to be different -- so additional changes
> to sqlite and xapian dbs would be required?
Yes, I think the internal "purge" and normal add operation
should take care of Xapian/SQLite changes. NNTP serial numbers
will change and readers will redownload a few messages, though.
Personally, I wouldn't bother since it'd be disruptive to
existing clones and I don't consider them to be big enough
problems worth breaking changes if git itself doesn't complain
by default.
^permalinkrawreply [flat|nested] 13+ messages in thread