Q: Did you do something to message number recently?urn:uuid:05dd3cb9-fda5-3c7a-2836-66b3d407feef2020-06-06T22:08:16ZEric W. Biedermanebiederm@xmission.comQ: Did you do something to message number recently?2019-06-24T15:59:19Zurn:uuid:8097d378-9cd1-cdf1-0021-82e4c5938a6a

Eric,
I am just starting to dig into this, I just noticed that I have several
inboxes that are seeing huge skips in message numbers assigned in
msgmap. Do you have any idea why this would be?
If not I will dig in and figure this out. I just figured I would ask
in case you have any handy canidates.
I thought I had a regression test in public-inbox to catch this.
Eric

Eric Wonge@80x24.orgRe: Q: Did you do something to message number recently?2019-06-24T16:34:43Zurn:uuid:eb0640f4-faba-e1f8-17d0-15762ef3810d

"Eric W. Biederman" <ebiederm@xmission.com> wrote:
>
> Eric,
>
> I am just starting to dig into this, I just noticed that I have several
> inboxes that are seeing huge skips in message numbers assigned in
> msgmap. Do you have any idea why this would be?
I've hit this in the past using the altid stuff to keep NNTP
article numbers in check.
There's a lot of changes w.r.t. progress reporting, no-op
speedups,
> If not I will dig in and figure this out. I just figured I would ask
> in case you have any handy canidates.
Are you hitting this with "git fetch && public-inbox-index" or
-watch/-mda?
> I thought I had a regression test in public-inbox to catch this.
Me too.

Eric Wonge@80x24.orgRe: Q: Did you do something to message number recently?2019-06-24T17:33:20Zurn:uuid:9212b7d2-699f-b7d5-22e2-2587fc89bdf0

Eric Wong <e@80x24.org> wrote:
> "Eric W. Biederman" <ebiederm@xmission.com> wrote:
> >
> > Eric,
> >
> > I am just starting to dig into this, I just noticed that I have several
> > inboxes that are seeing huge skips in message numbers assigned in
> > msgmap. Do you have any idea why this would be?
How big are the gaps you're hitting?
> I've hit this in the past using the altid stuff to keep NNTP
> article numbers in check.
>
> There's a lot of changes w.r.t. progress reporting, no-op
> speedups,
>
> > If not I will dig in and figure this out. I just figured I would ask
> > in case you have any handy canidates.
>
> Are you hitting this with "git fetch && public-inbox-index" or
> -watch/-mda?
OK, I see this in my libc-alpha v2 mirror via -watch,
but not my v1 -watch mirrors. Likely to affect -mda, too.
But the number gaps are small, however (1-2 messages), and.

Eric W. Biedermanebiederm@xmission.comRe: Q: Did you do something to message number recently?2019-06-24T22:56:53Zurn:uuid:32ad23dd-6def-802f-5caf-0da1b6c144ac

Eric Wong <e@80x24.org> writes:
> Eric Wong <e@80x24.org> wrote:
>> "Eric W. Biederman" <ebiederm@xmission.com> wrote:
>> >
>> > Eric,
>> >
>> > I am just starting to dig into this, I just noticed that I have several
>> > inboxes that are seeing huge skips in message numbers assigned in
>> > msgmap. Do you have any idea why this would be?
>
> How big are the gaps you're hitting?
This is from my mailbox where I confirmed there was an issue.
> sqlite> select max(num) from msgmap
> ...> ;
> 110106
> sqlite> select * from msgmap where num = 110106 ;
> 110106|84814e0e219b33ade641718dc76b3225@swift.generated
> sqlite> select * from msgmap where num = 110105 ;
> sqlite> select * from msgmap where num = 110104 ;
> sqlite> select * from msgmap where num = 110103 ;
> sqlite> select * from msgmap where num = 110102 ;
> sqlite> select * from msgmap where num = 110101 ;
> sqlite> select * from msgmap where num = 110100 ;
> sqlite> select * from msgmap where num = 110100 ;
> sqlite> select * from msgmap where num = 110099 ;
> sqlite> select * from msgmap where num = 110098 ;
> sqlite> select * from msgmap where num = 110097 ;
> sqlite> select * from msgmap where num = 110096 ;
> sqlite> select * from msgmap where num = 110095 ;
> sqlite> select * from msgmap where num = 110094 ;
> sqlite> select * from msgmap where num = 110093 ;
> sqlite> select * from msgmap where num = 110092 ;
> sqlite> select * from msgmap where num = 110091 ;
> sqlite> select * from msgmap where num = 110090 ;
> sqlite> select * from msgmap where num = 110089 ;
> 110089|be2d2a0d036c01dba1fcbe3c001d3f0e@rhempresarial.com
Hmm. With a smarter querey I am seeing:
> sqlite> select num, mid from msgmap order by num ;
> 110361|CAC8N3VSSLZ+ANaTXb0b3-K2NNrSRxYkuozAvJrPa4TZBL70ESg@mail.gmail.com
> 110365|947beed847840a7b278d820d00158ff9@rhempresarial.com
> 110412|388652ac24bbd42b565255e60032ac38@training-programs.net
> 110534|0.0.ED.ACB.1D52978A96689C0.0@uspmta120030.emarsys.net
> 110547|20190623052647.GA9838@gmail.com
> 110555|OF1F4E7E7A.F48DBC61-ON47258422.0008F563@LocalDomain
> 110560|832660e183e06b2c917d925c438b74b1@gruyeresfvxneypf.onion.pet
> 110571|1561260206528_102468_20455_1833.sc-10_9_12_206-inbound8$ebiederm@xmission.com
> 110577|166408121.377898.1561282483109@mail.yahoo.com
> 110578|20190623024642.3CADECD60542E@mail.casanare.gov.co
> 110580|308185e7c16c72785bfa7ca1a11bebb8@contato.ufsc.br
> 110592|207A6EDACFB04E7299FCD018E0A4BFA4.MAI@srv.newbilisim.com
> 110603|20190623084720.0180DCDAF68EC@mail.casanare.gov.co
> 110612|20190623103511.B05FF259256F2@mail.07d02.mspz7.gob.ec
> 110662|20190610090733.0CF77DB98C18400A@al-reefy.com
> 110689|9d692655ffc312f5899ee60d045055df@swift.generated
> 110701|25mseb$-t-8$-$$03-v$33@1nabo
> 110744|E1hfHhs-0003HR-Qo@mx02.mta.xmission.com
> 110753|20190624052459.6136AA4A29@poczta.zarz.agh.edu.pl
> 110755|SG2PR04MB260278655E8FC19804B95C76AEE00@SG2PR04MB2602.apcprd04.prod.outlook.com
> 110756|1459657803.17189221561359906087.JavaMail.app@rapp21.atlis1
> 110759|0e2c946c07c8941ce9b5b4d8df655c22@swift.generated
> 110765|SG2PR01MB2983FD90065F6FDAEB52F3B2CCE00@SG2PR01MB2983.apcprd01.prod.exchangelabs.com
> 110772|613cadb920e6c2bd6cecc7a5e2d7af17@swift.generated
> 110774|156136730324625@kroah.com
> 110776|f67aacd0534c471aaf9f531fd7d76514@SRV-MBX13-02.chu-toulouse.fr
> 110782|CAC=vM_Z8Pr4NVsatk3ROn1W8fJjkO3B_CDs8s+2UXm56cg3scA@mail.gmail.com
> 110783|20190624092326.475497997@linuxfoundation.org
> 110784|cb83bd8a41985599698df9225cbcd322@swift.generated
> 110787|6c4e4b4ecf374e0f92f6553e96368c38@SRV-MBX13-02.chu-toulouse.fr
> 110810|20190624132129.4c772nkjbrilxtcy@willie-the-truck
> 110826|1ada2c2ef303f2e6742747bb2218a9a4@fasobf.com
> 110827|20190624105927.84B07703461ACB3F@srv155.main-hosting.eu
> 110843|20190624163442.xhk6drl7ptnq7i5o@dcvr
> 110848|CF9A396C287FC34DABB6CB7E8D92D8FD0C9C6850@V1-EXCH-01.hccsc.k12.in.us
> 110852|20190624173319.3bb4t3zrieb4k5w2@dcvr
> 110856|822002_597274_254187_778544@cvanguardia.net
> 110865|20190624184534.209896-1-joel@joelfernandes.org
> 110866|20190624185214.GA211230@google.com
> 110869|CAG48ez3maGsRbN3qr8YVb6ZCw0FDq-7GqqiTiA4yEa1mebkubw@mail.gmail.com
> 110879|20190624201023.Vlx4D%akpm@linux-foundation.org
> 110884|CAEXW_YS0YR8Au+1f-sW_BT3xVONXKo9zrcSJMBwGJizyMw0xag@mail.gmail.com
So it is pretty bad. I am using my import_imap_mailbox script that
I am trying to clean up and so I can merge it into at leat contrib.
This is a v2 mailbox and I am using
"PublicInbox::InboxWritable($ibx)->importer(1)->add(...)"
There may have been some crashes in the past, and I am delivering to
multiple mailboxes at once.
Let's see. InboxWritable->importer->add resolves to V2Writeable->add
The add method computes the number using num_for, which uses
Msgmpa::mid_insert.
Short of the sequence number for msgmap getting scrambled I don't
see how that can go wrong. Sigh.
I will have to instrument my client, and see if that will give
me a clue what is going on.
I definitely don't have altid's in play here.
>> I've hit this in the past using the altid stuff to keep NNTP
>> article numbers in check.
>>
>> There's a lot of changes w.r.t. progress reporting, no-op
>> speedups,
>>
>> > If not I will dig in and figure this out. I just figured I would ask
>> > in case you have any handy canidates.
>>
>> Are you hitting this with "git fetch && public-inbox-index" or
>> -watch/-mda?
>
> OK, I see this in my libc-alpha v2 mirror via -watch,
> but not my v1 -watch mirrors. Likely to affect -mda, too.
> But the number gaps are small, however (1-2 messages), and.
Good. I am glad you can see it as well. One of use might be able
to spot what strange thing is going on here.
I think most of my previous work was all in the index rebuild path.
Which may be why we don't have a regression test catching this.
Grr.
I don't see a good candiate for this failure!
Eric

Eric Wonge@80x24.orgRe: Q: Did you do something to message number recently?2019-06-24T23:42:12Zurn:uuid:7416441d-8c8a-4b9f-c3f7-1d6d147a2799

"Eric W. Biederman" <ebiederm@xmission.com> wrote:
> The add method computes the number using num_for, which uses
> Msgmpa::mid_insert.
>
> Short of the sequence number for msgmap getting scrambled I don't
> see how that can go wrong. Sigh.
That seems to be what went wrong with "INSERT OR IGNORE"
cf. https://public-inbox.org/meta/20190624233809.1721-1-e@80x24.org/> I will have to instrument my client, and see if that will give
> me a clue what is going on.
I guess your client ends up seeing the same message
over-and-over again like -watch does?
> I think most of my previous work was all in the index rebuild path.
> Which may be why we don't have a regression test catching this.
Yeah, this isn't a new bug at all. I didn't notice since I
mainly use w3m or (neo)mutt for NNTP and didn't look too closely
at NNTP article numbers. lynx makes it apparent messages are
missing: lynx news.public-inbox.org/inbox.comp.lib.glibc.alpha

Eric W. Biedermanebiederm@xmission.comRe: Q: Did you do something to message number recently?2019-06-25T12:02:04Zurn:uuid:9f9ffccd-9745-b1ab-e270-e6214382c886

Eric Wong <e@80x24.org> writes:
> "Eric W. Biederman" <ebiederm@xmission.com> wrote:
>> The add method computes the number using num_for, which uses
>> Msgmpa::mid_insert.
>>
>> Short of the sequence number for msgmap getting scrambled I don't
>> see how that can go wrong. Sigh.
>
> That seems to be what went wrong with "INSERT OR IGNORE"
>
> cf. https://public-inbox.org/meta/20190624233809.1721-1-e@80x24.org/
Is it INSERT OR IGNORE or is it the autoincrement? As I read
the sqlite docs autoincrement is allowed to skip message numbers.
Especially in cases where the insert fails.
This fix should come with a test case shouldn't it? Such a test
case should be pretty straight forward to write. Insert a message
multiple times and then insert a new one.
I don't recall the transaction constraints on sqlite but I am wondering
if instead of INSERT OR IGNORE we should read and the increment
num_highwater in the public-inbox code and then we would not need to
depend upon autoincrement only incrementing once. Even when the sqlite
docs don't promise it will.
>> I will have to instrument my client, and see if that will give
>> me a clue what is going on.
>
> I guess your client ends up seeing the same message
> over-and-over again like -watch does?
It usually doesn't but for dealing with a different imap server I
tweaked the code and one of the servers feeding me the last message in
the mail box every time I checked the mailbox for new messages.
I have also narrowed it down to duplication insertions. I was just
about to start looking at what the fix would be.
>> I think most of my previous work was all in the index rebuild path.
>> Which may be why we don't have a regression test catching this.
>
> Yeah, this isn't a new bug at all. I didn't notice since I
> mainly use w3m or (neo)mutt for NNTP and didn't look too closely
> at NNTP article numbers. lynx makes it apparent messages are
> missing: lynx news.public-inbox.org/inbox.comp.lib.glibc.alpha
It is a little obscure in gnus but I start getting suspcious when
I tell it to give me the last 10 messages and I get only one.
Eric

Eric Wonge@80x24.orgRe: Q: Did you do something to message number recently?2019-06-25T17:51:33Zurn:uuid:6bccafee-229a-d798-bc26-80cec3dd6b06

"Eric W. Biederman" <ebiederm@xmission.com> wrote:
> Eric Wong <e@80x24.org> writes:
>
> > "Eric W. Biederman" <ebiederm@xmission.com> wrote:
> >> The add method computes the number using num_for, which uses
> >> Msgmpa::mid_insert.
> >>
> >> Short of the sequence number for msgmap getting scrambled I don't
> >> see how that can go wrong. Sigh.
> >
> > That seems to be what went wrong with "INSERT OR IGNORE"
> >
> > cf. https://public-inbox.org/meta/20190624233809.1721-1-e@80x24.org/
>
> Is it INSERT OR IGNORE or is it the autoincrement? As I read
> the sqlite docs autoincrement is allowed to skip message numbers.
> Especially in cases where the insert fails.
Changing it to plain INSERT made the autoincrement stop skipping.
That's been my experience with other DBs in the past, too;
but I haven't used "OR IGNORE" much (and it's been a
long time since I've looked at SQL outside of SQLite).
> This fix should come with a test case shouldn't it? Such a test
> case should be pretty straight forward to write. Insert a message
> multiple times and then insert a new one.
Did you see my changes to t/msgmap.t and t/v2writable.t?
I appended the test to existing dupe test in t/msgmap.t and
added the num_highwater check to t/v2writable.t which hit
duplicates. I don't think there needs to be a separate test,
and the test suite is already slow enough to impact productivity
on my laptop.
> I don't recall the transaction constraints on sqlite but I am wondering
> if instead of INSERT OR IGNORE we should read and the increment
> num_highwater in the public-inbox code and then we would not need to
> depend upon autoincrement only incrementing once. Even when the sqlite
> docs don't promise it will.
Yeah, that's probably the right step going forward. I don't
think there's forward/backwards-compatibility things to worry
about, since the indexing code for mirrors already ignores the
autoincrement.