It's been noticed while reviewing Jenkins test suite runs failure in June (#12289) that the TorBootstrapFailure exception still happens sometimes.

There are 3 possible reasons to it, I think: a) chutney/tor sometimes feels stupid; or b) our test suite sometimes fails to manage chutney properly; or c) some unrelated event on the system prevents them from doing their job.

Next steps to investigate this would be to:

1. look at the test suite debug log and
* double-check there's no explanation for the tor bootstrap failure (that's about (a) and (b))
* record the exact times when things went wrong2. ensure we save the tor log from the Tails system when this happens, so we can see if that tor is stupid (a)3. make the Journal persistent on isotesters so we can try to correlate such failures with system events (c).

2. ensure we save the tor log from the Tails system when this happens, so we can see if that tor is stupid (a)

While doing #13472, I've pushed the feature/13541-save-more-data-on-htpdate-or-tor-failures branch which contains a rough implementation of that. It also saves htpdate.log on 'time has synced failures', so that's why I wanted it pushed and live in Jenkins.

3. make the Journal persistent on isotesters so we can try to correlate such failures with system events (c).

I've pushed another commit in the branch mentioned in my previous note that does that. I've made it so that it does save the journal no matter what the failure. I think that's an interesting information we may want for a lot of reasons/cases. For example, I've done it because I wanted to have the systemd journal for #13461.

I've also pushed another commit in this branch that clean up the Tor and Htpdate logs retrieval.

I think that's something that'd be useful to get into stable and devel so that we get more useful informations from the Jenkins runs, so I'm puthing this branch RfQA. Merging it does not mean this ticket is over though so please set it back to Dev needed and no assignee if/when it's merged.

While analyzing test failures on #15019 I've just seen 2 problems that I was going to naively classify as "Tor bootstrap failure", thinking "oh crap, the new Tor fails to bootstrap more often!". But then I looked at the Journal and noticed that nm-dispatcher did not react to NM connectivity change, did not start its scripts, and therefore the Tor service was not even started => no chance it ever bootstraps. I'm attaching two example Journals that display this bug.

I don't know if this is caused by the hacks we do around time sync'ing etc. in our test suite (e.g. I see a 2 hours time bump after starting NM + its dispatcher, but before it got a DHCP lease), or if it's a bug that can happen during Tails regular usage.

I suggest we check that the Tor service is actually started next time we're tempted to classify issues as "Tor fails to bootstrap": see if there are entries in the *.tor artifact and possibly look for "Starting Anonymizing overlay network for TCP" in the Journal.