Thunderbird and the future of ...

Thunderbird and the future of ...

I've been meaning to write this post since I wrote the build system
prospects post, I just didn't know how to word it.

This post is half me laying out prospective work projects (unrelated to
the build system) for myself and half making people aware of near-term
and medium-term threats and potential implications for tasks that others
tackle. So, without further ado:

SpiderMonkey (JS)

Mozilla is currently making a push to eliminate all legacy JS features.
All the syntactic issues I have filed bugs on (save for the legacy
array/iterator comprehension syntax, since I'm not sure what the
thoughts are there); there's a few random library stuff I haven't
touched (e.g., __noSuchMethod__) in part because I don't detect as much
urgency in eliminating said features. Fortunately, we are making very
good progress in converting code (thanks, aceman!) and we could even be
legacy-free before mozilla-central is. :-)

Also important to be aware of is the addition of new ES6 features. The
only major features that have yet to land are ES6 classes and ES6
modules. Classes will probably land in Q1 or Q2 2015, but the timetable
for modules is extremely uncertain (decent chance of not landing in
2015). There's also a minor host of semantic issues with things like let
or destructuring. Both classes and modules are features that I would be
excited to start using.

JSMime and message composition

As I'm sure you're all aware, I'm slowly making progress on JSMime--my
current focus is MIME message building. I have a rough implementation of
nsMsgSend.cpp (the bigger half of the compose/ directory) in JS already.
Once it is made dogfoodable, I'll continue preview development in an
addon and start preparing work to land it on comm-central. The landing
process will likely be protracted: I'm replacing or abandoning ~11,000
lines of C++ code, and I'm also planning on taking the opportunity to do
a cleanup of several compose interfaces (notably nsIMsgSend,
nsIMsgAttachmentData, nsIMsgAttachedFile, nsIMsgComposeSecure will see
MAJOR changes and nsIMsgAttachment and nsIMsgCompFields are likely to
see minor ones). The end result should hopefully be a better-extensible
interface and fix a slew of bugs in the process (e.g., some of our
questionable decisions with attachments).

Other JSMime

Post-message composition, the next phases of JSMime work will consist of
EAI/IDN support and basic body/attachment support, in some order. The
latter will probably land as just a parser interface with only basic
attachments and multipart/alternative support enabled;
multipart/related, uuencode, yEnc, S/MIME, PGP, and TNEF support will
come later (and probably in that order [S/MIME and PGP come later
because crypto is harder to integrate]). My preview extension for
message composition (see last paragraph) will also start containing
dogfood support for displaying messages via JSMime instead of libmime.
Landing estimates are unavailable at this time.

SMTP?

The recent discussion of IMAP CONDSTORE has made me come to the
realization that, without dedicated developers, our backend is
increasingly lagging in taking advantage of effective new features. One
obvious solution is to start exploring the use of existing JS libraries
for email, such as those on emailjs.org. The first step is looking at
the SMTP client partly because it's small and self-contained within
Thunderbird (~4000 lines of code with an interface that largely consists
of "send this file to these people using this server and tell me when
it's done") and partly because I have a few feature additions I wanted
to explore anyways (e.g., EAI support, or moving SMTP to a worker
thread). I have already preliminarily looked at the smtpclient.js code
and can assert that it both fails some of our design principles (e.g.,
it uses callbacks instead of promises) and would constitute a regression
to use as-is (barely any support for SASL auth, no non-EAI IDN support,
and no support for DSN. Not sure how it handles non-UTF-8 bodies). I'm
planning on reaching out to the community of users there to see if some
kind of accommodation can be had.

Doing things in workers??

Two question marks this time. Off-main-thread I/O has been a stated
desire of the Firefox performance time for a long while. Considering our
nominal commitment to using more things in JS, this implies that we
should be looking at using workers in Thunderbird code. The difficulty,
though, is that workers have an emasculated API; in particular, there's
no support for XPCOM components and limited support for other features,
even in ChromeWorker threads. We also have an internal design model
which heavily discourages doing things on workers: the message database,
and the message store to a lesser extent, are inherently main-thread only.

I've talked in the past about what it would take to move to an
asynchronous message database. More recently, I've started to sketch out
what an API of an asynchronous message database ought to look like. With
a few tweaks to the current codebase, it is possible to envision
starting to use this API in worker threads. (When you have an
asynchronous API, the fact that everything would get proxied to the main
thread for the current implementation is easy to hide). Unfortunately,
there is one key difficulty: the message database relies heavily on
listeners, and effecting listeners cross-workers is difficult as Workers
are currently implemented.

Using TCPSocket in worker threads is a goal, and once it lands we might
want to start investigating using it where appropriate (e.g., NNTP and
POP). The interface I want to see land before attempting to use the
database off-main-thread is MessagePort (effectively, this allows
generic cross-thread communication), which is currently undergoing
review and will hopefully land in a few months. I'll point out that
IndexedDB-in-workers landed last December. Another complication with
worker threads is that importScripts acts like the subscript loader
instead of Components.utils.import, which suggests that some sort of
require.js-like loader (or ES6 modules, but see above) would be
advisable before using workers too heavily. I have not yet fully worked
out how to deal with cross-thread accesses to the message store, which
makes prospectively porting things like import code to workers difficult.

Re: Thunderbird and the future of ...

On 1/20/2015 5:20 PM, Joshua Cranmer 🐧 wrote:

> ...
>
> SMTP?
>
> The recent discussion of IMAP CONDSTORE has made me come to the
> realization that, without dedicated developers, our backend is
> increasingly lagging in taking advantage of effective new features. One
> obvious solution is to start exploring the use of existing JS libraries
> for email, such as those on emailjs.org.

I've long thought that world would be a better place if projects like Thunderbird figured out a way to work with reusable libraries rather than always build their own. That has never been a priority at Mozilla as far as I can tell, but I think it would be a great direction for Thunderbird.

My biggest concern would be performance regression. Because testing is weak in performance, it tends to get neglected during these C++ to javascript conversions. But just because we don't have decent tests doesn't mean that it is not important. It is, and we have to have devise some method of testing for performance regressions in these conversions.

Re: Thunderbird and the future of ...

On 1/20/2015 7:56 PM, R Kent James wrote:

> On 1/20/2015 5:20 PM, Joshua Cranmer 🐧 wrote:
>
>> ...
>>
>> SMTP?
>>
>> The recent discussion of IMAP CONDSTORE has made me come to the
>> realization that, without dedicated developers, our backend is
>> increasingly lagging in taking advantage of effective new features. One
>> obvious solution is to start exploring the use of existing JS libraries
>> for email, such as those on emailjs.org.
>
> I've long thought that world would be a better place if projects like
> Thunderbird figured out a way to work with reusable libraries rather
> than always build their own. That has never been a priority at Mozilla
> as far as I can tell, but I think it would be a great direction for
> Thunderbird.

For what it's worth (I forgot to mention this), I had an idea of a
"core" socket shim that neatly handles the dual UTF-8 control
channel/binary data channel aspect of IMAP, POP, NNTP, and SMTP, and
then using that as a basis for developing a suite of NNTP, POP, and SMTP
libraries (IMAP is its own, separate, complex beast). Even if we
ultimately don't use existing emailjs.org, I plan on making the core
protocol libraries publicly available like ical.js or jsmime are.

>
>>
>> Doing things in workers?? ...
>>
>> Thoughts/comments/flames/questions/concerns?
>>
>
> My biggest concern would be performance regression. Because testing is
> weak in performance, it tends to get neglected during these C++ to
> javascript conversions. But just because we don't have decent tests
> doesn't mean that it is not important. It is, and we have to have
> devise some method of testing for performance regressions in these
> conversions.

In same cases, what we do presently is so brain-dead stupid that a sane
JS implementation would be faster. An example is compose--if you want to
send as text+html, what we do is save the HTML to a file, read that
file, convert to plain text, save back into another file, read that
file, stream it with the rest of the message to another file, and then
read that file to send to SMTP, and then read it again for NNTP if
you're doing both, and then read it again so you can save it to your
local drafts folder. And all of these are naturally synchronous,
main-thread I/O. With my rewrite, I hope to get that down to one file
access.

But yes, in a more general sense, it's important to watch out for
performance. I've noticed in the past that some of the absolute worst
performance you can get tends to be caused in part by xpconnect
marshalling--throwing a C++ exception to JS in particular is *painfully*
expensive. Something else that probably bites us the distinction between
binary data and strings. This is particularly a problem in mailnews code
since mail messages are inherently "8 bit strings that may be UTF-8 but
may be some other local charset" (I'm seeing some EUC-KR in my spam
folder) and our codebase is oriented towards forcing every client of the
API to deal with it--which is more expensive in JS than in C++.

In short: performance is an issue, and we definitely need to watch out
for it, but a not-insignificant cause of perf regressions is inanity in
our API design.

Re: Thunderbird and the future of ...

On 2015年01月21日 13:02, Joshua Cranmer 🐧 wrote:

> On 1/20/2015 7:56 PM, R Kent James wrote:
>> On 1/20/2015 5:20 PM, Joshua Cranmer 🐧 wrote:
>>
>>> ...
>>>
>>> SMTP?
>>>
>>> The recent discussion of IMAP CONDSTORE has made me come to the
>>> realization that, without dedicated developers, our backend is
>>> increasingly lagging in taking advantage of effective new features. One
>>> obvious solution is to start exploring the use of existing JS libraries
>>> for email, such as those on emailjs.org.
>>
>> I've long thought that world would be a better place if projects like
>> Thunderbird figured out a way to work with reusable libraries rather than
>> always build their own. That has never been a priority at Mozilla as far
>> as I can tell, but I think it would be a great direction for Thunderbird.
>
> For what it's worth (I forgot to mention this), I had an idea of a "core"
> socket shim that neatly handles the dual UTF-8 control channel/binary data
> channel aspect of IMAP, POP, NNTP, and SMTP, and then using that as a basis
> for developing a suite of NNTP, POP, and SMTP libraries (IMAP is its own,
> separate, complex beast). Even if we ultimately don't use existing
> emailjs.org, I plan on making the core protocol libraries publicly available
> like ical.js or jsmime are.
>>
>>>
>>> Doing things in workers?? ...
>>>
>>> Thoughts/comments/flames/questions/concerns?
>>>
>>
>> My biggest concern would be performance regression. Because testing is
>> weak in performance, it tends to get neglected during these C++ to
>> javascript conversions. But just because we don't have decent tests
>> doesn't mean that it is not important. It is, and we have to have devise
>> some method of testing for performance regressions in these conversions.
>
> In same cases, what we do presently is so brain-dead stupid that a sane JS
> implementation would be faster. An example is compose--if you want to send
> as text+html, what we do is save the HTML to a file, read that file, convert
> to plain text, save back into another file, read that file, stream it with
> the rest of the message to another file, and then read that file to send to
> SMTP, and then read it again for NNTP if you're doing both, and then read it
> again so you can save it to your local drafts folder. And all of these are
> naturally synchronous, main-thread I/O. With my rewrite, I hope to get that
> down to one file access.

This is insane. I didn't realize there are so many extra copying just for
sending a message out.

>
> But yes, in a more general sense, it's important to watch out for
> performance. I've noticed in the past that some of the absolute worst
> performance you can get tends to be caused in part by xpconnect
> marshalling--throwing a C++ exception to JS in particular is *painfully*
> expensive. Something else that probably bites us the distinction between
> binary data and strings. This is particularly a problem in mailnews code
> since mail messages are inherently "8 bit strings that may be UTF-8 but may
> be some other local charset" (I'm seeing some EUC-KR in my spam folder) and
> our codebase is oriented towards forcing every client of the API to deal
> with it--which is more expensive in JS than in C++.
>
> In short: performance is an issue, and we definitely need to watch out for
> it, but a not-insignificant cause of perf regressions is inanity in our API
> design.

You have raised an issue of missing an infrastructure for performance
monitoring.
Has there been an infrastructure that covered XPCOM RPC before (maybe it was
removed?).

I am a little disturbed to see that some development projects are done for
"improving" the speed, but
without a proper measurement infrastructure, we can't achieve that goal very
well.
I probably don't have to quote Knuth here.

For a smallish project, I would not mind trying "-gp" for profing, but
sheer size of the built infrastructure of TB, and the use of JavaScript
inside TB
makes it imperative to monitor JS efficiency one way or the other.
(Close analogy I can think of is to "making some actions in Emacs faster"
and usually
complex actions are done in Emacs-Lisp and so improving Emacs-Lisp code is
the direct approach, and
it has a profiling feature built-in:
https://www.gnu.org/software/emacs/manual/html_node/elisp/Profiling.html

But as far as I understand there is none (?) for JScript inside TB.
That hurts. And as you mentioned elsewhere, some performance measurement
primitives can be
used for test coverage too (like the basic block entry counts.)

Re-thinking the priority may be in order here for a mozilla development
community in the large.

Especially considering the lack of development resources for TB-side, I think.

As a user, I want a "rock-solid cross-platform mailer"(tm) and TB falls a
little short of it now on the "rock-solid" side. So test-coverage that takes
advantage of instrumentation infrastructure for performance analysis would
be a good investment IMHO.

Re: Thunderbird and the future of ...

On 1/21/2015 11:40 PM, ishikawa wrote:

> You have raised an issue of missing an infrastructure for performance
> monitoring.
> Has there been an infrastructure that covered XPCOM RPC before (maybe it was
> removed?).
>
> I am a little disturbed to see that some development projects are done for
> "improving" the speed, but
> without a proper measurement infrastructure, we can't achieve that goal very
> well.
> I probably don't have to quote Knuth here.
>
> For a smallish project, I would not mind trying "-gp" for profing, but
> sheer size of the built infrastructure of TB, and the use of JavaScript
> inside TB
> makes it imperative to monitor JS efficiency one way or the other.

Building something that can reliably reproduce performance on modern
hardware and operating systems is extremely difficult. Invariably, you
need a lot of data points to be able to distinguish data from noise, and
you need stronger statistics backgrounds than many CS people (including
myself) have to really understand what's going on. This is, of course, a
distinct matter from the challenge of building a good benchmark in and
of itself: it has, for example, been found that v8 optimized for the
standard JS benchmarks to the detriment of large swathes of real-world code.

> But as far as I understand there is none (?) for JScript inside TB.

You appear to be confusing performance with profiling. Performance is
effectively measuring wall-clock time (or other key indicators) of a
high-level operation (assuming we're not microbenchmarking) to find
overall regressions. Profiling is taking a snapshot of performance in
one iteration to find out where the hot code is that could or should be
improved. For profiling (both C++ and JS), we do have a good tool: the
gecko profiler. It's not as good as, say, vTune for native code, but
I've found it generally adequate for the level we look at profiling,
especially because it shows a unified C++ and JS stack.
> Re-thinking the priority may be in order here for a mozilla development
> community in the large.

Actually, Mozilla has a lot of tools for performance testing. If
anything, it has too many. The problem is that these are not
well-communicated outside of the groups that care about performance.
Combined with the general lack of concern for Thunderbird, it means
there's no one who can really answer what can be done.

> As a user, I want a "rock-solid cross-platform mailer"(tm) and TB falls a
> little short of it now on the "rock-solid" side. So test-coverage that takes
> advantage of instrumentation infrastructure for performance analysis would
> be a good investment IMHO.

There are a few separate but related issues:
1. A standard profile with which to track performance. One of the
rationales I had for setting up
<https://bitbucket.org/jcranmer/maildocker> was to be able to install
real-world workloads: I dumped a production LDAP database and sanitized
it to create the container. Getting large samples of real-world data is
actually not hard (e.g., save off m.support.firefox--that would be ~160K
messages in a single folder), although representative samples are a more
difficult question (all data I have easy access to is en-US biased, so
pain in i18n code won't be easily seen).
1b. Standard benchmarks on a standard profile. Knowing what to measure
(and how to do so reliably) is critical in a benchmark: more runs tend
to increase validity, but smaller tested kernels (i.e., microbenchmarks)
too easily fall into the trap of "compiler outsmarted you" (there are,
e.g., some JSperf tests that inadvertently cause the optimizer to
compile the code into a NOP).
2. Infrastructure to periodically run performance tests. A single data
point isn't very useful; what you need is a comparison, and ideally one
with many points to help filter noise. The key issue here is that you
need fairly dedicated hardware to reduce noise, and hardware needs to be
identical. You also need hardware, not VMs.
3. A way to track the performance numbers. Data is of course useless
without visualization.

Actually, this does suggest a GSoC project, if we can find a mentor:
getting someone to start developing a test suite.

Re: Thunderbird and the future of ...

> JSMime and message composition
>
> As I'm sure you're all aware, I'm slowly making progress on JSMime--my
> current focus is MIME message building. I have a rough implementation of
> nsMsgSend.cpp (the bigger half of the compose/ directory) in JS already.
> Once it is made dogfoodable, I'll continue preview development in an
> addon and start preparing work to land it on comm-central. The landing
> process will likely be protracted: I'm replacing or abandoning ~11,000
> lines of C++ code, and I'm also planning on taking the opportunity to do
> a cleanup of several compose interfaces (notably nsIMsgSend,
> nsIMsgAttachmentData, nsIMsgAttachedFile, nsIMsgComposeSecure will see
> MAJOR changes and nsIMsgAttachment and nsIMsgCompFields are likely to
> see minor ones). The end result should hopefully be a better-extensible
> interface and fix a slew of bugs in the process (e.g., some of our
> questionable decisions with attachments).

Please foresee a way for extensions to register to the send process at
the location where S/MIME would be called (e.g.
RequiresCryptoEncapsulation).

I currently do a hack to replace the S/MIME registration with an
Enigmail registration. If Enigmail detects that the user sends an S/MIME
message then Enigmail would pass on the message to S/MIME.

Re: Thunderbird and the future of ...

On 1/25/2015 5:30 AM, Patrick Brunschwig wrote:
> Please foresee a way for extensions to register to the send process at
> the location where S/MIME would be called (e.g.
> RequiresCryptoEncapsulation).
>
> I currently do a hack to replace the S/MIME registration with an
> Enigmail registration. If Enigmail detects that the user sends an S/MIME
> message then Enigmail would pass on the message to S/MIME.

Don't fear. I am very well aware of how Enigmail hooks into this
process, and I'm not going to make the PGP side of things undoable. I
will probably trash the current interface for secure composition and
make it a new interface to make it easier to support multiple versions
in Enigmail. I may move out the S/MIME-or-PGP logic that Enigmail
presently does, but I'm not far enough along in the process to say for
certain what it will look like or give definitive answers for what will
happen.

Rest assured, though, that when I do figure it out, I will make sure to
bring you in the loop.

Re: Thunderbird and the future of ...

What's going on jsmime,

Does jsmime only used for mail sending at the current time?
I am looking forward to it,
I am also interested in move the whole
protocols infrastructure of thunderbird into pure js and running in a worker.

> On 1/25/2015 5:30 AM, Patrick Brunschwig wrote:
> > Please foresee a way for extensions to register to the send process at
> > the location where S/MIME would be called (e.g.
> > RequiresCryptoEncapsulation).
> >
> > I currently do a hack to replace the S/MIME registration with an
> > Enigmail registration. If Enigmail detects that the user sends an S/MIME
> > message then Enigmail would pass on the message to S/MIME.
>
> Don't fear. I am very well aware of how Enigmail hooks into this
> process, and I'm not going to make the PGP side of things undoable. I
> will probably trash the current interface for secure composition and
> make it a new interface to make it easier to support multiple versions
> in Enigmail. I may move out the S/MIME-or-PGP logic that Enigmail
> presently does, but I'm not far enough along in the process to say for
> certain what it will look like or give definitive answers for what will
> happen.
>
> Rest assured, though, that when I do figure it out, I will make sure to
> bring you in the loop.
>
> --
> Joshua Cranmer
> Thunderbird and DXR developer
> Source code archæologist

Re: Thunderbird and the future of ...

On 8/24/2015 9:38 AM, Yonggang Luo wrote:
> What's going on jsmime,
>
> Does jsmime only used for mail sending at the current time?

JSMime is currently used primarily in header parsing and formatting
(most notably RFC 2047). It's actually somewhat of a lie, but the
precise details of where it's used are rather complicated.

I haven't been updating it recently because I've been really bogged down
with other work.
> I am looking forward to it,
> I am also interested in move the whole
> protocols infrastructure of thunderbird into pure js and running in a worker.
>
> jsmime my be a start and replace the whole old
> C/C++ mime library.

Both of those tasks are explicit goals of JSMime: replacing libmime, and
working on worker threads.

Actually, asuth just told me today that gaia email is moving to jsmime
from email.js due to problems in email.js. Which isn't unexpected by
anybody (even email.js folks): email.js is basically a second-generation
hack, while JSMime is developed with attention to detail and
specification minutiae. Unfortunately, this does mean that progress gets
bogged down by seemingly random tangents: correct handling of email
addresses is blocked on the fact that the UTR46 test suite contains
rules that aren't listed in the UTR46 algorithm that I couldn't reverse
engineer.