Le blog de pingou - Tag - RabbitMQLe blog de pingou, ses actualités sur Fedora, ses RPMs, ses tests, son Linux... :-)
Pingou's weblog, his fedora's news, his RPMs, his tests, his Linux... :-)2018-04-17T08:56:29+02:00pingouurn:md5:66db5ce1ed1a80cb2f424695b4bb7780DotclearNew FMN architecture and testsurn:md5:5eff5b85e28993595df1fd6c96794d322016-06-25T13:23:00+01:002016-06-25T13:23:00+01:00Pierre-YvesGénéralfedmsgFedoraFedora-planetFMNPythonRabbitMQ <p>New FMN architecture and tests</p>
<h3>Introduction</h3>
<p>FMN is the <a href="https://apps.fedoraproject.org/notifications/">FedMsg Notification</a>
service. It allows any contributors (or actually, anyone with a
<a href="https://admin.fedoraproject.org/accounts">FAS</a> account) to tune what notification
they want to receive and how.</p>
<p>For example it allows saying things like:</p>
<ul>
<li>Send me a notification on IRC for every package I maintain that has successfully built on koji</li>
</ul>
<ul>
<li>Send me a notification by email for every request made in pkgdb to a package I maintain</li>
</ul>
<ul>
<li>Send me a notification by IRC when a new version of a package I maintain is found</li>
</ul>
<h3>How it works</h3>
<p>The principile is that anyone can log in on the <a href="https://apps.fedoraproject.org/notifications/">web UI of FMN</a>
there, they can create filters on a specific backend (email or IRC mainly) and
add rules to that filter. These rules must either be validated or invalited for
the notification to be sent.</p>
<p>Then the FMN backend listens to all the messages sent on <a href="https://fedora-fedmsg.readthedocs.io/">Fedora's fedmsg</a>
and for each message received, goes through all the rules in all the filters to
figure out who wants to be notified about this action and how.</p>
<h3>The challenge</h3>
<p>Today, computing who wants to be notified and how takes about 6 seconds to 12 seconds
per message and is really CPU intensive. This means that when we have an
operation sending a few thousands messages on the bus (for example,
mass-branching or a packager maintaining a lot of packages orphaning them),
the queue of messages goes up and it can take hours to days for a notification
to be delivered which could be problematic in some cases.</p>
<h3>The architecture</h3>
<p>This is the current architecture of FMN:</p>
<pre>
| +--------\
| read | prefs | write
| +----&gt;| DB |&lt;--------+
| | \--------+ |
| +-----+---+---+ +---+---+---+---+ +----+
| | |fmn.lib| | |fmn.lib| | |user|
v | +-------+ | +-------+ | +--+-+
fedmsg+-&gt;|consumer | |central webapp |&lt;-----+
+ +-----+ +---+| +---------------+
| |email| |irc||
| +-+---+--+-+-++
| | |
| | |
v v v
</pre>
<p>As you can see it is not clear where the CPU intensive part is and that's because
it is in fact integrated in the fedmsg consumer.
This design, while making things easier brings the downside of making it
pratically impossible to scale it easily when we have an event producing lots
of messages.
We multi-threaded the application as much as we could, but we were quickly
reaching the limit of the <a href="https://wiki.python.org/moin/GlobalInterpreterLock">GIL</a>.</p>
<p>To try improving on this situation, we reworked the architecture of the backend
as follow:</p>
<pre>
+-------------+
Read | | Write
+------+ prefs DB +&lt;------+
| | | |
+ | +-------------+ |
| | | +------------------+ +--------+
| | | | |fmn.lib| | | |
| v | | +-------+ |&lt;--+ User |
| +----------+ +---+ | | |
| | fmn.lib| | Central WebApp | +--------+
| | | +------------------+
| +-----&gt;| Worker +--------+
| | | | |
fedmsg | +----------+ |
| | |
| | +----------+ |
| +------------------+ | | fmn.lib| | +--------------------+
| | fedmsg consumer | | | | | | Backend |
+--&gt;| +------------&gt;| Worker +---------------&gt;| |
| | | | | | | +-----+ +---+ +---+
| +------------------+ | +----------+ | |email| |IRC| |SSE|
| | | +--+--+---+-+-+--+-+-+
| | +----------+ | | | |
| | | fmn.lib| | | | |
| | | | | | | |
| +-----&gt;| Worker +--------+ | | |
| RabbitMQ | | RabbitMQ | | |
| +----------+ | | |
| v v v
|
|
|
v
</pre>
<p>The idea is that the fedmsg consumer listens to Fedora's fedmsg, put the messages
in a queue. These messages are then picked from the queue by multiple workers
who will do the CPU intensive task and put their results in another queue.
The results are then picked from this second queue by a backend process that will
do the actually notification (sending the email, the IRC message).</p>
<p>We also included an <a href="https://en.wikipedia.org/wiki/Server-sent_events">SSE</a>
component to the backend, which is something we want to do for
<a href="https://pagure.io/fedora-hubs">fedora-hubs</a> but this still needs to be written.</p>
<h3>Testing the new architecture</h3>
<p>The new architecture looks fine on paper, but one would wonder how it performs
in real-life and with real data.</p>
<p>In order to test it, we wrote two scripts (one for the current architecture and one
for the new) sending messages via fedmsg or putting in messages in the queue that
the workers listens to, therefore mimiking there the behavior of the fedmsg consumer.
Then we ran different tests.</p>
<h2>The machine</h2>
<p>The machine on which the tests were run is:</p>
<ul>
<li>CPU: Intel i5 760 @ 2.8GHz (quad-core)</li>
<li>RAM: 16G DDR2 (1333 Mhz)</li>
<li>Disk: ScanDisk SDSSDA12 (120G)</li>
<li>OS: RHEL 7.2, up to date</li>
<li>Dataset: 15,000 (15K) messages</li>
</ul>
<h2>The results</h2>
<p><ins><strong>The current architecture</strong></ins></p>
<p>The current architecture only allows to run one test, send 15K fedmsg messages
and let the fedmsg consumer process them and monitor how long it takes to
digest them.</p>
<pre>
Test #0 - fedmsg based
Lasted for 9:05:23.313368
Maxed at: 14995
Avg processing: 0.458672376874 msg/s
</pre>
<p><ins><strong>The new architecture</strong></ins></p>
<p>The new architecture being able to scale we performed a different tests with it,
using 2 workers, then 4 workers, then 6 workers and finally 8 workers.
This gives us an idea if the scaling is linear or not and how much improvement
we get by adding more workers.</p>
<pre>
Test #1 - 2 workers - 1 backend
Lasted for 4:32:48.870010
Maxed at: 13470
Avg processing: 0.824487297215 msg/s
Test #2 - 4 workers - 1 backend
Lasted for 3:18:10.030542
Maxed at: 13447
Avg processing: 1.1342276217 msg/s
Test #3 - 6 workers - 1 backend
Lasted for 3:06:02.881912
Maxed at: 13392
Avg processing: 1.20500359971 msg/s
Test #4 - 8 workers - 1 backend
Lasted for 3:14:11.669631
Maxed at: 13351
Avg processing: 1.15160928467 msg/s
</pre>
<h3>Conclusions</h3>
<p>Looking at the results of the tests, the new architecture is clearly handling its
load better and faster. However, the progress aren't as linear as we like. My feeling
is that retrieve information from the cache (here redis) is at one point getting slower,
eventually also because of the central lock we tell redis to use.</p>
<p>As time permits, I will try to investigate this further to see if we can still gain some speed.</p>