Sessions is temporarily moving to YouTube, check out all our new videos here.

Event Driven Architecture

Loïc Faugeron speaking at London Node User Group in September, 2016

542Views

Great talks, fired to your inbox 👌No junk, no spam, just great talks. Unsubscribe any time.

Event Driven ArchitectureEnjoy this talk? Consider sharing it.

About this talk

When Node.js says "event-driven, non-blocking I/O", have you ever felt it was some sort of black magic? In this talk we'll make the magic go away by implementing our own simple HTTP server, Event Loop, Scheduler, Deferrer/Promise and Thread Pool.

Transcript

Hi, everyone. My name is Loïc. I come from
a startup called Constant Commerce.
We do NodeJS, Angular, Vanilla JavaScript
even. So if you'd like to know how to use
those texts, just give me a shout after
the talk.
Tonight's topic, event-driven
architecture. So if you've been to the
NodeJS website, the first thing you're
going to see is event-driven,
non-blocking I/O. But what does it mean?
To me, it sounds like black magic,
and I have a trick, given by Uncle Bob,
for those who know him.
The trick is, whenever you see black
magic, just re-implement it yourself.
And then once you've implemented it, it's
going to let the magic go, right?
So, in his blog posts... Sorry, yeah, in
his blog posts he takes the HTTP server as
an example, and we are going take him
quite literally here. So tonight we're
going to implement a HTTP server, a simple
one, don't worry, and then the next step
is to create an event loop to integrate
them both. And that's going to be 80% of
the work. Then the 20% left, we are going
to split up a bit, correct.
But it's basically scheduling, promises,
and threading. Right.
So let's start. Okay? This is the story of
input/output, also known as "I/O." So what
is I/O? I/O is communication between a
client and a server. So for example,
an HTTP client, like your browser, and
your HTTP server, like in GenEx or Nodes.
So let's implement it. So can you see the
code here? Can you read it well?
Okay. So we are just going to go through
the codes, line by line.
It's not JavaScript because it wouldn't
fit in the slides. So it's a pseudocode,
right?
So the first thing you do when you create
a web server is to create a socket.
It involves three system calls which are
creating your sockets,
bind it to a host and ports, and then
start to listen on it,
right?
When you start to listen, you can accept
clients. So clients can connect to you.
But the issue is that there's a limit.
It's a limited queue, and when you reach
the max number of clients in this queue,
you get errors. So the game here is to
just treat clients as fast as possible to
avoid the queue from being full,
right?
So to do that, you just start an infinite
loop, and in it, you accept new clients.
So when you call this system call accept,
it's going to take the first client into
queue and treat it. So you get a specific
HTTP connection, a specific socket
connection for this client, and that
un-queues it, so it allows you avoid
errors, right?
So with this client, you can wait for data
to be received. That's the request.
By the way, a side note, I/O is very slow.
So you have many types of I/O,
has a table, basically, it says that
process size is very fast.
Access to memory is a bit slower, and then
access to the hard drive is much slower,
and the slowest one of all is network. And
that's what we are treating here with the
HTTP server. And the issue with that is
that if you're very slow to treat your
client, your queue is getting full, and
then you get errors. So the game here,
again, is to get to end queue as quickly
as possible.
So you get data from the client. You pass
it as a request. You give it to your
application, HTTP application, which
creates a response, and then, again,
you write this response to the client's
socket so the client receives something.
And then you can close the connection.
The issue is that I/O not only is really
slow, but it's unreliable,
well, a network anyway, because the
client could have disconnected in the
meantime. I mean, WiFi, it's always... It
never works. So again,
we have a lot of risk to get errors while
we treat our clients,
and our queue is getting fuller again, and
we get errors if it gets full.
So once we've treated the client, we can
close the connection and treat the next
one. So that's an HTTP server, right? So
basically it's just a queuing system,
and what we've done here is a basic
HTTP server, and it's not very efficient.
It can handle 100 connections in the same
time, but more than that,
it's going to get a disk queue full of
clients, and it's going to get errors
everywhere.
So how do we handle more clients? It's
known as the C10K problem.
It's been around for a while, and
thankfully we have a solution for that.
Hang on, the solution is poll. So poll is
basically, it's a system call that allows
you to give it a list of sockets, and it's
going to wait until one of the sockets is
ready to talk to you. So when a client is
ready to talk to you,
it's going to read on that client,
and you can treat it.
So have you seen what I did here?
Hang on. Poll is hanging on.
No. Okay, never mind. So there's actually
more...many implementations of poll.
There are two PoS-X [SP] implementations,
select and poll. The e-poll from Linux,
kqueue from BSD, IOCP from Windows, but
we're not going to dive into those
details, we're just going to call them
poll simply, right.
So what does an HTTP server look like with
poll? So let's have look.
Again, we create a new socket. Just like
before, except this time we put it in a
collection of sockets. And once we have
our collection of sockets...
So the first socket is our HTTP server. So
we started... We put it in the collection.
Then we start our infinite loop and we
give it to poll. So it's going to wait
on this HTTP server sockets, and it's
going to block until there's a client
available for you.
Once a client is available, we get to the
first "if." So there's an HTTP connection
for you. We're going to accept it. It's
going to give us a specific socket for the
client. And this socket, we're going to
add it to the collection and then start to
loop again, right?
So we get back to the poll blocking, and
it's going to block until either we have a
new client ready to talk to us, or the
first client is ready to talk to us.
And in this case, for example, if the
client is ready to talk to us,
then we get the data. We pass it as a
request. Give it to the application.
Get a response. Write the response to the
client. Then close the connection,
and remove it from the
collection of sockets.
So that's it. That's a loop, and those
connections are events.
So what we've just done now is an event
loop. So we're just going to refactor it a
bit. So basically we are going to extract
the event loop into its own class.
So the first method of this class is going
to be run. So you run the event loop.
It's where you have the
infinite loop. You have to call to poll.
It gets connection. And then for each
connection ready, we call it callback.
We have the callback. So it allows us to
decouple the codes between the server and
the clients. We have a server callback. So
whenever we have a new client ready to
connect to the server, we have a callback
for this. And then we have also a callback
when a client is ready to
send the request.
So [inaudible 00:09:26] example. And so I
think this code might be familiar to you.
It's basically Nodes, right? You create a
socket, start...create an event loop.
You open a callback to the HTTP server
sockets, and then you start to loop.
And so the first callback from the server
is there to accept new clients.
When it gets a new client, it register the
socket in the event loop with a callback
to handle clients. And a callback to
handle clients is just calling your HTTP
application, so. Get the request from the
client, pass it to the application,
get the response, etc.
So that's it. That's event loop for you
in HTTP server. With this,
we have an efficient queue system, so
usually, the queue is quite empty thanks
to this. So we don't have any errors. And
that's been done using poll,
which is blocking, basically.
So if you want to memorize something from
this, it's, "Blocking,
if it's not solving all your issues, then
you're not using enough of it." Because
before, our issues was, okay, when I
accept a client, I'm blocking,
and I get my queue full. Now with the
event loop, you're blocking until
someone's ready. So you block
more, but it's more efficient.
So that's the 80%. That's how Nodes works.
You have an event loop.
You get more clients, and it gets the
queue empty. There are a few things that
even loops cannot cover. So that's what
we're going to see here.
So when you don't have any clients, you
have that poll call that is blocking and
doing nothing. And so it would be great if
we could do something while there's
nothing to do, like, for example,
debugging, monitoring, reports,
anything. Wait a second. I've got
something for you. Poll actually has a
second parameter, it's called timeout. So
you can tell poll, "Okay,
can you block until 10 seconds?"
And if you don't have anything return,
I can do something, and then I call you
back for, again, 10 seconds.
That way I can do things while we don't
have any clients.
Did you see what I did here this time?
Yeah. Okay. Good. Nice. Okay.
So this is called a scheduler. We have
many types of scheduler in Nodes.
We have one of scheduler so you can
program a task to do something if there's
nothing to do. Just one time. Or you have
periodic schedulers.
So every now and then just call this
callback when there's nothing to do,
etc., etc.
I'm actually not going to talk too much
about this one. So it's a one of
scheduler. So basically what it does is,
you have a class where you can register
callbacks, right? So you say, "Okay, you
have an interval, say, 10 seconds.
You have a callback, just call this one."
And then the arguments that you can
forward. The other method is tick. So
that's basically going to wait for the
interval. So 10 seconds, for example. And
it's going to then check if there's
something to execute here.
And if we want to integrate the schedule
in the event loop, it's quite easy.
So in our run method when we do the
infinite loop, before calling poll,
we are going to check what's the lowest
timeout in our scheduler,
and we're going to give it to poll, and
then we're going to loop.
And at the end of the loop, we're going to
tick the scheduler.
Again, it's to make your event loop more
efficient, so you can do stuff while it's
waiting. So it's using waiting time. So
again, we had a lot of waiting,
and now we are even...we are waiting even
more, and we can do more stuff
thanks to that.
So, waiting, if it's not solving your
issues, then you're not using enough of
it, right?
There's an issue with all those callbacks
with the scheduler, the event loop,
the server. It's that we can get a
callback hell, basically. Because all
those callbacks can be nested. It's not
easy to write. It's not easy to read.
If only it could feel more like acsync
code. So I guess you have heard about it
before? It's called promise.
Okay, I think this time you saw what I did
here. But I promise that's the last time I
make a pun. So, yeah, pitch out promises.
There are many implementations,
but the basic idea behind this is that you
have an asynchronous call that's
going to call a deferrer that creates a
promise. We return the promise,
and you, the user, can use this promise to
register callbacks when the
asynchronous call is ready, right.
It's actually quite hard to understand,
but the code is quite short,
so let's have a quick look about it. So
again, we have a deferrer class that we
use in our asynchronous calls.
What we do is we call
promise on it. And it's going to create a
new promise. But what it does when it does
that is it injects one of its own methods
in the promise. This method is a
setter. So let's see what it's used for.
So then the asynchronous call is going to
read on the promise. So you can interact
with it. You call then on it. And
when you call then, you get a callback,
right? So the callback is, when you're
finished, just call this, please.
And so what it does, what the promise does
is, it's going to call the method from the
deferrer and forward the callback, and
that's it. And optionally return a new
promise. And so the deferrer, in your
asynchronous call, when it decides that...
Oh, yeah, so that's the set up by the way,
it just gets the callback and put it in a
collection of callback. And so, once it's
ready, you just have to call a result with
a value, for example. It's going to call
all the callbacks and forward the
value to it.
So that's how the value from the
asynchronous call gets to your promise
in your parent code. So that's it.
Basically, to get rid of this callback
hell, we're just chaining calls,
right? So you can do promise, then, then,
then, then, then. So again,
we had many callbacks and now we have even
more callbacks, but that's solved our
issues. So callbacks, if it's not solving
your issues, then you're just not using
enough of it.
Actually, promise and deferrer are quite
hard to understand, so I think the best
thing to do is to just use an example,
right? Here's Filesystem.
I present to you Filesystem U/O. It's a
subtype of I/O, right?
It's U/O because it's the slowest
of all I/Os, basically.
So the main issue about Filesystem is that
you can use it as I/O just like a request
and responses. But the thing is, when you
pass it to poll, poll is not going to
block until it's ready. It's going to
return it immediately. So you get the
impression that the Filesystem is ready to
be written on it or read on it.
But it's not. So basically it becomes
useless. So the solution for it,
and that's what Node is using, is
a thread pool. So don't worry,
your NodeJS application are still single
threaded, right? It's just that it's using
a library, [inaudible 00:18:11] library,
where there is a thread pool in it.
So it's quite separate from your
application, and you don't get all the
threaded complex stuff.
So how does it work? Well, it's quite
simple. It's wrapping all the Filesystem
calls into its own calls. So, for
example, to open a file, what it does is
it provides you with an open call, right?
So you give it the filename and you give
it a callback to call once the file is
opened, and so what it does is
it puts the open Filesystem call, the
actual one, in a thread pool for what's
the arguments. This thread pool is going
to give you a promise.
So on this promise you call call then once
this file is ready,
just call my callback. And so, that's it.
So NodeJS, In order to use this in the
event loop and integrate everything, uses
a scheduler. So that's why we saw a
scheduler before.
So if you have a look here at the last
line, so what we do is
we don't directly promise them the given
callback. We give our own callback, which
is on five of them. And so what it does
is it's going to say to Node,
"Hey, in your event loop, the next
time you have nothing to do,
please call this callback." Which is once
the file is open, then do something.
And that's it.
So basically, your event loop in NodeJS is
going to first check,
"Okay, I have... Do I have any
HTTP clients connecting to me? No.
Do I have any current clients talking to
me? No." Okay. So I just go out and check
if there's any Filesystem things to do.
That's it. So, yep. That's it.
So Filesystem [inaudible 00:20:14] just by
wrapping them in threads to make them act
just like [inaudible 00:20:19] I/O. So
that's it. Threading, if it's not solving
your problems. Right? Right?
Actually, no. So this... It was a trap. So
there's a thing about threading,
is that it's very complex, and it has
actually limits. So if you take a look at
the Filesystem example, and you say,
"Okay, I want as many threads as possible,
like infinite pool of threads," you're
going to get errors. Why is that?
Because your kernel sets a limit of
concurrent call to your Filesystem.
Usually, it's four. But, the good news is,
you can actually change this limit.
You can just remove it and get an infinite
number of concurrent
Filesystem calls. That's great. But when
you do that, it turns out your hard drive
burns because it cannot physically handle
that many concurrent connections on the
same disk space.
So that's actually a true story. There's
this guy, there's a link who explains,
"Oh, I tried to be clever with my
Filesystem and my thread pools,
and it didn't turn out quite well." So
don't try this at home. But the thing to
remember here is that, so usually, you
have a pool of four threads,
and that's good enough for your
Filesystem.
All right? Conclusion. So I just have to
say something about Nodes.
So, yeah, nginx uses that. Graphical
clients uses event loops.
Everybody's using event loops, but the
special things about Nodes is that it's
your language, right? It's your
programming language. And if you take C,
and you try to do like Filesystem stuff,
you can't use the direct functions of
Filesystem because they're blocking. You
need to write them yourself or call a
library for this. But with Nodes, because
the language didn't exist for Filesystem
at least. I mean, JavaScript exists, but
you don't have Filesystem calls natively
in the V8 machine, right? So what Nodes
did is they just created this wrapped
Filesystem codes. So you don't have to
worry about it. Whenever you're going to
do a Filesystem call in Node, it's
automatically going to be non-blocking.
And you don't have the choice, so you
can't make the mistake. So that's the
quite clever thing with Nodes. That's it.
By the way, if you're not aware of it, so
it's not actually written in.
For those who know, Node has been written
in C++, right? But even
looping is actually not in Nodes, it's
in an external library.
So at first they were using Libuv,
which is a replacement for libevent.
But they found out that it wasn't good
enough, so they created their own C
library. It's written in C. It's called
Libuv. It's actually quite good.
If you have a look at it... If you want to
have a look at it, there's a link here
that explains how it works. But it's
basically what I just said.
So there's an event loop, and for the
educated, it's just like Filesystem,
just creates a thread pool and everything.
So my belief is that when you understand
something, you get to use it in a more
efficient way. So I hope this talk has
helped you a bit about how Nodes works and
how to use it more efficiently. I
understand that it's quite a complex
topic, and doing it on a talk is quite
hard to follow. But I can give you the
slides afterwards. And I also have the
transcript as an article, so if you want
to just read it and go through the codes
and everything, just give me a shout.

Loïc Faugeron

London Node User Group

The London Node.js User Group (LNUG) is a friendly monthly meetup for people using Node.js for fun or profit.
LNUG is the longest standing Node.js meetup in London and has been hosting first-class events for over 4 years. LNUG has welcomed speakers from

Sessions by Pusher

We make the complex simple. Use Pusher to add realtime interactive features to your apps in minutes.

About Sessions

Inspiring talks by inspiring speakers

Meetups are a great way to hear from the experts and keep up to date with the latest ideas - but what happens if you can’t be there? As developers ourselves, the Pusher team got to thinking that there had to be a better way. This content is just too valuable to miss.

So we decided to do something about it. Our mission: make it simple for developers anywhere to watch great programming talks and learn from the experts - anytime and absolutely free.

We spoke to meetup organisers and speakers and got them excited about getting in front of a wider audience. We pulled together a professional production team to create high quality videos and transcripts from meetups. We built a video platform to bring the content together in one place.

And now we have Sessions. Watch the talks that interest you. Subscribe to get notified when new content gets added. If you’re a meetup and want to get involved, let us know.