This is interesting, but you can bring that sort of approach to
some other language as well. Here is a simple approach to get it
done for Java:
http://java.dzone.com/articles/go-style-goroutines-java-and. What
is hard to implement without support from the language is the
CSP-style channel select, which delivers a lot of the power of
channels in Go.
Also, without green threads as in Go there is no way to spawn
some hundred thousand threads as in Go. Java will start to get
into its knees after some thousand threads already depending on
how much resources are available to the machine. You can work
around this by having a pool of threads process tasks added to
queues where the number of queues can become very large. But when
your thread pool has n threads and for an overlapping time window
you have n long-runners being executed all other tasks are stuck
till the first long runner has finished execution.
Green threads and CSP-style channels is what keeps a lot of
people with Go as the rest of the language is almost simplistic.
What I would really like to have is D with Go's green threads
along with channels and channel select. Some people would now
smile mildly, but for today's load on servers there is a real
necessity. For example read this article "How We Went from 30
Servers to 2: Go". Link:
http://blog.iron.io/2013/03/how-we-went-from-30-servers-to-2-go.html
Regards, Bienlein

This is interesting, but you can bring that sort of approach to
some other language as well. Here is a simple approach to get
it done for Java:
http://java.dzone.com/articles/go-style-goroutines-java-and.
What is hard to implement without support from the language is
the CSP-style channel select, which delivers a lot of the power
of channels in Go.

To follow up on this here is what Ian Taylor (member of Go dev
team) says about this (in this thread:
https://groups.google.com/forum/?hl=de#!topic/golang-nuts/kF_caFpPNgA):
I have not looked at your code. I just want to say that the
select
statement is the core of Go's channels. Without select, channels
are
just a simple communication mechanism. If you want to design a
different implementation of Go's channels, I recommend designing
select first.

Support for green threads in std.concurrency is almost complete.
I should really just do the last bit of work. I imagine you could
try out the idea now though by using the messaging in vibe.d,
since every connection is a fiber.

Support for green threads in std.concurrency is almost complete. I
should really just do the last bit of work. I imagine you could try out
the idea now though by using the messaging in vibe.d, since every
connection is a fiber.

Support for green threads in std.concurrency is almost complete. I
should really just do the last bit of work. I imagine you could try out
the idea now though by using the messaging in vibe.d, since every
connection is a fiber.

Did you express that work as one or more bugzilla issues?
Andrei

As in: "This would make a good change log entry for added
features" ?
--
Marco

Support for green threads in std.concurrency is almost
complete. I should really just do the last bit of work. I
imagine you could try out the idea now though by using the
messaging in vibe.d, since every connection is a fiber.

Support for green threads in std.concurrency is almost
complete. I should really just do the last bit of work. I
imagine you could try out the idea now though by using the
messaging in vibe.d, since every connection is a fiber.

My motivation was to make std.concurrency work with vibe.d. And
more generally, to start testing fiber-based concurrency in
general. The basic idea is to make certain low-level parts of
std.concurrency be pluggable, so the same API can be used on top
of different threading schemes. You basically just need to
implement this interface:
interface Scheduler {
void start(void delegate() op); // start the scheduler
void spawn(void delegate() op); // spawn a new thread
void yield(); // for send and receive to yield to allow green
threading
Condition newCondition(Mutex m); // the condition will
notify/wait for new messages
}
I should have a sample implementation working for green threads
shortly. Then I need to submit a bugzilla ticket and sort out a
pull request.

Okay, just for fun, here are some results with the new
scheduler.
I injected periodic yields into the code to simulate the
yielding that would happen automatically if the code was using
send and receive. First the code:

Hi Sean,
with "send and receive" you mean adding to a channel and doing a
blocking take on it? Just for me to build up an understanding.
Regards, Bienlein

Okay, just for fun, here are some results with the new
scheduler.
I injected periodic yields into the code to simulate the
yielding that would happen automatically if the code was using
send and receive. First the code:

Hi Sean,
with "send and receive" you mean adding to a channel and doing
a blocking take on it? Just for me to build up an understanding.

Sort of. std.concurrency uses the actor model. So it's messaging,
but not the CSP model used by Go. We should probably offer both,
but for now it's just actors. And because you basically have one
channel per thread, the limiting factor to date is how many
threads you can sanely run simultaneously. Actor-oriented
languages typically use green threads instead of kernel threads
so the number of threads can scale. In Erlang, a "process" (ie. a
thread) is equivalent to a class in D, so there tends to be a lot
of them.

Sort of. std.concurrency uses the actor model. So it's
messaging, but not the CSP model used by Go. We should probably
offer both, but for now it's just actors. And because you
basically have one channel per thread, the limiting factor to
date is how many threads you can sanely run simultaneously.
Actor-oriented languages typically use green threads instead of
kernel threads so the number of threads can scale. In Erlang, a
"process" (ie. a thread) is equivalent to a class in D, so
there tends to be a lot of them.

On a very well equipped machine 10.000 threads is about the
maximum for the JVM. Now for D 1.000.000 kernel threads are not a
problem!? Well, I'm a D newbie and a bit confused now... Have to
ask some questions trying not to bug people. Apparently, a kernel
thread in D is not an OS thread. Does D have it's own threading
model then? Couldn't see that from what I found on dlang.org. The
measurement result for fibers is that much better as for threads,
because fibers have less overhead for context switching? Will
actors in D benefit from your FiberScheduler when it has been
released? Do you know which next version of D your FiberScheduler
is planned to be included?
In Go you can easily spawn 100.000 goroutines (aka green
threads), probably several 100.000. Being able to spawn way more
than 100.000 threads in D with little context switching overhead
as with using fibers you are basically in the same league as with
Go. And D is a really rich language contrary to Go. This looks
cool :-)

On a very well equipped machine 10.000 threads is about the
maximum for the JVM. Now for D 1.000.000 kernel threads are not
a problem!? Well, I'm a D newbie and a bit confused now... Have
to ask some questions trying not to bug people. Apparently, a
kernel thread in D is not an OS thread. Does D have it's own
threading model then? Couldn't see that from what I found on
dlang.org. The measurement result for fibers is that much
better as for threads, because fibers have less overhead for
context switching? Will actors in D benefit from your
FiberScheduler when it has been released? Do you know which
next version of D your FiberScheduler is planned to be included?

Well, I spawned 1 million threads, but there's no guarantee that
1 million were running concurrently. So I decided to run a test.
I forced the code to block until all threads were started, and
when using kernel threads this hung with 2047 threads running
(this is on OSX). So I think OSX has a hard internal limit of
2047 threads. It's possible this can be extended somehow, but I
didn't investigate. And since I don't currently have a great way
to block fibers, what I was doing there was a busy wait, which
was just slow going waiting for all the threads to spin up.
Next I just figured I'd keep a high water mark for concurrent
thread count for the code I posted yesterday. Both fibers and
kernel threads topped out at about 10. For fibers, this makes
perfect sense given the yield strategy (each client thread yields
10 times while running). And I guess the scheduling for kernel
threads made that come out about the same. So the fact that I
was able to spawn 1 million kernel threads doesn't actually mean
a whole lot. I should have thought about that more yesterday.
Because of the added synchronization counting threads, everything
slowed down a bit, so I reduced the number of threads to 100.000.
Here are some timings:
$ time concurrency threads
numThreadsToSpawn = 100000, maxConcurrent = 12
real 1m8.573s
user 1m22.516s
sys 0m27.985s
$ time concurrency fibers
numThreadsToSpawn = 100000, maxConcurrent = 10
real 0m5.860s
user 0m3.493s
sys 0m2.361s
So in short, a "kernel thread" in D (which is equivalent to
instantiating a core.thread.Thread) is an OS thread. The fibers
are user-space threads that context switch when explicitly
yielded and use core.thread.Fiber.
One thing to note about the FiberScheduler is that I haven't
sorted out a solution for thread-local storage. So if you're
using the FiberScheduler and each "thread" is accessing some
global static data it expects to be exclusive to itself, you'll
end up with an undefined result. Making D's "thread-local by
default" actually be fiber-local when using fibers is a pretty
hard problem to solve, and can be dealt with later if the need
arises. My hope was that by making the choice of scheduler
user-defined however, it's up to the user to choose the
appropriate threading model for their application, and we can
hopefully sidestep the need to sort this out. It was the main
issue blocking my doing this ages ago, and I didn't think of this
pluggable approach until recently.
The obvious gain here is that std.concurrency is no longer
strictly limited by the overhead of kernel threads, and so can be
used more according to the actor model as was originally
intended. I can imagine more complex schedulers multiplexing
fibers across a pool of kernel threads, for example. The
FiberScheduler is more a proof of concept than anything.
As for when this will be available... I will have a pull request
sorted out shortly, so you could start playing with it soon. It
being included in an actual release means a review and such, but
as this is really just a fairly succinct change to an existing
module, I hope it won't be terribly contentious.

In Go you can easily spawn 100.000 goroutines (aka green
threads), probably several 100.000. Being able to spawn way
more than 100.000 threads in D with little context switching
overhead as with using fibers you are basically in the same
league as with Go. And D is a really rich language contrary to
Go. This looks cool :-)

Yeah, I think it's exciting. I had originally modeled
std.concurrency after Erlang and like the way the syntax worked
out, but using kernel threads is limiting. I'm interested to see
how this scales once people start playing with it. It's possible
that some tuning of when yields occur may be needed as time goes
on, but that really needs more eyes than my own and probably
multiple real world tests as well.
As some general background on actors vs. CSP in std.concurrency,
I chose actors for two reasons. First, the communication model
for actors is unstructured, so it's adaptable to a lot of
different application designs. If you want structure you can
impose it at the protocol level, but it isn't necessary to do
so--simply using std.concurency requires practically no code at
all for the simple case. And second, I wasn't terribly fond of
the "sequential" part of CSP. I really want a messaging model
that scales horizontally across processes and across hosts, and
the CSP algebra doesn't work that way. At the time, I found a
few algebras that were attempting to basically merge the two
approaches, but nothing really stood out.

As for when this will be available... I will have a pull request
sorted out shortly, so you could start playing with it soon. It
being included in an actual release means a review and such, but
as this is really just a fairly succinct change to an existing
module, I hope it won't be terribly contentious.

Sounds good. So, I only need to watch the Github repo for phobos
and I will get notified? Or do I need to watch some other repo
for D on Github? Just to be in the save side since I'm new to D
and not familiar with the way things are split up.

... And second, I wasn't terribly fond of
the "sequential" part of CSP. I really want a messaging model
that scales horizontally across processes and across hosts, and
the CSP algebra doesn't work that way.

What is nice about CSP is that you can proof that your code is
free of deadlocks. The Go guys have developed a tool that parses
the code and then tells you what it has found.

As some general background on actors vs. CSP in std.concurrency,
I chose actors for two reasons. First, the communication model
for actors is unstructured, so it's adaptable to a lot of
different application designs.

Yeah, I understand the reasoning. CSP is somewhat from its level
of granularity between low-level locks/semaphores/etc. and
high-level actors. I guess you can easily build actors on top of
CSP. In case of D actors are not that blown up as for example in
Scala or Akka. Creating an actor is mostly like spawning a
thread. So actors in D are much less heavy than in Scala/Akka.
Actors in D must also have a message queue like channels in CSP
where the message is inserted when some tid.send(...) is done. It
is only not accessible from the outside.

... It's possible this can be extended somehow, but I
didn't investigate. And since I don't currently have a great
way
to block fibers, what I was doing there was a busy wait, which
was just slow going waiting for all the threads to spin up.

Goroutines in Go are also co-operative, but I'm not sure (e.g.
not pre-emptive). They probably yield when a channel has run
empty. Well, then they have to in order to detach the thread that
serves the channel to prevent the system to run out of threads. I
guess they may have a strategy when to yield based on how long
other channels had to wait to get a thread attached to them. For
that purposes maybe there is a way to measure the traffic in the
message queues of actors in D to get some effective yielding
done. Just some thought. I'm not really an expert here.

Thanks for the link. Seems like the whole success story in this
article in using Go is based on using goroutines and channels. So
getting something similar accomplished in D would be important
for D to be used for scalabale/elastic server-side software. Rust
is basically using the same approach as Go with regard to
threading. There seems to be something to it.
Cheers, Bienlein

What is nice about CSP is that you can proof that your code is
free of deadlocks. The Go guys have developed a tool that
parses the code and then tells you what it has found.

Note that the Go race detector isn't a static analysis tool that
identifies deadlocks at compile time; it instruments the code and
then detects race conditions at runtime. It's based on the C/C++
ThreadSanitizer runtime library, so a similar thing could
probably be implemented for D.

Goroutines in Go are also co-operative, but I'm not sure (e.g.
not pre-emptive).

The Go scheduler can perform a limited form of pre-emptive
scheduling; from the version 1.2 release notes:
"In prior releases, a goroutine that was looping forever could
starve out other goroutines on the same thread, a serious problem
when GOMAXPROCS provided only one user thread. In Go 1.2, this is
partially addressed: The scheduler is invoked occasionally upon
entry to a function. This means that any loop that includes a
(non-inlined) function call can be pre-empted, allowing other
goroutines to run on the same thread. "

Rust is basically using the same approach as Go with regard to
threading.

Note that the Go race detector isn't a static analysis tool
that identifies deadlocks at compile time; it instruments the
code and then detects race conditions at runtime. It's based on
the C/C++ ThreadSanitizer runtime library, so a similar thing
could probably be implemented for D.

Thanks for pointing out. I seem to have interpreted the
information I had to optimistically.

Yes, I read an interview on infoq.com saying the same thing which
confused me a bit. M:N threading is still there, but is there
still some focus on it as with the Go people ? Anyway, as long as
D continues its own way with fibers ... ;-).

As for when this will be available... I will have a pull request
sorted out shortly, so you could start playing with it soon. It
being included in an actual release means a review and such, but
as this is really just a fairly succinct change to an existing
module, I hope it won't be terribly contentious.

Hello,
I have a little question about how pre-emption works with the
FiberScheduler. Let's say I create 100.000 fibers that all run
long-runners (such as calculating fibonacci(100)). Now I start
another fiber that just prints "hello world" to the console. So
it's a short runner. When can I expect "hello world" to appear on
the console?
a. Will take a long time in case fibonacci(100) never does any
yield.
b. The FiberScheduler will do a yield periodically. So "hello
world" will be displayed in not so long time
c. I need to do a yield from within the fibonacci function here
and then for "hello world" to be displayed in not so long time
Just for my understanding ...
Thanks, Bienlein

Hello,
I have a little question about how pre-emption works with the
FiberScheduler. Let's say I create 100.000 fibers that all run
long-runners (such as calculating fibonacci(100)). Now I start
another fiber that just prints "hello world" to the console. So
it's a short runner. When can I expect "hello world" to appear
on the console?

The API is able to context switch inside send and receive. So if
you aren't sending messages with some frequency then the level of
parallel execution will be fairly low. For apps like this, it's
possible that a more complex scheduler that is backed by a thread
pool would be more appropriate. Since D isn't built from the
ground up around fibers, choosing the right scheduler for your
application is an important decision.

The API is able to context switch inside send and receive. So
if you aren't sending messages with some frequency then the
level of parallel execution will be fairly low. For apps like
this, it's possible that a more complex scheduler that is
backed by a thread pool would be more appropriate. Since D
isn't built from the ground up around fibers, choosing the
right scheduler for your application is an important decision.

Hi Sean,
thanks for the quick reply. Let's say I have most of my actors
running with the FiberScheduler. Then I have my emergency actor
that is supposed to run down my nuclear power plant here and now
in case it receives a message to do so. Now I let the emergency
actor run in a kernel thread. This way it should be able to be
immediately responsive. Is that right? Because that would be
really good enough for me.
Thanks, Bienlein

Yes. The schedulers are required to maintain some data (one being
a message queue) for each "thread" they spawn. If the data is
requested from a thread the scheduler doesn't own, it's required
to return a thread-local copy instead. In short, any manually
created kernel thread will get its own message queue regardless
of the scheduler in place.

Are there some plans when we will have the FiberScheduler be included in
D? Wait for D 2.066? Not wanting to be "pushy", it's only because of
pure impatience ;-)

Just out of curiosity, what did you miss in vibe.d regarding fiber based
scheduling?
Of course it would be great to have something like this in Phobos, but
to really have a consistent system, changes need to be made all over the
place to make things work asynchronously under the hood, or you have a
system that makes it really easy to shoot yourself in the foot, which
may be much worse than not having support at all. Rather than starting
with bits of this here and there, the complete picture should IMO be
well though out beforehand, and rather be started by integrating low
level asynchronous operations (and a pluggable event loop
implementation). Adding fiber based concurrency would then be the last step.

Just out of curiosity, what did you miss in vibe.d regarding
fiber based scheduling?

Hi Söhnke,
I'm thinking of developing a little actor library on top of D's
spawn/receive model for creating threads, which is already
actor-like but on a level of global functions. I want to mold
some thin class layer on top of it to have actors on class level.
Vibe.d would be a good solution for distributed actors. But for a
first step I want to have local actors. Actors that are in the
same memory space don't need to communicate through sockets as in
case of vibe.d.
Regards, Bienlein

Just out of curiosity, what did you miss in vibe.d regarding fiber
based scheduling?

Hi Söhnke,
I'm thinking of developing a little actor library on top of D's
spawn/receive model for creating threads, which is already actor-like
but on a level of global functions. I want to mold some thin class layer
on top of it to have actors on class level. Vibe.d would be a good
solution for distributed actors. But for a first step I want to have
local actors. Actors that are in the same memory space don't need to
communicate through sockets as in case of vibe.d.
Regards, Bienlein

The vibe.core.concurrency module provides the same interface as
std.concurrency (with some different details). Once Sean's fiber
additions to std.concurrency will be ready, vibe.core.concurrency will
be layered on top of (and finally replaced by) it.
There is also vibe.stream.taskpipe, which offers a stream interface for
passing data between tasks. This works for tasks in the same or in
different threads.

Just out of curiosity, what did you miss in vibe.d regarding
fiber based scheduling?

There is something else I forgot to mention. One scenario I'm
thinking of is to have a large number of connections like more
than 100.000 I want to listen on. This results in a situation
with blocking I/O for all those connections. Fibers in D are more
like continuations that are distributed over several kernel
threads. The way Sean Kelly has implemented the FiberScheduler a
fiber is invoked in case it receives an item like data through
the connection it serves as in my scenario. At least this is the
way I understood the implementation. So I can have like 100.000
connections simultanously as in Go without having to use Go (the
Go language is too simple for my taste).

Just out of curiosity, what did you miss in vibe.d regarding fiber
based scheduling?

There is something else I forgot to mention. One scenario I'm thinking
of is to have a large number of connections like more than 100.000 I
want to listen on. This results in a situation with blocking I/O for all
those connections. Fibers in D are more like continuations that are
distributed over several kernel threads. The way Sean Kelly has
implemented the FiberScheduler a fiber is invoked in case it receives an
item like data through the connection it serves as in my scenario. At
least this is the way I understood the implementation. So I can have
like 100.000 connections simultanously as in Go without having to use Go
(the Go language is too simple for my taste).

In vibe.d, there are basically two modes of fiber scheduling. The usual
mode is purely driven by the event loop: Once a task/fiber triggers a
blocking operation, lets say a socket receive operation, it registers
its handle for the corresponding event and calls an internal rawYield()
function. Once the event fires, the fiber is then resumed.
The other mode happens when yield() (in vibe.core.core) is explicitly
called. In this case, tasks are inserted into a singly-linked list,
which is processed in chunks alternated with a call to processEvents()
and in FIFO order to ensure a fair scheduling and to avoid blocking
event processing when tasks perform continuous computations with
intermittent yield() calls.
So the first mode AFAICS is working just like how Sean has made his
fiber scheduler. And at least on 64-bit systems, there is nothing that
speaks against handling huge numbers of connections simultaneously.
32-bit can also handle a lot of connections with small fiber stack sizes
(setTaskStackSize), but using decently sized stacks will quickly eat up
the available address space.