If you want to share your experience with us, learn more about the technology or just have the opportunity to meat Yakaz team along with Datastax French team, feel free to RSVP on the dedicated Meetup page. Hum, having a couple of drinks with nice people may also be an additional reason to join us ...

In case you can't make it, but want to meet Yakaz software engineers and chat with them about this hot topic, you may find Fabien & Olivier at Cassandra Summit in London on October 17th 2012.

yamerl is able to parse YAML 1.1 and YAML 1.2, as well as JSON. Documents can come from in-memory strings, files or streams, for which the developer is responsible for feeding the parser with chunks of data.

Besides standard node types, Erlang-specific types are supported by:

Erlang atoms, with the ability to autodetect them,

Erlang fun().

yamerl is distributed under the terms of the 2-clause BSD license. Code, documentation and testsuite are available from Github: https://github.com/yakaz/yamerl.

The files to make a Debian package are also available in this same repository. A FreeBSD port will follow soon.

Avatar selection

If he sets an avatar from the "Profile settings" page, we obviously display this one.

If he doesn't set an avatar, we query Gravatar, using its e-mail, to get an avatar.

If Gravatar returns nothing, we default to a random "house" avatar.

Security considerations

One known issue with Gravatar is the MD5 of the user e-mail made public. Some technics permits one to find the original e-mail without brute forcing anything. Therefore, on a website like Stackoverflow.com where the URL to the avatar points directly to Gravatar, an attacker could collect thousands of MD5 hashes, recover the corresponding e-mail and possibily associate it with a user real name.

For Yakaz.com, we use an approach close to the one described in the "Line of defense" paragraph at the end of the article mentionned above. We use a unique URL to get the avatar, no matter the source of the image. The backend will try our local avatar storage, then Gravatar and finally the default avatar. Therefore, the request to Gravatar is not visible to the user.

As explained by the article, this is not bullet-proof but to collect e-mails, the attacker would have to compare images, taking resize and conversion into account. This would be a much slower process.

Avatar freshness

We do not query Gravatar everytime an avatar is requested. We have an HTTP cache in front of the backend responsible for the avatars. Avatars from Gravatar are cached for 24 hours. Therefore, if a user changes its avatar on Gravatar, it won't be visible on Yakaz.com immediately.

09/15/2011

At Yakaz, Erlang has a fundamental importance. We use it, among other things, and mainly for our cluster of webservers and our chat service. Many OTP applications share the same Erlang VM with (sometimes complex) dependencies between them.

As much as possible, we try to do hot upgrades of these applications. But, sometimes, we need to shut down a node. This is done using the OTP design principles and it must be safe: no data lost, no error thrown.

For example, when a node is stopped, our http service must close its listening sockets to not accept new requests, close all waiting TCP connections and reply to all HTTP requests in progress.

During our work, we found some corner cases with OTP supervisors. In this article, we are going to explain these cases and the solutions found to work around it.

These OTP design principles in general and the supervisor behaviour in particular are explained in the OTP Design Principles User's Guide. The readers must be familiar with these notions in order to fully understand this article.

Infinite timeout to shut down worker processes

The first problem we encountered is about a limitation of the supervisor behaviour. Children attached to a supervisor are themselves supervisors or workers. For each of them, a Shutdown strategy is defined. This is a part of their child specification and it defines how they should be terminated.

An integer timeout value means that the supervisor tells the child process to terminate by calling exit(Child, shutdown) and then waits for an exit signal back. If no exit signal is received within the specified time, the child process is unconditionally terminated using exit(Child, kill).

If the child process is another supervisor, it should be set to infinity to give the subtree enough time to shutdown.

This is not very explicit here, but the Shutdown value can (and should) be set to infinity for supervisor children only. This is forbidden for worker children. This is a crude limitation.

To do some cleanup when a worker process is stopped, we must define an upper bound timeout to execute it. When it is possible, this is the best solution and all is fine. But if not, because we are not able to use an infinite timeout to shut down a worker process, we must find a workaround.

Given the circumstances, there are 3 solutions to solve this problem:

The good: Stop concerned workers by hand. With this method, it's easy to properly shut down these processes. For an OTP application, this can be done in the prep_stop function of the application callback module. But there are 2 drawbacks. Firstly, Restart strategy for processes in question must not be set to permanent[1]. Secondly, The responsability to stop these processes falls to the developper (which is messy) and no more to the supervisor.

The bad: Set a very high timeout for want of anything better. This is not elegant and this has the taste of defeat. But it works almost everytime.

The ugly: Declare concerned processes as supervisors instead of workers. There is no side-effect (as far as we know) and we can set an infinite timeout. But, there is no guarantee that this will be always so. This is a crafty way (and honestly, an ugly way) to solve our problem but it serves the purpose.

As we have just seen, there is no proper and general solution to solve this problem. There is no evident reason for this limitation. The cleanest solution we found at Yakaz was to patch the supervior module to remove this restriction. You can find our patch on GitHub (Diff view). It was submitted to Erlang/OTP team, by hoping it will be accepted.

Shutdown dynamic children for simple_one_for_one supervisor

Another problem, more tricky, with superviors is about simple_one_for_one supervisors.

simple_one_for_one supervisors are used to manage child processes dynamically instanciated. All these children share the same child specification. this is handy to implement connection handlers: everytime a new connection is accepted, we can start a new child to manage it.

But it exists a subtle corner case with this supervisor's type: Dynamic child processes are not explicitly killed by the supervisor when it is shut down.

The official documentation says:

Important note on simple-one-for-one supervisors: The dynamically created child processes of a simple-one-for-one supervisor are not explicitly killed, regardless of shutdown strategy, but are expected to terminate when the supervisor does (that is, when an exit signal from the parent process is received).

Because child processes are linked (in the Erlang sense of the word) with their supervisor, when this last one dies, then dynamic child processes receive an exit signal from it and leave. All is fine as long as we stop simple_one_for_one supervisor manually. But, if it happens when we stop an application, after the top supervisor has stopped, the application master kills all remaining processes associated to this application[2] including leftover dynamic children. So, dynamic children that trap exit signals can be killed during their cleanup. This is unpredictable and highly time-dependent.

Let's explain this behaviour in detail with an example. Here is our supervision tree:

App ---> TopSup ---> Sup ---> [SimpleOneForOneWorkers]
|
SomeWorker

Suppose that:

[SimpleOneForOneWorkers] are implemented using the gen_server behaviour and they trap exit signals

No brutal_kill shutdown strategy is used

If Sup is shut down by calling supervisor:terminate_child(TopSup, Sup) by hand, TopSup will tell Sup to termiante by calling exit(Sup, shutdown). Once Sup is dead, all [SimpleOneForOneWorkers] receive an 'EXIT' message from it. Because they trap exit signals, Worker:terminate/2 function is called with Reason=shutdown and, in turn, they die. This is the good case.

However, instead of shutting down Sup by hand, if we stop the application with application:stop(App), this will stop TopSup. During its termination, it will stop SomeWorker and Sup, then dies. At this time, TopSup, SomeWorker and Sup are fully stopped and [SimpleOneForOneWorkers] are stopping (some may be already stopped, some not). The last step is the application master killing all workers that were not dead yet, in the middle of their cleanup.

This behaviour is very troublesome and hard to debug. An easy way to fix this problem can be found in Agner. It uses the ApplicationCallback:prep_stop(State) function to fetch a list of all the simple_one_for_one workers, monitors them, and then wait for all of them to die in the ApplicationCallback:stop(State) function. This forces the application master to stay alive until all of the dynamic children died.

This solution is elegant and has no technical drawback. But it must be done for all simple_one_for_one supervisors. This is painful and error prone. Nevertheless, we can live with that.

The main problem here is that it is a breach of promises made by the supervisor behaviour. Its purpose is to start, stop and monitor its child processes, restarting them when necessary. There is no reason to deal with dynamic child processes in a different way than other child processes. it can be seen as a bug or a lack of consistency. So, again, we have decided to patch the supervior module. You can find our patch on GitHub (Diff view) and it was submitted to Erlang/OTP team, by hoping it will be accepted.

[1] Because processes are stopped manually, out of the supervisor's scope, if Restart strategy is set to permanent, the supervisor will try to restart it.
[2] i.e. all processes with the application master as group leader

05/12/2011

Yaws is a HTTP high perfomance 1.1 webserver particularly well suited for dynamic-content web applications. Two separate modes of operations are supported:

Standalone mode where Yaws runs as a regular webserver daemon. This is the default mode.

Embedded mode where Yaws runs as an embedded webserver in another Erlang application.

Yaws is entirely written in Erlang, and furthermore it is a multithreaded webserver where one Erlang lightweight process is used to handle each client.

The main advantages of Yaws compared to other Web technologies are performance and elegance. The performance comes from the underlying Erlang system and its ability to handle concurrent processes in an efficient way. Its elegance comes from Erlang as well. Web applications don't have to be written in ugly ad hoc languages.

Some benchmarks later, we decided to use it for the Yakaz website, as a replacement of Apache, precisely for these reasons (and because we have some skills in Erlang).

After 7 months in production, it proves that it scales very well and that it is
a very stable server. But to meet all our needs, we had to patch it to add some missing features and to fix some bugs or unexpected behaviours.

You can find on github our fork of Yaws with all our updates. Feel free to get it, feedbacks are welcome.
To have more information on our patches, see the branches overview.

UPDATE (05-May-2011): All our modifications was integrated into Yaws-1.90. Thanks to Steve Vinoski and Claes Wikstrom.

05/02/2011

When your goal is to integrate a high performance text based search engine and a large number of location dependant classified ads, one of the questions that arises is : How do you efficiently query and manipulate any number of these ads within an area of any shape and size ?

There are a number of online solutions but we wanted an in-house system, mainly for performance issues, that was as close as possible to the Exalead search layer. With the rest of the engineering team at Yakaz.com, we chose to actually put it inside the search layer. Out of the box, Exalead supports numerical queries with the regular comparison operators (<, >...), so using a classical coordinate system (latitude/longitude, Mercator...) is feasible. Fairly obviously though, this cannot efficiently be used in production and, as speed is a paramount focal point, we had to come up with a different way of doing things.

The most common spatial indexing technique is the R-tree (see R-tree@wikipedia) and derivatives. Although efficient, the "shape" of the tree depends on the data and evolves when data is added or removed. But remember, this has to be fed to a full-text search engine, we cannot rebalance the tree whenever a new point is added. Therefore we need to store and retrieve objects with an a priori knowledge of the tree structure. The next obvious step is to look at the Quadtree structure, and we took a very good look...

03/10/2011

It's been almost six years that Yakaz has been created. Many people joined the development team to improve the site and we hope you like it ! To cling to the ideal of an open web with open standards we favor the use and enhancement of opensource projects. In this context, our precious geeks want to publish some articles to comment on various technical subjects. Long gone are the incomprehensible pages of code that look like monitors from the Matrix movies. Now and every month, Yakaz Engineers will speak freely on some parts of their work and will try help the whole community on the web. We hope you will have much pleasure in reading these articles.

Feel free to respond and ask questions, our engineers will be there to assist !