05/12/2011

Yaws is a HTTP high perfomance 1.1 webserver particularly well suited for dynamic-content web applications. Two separate modes of operations are supported:

Standalone mode where Yaws runs as a regular webserver daemon. This is the default mode.

Embedded mode where Yaws runs as an embedded webserver in another Erlang application.

Yaws is entirely written in Erlang, and furthermore it is a multithreaded webserver where one Erlang lightweight process is used to handle each client.

The main advantages of Yaws compared to other Web technologies are performance and elegance. The performance comes from the underlying Erlang system and its ability to handle concurrent processes in an efficient way. Its elegance comes from Erlang as well. Web applications don't have to be written in ugly ad hoc languages.

Some benchmarks later, we decided to use it for the Yakaz website, as a replacement of Apache, precisely for these reasons (and because we have some skills in Erlang).

After 7 months in production, it proves that it scales very well and that it is
a very stable server. But to meet all our needs, we had to patch it to add some missing features and to fix some bugs or unexpected behaviours.

You can find on github our fork of Yaws with all our updates. Feel free to get it, feedbacks are welcome.
To have more information on our patches, see the branches overview.

UPDATE (05-May-2011): All our modifications was integrated into Yaws-1.90. Thanks to Steve Vinoski and Claes Wikstrom.

05/02/2011

When your goal is to integrate a high performance text based search engine and a large number of location dependant classified ads, one of the questions that arises is : How do you efficiently query and manipulate any number of these ads within an area of any shape and size ?

There are a number of online solutions but we wanted an in-house system, mainly for performance issues, that was as close as possible to the Exalead search layer. With the rest of the engineering team at Yakaz.com, we chose to actually put it inside the search layer. Out of the box, Exalead supports numerical queries with the regular comparison operators (<, >...), so using a classical coordinate system (latitude/longitude, Mercator...) is feasible. Fairly obviously though, this cannot efficiently be used in production and, as speed is a paramount focal point, we had to come up with a different way of doing things.

The most common spatial indexing technique is the R-tree (see R-tree@wikipedia) and derivatives. Although efficient, the "shape" of the tree depends on the data and evolves when data is added or removed. But remember, this has to be fed to a full-text search engine, we cannot rebalance the tree whenever a new point is added. Therefore we need to store and retrieve objects with an a priori knowledge of the tree structure. The next obvious step is to look at the Quadtree structure, and we took a very good look...