Category Archives: Back-end

Post navigation

This post is an introduction to a useful tool here at Intersec, a tool that we call IOP: the Intersec Object Packer.

IOP is our take on the IDL approach. It is a method to serialize structured data to use in our communication protocols or data storage technologies. It is used to transmit data over the network in a safe manner, to exchange data between different programming languages or to provide a generic interface to store (and load) C data on disk. IOP provides data integrity checking and backward-compatibility.

The concept of IDL is not new. There are a lot of different available languages, such as Google Protocol Buffers or Thrift. IOP itself isn’t new, its initial version was written in 2008 and has seen a lot of evolutions during its, almost decade-long, life. However, IOP has proven itself to be solid and sufficiently well designed for not seeing any backward incompatible changes during that period.

IOP package description

The first thing to do with IOP is to declare the data structures in the IOP description language. With those definitions, our IOP compiler will automatically create all the helpers needed to use these IOP data structures in different languages and to allow serialization and deserialization.

Data stucture declaration is done in a C-like syntax (actually, it is almost the D language syntax) and lives inside a .iop file. As a convention, we use CamelCase in our iop files (which is different from our .c files coding rules).

Let’s look at a quick example:

1

2

3

4

structUser{

intid;

stringname;

};

Here we are. An IOP object with two fields: an id (as an integer) and a name (as a string). Obviously, it is possible to create much more complex structures. To do so, here is the list of available types for our structure fields.

Basic types

IOP allow several low-level types to be used to define object members. One can use the classics:

int/uint(32 bits signed/unsigned integer)

long/ulong(64 bits signed/unsigned integer)

byte/ubyte(8 bits signed/unsigned integer)

short/ushort(16 bits signed/unsigned integer)

bool

double

string

and also the types:

bytes (a binary blob)

xml (for an XML payload)

void (to specify a lack of data).

Complex types

Four complex data types are also available for our fields.

Structures

The structure describes a record containing one or more fields. Each field has a name and a type. To see what it looks like, let’s add an address to our user data structure:

1

2

3

4

5

6

7

8

9

10

11

12

structAddress{

intnumber;

stringstreet;

intzipCode;

stringcountry;

};

structUser{

intid;

stringname;

Address address;

};

Of course, there is no theoretical limitation on the number of struct “levels”. A struct can have a struct field which also contains a struct field etc.

Classes

A class is an extendable structure type. A class can inherit from another class, creating a new type that adds new fields to the one present in its parent class.

We will see classes in more details in a separate article.

Unions

An union is a list of possibilities. Its description is very similary to a structure: it has typed fields, but only one of the fields is defined at a time. The name union is inherited from C since the concept is very similar to C unions, however IOP unions are tagged, which means we do know which of the field is defined.

Example:

1

2

3

4

5

unionMyUnion{

intwantInt;

stringwantString;

User wantUser;

};

Enumeration

The last type that can be used is the enumeration. Here again, an enum is similar to the C-enum. It defines several literal keys associated to integer values. Just like the C enum, the IOP enum supports the whole integer range for its values.

Example:

1

2

3

4

5

enumMyEnum{

VALUE_1=1,

VALUE_2=2,

VALUE_3=3,

};

Member constraints

Now that we have all the types we need for our custom data structure fields, it’s time to add some new features to them, in order to gain flexibility. Those features are called constraints. These constraints are qualifiers for IOP fields. For now, we have 4 different constraints: optional, repeated, with a default value and the implicit mandatory constraint.

Mandatory

By default, a member of an IOP structure is mandatory. This means it must be set to a valid value in order for the structure instance to be valid. In particular, you must guarantee the field is set before serializing/deserializing the object. By default, mandatory are value fields in the generated C structure: this means the value is inlined in the structure type and is copied. There are however some exceptions to this rule but we will see that later.

The example is pretty simple:

1

2

3

structFoo{

intmandatoryInteger;

};

Optional members

An optional member is indicated by a ? following the data type. The packers/unpackers allow these members to be absent without generating an error.

1

2

3

4

5

structFoo{

int?optionalMember;

Bar?optionalMember2;

intmandatoryInteger;

};

Repeated members

A repeated member is a field that can appear zero or more times in the structure (often represented by an array in the programming languages). As such a repeated field is optional (can be present 0 times). A repeated member is indicated by a “[]” following the data type.

In the next example, you can consider the repeatedInteger field as a list of integers.

1

2

3

4

5

6

structFoo{

int[]repeatedInteger;

int?optionalMember;

Bar?optionalMember;

intmandatoryInteger;

};

With default value

A field with a default value is a kind of mandatory member but allowed to be absent. When the member is absent, the packer/unpacker always sets the member to its default value.

A member with a default value is indicated by setting the default value after the field declaration.

1

2

3

4

5

6

7

structFoo{

intdefaultInteger=42;

int[]repeatedInteger;

int?optionalMember;

Bar?optionalMember;

intmandatoryInteger;

};

Moreover, it is allowed to use arithmetic expressions on integer (and enum) member types like this:

1

2

3

4

5

6

7

structFoo{

intdefaultInteger=2*(256<<20)+42;

int[]repeatedInteger;

int?optionalMember;

Bar?optionalMember;

intmandatoryInteger;

};

IOP packages

The last thing to know to be able to write our first IOP file is about packages.

An IOP file corresponds to an IOP package. Basically, the package is kind of a namespace for the data structures you are declaring. The filename must match with package name. Every IOP file must define its package name like this:

1

2

3

4

5

6

7

packagefoo;/*< package name of the file foo.iop */

structFoo{

[...]

};

[...]

A package can also be a sub-package, like this:

1

2

3

4

5

6

7

packagefoo.bar;/*< package name of the file foo/bar.iop */

structBar{

[...]

};

[...]

Finally, you can import objects from another package by specifying the package name before the type:

1

2

3

4

5

6

7

packageplop;/*< package name of the file plop.iop */

structPlop{

foo.bar.Bar bar;

};

[...]

How to use IOP

Before going to more complicated features on IOP, let’s see a simple example of how to use our new custom data structures that we just declared.

When compiling our code, a first pass is done on our IOP files using our own compiler. This compiler will parse the .iop files and generate the corresponding C sources files that provides helpers to serialize/deserialize our data structures. Here again, we will see it in more details soon

Let’s see an example of code which is using IOP. First, let’s assume we have declared a new IOP package:

1

2

3

4

5

6

7

8

9

10

11

12

13

packageUser;

structUserAddress{

stringstreet;

int?zipCode;

stringcity;

};

structUser{

ulong id=1;

stringlogin;

UserAddress addr;

};

This will create several C files containing the type descriptors used for data serialization/deserialization as well as the C types declarations:

1

2

3

4

5

6

7

8

9

10

11

12

13

structuser__user_address__t{

char*street;/*< Actually a slightly more complicated type is used for

* strings, but no need to be too specific here :)

*/

opt_i32_t zip_code;

char*city;

};

structuser__user__t{

uint64_t id;

char*login;

structuser__user_address__t addr;

};

Not very different from the IOP file right? We can notice some uncommon stuff still:

The opt_i32_t type for zip_code. This is how we handle optional field. It is a structure containing a 32 bits integer + a boolean indicating if the field is set or not.

The stuctures names are now in snake_case instead of camelCase. The name of the package is added as a prefix of each structures, and there is a __t suffix too. This helps to recognize IOP structures when we meet one in our C code.

All the code generated by our compiler will be available through a user.iop.h file.

Now let’s play with it in our code :

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

#include "user.iop.h"

[...]

intmy_func(void){

user__user__t user;

/* This function will initialize all the fields (and sub-fields) of the

* structure, according to the IOP declarations. Here, everything will be set

* to 0/NULL but the field "id" which will contains the value "1". The first

* argument indicates the package + structure name of our IOP object.

*/

iop_init(user__user,&user);

/* This function will pack our IOP structure into an IOP binary format and

* returns a pointer to the created buffer containing the packed structure.

* The structures will be packed in order to use as little memory as possible.

* Let put aside the memory management questions for this post.

*/

void*binary_data=iop_bpack(user__user,&user);

/* This call must have failed. Our constraint are not respected, as several

* mandatory fields were not correctly filled.

*/

assert(binary_data==NULL);

user.addr.street="221B Baker Street";

user.addr.city="London";

user.login="SH";

binary_data=iop_bpack(user__user,&user);

/* This one should be the good one. Even if "id" field and "addr.zip_code" are

* not filled, it is not a problem as the first one got a default value and

* the second one is an optional field.

*/

assert(binary_data!=NULL);

/* Now we can do whatever we want with these data (writing it on disk for

* example). But for now, let's just try to unpack it. Here again, put a

* blindfold about memory management.

*/

user__user__t user2;

intres=iop_bunpack(binary_data,user__user,&user2);

/* Unpacking should have been successful, and we now have a "user2" struct

* identical to "user" struct.

*/

assert(res>=0)

}

Here we are. IOP gave us the superpower of packing/unpacking data structures in a binary format in two simple function calls. These binary packed structures can be used for disk storage. But as we will see in a future article, we also use it for our network communications.

About multi-process programming

In modern software engineering, you quickly reach the point where one process cannot handle all the tasks by itself. For performance, maintainability or reliability reasons you do have to write multi-process programs. One can also reach the point where he wants its softwares to speak to each-other. This situation raises the question: how will my processes “talk” to each other?

If you already have written such programs in C, you are probably familiar with the network sockets concept. Those are handy (at least compared to dealing with the TCP/IP layer yourself): it offers an abstraction layer and lets you have endpoints for sending and receiving data from a process to another. But quickly some issues arise:

How to handle many-to-many communications?

How to scale the solution?

How to have a clean code that doesn’t have to handle many direct connections and painful scenarios like disconnection/re-connection?

How can I handle safely all the corner cases with blocking/non-blocking reads/writes?

Almost every developer or every company has its own way to answer those questions, with the development of libraries responsible of communications between processes.

Of course, we do have our own solution too

So let’s take a look on what we call MiddleWare, our abstraction layer to handle communication between our processes and software instances.

What is MiddleWare ?

At Intersec, the sockets were quickly replaced by a first abstraction layer called ichannels. These channels basically simplify the creation of sockets, but we still deal with a point-to-point communication. So we started the development of MiddleWare, inspired by the works of iMatix on ØMQ.

First, let see how things were done before Middleware:

As you can see, every daemon or process had to open a direct connection to the other daemon he wanted to talk to, which leads to the issues described above.

Now, after the introduction of our MiddleWare layer:

So what MiddleWare is about? MiddleWare offers an abstraction layer for developers. With it, no need to manage connections and handle scenarios such as disconnection/re-connection anymore. We now communicate to services or roles, not to processes nor daemons.

MiddleWare is in charge of finding where the receiver is located and routing the message accordingly.

This solves many of the problems we were talking about earlier: the code of a daemon focuses on the applicative part, not on the infrastructure / network management part. It is now possible to have many-to-many communications (sending a message to N daemons implementing the same role) and the solution is scalable (no need to create multiple direct connections when adding a new service).

Services vs roles

MiddleWare is able to do service routing and/or role routing. A service is basically a process, the user can specify a host identifier and an instance identifier to get a channel to a specific instance of a service.

Processes can also expose roles: a role is a contract that associates a name with a duty and an interface. Example: "DB:master" can be a role of the master of the database, the one which can write in it, whereas "DB:slave" can be a role for a slave of the database, which has read-only replicate of it. One can also imagine a "User-list:listener" for example, which allows to register a callback for any user-list update.

Roles dissociate the processes from the purpose and allow extensibility of the software by allowing run-time additions of new roles in existing processes. Roles can be associated to a constraint (for example “unique” in cluster/site).

Those roles can also be attached to a module, as described in one of our previous post. As module can be easily rearranged, this adds another layer of abstraction between the code and the actual topology of the software.

Some examples from the API

How does an API for such a feature look like?

As described above, one of the main ideas of MiddleWare is to ease inter-processes communication handling, and let the developer focus on the applicative part of what he is doing. So it’s important to have very few steps to use the “basic” features: create a role if needed, create a channel and use it and handle replies.

And here you are, no need to do more: no connection management, no need to look for the location of the service and the right network address in the product configuration. A simple function call give you a mw_channel_t pointer you can use to send messages. The first argument is what we call a service at intersec (as said above, it is basically a process). Here we just want to have a channel to our DB service. The second and third arguments indicate an host identifier and an instance identifier, if we want to target a specific instance of this service. Here, we just want a channel that targets all the available instances of the DB service by specifying -1 as both host and instance ids. Finally, the last argument indicates whether a direct connection is needed or not, but we will come back to this later.

Now let see some roles. Processes can register/unregister a role with that kind of API:

1

2

mw_register_role("db:master");

mw_unregister_role("db:master");

Pretty simple, isn’t it? All you need to do is give a name to your role. If we want to use a more complex role, with a unique in cluster constraint, we do have another function to do so:

1

mw_register_unique_role("db:master",role_cb);

The only difference is the need of a callback, which takes as arguments the name of the role and an enum value. This enum represents the status of the role. The callback will be called when the role is granted to a process by MiddleWare: the new owner get a MW_ROLE_OWNER status in its callback, the others get the MW_ROLE_TAKEN value.

On the client side, if you want to declare your role, all you have to do is:

1

mw_channel_t *chan=mw_new_channel_to_role("db:master",false);

And chan can now be used to send messages to our process which registered the "db:master" role.

How does this (wonderful) functionality work?

The key of MiddleWare is its routing tables. But to understand how it works, I need to introduce to you another concept of our product at Intersec: the master-process. No doubt it will ring a bell, as it is a common design pattern.

In our product, a single process is responsible for launching every sub-process and for monitoring them. This process is called the master process. It does not do much, but our products could not work without it. It detects when one of its child goes down and relaunch it if needed. It also handles communications to other software instances.

Now that you know what a master is in our Intersec environment, let’s go back to MiddleWare and its routing tables.

By default, the routing is done by our master process: every message is transmitted to the master which forwards it to the right host and then the right process.

The master maintains routing tables in order to be resilient to network connectivity issues. Those routing tables are built using a path-vector-like algorithm.

So let’s take a look to another picture which show the communication with more details:

As we can see, MiddleWare opens connections between every master processes and their childs. There are also connections between each master. From the developer’s standpoint, this is completely transparent. One can ask for a channel from the Core daemon to the Connector one, or a channel between the two Computation daemons for example, and then start to send/receive messages on these channels. MiddleWare will route these messages from the child lib to the master on the same host, then to the master on the receiving host, to finally transfer it to the destination process.

In case you expect a large amount of data to go through a channel, it is still possible to ask for a direct connection to a process during the creation of that channel. MiddleWare will still handle all the connection management complexity and from that point, everything will work exactly the same. Note that in our implementation we never have the guarantee that a message will go through a direct link, as MiddleWare will still route the queries throught the master if the direct link is not ready yet. Moreover, every communication from a service to another will use the direct link as soon as it exists.

Tradeoffs

Having such a layer in a software does not come without some drawbacks. The use of MiddleWare creates an overhead introduced by the abstraction cost: the routing table creation adds a bit of traffic each time a process starts or stop, or when roles are registered or unregistered.

As start-up and shutdown are not critical parts of the execution for us, it is fine to have a small overhead here. In the same way, roles registrations are not frequent, it is not an issue to add some operations during this step.

Finally, high traffic may put some load on our master process that must route the messages. Not a big issue on that one too, as our master does not do much beside message routing. The main responsibility of this process is to monitor its children, no complex calculation or time-consuming operations here. Moreover, if an heavy traffic is expected between two daemons, it is a good practice to ask for a direct link. This decreases the load on the master and therefore the risk of impacting MiddleWare.

We do write complex software, and like everyone doing so, we need a way to structure our source code. Our choice was to go modular. A module is a singleton that defines a feature or a set of related feature and exposes some APIs in order for other modules to access these features. Everything else, including the details of the implementation, is private. Examples of modules are the RPC layer, the threading library, the query engine of a database, the authentication engine, … Then, we can compose the various daemons of our application by including the corresponding modules and their dependencies.

Most modules maintain an internal state and as a consequence they have to be initialized and deinitialized at some point in the lifetime of the program. An internal rule at Intersec is to name the constructor {module}_initialize() and the destructor {module}_shutdown(). Once defined, these functions have to be called and this is where everything become complicated when your program has tens of modules with complex dependencies between them.

Introduction

Back in 2009, Snow Leopard was quite an exciting OS X release. It didn’t focus on new user-visible features but instead introduced a handful of low level technologies. Two of those technologies Grand Central Dispatch (a.k.a. GCD) and OpenCL were designed to help developers benefit from the new computing power of modern computer architectures: multicore processors for the former and GPUs for the latter.

Alongside the GCD engine came a C language extension calledblocks. Blocks are the C-based flavor of what is commonly called a closure: a callable object that captures the context in which it was created. The syntax for blocks is very similar to the one used for functions, with the exception that the pointer star is * replaced by a caret ^. This allows inline definition of callbacks which often can help improving the readability of the code.

Introduction

The most used custom allocators at Intersec are the FIFO and the Stack allocators, detailed in a previous article. The stack allocator is extremely convenient, thanks to the t_scope macro, and the FIFO is well fitted to some of our use cases, such as inter-process communication. It is thus important for these allocators to be optimized extensively.

We are two interns at Intersec, and our objective for this 6 week internship was to optimize these allocators as far as possible. Optimizing an allocator can have several meanings: it can be in terms of memory overhead, resistance to contention, performance… As the FIFO allocator is designed to work in single threaded environments, and the t_stack is thread local, we will only cover performance and memory overhead.

In the third post of the memory series we briefly explained locality and why it is an important principle to keep in mind while developing a memory-intensive program. This new post is going to be more concrete and explains what actually happens behind the scene in a very simple example.

This post is a follow-up to a recent interview with a (brilliant) candidate1. As a subsidiary question, we presented him with the following two structure definitions:

1

2

3

4

structfoo_t{

intlen;

char*data;

};

1

2

3

4

structbar_t{

intlen;

chardata[];

};

The question was: what is the difference between these two structures, what are the pros and the cons of both of them? For the remaining of the article we will suppose we are working on an x86_64 architecture.

By coincidence, an intern asked more or less at the same time why we were using bar_t-like structures in our custom database engine.

Introduction

Here we are! We spent 4 articles explaining what memory is, how to deal with it and what are the kind of problems you can expect from it. Even the best developers write bugs. A commonly accepted estimation seems to be around of few tens of bugs per thousand of lines of code, which is definitely quite huge. As a consequence, even if you proficiently mastered all the concepts covered by our articles, you’ll still probably have a few memory-related bugs.

Memory-related bugs may be particularly hard to spot and fix. Let’s take the following program as an example:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

#include <stdio.h>

#define MAX_LINE_SIZE 32

staticconstchar*build_message(constchar*name)

{

charmessage[MAX_LINE_SIZE];

sprintf(message,"hello %s!\n",name);

returnmessage;

}

intmain(intargc,char*argv[])

{

fputs(build_message(argc>1?argv[1]:"world"),stdout);

return0;

}

This program is supposed to take a message as argument and print “hello <message>!” (the default message being “world”).

The behavior of this program is completely undefined, it is buggy, however it will probably not crash. The function build_message returns a pointer to some memory allocated in its stack-frame. Because of how the stack works, that memory is very susceptible to be overwritten by another function call later, possibly by fputs. As a consequence, if fputs internally uses sufficient stack-memory to overwrite the message, then the output will be corrupted (and the program may even crash), in the other case the program will print the expected message. Moreover, the program may overflow its buffer because of the use of the unsafe sprintf function that has no limit in the number of bytes written.

So, the behavior of the program varies depending on the size of the message given in the command line, the value of MAX_LINE_SIZE and the implementation of fputs. What’s annoying with this kind of bug is that the result may not be obvious: the program “works” well enough with simple use cases and will only fail the day it will receive a parameter with the right properties to exhibit the issue. That’s why it’s important that developers are at ease with some tools that will help them to validate (or to debug) memory management.

In this last article, we will cover some free tools that we consider should be part of the minimal toolkit of a C (and C++) developer.

malloc() is not the one-size-fits-all allocator

malloc() is extremely convenient because it is generic. It does not make any assumptions about the context of the allocation and the deallocation. Such allocators may just follow each other, or be separated by a whole job execution. They may take place in the same thread, or not… Since it is generic, each allocation is different from each other, meaning that long term allocations share the same pool as short term ones.

Consequently, the implementation of malloc() is complex. Since memory can be shared by several threads, the pool must be shared and locking is required. Since modern hardware has more and more physical threads, locking the pool at every single allocation would have disastrous impacts on performance. Therefore, modern malloc() implementations have thread-local caches and will lock the main pool only if the caches get too small or too large. A side effect is that some memory gets stuck in thread-local caches and is not easily accessible from other threads.

Since chunks of memory can get stuck at different locations (within thread-local caches, in the global pool, or just simply allocated by the process), the heap gets fragmented. It becomes hard to release unused memory to the kernel, and it becomes highly probable that two successive allocations will return chunk of memories that are far from each other, generating random accesses to the heap. As we have seen in the previous article, random access is far from being the optimal solution for accessing memory.

As a consequence, it is sometimes necessary to have specialized allocators with predictable behavior. At Intersec, we have several of them to use in various situations. In some specific use cases we increase performance by several orders of magnitude.

Developer point of view

In the previousarticles we dealt with memory classification and analysis from an outer point of view. We saw that memory can be allocated in different ways with various properties. In the remaining articles of the series we will take a developer point of view.

At Intersec we write all of our software in C, which means that we are constantly dealing with memory management. We want our developers to have a solid knowledge of the various existing memory pools. In this article we will have an overview of the main sources of memory available to C programmers on Linux. We will also see some rules of memory management that will help you keep your program correct and efficient.

From Virtual to Physical

In the previous article, we introduced a way to classify the memory a process reclaimed. We used 4 quadrants using two axis: private/shared and anonymous/file-backed. We also evoked the complexity of the sharing mechanism and the fact that all memory is basically reclaimed to the kernel.

Everything we talked about was virtual. It was all about reservation of memory addresses, but a reserved address is not always immediately mapped to physical memory by the kernel. Most of the time, the kernel delays the actual allocation of physical memory until the time of the first access (or the time of the first write in some cases)… and even then, this is done with the granularity of a page (commonly 4KiB). Moreover, some pages may be swapped out after being allocated, that means they get written to disk in order to allow other pages to be put in RAM.

As a consequence, knowing the actual size of physical memory used by a process (known as resident memory of the process) is really a hard game… and the sole component of the system that actually knows about it is the kernel (it’s even one of its jobs). Fortunately, the kernel exposes some interfaces that will let you retrieve some statistics about the system or a specific process. This article enters into the depth of the tools provided by the Linux ecosystem to analyze the memory pattern of processes.