{collecting solutions}

Whether you know it or not, web sites collect a lot of information about your browsing habits. This information usually includes unique identifying information about you, which pages you are visiting and who you are interacting with. The information is primarily used for advertising purposes. Often times this information is also correlated widely across web sites, such as the Facebook “like” button tracker. There are many examples.

The biggest problem with this kind of intrusion to privacy may be that you never know what the data collected about you is going to be used for. It might be stored for 1 day or 1000 years and may be used for advertising or fraud (especially if a legitimate web site gets hacked, which happens frequently enough) – it is completely out of your control.

There are many ways to mitigate these attacks on your privacy, and the general rule of security versus convenience applies here as well: if you want complete privacy, then it will likely be quite intrusive to your browsing habits.

The tips below should not affect your browsing habits at all, but still offer a much higher level of privacy. They are centred around Firefox, but there are probably equivalent solutions for other browsers (please add a comment if you have suggestions).

Delete cookies

When you visit any web site, it may request that your web browser stores a HTTP cookie for it, which is a unique identifier. The next time you visit the same site, your web browser sends the same cookie, allowing the web site to track you over time and aggregate information about you.

All major web browsers have an option to not accept cookies at all, but this may lead to some web sites not functioning well. A less intrusive option is to have your web browser frequently delete all cookies. In Firefox (v. 24) you can delete all cookies each time you close the web browser in Edit -> Preferences -> Privacy -> Accept cookies from sites -> Keep until -> I close Firefox.

Disable common trackers

Ghostery is an excellent Firefox extension that disables trackers that it has in its blacklist, while not affecting your browsing experience.

Make sure blocking is on by clicking on the Ghostery icon at the top right -> Settings icon -> Options -> check the desired “Trackers” -> Save.

Delete Flash and Silverlight Cookies

A little known feature of the Adobe Flash Player and the Silverlight plugin is that they stores cookies too, independently of your web browser. You can see and delete Flash cookies at the Adobe Flash Player settings panel.

Ghostery has a feature to delete Flash and Silverlight cookies on browser exit. Click on the Ghostery icon at the top right -> Settings icon -> Options -> Advanced -> Check “Delete Flash and Silverlight cookies on exit” -> Save.

Disable HTTP referrer

When you click a link on any web site, the web site you are going to can actually see where you came from through the HTTP referer field. For example, web sites can capture your search keywords if you find them using a search engine. Again, a Firefox extension comes to rescue: Referrer Control.

Tell sites to not track you

Modern web browsers allows you to notify web sites that you don’t want them to track you when you visit them. It is up to the sites themselves to choose if they want to honour your request, so the feature is not always useful. But for example Twitter claim they support Do Not Track, so it could help.

To turn it on in Firefox, go to Edit -> Preferences -> Privacy -> Tell sites that I do not want to be tracked.

Use several browsers

This is a simple trick that relies on the fact that web browsers are isolated environments: they do not share information between each other.

For example, you could use Opera for logging in to Gmail and Facebook, while you use Firefox for everything else. That way, Google and Facebook cannot reliably track your browsing habits outside their own sites (at least not with your name).

Also note that emails from sites often contain “special links” for you, even though they link to publicly available pages like blog posts. When you click them they know that you clicked the link since you’re the only one in the world that has this particular link. LinkedIn is one example, but most sites use this trick. You could just search for the title of the link if you don’t like to share with the site that you clicked their link (and thus what you’re interested in out of the links they sent you).

None of the above hides your IP address while you are surfing, which can be used by web sites to recognize you most of the time, depending on how often you change IP address and how many other people are sharing the ones you use. To alleviate this you could use the Tor network or a VPN service provider. This will usually have noticeable effect on web response time, however.

Introduction

CFEngine and Puppet are configuration management tools that can help you automate IT infrastructure. Practical examples include adding local users, installing Apache, and making sure password-based authentication in sshd is turned off. The more servers and complexity you have, the more interesting such a solution becomes for you.

In this test, we set out to explore the performance of the tools as the environment scales, both in terms of nodes and policy/manifest size. Amazon EC2 is used to easily increase the size of the environment. This test is primarily a response to the comments in this older test.

I want to start with a few disclaimers:

I am affiliated with CFEngine (the company), and so it is extremely important for me to provide all the details so the test procedure and results can be scrutinized and reproduced. I would love for some of you to create independent and alternative tests.

The exact numbers in this test do probably not map directly to your environment, as everybody’s environment is a little different (especially in hardware, node count, policy/manifest). The goal is therefore to identify trends and degree of differences, not so much exact numbers.

For simplicity, all ports were left open during the tests (the “Everything” security group by Amazon was used).

Both tools were set to run every 5 minutes.

Test procedure

The policy server/puppet master were set up manually first, as described in detail below. For test efficiency reasons, they were set to accept new clients automatically (trusting all keys or autosigning).

Clients were added in steps of 50, up to 300 every 15 minutes (there wasn’t enough time to go higher). The manifest/policy was changed twice during the test, to see what impact this had. These were the exact steps taken:

Time

Client count

Policy/manifest

0:00

50

Apache

0:15

100

Apache

0:30

150

Apache

0:45

200

Apache

1:00

200

Apache, 100 echo commands

1:15

250

Apache, 100 echo commands

1:30

250

Apache, 200 echo commands

1:45

300

Apache, 200 echo commands

CPU usage at the policy server/master was measured with Amazon CloudWatch. Client run-time was measured by picking a random client and invoking a manual run of each tool in the time utility. Each run-time was measured three times and the average was taken.

The policy/manifest was changed twice, by adding 100 echo commands each time (run /bin/echo 1, /bin/echo 2,… /bin/echo 100) to see how the tools handled a simple increase in work-size.

Setting up the CFEngine policy server

The Ubuntu 12.04 package was found and downloaded at the CFEngine web site. In order to save time and money, this package was uploaded to the Amazon S3 for the clients to download (internal Amazon communication is free on EC2). Note that I could have used the CFEngine apt repository, but since CFEngine is just one package I chose to install it directly with dpkg. The following steps were carried out:

Setting up the Puppet master

It is important to note that the default Puppet master configuration is not production ready according to the documentation: “The default web server is simpler to configure and better for testing, but cannot support real-life workloads”.

The main reason for this is that Ruby does not support multi-threading, so puppetmasterd can only handle one connection at the time. This would limit the scale to just tens of clients, which is too low for our purposes.

The recommended way to get around this is to create an Apache proxy balancer with the passenger extension that receives all connections and hands them over Puppet. There are some documents that describe this for Red Hat and Debian. The configuration is quite complex, so I used a Ubuntu 12.04, which supports a package that contains the necessary configuration.

These steps were taken to install and set up the Puppet master with passenger:

Results

Server-side CPU usage

These graphs were taken directly from Amazon CloudWatch at the master/server instance.

CFEngine policy server CPU usagePuppet master CPU usage

From the graphs, we can see that the Puppet master instance uses about 10 times as much CPU at 50 clients. At 300 clients with 200 echo commands, the Puppet master uses about 18 times as much CPU.

The most interesting points, though, are not when we increase the client count, but increase client work in the policy/manifest.

This happens where we add 100 extra echo commands and the following 15 minutes when they are run. These areas are indicated by red lines in both graphs:

200 clients, 100 echo commands (until we go to 250 clients)

250 clients, 200 echo commands (until we go to 300 clients)

The reason this is more interesting is that users will probably extend the policy/manifest more frequently than add nodes. How does changing the policy/manifest impact the server?

We can clearly see that the Puppet master is heavily impacted by changing the manifest, while the CFEngine policy server seems unaffected by the changes (the load increase at the red lines).

Client-side execution time

The data for the client execution time was captured with three runs of time /var/cfengine/bin/cf-agent -K and time puppet apply --onetime --no-daemonize, respectively. The resulting data-files for CFEngine and Puppet are also provided. If we calculate averages, the graph for comparing will look like the following (the ods file is available here).

At 50 hosts with just the Apache configuration, CFEngine agents run 20 times faster than Puppet agents. At 300 hosts, with 200 echo commands, CFEngine agents run 166 times faster than Puppet agents.

Note that some spikes at 200c,100e and 250c,200e are to be expected since we added 100 more echo commands in the policy/manifest at these points. At 100c,a the Puppet agent had one very long run (as shown in the data files), which caused a spike there for Puppet.

The individual charts compare each tool to itself more easily — does the client execution time increase much when only the number of nodes increases?

For completeness, the client execution results are provided in tabular form below.

Environment

CFEngine time (seconds)

Puppet time (seconds)

50c,a

0.172

3.427

100c,a

0.173

19.24

150c,a

0.172

3.63

200c,a

0.178

3.63

200c,100e

0.481

22.408

250c,100e

0.459

32.56

250c,200e

0.742

106.4

300c,200e

0.732

121.86

Final remarks

It is clear that CFEngine is much more efficient and vertically scalable than Puppet. This is probably due to two items:

Puppet’s architecture is heavily centralised, the Puppet master does a lot of work for every client – especially with cataloge compilation. In contrast, CFEngine agents will interpret the policy in a distributed fashion. The CFEngine policy server is just a file server.

Puppet runs under the Ruby interpreter, which is far less efficient than running natively on the operating system.

The most interesting observation, I think, is that the Puppet master and agent performance were heavily influenced by the manifest complexity. When the manifest is small, increasing the agent count did not have much impact on the agent performance. However, as the manifest grew, the performance of all the agents (and the master) degraded significantly. It is also evident that as the master gets more loaded, all the Puppet agents run slower. This can also probably be attributed to the heavily centralised architecture of Puppet.

It would be interesting to create a more real-world policy/manifest to explore this further. The manifest in this test did not have much dependencies, and so the Puppet resource dependency graph was quite simple. If the dependency graph was more realistic – would that have had impact on the test results?

This post was primarily created due to feedback gotten from an older post on the same matter. Please don’t hesitate to add your comments below!

NOTE: These tests were carried out in the beginning of 2011, with the newest stable versions of CFEngine and Puppet back then (CFEngine web site and Suse Linux repository, respectively). New versions of both solutions have been released since.

Introduction

CFEngine and Puppet are configuration management solutions that can help you automate IT infrastructure. Practical examples include adding local users, installing Apache, and making sure password-based authentication in sshd is turned off. The more servers and complexity you have, the more interesting such a solution becomes for you.

The companies and communities behind CFEngine and Puppet frequently make claims that their solution “is scalable”. They point to some users managing hundreds, thousands, even tens of thousands of servers with the tools. But what does all this mean? Is CFEngine and Puppet indistinguishable with respect to scalability?

In this post, we will highlight one aspect of scalability: how many clients a server can handle. As always, to find useful answers, we should do measurements — and stop listening to the marketing departments.

Test setup

Amazon EC2 is used to measure performance at the server while the number of clients is being increased in steps every 30 minutes. We start up with 25 clients, then go to 50, 100, 200, 400.

CPU usage at the central server and client execution time is being measured. The Amazon CludWatch system is being used to monitor resource usage.

Policy/manifest details

Neither CFEngine nor Puppet will perform any action if you do not tell it to. During these tests, a policy or manifest to copy 20 configuration files from the server (totalling 140 kb) was used to have something realistic to test.

Doing file copies and templating is very common in the space of configuration management, merely because all Unix systems can be configured in terms of various configuration files.

Results

The CFEngine server was not much affected by the extra load of more clients. At 400 clients, the average server CPU usage is below 10%. Client execution time is pretty much constant at 7.85 seconds independent of the number of clients.

CFEngine server CPU usage

For Puppet, the story looks different. The Puppet server worked up to 50 clients, but stopped responding to client requests at 100. Reducing the amount of clients back down to 50 made the Puppet server start responding again.

Another thing to note is that the CPU graph for Puppet has much more spikes than the smoother graph for CFEngine.

Puppet server CPU usage

Conclusions

We did not manage to find the maximum client count for CFEngine servers, but if you extrapolate the CPU graph, it should lie somewhere between 4000 – 5000 clients. The Puppet server started failing between 50 – 100 clients. The difference is mind-blowing!

We have just discussed the amount of clients per server. Scalability is a abstract term, so there are definitively other interesting aspects to highlight. One example is, how easy is it to scale to multiple servers?

The Amazon Elastic Compute Cloud (EC2) allows you to quickly launch virtual servers in the cloud. If you create an AMI (Amazon Machine Image), you can deploy many instances of this image, only limited by your instance limit (20 initially, but you can request more).

Amazon Management Console

The instances can be manged through the AWS management console web-interface, for example to create multiple instances as shown above.

However, this console does not have a convenient method to terminate all the running instances – you have to terminate them one-by-one! This becomes a big problem if you, like me, are launching 1000s of instances for a few hours and then want to take them all down again. Since you are getting charged by the instance hour, you don’t want them running longer than necessary.

EC2 API tools

Fortunately, the EC2 API Tools provide an alternate interface to EC2. These tools rely on a Java runtime, and a few environment variables to install. They seem to work best with Linux, but I have not tested them on Windows myself. There are some guides available on how to install them, for example the Ubuntu EC2 Starters Guide. If you already have an AWS account, you should get them working within 10 minutes on Ubuntu.

Now that you have command-line access to managing the instances, you have much more power to do mass-management of them.

Terminating all instances

Assuming you’ve got EC2 API tools installed and working correctly above, you probably noticed a command that can help terminate the instances: ec2-terminate-instances. However, this command assumes the instance IDs that are to be terminated as parameters — there is no option to terminate all. Looking further at the tools, there is a command called ec2-describe-instances that returns the instance IDs of the running instances (among other things).

So our task is to get the running instance IDs from ec2-describe-instances and pass them as a parameter to ec2-terminate-instances. With some helper commands, this can be achieved as follows.

Be careful: this will terminate all your running instances, so don’t run it if you don’t mean it! If you have a lot of instances to terminate, ec2-terminate-instances may report that reading from the network socket failed (due to a timeout). But the termination will nonetheless succeed, as you can confirm from the AWS management console.

This has saved me hours of one-by-one termination from the web interface, and I hope it can help you too.

But in order to build useful applications in these environments, we often need some common libraries. In this article, we will have a look at how to compile the OpenSSL library and make a small application that uses it. Compiled OpenSSL libraries are available for download (see the link at the bottom), in case you don’t want to do the compilation yourself.

Prerequisites

We will be cross-compiling from Linux. If you want to use Windows only, please consider downloading the compiled OpenSSL binaries near the bottom of the page, or adjust the paths accordingly when building the library.

I have my 64-bit Windows build environment installed in /opt/mingw64, and the cross-compiler prefix is x86_64-w64-mingw32. I will target (build binaries for) 64-bit Windows in this article. Please adjust these variables according to your own build environment. i686-w64-mingw32 is the prefix for the 32-bit Windows cross-compiler.

Compiling OpenSSL

Follow the simple instructions on how to set up a Windows build environment on Linux. It is also possible to do this on Windows, but it is simpler and faster using Linux. Please leave a comment if you would like me to describe how to build on Windows.

Grab the desired OpenSSL source tarball. Use OpenSSL version 1.0.0 or newer; OpenSSL versions older than v1.0.0 are a bit harder to build on Windows, but let me know if you want to see how to do this. I’ll use OpenSSL version 1.0.0e in the following, but the steps should be identical for any version newer than 1.0.0.

Compile. Make sure the the cross-compiler is in your path, or add it explicitly as show below.$ PATH=$PATH:/opt/mingw64/bin make
…

Install it.$ sudo PATH=$PATH:/opt/mingw64/bin make install

We now have the OpenSSL libraries and headers for 64-bit Windows installed. Repeat the steps above with CROSS_COMPILE="i686-w64-mingw32-" and prefix /opt/mingw32 to build and install the 32-bit libraries for Windows.

A simple application

To confirm OpenSSL is working correctly, let’s create a small C application that generates a SHA-256 digest of a character string. It reads a string given as the argument, generates the digest and shows the computed digest. The digest-generating code is shown below, while the complete code is available for download.

I frequently encounter people trying to do a recursive copy with CFEngine, but want to ignore some subdirectories or files.

A typical example is that /var/cfengine/masterfiles is under version control on the policy server, and a lot of meta-data is copied down to the clients during policy updates. Fortunately, it is very easy to ignore certain subdirectories during copies with CFEngine. Consider the following example.

Highlights

Finally, we are ready for the CFEngine 3.2.0 release and we’ll have a look at the major changes. In 3.2.0, an exciting CFEngine Nova feature is brought into the community edition, and a set of out-of-the-box policies are released to standardise and simplify CFEngine installations. This will make bootstrapping new CFEngine’d hosts a breeze. Also, as usual, a bunch of new features and bug fixes also made it into the release.

We will cover the following items this time.

The brand new cf-agent bootstrap feature

More convenient use of ifvarclass

Package-promises improvements

Editors to support the CFEngine language

Bootstrapping

If you look at cf-agent --help, you will notice that a new option, --boostrap (-B), has been added to the community edition. Also note the --policy-server (-s) option – they go hand-in-hand. These options, when combined, allows you to specify the IP address of a policy server on the command line. The IP address will then be used when the bootstrapped node pulls for policy updates. So this is easier than the conventional way to put a new node under management; manually deploying a custom policy and running cf-agent.

So say we have a policy server running at 192.168.1.1. On a fresh client, we run /var/cfengine/bin/cf-agent -Bs 192.168.1.1. Let’s break cf-agent‘s actions down into a series of steps, to see what is actually happening.

Write the address from the --policy-server (-s) option to /var/cfengine/policy_server.dat

Check if /var/cfengine/inputs/failsafe.cf exists, run it and exit if it does

Write an embedded failsafe policy to /var/cfengine/inputs/failsafe.cf (we’ll explore this in a moment)

Run the embedded failsafe policy

The embedded failsafe pulls down a new policy from 192.168.1.1:/var/cfengine/masterfiles and starts cf-execd

Bootstrapping is now finished and the new node is under CFEngine management!

This means that all you need to have is a policy server sharing a policy under /var/cfengine/masterfiles and then run one command at the clients to put them under management!

Embedded policy

So how does this embedded policy look? It is actually a very simple policy mainly consisting of a file-copy promise and a promise to start cf-execd. You can actually see the embedded failsafe as failsafe.cf.cfsaved after a host has been successfully bootstrapped.

Since it is embedded in the CFEngine binaries, it is not very pretty with respect to indentation and Knowledge Management features, but it does the job. For your convenience and curiosity needs, I have uploaded the CFEngine embedded failsafe.cf. Also note that it cannot be changed since it is embedded (and this is why it is so simple), but the policy that gets pulled down is of course entirely up to you.

Bootstrapping the policy server

Until now, we have assumed that the policy server is already set up for us in advance. But, as a matter of fact, bootstrapping the policy server is almost exactly as easy. It will just use its own IP address, thus all hosts will use the exact same bootstrap command. However, /var/cfengine/masterfiles should be populated with the desired policy before bootstrapping it (otherwise, it has no policy to grab!). But CFEngine 3.2.0 also gives you a head start with the bundled policy.

The bundled policy

So what do you put in /var/cfengine/masterfiles on the policy server prior to doing any bootstrapping? In order to get policy updates, it should contain a promise to check for updates from the policy server, and the policy server needs to grant access to the directory with the policy updates (/var/cfengine/masterfiles by convention). But there are also a few other items users always want, like a promises.cf policy with the bundlesequence and inputs. Also, you might want to configure cf-execd‘s email functionality. And then there’s the cf-serverd access configuraion..

CFEngine 3.2.0 is distributed with a skeleton policy that allows to change these normal parameters without needing to write your own version of everything — just plug in the data in the right place. If grab a tarball, you will find the bundled policy in the masterfiles subdirectory after you unpack it. Otherwise, you can always get it from the CFEngine core subversion repository.

Before starting using it, you can look through promises.cf to adjust it to your needs. In particular, have a look at the bundle common def. The acl slist there determines the access rules for cf-serverd.

What about trust?

As we know, simplicity and security are often conflicting. In order to make our easy bootstrapping work, two kinds of trust are assumed.

Bootstrapping host trusts the policy server’s public key and the policy it holds

Policy server trust the new client’s public key and gives access to the policy

The first item is implicit from running the bootstrap command. We are essentially asking the client go to the given IP address for policy updates. If we do not want to assume this kind of trust for some reason, we have to revert to the old way of copying the policy and/or the policy-server’s public key manually. An automatic bootstrap procedure will not be useful to us in this case.

However, we can do something with the second item. First off, we can allow only certain time windows of client trust in the server. We can also limit the (ranges of) IP-addresses we accept. By default we allow the Class B network to access the policy server. See the acl slist in bundle common def in promises.cf to configure it.

Improvements for ifvarclass

As you perhaps know, ifvarclass can be used to transform a variable into a class expression. With CFEngine 3.2.0 you can also make convenient expressions and tests directly in the ifvarclass expression using the new special functions and(), not(), or() and concat(). This can save you class definitions that you would use only once.

For example, you might want to copy down some configuration only to hosts having a certain binary. This can now be done easily with one promise, as shown below.

A pack of bugfixes

As usual, the CFEngine developers have been working hard to check and fix bugs reported by the community. In particular, the packages-promises have gotten a facelift.

The addupdatepackage policy is now fully functioning when using a repository-enabled manager such as yum or aptitude. Previously this policy was only supported when specifying package_file_repositories combined with non-repository package commands such as rpm or dpkg. For example, the following promise will make sure Apache is installed with a version at least 2.2. If Apache 2.2 or newer is already installed, no action is taken.

Also, for efficiency reasons, CFEngine caches the list of installed packages and their version (the output from package_list_command) in WORKDIR/state/software_packages.csv. A problem encountered by some users was that CFEngine did not actually update this cache often enough, so CFEngine might actually act on stale information. To alleviate this problem, package cache invalidation is introduced in this release. Also note that you can define the cache maximum age with the package_list_update_ifelapsed attribute. There were also a few other package-related issues that were addressed in the 3.2.0 release.

For a complete list of the reported issues/feature requests that was resolved in CFEngine 3.2.0, have a look at the Change Log at the bug tracker.

Get it!

As usual, CFEngine 3.2.0 is provided not only as a source tarball, but also prepackaged for the most popular Linux distributions by logging into the Engine Room (free registration required). Users of the following distributions enjoy free packages, both 32- and 64-bit versions.

Debian 5 and 6

Fedora 14 and 15

Red Hat Enterprise Linux 4, 5 and 6

Suse 9, 10 and 11

Ubuntu 8, 9, 10 and 11

Note that most distributions also maintain a CFEngine 3 package, but this is usually older and may not be built in a uniform way.

Please do not hesitate to leave a comment if you found this useful, have suggestions for improvements, or would like to see other CFEngine-related get covered.

Streaming Windows media files (WMV and WMA) from Firefox in Ubuntu has been a known problem for a long time. At the same time, a lot of web media content is encoded in these formats, making it a real pain that they are not working.

There are various plugins that claim to work, but I have had some bad experiences with some not working and others creating lags and stop working after a few minutes. Now, let’s put an end to this and install a plugin that just works flawlessly.

You just need to install one package: gecko-mediaplayer. As usual, you can use the Ubuntu graphical tools, or do it on the command line:

sudo apt-get install gecko-mediaplayer

Now restart Firefox, and you should be able to play any Windows media files directly from your browser. Simple as that!

I have tested it in Ubuntu 10.4 (Lucid), but it should work on all recent versions of Ubuntu.

A common question from Cfengine users is how to configure how often cf-agent is being run. Setting a simple time-based schedule is very easy, but there are a few additional steps you might want to consider to ensure optimal scalability.

It is cf-execd that schedules the execution of cf-agent. In Cfengine 3, the default schedule will ensure that cf-agent is run every five minutes. To adjust this, all we need to do is add the control body for cf-execd and set the schedule body-part to our liking. For example, below we set cf-agent to run every fifteen minutes.

Splaytime

If you have a reasonable-sized environment, you might also want to have a look at the splaytime body-part. This delays the execution of cf-agent by a random amount of seconds within a bound. The purpose of this is to distribute the load on network resources evenly — to avoid that all your agents start pulling from the network at the exact same second. So, instead of saying that we want to run cf-agent the exact quarters of an hour, we can instead say we want to run it four times an hour and make our configuration more scalable as follows.

Advanced schedules

Perhaps you have some specific time slot during the day you want your systems to spend all their resources on a specific task. Even though Cfengine 3 is the most lightweight configuration management system in existence, there might still be some shell commands or network transfers that are executed as part of your Cfengine policy. So let us adjust our schedule to run cf-agent every fifteen minutes, except from 12 PM to 6 PM.

As you might have noticed, each list element in schedule is actually a class expression, which makes it very flexible. Normally though, time-based class-expressions are used. Note the four hardclasses (automatically defined) that split the day into six-hour slots:

00 AM – 06 AM : Night

06 AM – 12 PM : Morning

12 PM – 06 PM : Afternoon

06 PM – 12 AM : Evening

We can create arbitrary complex class expressions to create any schedule that fits our needs (perhaps use one of Monday, Tuesday, etc.). Remember that you can always run cf-promises -v to see all the classes that are currently defined on a given host, which gives you a hint of what you have at your disposal.

Internal workings

We finish off with a short note on the steps cf-execd actually takes to do the scheduling. cf-execd sleeps one minute at the time. When it wakes up, it checks if any of the class-expressions in schedule evaluates to true. If so, it gets ready to run cf-agent (or the command in exec_command). But first, it checks the splaytime body-part, and delays the execution an arbitrary amount of seconds (based on a hash of the host’s name) according to this setting.

If you want to test your new schedule, try running cf-execd in verbose mode (cf-execd -v). You would see something similar to the following once a class-expression in your schedule evaluates to true.

Highlights

First of all, thanks for the good feedback from the previous extended change log! It seems like this is something the Cfengine community is interested in, so I will continue the series. This time we cover the changes in both Cfengine 3.1.3 and 3.1.4, since they were released quite close to each other (January 22nd and January 31st). Some rather annoying bugs were discovered by the Cfengine community in 3.1.3, so the 3.1.4 release was brought forward.

Leak no more

Memory leaks occur when a program allocates memory, but does not release it again later. This is not always a problem, because operating systems always reclaim all memory when a process terminates. Releasing memory just before termination thus only results in unnecessary resource consumption (indeed the GNU C library does not by default release memory on process termination).

However, in daemons and long running programs, repeated memory leaks is clearly an issue. Memory leaks usually manifest themselves in an ever-increasing size of the process’ virtual memory. Unfortunately, such bugs are extremely hard to track down, because it is not always clear where the leak happens and when a certain memory segment should be released (if it is released to soon, the process will crash). In Cfengine, this is further complicated by the fact that certain policies may cause more severe leaks than others because different execution paths are followed when running them.

But since multiple reports of severe leaks started to come from the Cfengine community, a lot of effort was put into debugging it (see the report on the bug tracker). It took one month (!) of iteration before all the leaks were tracked down, even with much help with testing from the community – especially Jonathan. On the positive side, these leaks (or anything like them) are very unlikely to reappear. But don’t hesitate to create a report if you think you have found a leak some day.

The main sources of leaks turned out to be an error when releasing lists (struct Rlist). Also, when re-reading the policy, parts of the old one was never released. Since only cf-monitord and cf-serverd re-read the policy, they were the most affected components.

Jonathan from the Cfengine community provided a policy that caused a lot of leakage, and by using this, debugging was easier. But we can also use it to illustrate the difference between Cfengine 3.1.2 and Cfengine 3.1.4. The graphs below show the segment size (RSS) of the three Cfengine daemons, measured over one day. They pretty much speak for themselves. Thanks to community members helping to fix this issue!

Command return codes

As of Cfengine 3.1.0, promises of type commands were flagged as repaired if they returned zero, not kept otherwise. This allowed to define a class in either case and run follow-up promises. In Cfengine 3.1.4, a much more flexible framework has been introduced. In addition, commands in packages-promises and transformer in files-promises has been incorporated. Now, Cfengine users can specify a list of return codes for which one of these promises should be kept, repaired and not kept. It’s often easier to understand by example, so let’s do just that. First let’s start with a simple shell script.

#!/bin/sh
# saved to /tmp/retarg
exit $1

So this script just exits with the code given as the first parameter, which must be from 0 to 255 on Unix. We will use this little script to demonstrate the new return code functionality in the following snippet.

By running /tmp/retarg with arguments 0, 1 and 5, we see that the classes wasrepaired, wasfailed and waskept gets defined, respectively. We may also use overlapping return codes in the *_returncodes lists, which could result in the promise getting multiple statuses (e.g. both repaired and failed). This might seem a bit strange, but gives the user total control. If the return code is not found in any of the lists, the promise does not get a status at all. When none of the lists are defined, Cfengine falls back to the default of zero being promise repaired, and anything else promise failed.

This flexibility is also allowed in packages-promises, as demonstrated in the following.

A complete self-containing policy demonstrating the new framework in all the three above promise types is available for download here. In the reference manual, this is documented as part of the classes body.

Lock purging

As you probably know, Cfengine has a concept of locks. Locks ensure that promises are not checked too often, but also that repairing each promise does not take too long. These parameters are configurable through ifelapsed and expireafter policy setting, available at a global and promise level. Since information about these locks needs to persist between runs of cf-agent, Cfengine keeps track of these locks in a database stored in /var/cfengine/state/cf_lock.* (suffix depends on the dbm used). A hash of the promise attributes is used as keys for this database.

The problem with this is that sometimes the attributes change even though the promise really is the same. For example, if you have a commands-promiser “/bin/echo $(date)”, the promiser would seem to change each time cf-agent runs. As another example, you may want to delete files in /tmp that are more than 3 days old. Many of these files would never reappear (but some might), so keeping an entry for all of them in the lock database just increases its size for no reason. This causes the lock database to grow indefinitely, but very slowly (if you are still not on Cfengine 3.1.4, check the size of yours). Trying to make some heuristic checks for if a given promise should be in the lock database or not would surely end in unexpected behaviour for some of the huge user base of Cfengine. A less risky approach that was introduced in Cfengine 3.1.4, is to automatically purge old locks.cf-agent will run a lock-purging algorithm every month, deleting locks that are more than one month old. This should take care of the (slow) growth of the lock database, while still not risking unexpected behaviour.

Other improvements

A 30 second timeout on the recv() system call is introduced on Linux hosts. This means that any connection that waits to receive data will time out if no data is received within 30 seconds. The reason to introduce this is to avoid a remote system to cause components of Cfengine to hang indefinitely. A remote system may become unresponsive for a number of reasons, including network unreliability, high load, deadlocks (e.g. when trying to open a database that was uncleanly shut down), kernel or driver bugs, most of which are outside of Cfengine’s control. Introducing a mechanism to back-off after a certain time has elapsed is the only way we can protect ourself from all these scenarios, but still allows for self-healing when Cfengine retries the operation later. As the details of the socket API is different amongst OSes, Linux is the first one to get this support.

A new function ip2host() that does reverse dns lookups is introduced. Note that DNS is often quite unreliable, and can thus cause cf-agent to hang for a while while doing the lookup.

Cfengine community members discovered that Cfengine sometimes ignored the architecture when considering packages-promises. This could cause Cfengine to believe that a given package was installed for all architectures, even though it was installed only for one. With Cfengine 3.1.4, this is handled correctly.

Two important issues casing segmentation faults in cf-serverd have been fixed. They were caused by race conditions in cf-serverd and were thus appearing only on busy servers. On Solaris global zones, Cfengine can now distinguish processes based on the zone they run in (previously a Nova-feature). This means that a process restart promise in the global zone will not kill processes in other zones. However, a bug was causing this not to function properly, so indeed processes in all zones were killed. This is all resolved in Cfengine 3.1.4.

Get it!

As usual, Cfengine 3.1.4 is provided not only as a source tarball, but also prepackaged for the most popular Linux distributions by logging into the Engine Room (free registration required). Users of the following distributions enjoy free packages, both 32- and 64-bit versions.

CentOS 5

Debian 4, 5 and 6

Fedora 14

Red Hat Enterprise Linux 3, 4, 5 and 6

Suse 9, 10 and 11

Ubuntu 8, 9 and 10

Note that most distributions also maintain a Cfengine 3 package, but this is usually older and may not be built in a uniform way.

The feedback from last post ensured that self-containing policies are available for download in companion with the snippets. Please do not hesitate to leave a comment if you found this useful, or have more suggestions for improvements.