Category: What makes it “tech”..

So, you are a sysadmin living/managing servers in Egypt or in Egypt’s timezone. Or even a good faithful linux user. The government, in it’s infinite wisdom, decided that we should go back to DST. Are you sure you are ready for this?

Since I do run servers a lot of servers in the Africa/Cairo timezone, mostly Ubuntu LTS and Debian servers, I looked to see if there is an update for the tzdata package in Ubuntu that would include this, but couldn’t find any (bug report ?).

Although it’s not the best way to do this, I decided to create the timezone datafile myself. IANA is responsible for providing the datafiles. I downloaded the datafiles package, untared it, and checked the the africa file I was happy to see this:

This should be it! However, this is not meant to be a permanent solution. You should update your tzdata package as soon as the next update is released.

UPDATE: I just found out that the Ubuntu had released a critical update to fix this problem. You don’t need to perform these steps now, just make sure to get the latest tzdata package. Not sure about debian, yet.

Ansible has a dedicated module to manage public keys; the authorized_key module. It’s a very nice module, with enough flexibility to do almost anything I can think of.

However, it does have one very annoying thing. While I was migrating our automation scripts to ansible; I got to the point where I was working on the script that provisions our users. By default, we disabled all password authentication and root SSH access. Only key based access is allowed.

I found that I have to actually put the public SSH key strings inside the playbook vars. That’s just not cool. SSH keys are long, they might have specific options (although the authorized_key module allows you to configure that) and it’s harder to maintain the list of keys like this. So, I tried to work around this. My target was to add the public SSH keys for my users as static files in an ansible role. Basically, I will be populating my my group_vars files by reading files inside my roles.

First, I added the public key files in the ‘files‘ directory of the role I was using to configure the users.

Now, I have to find a way to “read” the key files and set them in the vars file. Fortunately, ansible provides Lookup plugins that allows me to do just that!

I am working on a playbook for configuring Apache 2.4 for a complex application. The plan to run the application on an IaaS cloud(ish) platform. We need to control the Apache worker settings via the playbook since there will be several “flavors” of cloud instances with different sizes and configurations. I was considering using a template for the configuration file. But since I am playing..

I decided to take a shot at using the lineinfile module, which I find really cool! And to make this a bit more interesting, I wanted to this this using ansible loops, not one configuration item at a time.

So, basically, this is the first shot at getting this done, there is a lot of room for improvement:

So, I was recently asked to check on an EC2 instance that started spitting Nagios plugin errors for no apparent reason for a few days.

Basically, almost all NRPE checks would time out randomly. There is no load on the server, no disk IO that would cause something similar. Also, several commands were pretty slow on the command line. However, the most notable ones were commands run with sudo. Especially since the same commands when ran as root mostly worked fine.

Initially I tried to check dmesg for any file system (or disk ?) issues there. I found none of that. However, I did fine several traces of OOM kills. I checked and turned out that the application running on that instance had eaten all the memory and crashed a few days ago.

I tried to check the system logs for errors, and found out that all logs had their last around when the Nagios problem started, and that was also when the application on the instance crashed.

So, rsyslog was dead. There was a pid file and everything, but no process running. I went back to dmesg and found that it got waked by the OOM killer. By now I had a pretty good idea of what was the problem.

The theory is, most applications write to /dev/log (a UNIX socket) to send syslog messages to rsyslog. If rsyslog is dead, no one would read from that socket buffer and it will be filled pretty quickly. Once that happens, any process trying to that socket will have to block until the free buffer is freed or times out.

sudo was particularly sensitive to this because PAM write to the auth.log any time sudo is used. When rsyslog was dead, sudo had to wait till the log write attempt times out.

Simply restarting rsyslogd fixed the problem and everything was back to normal. I took a note-to-self that standard services such as rsyslogd should be monitored on any systems under our management, to avoid such situations.

Hello, World! This is my first technical post, I hope it’s useful to someone out there!

I am working on a very small tool that I need to for a proof of concept. It’s basically a small TCP server in python.
After creating a small skeleton using SocketServer, I found that the server it self works fine with no problem.

However, if I try to stop and start the server again to test any modifications, I get a random “socket.error: [Errno 98] Address already in use” error. This happens only if a client has already connected to the server.

Checking with netstat and ps, I found that although the process it self is no longer running, the socket is still listening on the port with status “TIME_WAIT”. Basically the OS waits for a while to make sure this connection has no remaining packets on the way.

My good friend mux mentioned that I should probably set the socket option “SO_REUSEADDR” to avoid this issue.

The man socket(7) says about this:

SO_REUSEADDR

Indicates that the rules used in validating addresses supplied in a bind(2) call should allow reuse of local addresses. For AF_INET sockets this means that a socket may bind, except when there is an active listening socket bound to the address. When the listening socket is bound to INADDR_ANY with a specific port then it is not possible to bind to this port for any local address. Argument is an integer boolean flag.

When using the pure socket module, you can simply set this option using: