Category: Linux System Administration

I’ve recently been moving some of our project from a single Redis server (or server with a replica) to the more modern Redis Cluster configuration. However, when trying to set up PHP sessions to use the cluster, I found there wasn’t a lot of documentation or examples. This serves as a walk-through for setting up PHP sessions to use a redis Cluster, specifically with Elasticache on AWS.

Once there servers are launched, make note of the Configuration Endpoint which should look something like: my-redis-server.dltwen.clustercfg.usw1.cache.amazonaws.com:6379

Finally, use these settings in your php.ini file. The exact location of this file will depend on your OS, but on modern Ubuntu instances, You can place it in /etc/php/7.0/apache2/conf.d/30-redis-sessions.ini

Note the special syntax for the save_path where is has seed[]=. You only need to put the main cluster configuration endpoint here. Not all of the individual instances as other examples online appear to use.

That’s it. Restart your webserver and sessions should now get saved to your Redis cluster.

IIn the even that something goes wrong, you might see something like this in your web server log files:

PHP Warning: Unknown: Failed to write session data (redis). Please verify that the current setting of session.save_path is correct (tcp://my-redis-server.dltwen.clustercfg.use1.cache.amazonaws.com:6379) in Unknown on line 0

AWS’s CloudWatch Logs was first available about a year ago, and to my estimation has gone largely unnoticed. The initial iteration was pretty rough, but some recent changes have made it more useful, including the ability to search logs, and generate events for monitoring in CloudWatch from log content.

Unfortunately, the Cloudwatch Logs agent just watches log files on disk and doesn’t act as a syslog server. An AWS blog post explained how to get the the Cloudwatch Logs Agent running inside a container and monitoring the log output from rsyslogd, but the instructions used Amazon’s ECS service, which still doesn’t quite offer the flexibility that CoreOS or Deis offer IMHO. ECS does some magic behind the scenes in passing credentials around that you have to do yourself when using CoreOS.

When trying to pull all of this together to work, I discovered a problem due to a bug in the overlayfs that is in current Deis Releases which causes the AWS Logs agent not to notice changes in the syslog files. A workaround is available that reformats the host OS back to btrfs to solve that particular problem

Evidently it is not possible to add an SSH key directly from an SSH agent. Instead, you can grep the public key from your ~/.ssh/authorized_keys file then, have deis use that key. Or if you only have one line in ~/.ssh/authorized_keys, you can just tell deis to use that file directly with the command

So far, I’m pretty impressed with cloudwatch logs. The interface for it isn’t as fancy, and search capability isn’t as deep as other tools like PaperTrail or Loggly, but the cost is significantly less, and I like the fact that you can store different log groups for different lengths of time.

I’m trying to get the cloudwatch logs agent to install as part of an automated script, and couldn’t find any easy instructions to do that, so here is how I got it working with a shell script on an Ubuntu 14.04 host

For over 7 years, I’ve hosted an email validation tool on this site. I developed the tool back when I was doing a lot with email, and when DKIM and SPF was still pretty tricky to get working. Over the years it has become the single most popular page on the site, and Google likes it pretty well for certain DKIM and SPF keywords. (And strangely “brandon checketts dkim” pops up on Google search suggest you you try to google my name.

In any case, I’ve moved that functionality over to its own site now at http://dkimvalidator.com/ so that it has its own place to call home. It also got a (albeit weak) visual makeover, and all of the underlying libraries have been updated to the latest versions so that they are once again accurate and up-to-date.

On Debian-based systems, files in /etc/cron.d:
– must be owned by root
– must not be group or world writable
– may be symlinked, but the destination must follow the ownership and file permissions above
– Filenames must contain ONLY letters, numbers, underscores and hyphens. (not periods)
– must contain a username in the 6th column

From the man page:

Files in this directory have to be owned by root, do not need to be executable (they are configuration files, just like /etc/crontab) and must conform to the same naming convention as used by run-parts(8): they must consist solely of upper- and lower-case letters, digits, underscores, and hyphens. This means that they cannot contain any dots.

The man page also provides this explanation to this strange rule:

For example, any file containing dots will be ignored. This is done to prevent cron from running any of the files that are left by the Debian package management
system when handling files in /etc/cron.d/ as configuration files (i.e. files ending in .dpkg-dist, .dpkg-orig, and .dpkg-new).

I had a customer complaining lately that messages sent via Gmail to one of my mail servers was occasionally receiving SMTP Authentication failures and bounce backs from Gmail. Fortunately he noticed it happening mainly when he sent a messages to multiple recipients and was able to send me some of the bounces for me to track it down pretty specifically in the postfix logs.

The Error message via Gmail was:

Technical details of permanent failure:
Google tried to deliver your message, but it was rejected by the recipient domain. We recommend contacting the other email provider for further information about the cause of this error. The error that the other server returned was: 535 535 5.7.0 Error: authentication failed: authentication failure (SMTP AUTH failed with the remote server) (state 7).

This was a little odd, because the SMTP AUTH failure is what I would typically expect with a mistyped username and password. However, I could see that plenty of messages were being sent from the same client. By looking at the specific timestamp of the bounced message, I tracked down the relevant log segment shown below. It indicates 5 concurrent SMTPD sessions where the SASL authentication was successful on 4 of them and failed on the 5th.

So it seemed likely that the 4 child processes were in use and that Postfix couldn’t open a connection to a 5th simultaneous SASL authentication server, so it responded with a generic SMTP AUTH failure.

To fix, I simply added a couple of extra arguments to the saslauthd command that is run. I added a ‘-c’ parameter to enable caching, and ‘-n 10’ to increase the number of servers to 10. On my CentOS server, I accomplished that by modifying /etc/sysconfig/saslauthd to look like this:

# Directory in which to place saslauthd's listening socket, pid file, and so
# on. This directory must already exist.
SOCKETDIR=/var/run/saslauthd
# Mechanism to use when checking passwords. Run "saslauthd -v" to get a list
# of which mechanism your installation was compiled with the ablity to use.
MECH=rimap
# Additional flags to pass to saslauthd on the command line. See saslauthd(8)
# for the list of accepted flags.
FLAGS="-r -O 127.0.0.1 -c -n 10"

I have a system at home with a pair of 640 GB drives, that are getting full. The drives are configured in a Linux Software RAID1 array. Replacing those drives with a pair of 2 TB drives is a pretty easy (although lengthy process).

First, we need to verify that our RAID system is running correctly. Take a look at /proc/mdstat to verify that both drives are online:

After verifying that the system is running, I powered the machine off and removed the second hard drive and replaced it with the new drive. Upon starting it back up, Ubuntu noticed that the RAID system was running in degraded mode and I had to hit yes at the console to have it continue booting.

Once the machine was running, I logged in and created a similar partition structure on the new drive using the fdisk command. On my system, I have a small 200 partition for /boot as /dev/sdb1, a 1 GB swap partition, and then the rest of the drive is one big block for the root partition. The I copied the partition table that was on /dev/sda, but for the final partition, made it take up the entire rest of the drive. Make sure to set the first partition as bootable. The partitions on /dev/sdb now look like this:

The system is now running with one old drive and one new drive. The next step is to perform the same steps with removing the other old drive and rebuilding and re-adding it to the raid system. The steps are the same as above, except performing them with /dev/sda. I also had to change my BIOS to boot from the second drive.

Once both drives are installed and working with the RAID array, the final part of the process is to increase the size of the file system to the full size of our new drives. I first had to disable the EXT3 file system journal. This was necessary so that the online resize doesn’t run out of memory.

Edit /etc/fstab and change the file system type to ext2, and the arguments to include “noload” which will disable the file system journal. My /etc/fstab looks like this:

I will sometimes run into an error inside an OpenVZ guest where the system complains about running out of memory. This is usually seen as a message containing “Cannot allocate memory”. It may even occur when trying to enter into the container

This particular error can be difficult to troubleshoot because OpenVZ has so many different limits. The “Cannot allocate Memory” error is a pretty generic error, so Googling it doesn’t necessarily yield any useful results.

I’ve found that the best way to track down the source of the problem is to examine the /proc/user_beancounter statistics on the host server. The far right column is labeled ‘failcnt’ and is a simple counter for the number of times that particular resource has been exhausted. Any non-zero numbers in that column may indicate a problem. In this example, you can see that the dcachesize parameter is too small and should be increased

You can do a little investigation to see what each particular limit does. I will usually not dig too deep, but just double the value and try again. You can set a new value using the vzctl command like this:

I was fortunate to be selected to give a presentation at the 2010 Southeast Linux Fest held this year in Greenville, SC. The topic was MySQL replication which I picked from a similar presentation I gave about about 1.5 years ago at my local LUG. I’ve configured plenty of replicated servers and I think that I understand it well enough to explain it to others.

The 2-hour presentation is about half slides and half demo. Throughout the course of the presentation I set up a simple master-slave. Then I add a second slave. Taking it a step farther I set up the three servers to replicate in a chain, and finally I configure them to replicate in a full circle so that changes made on one are propagated to all of the others. I intentionally do things that break replication at certain points to show some of the limitations and configurable features that can help it to work.