I spent a good few hours searching google and trying to work out what had been installed that could be causing an issue with libc6. Though I was fairly certain nothing had been (only because this is 2 days after a migration to new servers, and I’m hitting this problem on 2/4 servers, but they have extremely similar setups).

Something is eating all the space. So I tried to run ncdu to look for large files (I know there’s other ways, but I like ncdu). But I hadn’t installed it on this new server and I can’t install it with apt-get broken.

Thinking /run is bound to be causing some issues (still not sure if it’s causing this particular issue), I reboot the server (bad move!). It locked up and had to be power cycled. Thankfully it’s a droplet and with DigitalOcean I can power cycle easily (I did try the console but it wouldn’t connect).

Anyway after a reboot, /run started at 5% used, but quickly grew to 70%. but I did managed to install ncdu, and with that I knew the problems I had with apt-get were being caused by a full /run.

After a quick (very quick) look at ncdu /run I could see hhvm.hhbc taking up approx 85Mb+

A quick check of the config and I can see hhvm is configured to do so. So I adjusted the config to put it in /var/cache/hhvm/hhvm.hhbc instead and update the systemd service script to create /var/cache/hhvm and set it’s owner.

Another reboot, everything seems fine and /run is now at 3% used.

And I’ve run apt-get upgrade successfully.

I’m thankful that I noticed, I really thought I’d screwed something on these 2 servers while migrating, and I could see another night of rebuilding new servers ahead of me.

Morale of the story: Check your not out of space when you get weird errors (yes the I/O should have rang some bells, but hey it is 2am).

I’ve recently noticed a problem on 3 of my Digital Ocean Servers. The APT package lists are not automatically updating every day. I try to keep all servers upto date, and rely on Nagios to inform me when there’s packages needed to be updated and that’s the main reason I noticed something was broken.

The 3 servers in particular are newer builds to the rest of the system, and they dont have near as much installed as the others, so at first I didn’t pay too much attention when other servers were going into warning state on nagios indicating updates but these 3 weren’t. However I would still connect to these servers and run my normal command:-

A few times these servers did install updates and I just thought it must have been my timing, that the package lists hadn’t yet been updated by the cron.daily.

But after this happening a few times, I decided to not run the above and see how long these servers would take for nagios to throw an alert. It never did and that got me a little worried.

Over the last few days I’ve been diagnosing what’s wrong. I started out with making sure cron is working properly. Then kept an eye on the file timestamps

ls -ltrh /var/lib/apt/lists/

Eventually getting to /etc/cron.daily/apt and checking through what was was doing on the working servers compared to the broken ones. I turned on VERBOSE and got a bit of info when running /etc/cron.daily/apt but it seemed to exist quite quicky.

Comparing it to a working server the important bit seemed to be around

I have no doubt that I will learn I’ve done bits wrong as I get further in, but I’m going to start with the following hosts config

[initial]
C3PO-1
[balancers]
[backends]
[databases]
[fileservers]

So the first thing I’m going to tackle is setting up new users. I’ve been playing with it for about an hour, although I got the user created very quickly I hit a problem with my setup. I need to send multiple ssh keys for some users (we use different keys on PC’s, Laptops, Mobiles). Every example I found seemed to a) want to pull a key from a file, b) just use one key.

After quite some playing, and trying different things I found a way. This in turn meant I had to slightly change the create user bit of the playbook.

I did think about pulling the keys in from the authorized_keys files, but not all users are allowed on the management, so I’d have to keep files for them and if I’m going that far, I may as well just keep them in group_vars. It doesn’t look as nice if you cat the file but it’s structured and makes sense.

I was worried that it may screw up the users own group as the manual says it delete’s all other groups except the primary, since I haven’t told it a primary using ‘group’ I thought it may be a problem, but thankfully it’s not this just worked.

Well I thought at this point I was pretty finished. Ha not a chance. I added a user for ansible to be able to connect as since I’ll be removing root ssh access. This means I’m going to need to let ansible sudo so I’d best sort that now. Here’s a bit of code I found and changed slightly. Make sure you change the username in the sudoers.d/ansible file

All working. It’s taken about 2 hours to get to the point of deploying a couple of users automatically. I’m not so sure this has saved me time in the long run 🙂 but it’s the first step in a much bigger project. I’m kind of glad it wasn’t just copy and paste others code and stuff broke, it gave me a chance to understand a bit more.

Part 2 will be coming soon. There we’ll lock down SSHd and apply the default firewall rules.

So If you read my last post (it was really long sorry), you’ll see right at the end the current deployment. I had tried a few managers to be able to deploy/scale the whole system, but it really overcomplicated the whole thing. Chef looked really good (I can’t remember the other one), but it was problematic and just didn’t suit.

Instead I kept with the scripts I had written for the time being. They are in no way good enough to share as they are very customised to my setup but they achieve what I need. However to run them takes quite a bit of initial manual work.

So what do I need from a system:-

It has to just work, not go installing stuff it depends on to run.

It has to be able to split the setup into an initial and running level.

It has to be able to be told easily about a new server and what role it will be, then do the initial setup and onto the relevant running level.

It MUST be simple to use and understand. I’m getting a bit sick of having to read through how to configure weird stuff because someone decided to do things completely differently to how you’d expect it to work.

It must have very low resource requirements. I want to run this from a management droplet that already runs nagios, so it can’t sit there just eating resources while it has nothing to do.

I saw a very quick video on Ansible with someone using it for Raspberry PI’s (something else I have far too many of) and thought right away I should look into that. So here I am. I’m going to do some testing with Ansible.

Initially all I’m looking for it to do is handle the initial setup that I already have scripts for.

Create a new user, add it to the sudo group, set a password and copy the SSH keys.

Reconfigure sshd to deny root connections.

Set the server’s timezone.

Set the keypad option in nanorc.

Set the .bashrc for root to use a red prompt.

Install some basic packages such as screen and htop

Create a swapfile

Install and configure NTP

Setup IPTables (I dont use UFW, I prefer to deploy an iptables file and have this restored when the loopback interface is loaded).

I’ve had a quick look on DigitalOcean community (I love the resources there), but the stuff about using Ansible seems a little more throw it all in one file rather than properly split them out like I saw in the video. I think splitting out each task is a must to be able to understand what’s going on and making changes.

The video I’m referring to is https://www.youtube.com/watch?v=ZNB1at8mJWY

If your looking for the gluster error ‘brick2.mount_dir not present’ jump to the end

Time for another post 🙂

I’ve been using DigitalOcean for some time now, and I’m still tweaking my setup. Once thing I really hope they sort soon is proper private lan between your own droplets, for now we just have to use a vpn between them.

Being responsible for a new website can give a lot of headaches, especially when you have to try to guess just how popular it will be. So about a year ago I setup a new droplet to host the new site, testing it was going well and I increased the droplet before we launched to handle a spike. Sadly I underestimated just how busy it would get, based on the numbers I was given I think I was about right but unfortunately those numbers were way off.

But each failure it just another learning curve 🙂 so as quickly as possible that was fixed, then the site got to normal volume so we scaled it back down (yes it’s a whole cost exercise especially when your paying for it). Then we had the lead up to Christmas, in an attempt to not repeat the problems at launch, I changed the whole configuration so that I could (if needed) take a server down while staying partly operational. This kind of worked and was needed when some brightspark promoted the site a day early and we hadn’t scaled back up!

Come the new year I decided it was time to seriously sort the infrastructure for the site. It now has an online store and it’s important it keep running, it’s not just a blog anymore. So I put in place the following setup (working around various obstacles).

DNS:

All websites name servers are pointing to cloudflare and they handle the first web connection. It works really well on their free tier, and changes (adding new servers) are pretty quick to take effect.

DigitalOcean Droplets:

1x Server running as a load balancer.

2x Servers running as webservers.

1x Server running as database server.

1x Server running as email (not quite running).

At the same time as making this setup I decided to ditch apache and move to nginx, so loadbalancer and webservers are running that.

Software:

4x Nginx (loadbalancer and webservers, and installed on database server for stats).

2x Syncthing (webservers) to keep the www folders in sync.

1x MySQL Server.

4x OpenVPN (connections between loadbalancer and webservers, webservers and database).

1x Redis Server (for session data, I tried nginx load balancing options but it still screwed up if I had to take one of the web servers out for maintenance, so installed Redis on the Database server.

As this progressed I dropped the VPN between loadbalancer and webservers and just use HTTPS/SSL instead. Syncthing already has it’s own SSL built-in so I could leave that over the semi-private LAN. but I really would like to change MySQL to be encrypted and drop the vpn from that too, but find info on doing this for wordpress seems to be non-existent at the moment.

Roll forward a few months, this has been working but still has areas to resolve. Such as syncthing: yes it keeps the folders in sync and is actually really good that I can also store them on another system easily. but it doesn’t listen to the OS for changes to files. Instead it polls every x seconds. Although there’s nothing much changing, updating plugins became a problem if you click the update button it downloads but then nginx sends your next request to another server and now the plugin.zip isn’t there so wordpress throws an error.

My whole reason for running syncthing was I wanted the files to be available on each server independently. So if Server A goes down it doesn’t matter Server B has all the files locally anyway. NFS would still give me a single point of failure. On looking into resolving this though I remembers GlusterFS. I’d played with it a long time ago, but dropped it as a solution (can’t remember what I was doing or why it wasn’t working). Now it’s time to try it again. downside I’m back to needing VPN’s and OpenVPN isn’t the easiest to quickly add a new server.

So I’ve done the following:

Added a new server just for the files (I don’t like gluster being in a 2 replica incase there’s a problem, there should be a majority who thinks they are holding the correct file).

Swapped out OpenVPN for Tinc, I have to say one of the best decisions. yes there are downsides, it creates a mesh (only doable with OpenVPN by running quagga for manually forcing routes) but I have no idea which Server is actually connected to which Servers. There’s no VPN status and I can’t see how much traffic has gone between 2 particular servers (iptables helps but it’s not 100%)

Added another new server for nagios and central logging.

There were a load of changes within a few weeks of each other, but I now have a setup I’m confident I can scale more quickly than ever before. Yes it has single points (load balancer, mysql) but I know if the load balancer has a problem it’s pretty static so can be wiped and redeployed quickly, as well as it will take a few minutes to open the webservers to the world and let cloudflare hit them directly. So MySQL is the real problem and I’ll be addressing that one soon enough.

So now onto today’s problem 🙂

I’ve had gluster running a few weeks, and I have our testing website (for theme changes etc) setup on our webservers behind the loadbalancer. The last few days I’ve need to do more extensive testing than just changing bits in a theme, so I’ve decided to split the tester site onto it’s own droplet (still behind the loadbalancer and with VPN to the databases). I thought I may as well make use of Gluster here too (yes it would be in a 2 replica setup itself and the fileserver. I don’t like that idea). So I brought up a new server and configured it: new users, firewall rules, tinc, nginx, php, etc.

I added gluster and copied the /etc/hosts entries over from the other servers. All looked good. I gluster peer probe ServerX and it worked, gluster peer status and I could see it fine. but on trying to add a new volume:

After upgrading my Ubuntu system first from 12.04 to 13.10 then onto 14.04.1 I started getting problems with xlib_shmIt wouldn’t start up, all the cameras were working fine from the web interface but nothing to the TV.

So I started troubleshooting and found the following error coming back at me.

It doesn’t really make any sense though, it take you down the path of increasing the shmmax an shmall. But this was all working fine before the update and these values haven’t been changed. Still I increased them just incase.

No joy.

I also tried to recompile xlib_shm incase (being upgraded a few times) too much had changed. This unfortunately fell flat on its face when it can’t compile there’s stuff missing (we’ll come back to that).

After going round in circles for quite a while, I found while checking the camera configs that each camera had been set to 8bit greyscale. I set this back to 24bit colour. And restart. hey presto! my tv is once again filled with cameras.

I do still have a problem at the moment though, although the cams are now displaying they are frozen. I’m not sure though if this is more to do with how I start xlib_shm, and think it may be crashing out but not completely enough to take the display away.

So if your having problems with xlib_shm after upgrading either your OS or Zoneminder, check that your cams are setup properly. I almost missed the colour as it’s nighttime and they are grey anyway 🙂

As for the compile problems. It looks as though upgrading through 2 versions of Ubuntu has removed some packages, I’ll have to try and work out what incase I need to compile again.

A friend recent;y asked me about DarkCoin, and I had to admit I hadn’t heard anything about it. Sure I have some bitcoins but only really to say I have them. So when he asked I just thought `ah there’s another one`. He then asked how someone would mine these things, So I showed him a few youtube videos of pretty impressive setups.

Anyway to the point, I ran through a quick demonstration of how the miner would work and setup one up on a Virtual Server (Droplet). So for anyone that’s interested here goes (p.s. I can’t say you’ll make a fortune from this method, it’s more proof of concept).

We’ll be using DigitalOcean for our server, so you’ll need an account. Signup here There’s always some promo codes for some credit at present SSDMAY10 give you an extra $10 credit. I personaly use DigitalOcean to host a few of my projects and they’re great for being able to test ideas.

Once you have your account setup, Create a new Droplet I normally use the Latest Ubuntu.So the specs I’m testing this on are:-

One thing to note, the above command will test under MY pool test login. You will not receive any coins doing this and should only be used as a test. You can stop this using CTRL+C

As long as that all seems to be ok, we should now join a pool. I’m currently using DarkCoin Official Pool I have no idea how ‘Official’ it is, but I was drawn to the graphs 🙂 I did briefly run a miner on windows using this guide and received coins directly into my wallet. Whereas using the pool I’ll have to cash anything out to my wallet and that will incur a charge. I would have also shown using the servers from the other guide, but at present they seem to be down I suppose that’s why this Pool does charge small admin % on each transaction. I can’t say I mind for good service.

I’ve had nagios running for years, so decided to play around with the alerts.Twitter seemed the obvious choice, it’s easy for a people to follow the twitter account that’s publishing the alerts, and great if you actively use twitter (I don’t, this was more of a ‘how would you’ rather than a need).First thing is to register a twitter account that nagios will publish as. I setup https://twitter.com/NagiosStarBOnce you’ve registered you need to edit your profile and add a mobile phone number (This is needed before you can change the app permissions later. Once you’ve done that you can delete the mobile number).Now head over to https://apps.twitter.com/ and create a new app Fill in Name, Description and Website (This isn’t particularly important, as we’re not pushing this app out to users). You’ll be taken straight into the new app (if not simply click on it).We need to change the Access Level, Click on ‘modify app permissions’I chose ‘Read, Write and Access direct messages’, although ‘Read and Write’ would be fine. Click ‘Update settings’ (If you didn’t add your mobile number to your account earlier, you;ll get an error).Now click ‘API Keys’You need to copy the API key and API secret (Please dont try to use mine).Now click ‘create my access token’ close to the bottom of the page.You also need to copy your ‘Access token’ and ‘Access tocken secret’

Now we move onto the notification script.Login to your Nagios server via SSH.You need to ensure you have python-dev & python-pip installed.

apt-get install python-dev python-pippip install tweepy

Then cd into your nagios libexec folder (mines at /usr/local/nagios/libexec)

You should now be able to see the Tweet on your new account (you may need to refresh the page). If all has gone well so far, you can now add your Nagios Configuration. Change Directory into your nagios etc

cd /usr/local/nagios/etc/

Edit your commands.cfg (mine is inside objects)

nano -w objects/commands.cfg

Where you choose to place the new configurations doesn’t really matter, but to keep things in order I choose just below the email commands. Copy and paste the following

I decided to create a specific contact and contact-group for this, but you can adjust as you wish, add the contact to other contact-groups if you wish. Now the last bit,Add the new contact group to the hosts & services, templates or host-groups and service-groups.How you decide to do this will depend on how you’ve set out your hosts, services, templates and contacts. For me I edit the each of the host files and add contact_groups nagiostwitter to each host and service.(IMPORTANT: this will override settings that are inherited from templates, so if you already have email notifications active you’ll either have to just add nagiostwitter to the template or add users to this). Dont forgot to , delimitedAn example host of mine

UPDATE:A few weeks ago I received an email from twitter telling me my application had been blocked for write operations. It also said to check the Twitter API Terms of Service. I didn’t think this would cause a problem, I’m not spamming anyone other than myself or users I’ve asked to follow the alerts. So I read the Terms of Service, and it’s all fine. I raised a support request with Twitter and had a very quick response saying “Twitter has automated systems that find and disable abusive API keys in bulk. Unfortunately, it looks like your application got caught up in one of these spam groups by mistake. We have reactivated the API key and we apologize for the inconvenience.”This did stop my alerts for a few days though.So just be aware of this.UPDATE 2:Thanks to a comment from Claudio to truncate messages over 140 characters. I’ve incorporated this recommendation into the code above.

I’ve been using one of my PI’s as a torrent server for some time. Recently I decided to refresh the entire system. This will NOT go into the legalities of downloading anything, I expect everyone to only be using this for downloading raspberry images 🙂

# Enable DHT support for trackerless torrents or when all trackers are down.# May be set to "disable" (completely disable DHT), "off" (do not start DHT),# "auto" (start and stop DHT as needed), or "on" (start DHT immediately).# The default is "off". For DHT to work, a session directory must be defined.## dht = autodht = off

## Do not modify the following parameters unless you know what you're doing.#

# Hash read-ahead controls how many MB to request the kernel to read# ahead. If the value is too low the disk may not be fully utilized,# while if too high the kernel might not be able to keep the read# pages in memory thus end up trashing.#hash_read_ahead = 10

# Interval between attempts to check the hash, in milliseconds.#hash_interval = 100

# Number of attempts to check the hash while using the mincore status,# before forcing. Overworked systems might need lower values to get a# decent hash checking rate.#hash_max_tries = 10

# Start The Plugins when Rtorrent Starts not when the page is first opened. If apache service is restart separately the plugins are likely to be stopped. Only really needed for RSS feeds.execute = {sh,-c,/usr/bin/php /var/www/php/initplugins.php &}

Save and Exit (ctrl+x then y then enter)We now need to perform a test run of rtorrent.

rtorrent

It should start without any problems. You may get a few warnings inside rtorrent, but it should still be running. To Exit press ctrl+q.You should now exit the rtorrent user.

exit

Finally we’re going to setup rtorrent to automatically start when the PI is powered up.

nano -w /etc/init.d/rtorrent

Copy and Paste the Following:-

#!/bin/bash

# To start the script automatically at bootup type the following command# update-rc.d torrent defaults 99

And that’s it. You could now start rtorrent using “/etc/init.d/rtorrent start”, but it’s just as easy to reboot and test that the startup scripts runs. Once you’ve reboot or started rtorrent you can access the webpage at http://{ip-address or name} Notes:-

This setup is meant to run internally, as such there is no security on the apache setup.

Personally I forward ports 51515-51520 on the router onto the PI, this makes a difference in download speed (much quicker) but as it's opening ports it's a security risk so you'll have to decided whether or not to.

I run this setup behind a vpn using ipredator.se, if there's any demand I'll write up another guide on how to configure that and ensure your traffic is locked to only go over the vpn.

Currently moving an OTRS installation from one server to another. This installation has been running fine for months, I’ll be upgrading as part of the process from 3.2.6 to 3.3.1 but to keep the migration simple I thought I’d just tar up the current OTRS installation and dump the database. Copy them over to the new server extract the files and restore the database and permissions.This all went well, so I configured apache (copied the existing apache config from the old server). Restart the apache server and tried accessing the page. Kept getting ‘Forbidden’ messages, everything pointing towards permissions, so checked and reran otrs.SetPermissions still no joy. As nothing seemed to be moving forward I decided to wipe the install, and perform a fresh install. Did this but then found that apache just wouldn’t start. This was down to alot of entries in various failes pointing to /opt/otrs/ rather than my installed /usr/local/otrs/ Once I sorted this I was back to encountering ‘Forbidden’ messages again.After alot of poking around and searching the net I found:-http://httpd.apache.org/docs/2.4/upgrading.htmlI hadn’t considered that apache may have been different between servers, a quick ‘apache2 -v’ on each server confirmed one server running 2.2 and the new running 2.4.

So to solve this I had to replace:- Order allow,deny Allow from all

with

Require all granted

all through the apache config. After a service apache2 restart this let me get to the installer.pl so with that working, I’ll be back to extracting the OTRS and database from the old server.

I’ll apologise for any mistypes, I’ve written all this from memory, while having a well deserved coffee.