I spent a good few hours searching google and trying to work out what had been installed that could be causing an issue with libc6. Though I was fairly certain nothing had been (only because this is 2 days after a migration to new servers, and I’m hitting this problem on 2/4 servers, but they have extremely similar setups).

Something is eating all the space. So I tried to run ncdu to look for large files (I know there’s other ways, but I like ncdu). But I hadn’t installed it on this new server and I can’t install it with apt-get broken.

Thinking /run is bound to be causing some issues (still not sure if it’s causing this particular issue), I reboot the server (bad move!). It locked up and had to be power cycled. Thankfully it’s a droplet and with DigitalOcean I can power cycle easily (I did try the console but it wouldn’t connect).

Anyway after a reboot, /run started at 5% used, but quickly grew to 70%. but I did managed to install ncdu, and with that I knew the problems I had with apt-get were being caused by a full /run.

After a quick (very quick) look at ncdu /run I could see hhvm.hhbc taking up approx 85Mb+

A quick check of the config and I can see hhvm is configured to do so. So I adjusted the config to put it in /var/cache/hhvm/hhvm.hhbc instead and update the systemd service script to create /var/cache/hhvm and set it’s owner.

Another reboot, everything seems fine and /run is now at 3% used.

And I’ve run apt-get upgrade successfully.

I’m thankful that I noticed, I really thought I’d screwed something on these 2 servers while migrating, and I could see another night of rebuilding new servers ahead of me.

Morale of the story: Check your not out of space when you get weird errors (yes the I/O should have rang some bells, but hey it is 2am).

I’ve recently noticed a problem on 3 of my Digital Ocean Servers. The APT package lists are not automatically updating every day. I try to keep all servers upto date, and rely on Nagios to inform me when there’s packages needed to be updated and that’s the main reason I noticed something was broken.

The 3 servers in particular are newer builds to the rest of the system, and they dont have near as much installed as the others, so at first I didn’t pay too much attention when other servers were going into warning state on nagios indicating updates but these 3 weren’t. However I would still connect to these servers and run my normal command:-

A few times these servers did install updates and I just thought it must have been my timing, that the package lists hadn’t yet been updated by the cron.daily.

But after this happening a few times, I decided to not run the above and see how long these servers would take for nagios to throw an alert. It never did and that got me a little worried.

Over the last few days I’ve been diagnosing what’s wrong. I started out with making sure cron is working properly. Then kept an eye on the file timestamps

ls -ltrh /var/lib/apt/lists/

Eventually getting to /etc/cron.daily/apt and checking through what was was doing on the working servers compared to the broken ones. I turned on VERBOSE and got a bit of info when running /etc/cron.daily/apt but it seemed to exist quite quicky.

Comparing it to a working server the important bit seemed to be around

In the last few weeks I’ve been rebuilding servers & all services. I like to do this every so often to clear out any old junk, and take the opportunity to improve/upgrade systems and documentation.

This time around it’s kind of a big hit though. While I’ve had some services running with HA in mind, most would require some manual intervention to recover. So the idea this time was to complete what I’d previously started.

So far there’s 2 main changes I’ve made:

Move from MySQL Master/Master setup to MariaDB cluster.

Get redis working as HA (which is why your here).

I’ll bore everyone in another post with the issues on MariaDB cluster. But this one concentrates on Redis.

The original setup was 2 redis servers running, 1 master 1 slave. With php session handler configured against a hostname which was listed in /etc/hosts. However this time as I’m running a MariaDB cluster, it kind of made sense to try out a redis cluster (dont stop reading yet). After reading lots, I decided on 3 Servers each running a master and 2 slaves. A picture here would probably help but you’ll just have to visualize. 1, 2 & 3 are Servers, Mx is Master of x, Sx is Slave of x. So I had 1M1, 1S2, 1S3, 2S1, 2M2, 2S3, 3S1, 3S2, 3M3.

This worked, in that if server 1 died, it’s master role was taken over by either 2 or 3. And some nagios checks and handlers brought it back as the master once it was back online. Now I have no idea if this was really a good setup (I didn’t test it for long enough), but 1 of the problems I encountered was where the PHP sessions ended up. I (wrongly) thought the php session would work with the cluster to PUT and GET the data, so I gave it all 3 master addresses. Reading info on redis, if the server your asking doesn’t have the data it will tell you which does, so I thought it’s not really a problem if 1M1 goes down and the master becomes 2M1 because the other 2 masters will know so will say the data is over there. In manual testing this worked. but PHP sessions doesn’t seem to work with being redirected (and this is also a problem later).

So after seeing this as a problem, I thought maybe a cluster is a bit overkill anyway and simplifying it to 1 Master and 2 Slaves would be fine anyway.

I wiped out the cluster configuration and started again, but I knew I was also going to need sentinel this time to manage which is the master (I’d looked at it before, but went for cluster instead. Time to read up again).

After getting a master up and running and then adding 2 slaves. I pointed PHP Sessions to all 3 servers (again a mistake). I was hoping (again) that the handler would be sensible enough to connect to each and if it’s a slave (read only) detect that it can’t write and move to the next. It doesn’t. It happily writes errors in the logs for UNKNOWN.

So I need a way for the session handlers to also know which is the current master, and just use this.

Basically this script is run whenever the master changes (I need to add some further checks to make sure <role> and <state> are valid, but this is being done for quick testing.

I’ve then changed the session path:

session.save_path = "tcp://REDIS-MASTER:6379"

and again in the pool.d/www.conf:

php_admin_value[session.save_path] = "tcp://REDIS-MASTER:6379"

Quite simply I’m back to using a hosts entry which points at the master. but the good thing (I wasn’t expecting) was I dont have to restart php-fpm for it to detect the change in IP address.

It’s probably not the most elegant way of handling sessions, but it works for me. The whole point in this is to be able to handle spikes in traffic by adding more servers N3, N4, etc. and providing they are sentinels with the script they will each continue to point to the master.

I did think about writing this out as a step by step guide with each configuration I use, but the configs are pretty standard and as it’s a bit more advanced than just installing a single redis to handle sessions, I think the above info should really only be needed by anyone with an idea of what’s where anyway.

I still dont really get why the redis stuff for PHP doesn’t follow what the redis server tells it i.e the data is over there. I suppose it will evolve. If your programming up your own use of redis like the big boys then you’ll be programming for that. I feel like I’m doing something wrong or missing something, I’m certain I’m not the only one who uses redis for PHP sessions across multiple servers behind a load balancer, but there is very little I could find googling beyond those that just drop a single redis instance into place. As this becomes single point of failure it’s pretty bizarre to me that every guide seems to go this way.

If anyone does read this and actually wants a step by step, let me know in the comments.

Just to finish off I’ll give an idea of my current (it’s always changing) topology.

2 Nginx Load Balancers (1 active, 1 standby)

2 Nginx Web Servers with PHP (running syncthing to keep files in sync, I tried Gluster FS which everyone raves about, but I found it crippling so I’m still looking for a better solution).

I have no doubt that I will learn I’ve done bits wrong as I get further in, but I’m going to start with the following hosts config

[initial]
C3PO-1
[balancers]
[backends]
[databases]
[fileservers]

So the first thing I’m going to tackle is setting up new users. I’ve been playing with it for about an hour, although I got the user created very quickly I hit a problem with my setup. I need to send multiple ssh keys for some users (we use different keys on PC’s, Laptops, Mobiles). Every example I found seemed to a) want to pull a key from a file, b) just use one key.

After quite some playing, and trying different things I found a way. This in turn meant I had to slightly change the create user bit of the playbook.

I did think about pulling the keys in from the authorized_keys files, but not all users are allowed on the management, so I’d have to keep files for them and if I’m going that far, I may as well just keep them in group_vars. It doesn’t look as nice if you cat the file but it’s structured and makes sense.

I was worried that it may screw up the users own group as the manual says it delete’s all other groups except the primary, since I haven’t told it a primary using ‘group’ I thought it may be a problem, but thankfully it’s not this just worked.

Well I thought at this point I was pretty finished. Ha not a chance. I added a user for ansible to be able to connect as since I’ll be removing root ssh access. This means I’m going to need to let ansible sudo so I’d best sort that now. Here’s a bit of code I found and changed slightly. Make sure you change the username in the sudoers.d/ansible file

All working. It’s taken about 2 hours to get to the point of deploying a couple of users automatically. I’m not so sure this has saved me time in the long run 🙂 but it’s the first step in a much bigger project. I’m kind of glad it wasn’t just copy and paste others code and stuff broke, it gave me a chance to understand a bit more.

Part 2 will be coming soon. There we’ll lock down SSHd and apply the default firewall rules.

So If you read my last post (it was really long sorry), you’ll see right at the end the current deployment. I had tried a few managers to be able to deploy/scale the whole system, but it really overcomplicated the whole thing. Chef looked really good (I can’t remember the other one), but it was problematic and just didn’t suit.

Instead I kept with the scripts I had written for the time being. They are in no way good enough to share as they are very customised to my setup but they achieve what I need. However to run them takes quite a bit of initial manual work.

So what do I need from a system:-

It has to just work, not go installing stuff it depends on to run.

It has to be able to split the setup into an initial and running level.

It has to be able to be told easily about a new server and what role it will be, then do the initial setup and onto the relevant running level.

It MUST be simple to use and understand. I’m getting a bit sick of having to read through how to configure weird stuff because someone decided to do things completely differently to how you’d expect it to work.

It must have very low resource requirements. I want to run this from a management droplet that already runs nagios, so it can’t sit there just eating resources while it has nothing to do.

I saw a very quick video on Ansible with someone using it for Raspberry PI’s (something else I have far too many of) and thought right away I should look into that. So here I am. I’m going to do some testing with Ansible.

Initially all I’m looking for it to do is handle the initial setup that I already have scripts for.

Create a new user, add it to the sudo group, set a password and copy the SSH keys.

Reconfigure sshd to deny root connections.

Set the server’s timezone.

Set the keypad option in nanorc.

Set the .bashrc for root to use a red prompt.

Install some basic packages such as screen and htop

Create a swapfile

Install and configure NTP

Setup IPTables (I dont use UFW, I prefer to deploy an iptables file and have this restored when the loopback interface is loaded).

I’ve had a quick look on DigitalOcean community (I love the resources there), but the stuff about using Ansible seems a little more throw it all in one file rather than properly split them out like I saw in the video. I think splitting out each task is a must to be able to understand what’s going on and making changes.

The video I’m referring to is https://www.youtube.com/watch?v=ZNB1at8mJWY

I should make it clear from the outset, this post isn’t going to be solving anything. I’ve spent about 3 days working on stuff and this is just bugging me.

Let’s go back to around this time last year. A couple of friends and I were working on how to get some money back from a facebook page one of them had setup a few years prior. I had been working with him on it since a few weeks after he set it up, and we’d pushed around ideas to sell some merchandise alongside it a few times, but never really got anywhere.

He had made a rash decision one night to use an ‘online website creation’ provider to get a site online. From the start I hated it. Not the idea, I think it was about time to get our own site running, but he spent a few weeks tweaking about 8 pages to look really good, using their WYSIWYG editor. Then wanted me to change a few things in code that wasn’t right. It was an absolute nightmare! I can’t remember the name of the site but I think it had a W E X somewhere in the title.

It was a paid “solution”, and I think cost about £40 per month by the time he’d added a mailing list option to capture email addresses (not actually handle any emails) and a few other bits.

Coming from more of an IT position, my main concern was around load/spikes being handled. There was very little information about how well they could handle this (and I think we found the reason why). We finally put this live posting the link to about 40k followers.

Watching the nice google analytics (that I had to add to each page, because they didn’t have a drop in the tracking code option), within seconds we hit around 150 hits per second. This continued for a few hours, but 2 problems became apparent:

The site was struggling, and we would probably have been hitting a higher number. We were getting positive feedback and people understood it was busy, but I still wasn’t happy we’re paying for them to handle this and it’s just not.

And this really hits onto point 1. He’d setup a site that was pretty static! There was nothing to get people coming back for more. Yes there was a news page that we could update, but other than the mailing list form there was nothing interactive. (So hitting point 1, static content should really have been able to be handled 100x better).

Anyway we took that for what it was, a basic site with a bit of info and something to get us started.

We already had a ‘Shop coming soon’ page, so the next thing we were trying to figure out was what are we going to sell and how?

Initially we thought t-shirts, and started looking at some of the ways we could do this. 2 main providers seemed to jump out zazzle and cottonpress (I think, it was a while ago). While both had some good offerings, neither really grabbed us. I can’t remember which, but one of them deleted an image we uploaded for copyright (it was our logo, and we had it plastered everywhere and the account was signed up with a domain name using the same logo), but they wanted us to fill in and snailmail/fax some copyright forms and reupload the logo. Considering we were only seeing what we could do at the time, we decided to drop them as an option. If we have to jump through these hoops with everything we do, we’ll spend more time filling in their paperwork than anything else.

Time went on, visits to the site died down (did I mention lack of content/interactivity), and we still hadn’t sorted out products, a store, a business.

We continued with the facebook page and still poked around ideas on how to get a site/shop running. I spent a good few weeks working with oscommerce (I’d previously used it a little for another project idea, but it never went live) and finally had something to show, a semi working shop front (it had no products).

We discussed that neither of us really had a clue about setting up a proper business. I’m all IT and have no interest in writing business plans or doing business meetings (I should mention that a previous role I was an IT Manager and regularly had to be part of “grown up” meetings, I’m a Tech, I hate people, I hate meetings, give me something broken – I’ll fix it, give me a problem – I’ll work out a solution. But in NO WAY do I want to be taking part in any more business meetings).

A few months later I was helping him move. Another friend of his was also helping. I’d met him once before but didn’t get chatting then. He mentioned he was in his last year of university and was studying business and finance! Just how he hadn’t thought of this before I dont know, but instantly we knew he was coming on board 🙂

We spent about 3 hours in McDonalds discussing what we had, what we’d like to do, and just how crap we’ve been so far. Within this conversation we said about selling t-shirts/mugs/bags etc. Just like a genii out of a lamp our 3rd comes out with ‘oh my almost father in-law does printing stuff, I know he does mugs. Shall I speak to him?’ Just like a match made in heaven, we suddenly had our missing piece! someone who should have more of an idea on the business side (or at least know someone to ask) and connections to a printers for the kind of stuff we want to sell. You just couldn’t make it up, he’d known this guy for a few years and never thought to asking him about business stuff.

Things started moving forward, slowly at first but at least they were moving. We met up with “almost” father in-law and went through some designs and processes. We setup a business. I continued work on the shop website and we took down the other that was costing too much and not really doing anything.

In around October time we were set. Nothing spectacular, about 9 mugs and a few t-shirts. The mugs would be the easiest, we just send the order to “almost” and he takes care of printing, packaging and sending. The t-shirts would be a little different as we’d have to get a template made for them, and couldn’t afford the cost until we had some orders in.

We launched, I had tried to over-spec the server(s) but in itself this was tricky. There were no real stats on how well oscommerce could perform on certain hardware. And scalable VPS’s such as DigitalOcean’s current system just didn’t exist. Scaling would mean taking a new 30 day server and moving everything over to it. Certainly not a 5 min job, and definitely not something to start an hour after we’ve launched. We’ll just have to bite the bullet and see.

My memory of launch night is fuzzy to say the least. I think I’d been working 36-48 hours trying to finish stuff off. I had a big list of checks and can’t remember doing half of them.

Our page audience had grown to about 70k, so I was very nervous. We launched the shop and watched. ping, email – it’s an order, ping, another, ping another. It was working. I have to say the server(s) held up pretty well. It wasn’t without problems, we did start seeing the site timing out on new connections for about 10 mins. but a swift kick of apache sorted that and it didn’t cause a problem again.

Finally we were running. The feedback again was good. We had a bunch of concerns like

Will the system work

Will the servers(s) hold up

What happens if it goes mental and we sell a thousand mugs

Can we really do this

I think all in all it went well. We could have done better, but it also could have been a lot worse. We ran with oscommerce for a few months. Shortly after launching the shop I had a discussion on just how were we going to get facebook and the shop incorporated? There was no obvious answer, then we hit a problem. Once of the posts to facebook got reported (it’s a humorous page, and we only every post stuff sent in to us), this showed us just how much we’re reliant on facebook. Suddenly we’re all logged out and the poster was blocked for 24 hours. Luckily facebook pretty much left the page alone (just delete the one post), so we played on it that one of our admins was in the dog house for an earlier post. But it still didn’t take long to realise if facebook wanted they can delete the page at a whim and we’ve suddenly lost all our content and fans!

This just didn’t sit right with me, and I started looking at how the hell to get a backup of OUR page and content. There was nothing. So I started looking at how can we do things differently, enter WORDPRESS.

I’d seen this name floating about for the last few months, but never really saw the point in using it. I dont blog, we dont blog, so what’s the point? (I’m still not entirely sure I understand the point) but it’s close to one of the best things I’ve spent weeks fiddling with.

I’d installed wordpress on our VPS for me to have a look around, it still wasn’t a site that we could really use, but as a CMS maybe I can find a way to connect to facebook and backup our stuff. There must be people who do this right? WRONG. There’s loads of plugins for wordpress & facebook, but I’ve only ever found 1 that takes your page and puts it as posts. To make matters worse, it’s flakey as f**k hadn’t been worked on in god knows how long, and of the very few comments in the code their in Chinese.

Now I would never describe myself as a coder. I’ve used Delphi and VB for writing some functional programs in the past, and had to program a few in VB.net when I was IT Manager (the old problem/solution thing), I could also write some ASP and PHP, really most of my stuff was drag and drop boxes and program them up for stuff. I did quite a bit of databases stuff within them, but that was it. There was absolutely no such thing as using classes (I think even think they existed). But as part of my job I had a dev team who did develop in PHP and VB.net, and they were always amazed when trying to tell me how something couldn’t work, that I could not only follow along but tell them why they were wrong and on several occasions when something broke could actually read their code and work out (normally the simple thing) a temporary fix.

And so it begins I now have no dev team, a bunch of php code and classes that really didn’t make much sense to me. Bit by bit I managed to work out what each bit was doing, then moved on to changing it so that it would run for us. I know it will seem like simple stuff (especially looking back), but things like:

Changing a hard coded loop to only pull 10 facebook posts, to take a limit from a setup interface where you can specify how many to pull.

Adding in Date ranges to pull from and to.

Improving the cron job, so it look from the time of the lastest post it was + 1 second.

Downloading any attached image and saving it to the server (huge accomplishment).

Changing the posts content that gets published and updating links back to the post it just posted.

There’s load of stuff I’ve had to do to this plugin to get it working, let alone better and working for us. Eventually I’d finished (your never really finished, I have a list of new changes to get to sometime). After running it on a fresh wordpress install, we suddenly have a complete backup of our facebook page, around 9k posts and images, all sat in wordpress. and what’s more automatically grabbing new stuff 🙂

I showed off my new achievement, personally I dont think it was appreciated just how much time and effort I had put into this. but it went down really well. We now had a blog! we now had a blog alongside our store and it was really starting to come together.

Over the next few weeks I kept working on improving the blog while managing the shop. Then suddenly a new disaster, our server had for some reason gone offline. Trying to connect via the backup terminal access just gave me a blank screen, something was wrong and I couldn’t get access to see what. To make matters worse, our provider had very nicely decided to cut back on it’s 24/7 support, and now only operated 8-8. At around midnight, it’s not exactly what you want to be finding out that the support has changed and no-one bothered to tell you. The ONLY thing I could do was email them, and hope someone picked it up soon. They didn’t! I spent a good few hours trying everything I could think of to get hold of someone or find a way to the console, but nothing. This had the effect of making me sleep through my alarm at 8am, but I woke at 9am and called them. After a few choice words, I was assured the tech team would look at it right away. I was so tired I fell back to sleep and woke again about mid day. The first thing I did was open our site, or I should say attempted to! it was still down!! Another call, more choice words, and me advising them I’m not going anywhere and if they cut me off I’ll just keep calling until I can speak to a “Tech”, then explaining the problem and what I had tried to tier 1 monkey, quickly got me escalated. I couldn’t stop laughing when I finally did get to tier 2, their tier 1 had placed me on hold to get someone then come back to me and said ‘I need you to take this call, this bloke knows what the f**k he’s on about, I’d put him to tier 3 but I can’t direct transfer’ to which I replied ‘Yes and I know how to work a phone system, Line 1 is the customer, you should be on Line 2 for that conversation’ 🙂 I have to be honest just that mistake made my day. Tier 1, 2 and the manager that called me back an hour later were mortified, but as I explained to him I’ve been a senior tech on phone support, I’ve been an IT manager, I’m guessing I hit a newish person and scared the crap out of them. I only care in getting this back online. To be fair to Tier 2, I was connected to the console while he was apologising. (This part really could have had it’s own post).

Anyway getting over that failure I started looking for another VPS provider, I had no problem with their VPS and generally it was a very stable system, but 8-8 support with no out of hours we’re really f**ked option, forced my hand. It had gone down very badly with the others that this had costs us money and there was no way I could argue it as I agreed the situation was crap.

I found another provider and started moving stuff, but it just wasn’t right. It was actually a previous colleagues company, but something just wasn’t right. So I kept looking. Then I found Digital Ocean, initially I started using them to test some wordpress plugins, but I loved that I could very quickly bring up new servers in a matter of minutes. This surely has to be better than waiting hours. And it was. Testing was going well, so I started moving everything over. Everything just worked, and where I had to contact their support for a few little things (1 account related can’t remember the others), I had a reply very quickly sometimes within minutes other times within 30 mins. I couldn’t fault their support and I wasn’t bounced around, they knew exactly what I needed and sorted it.

So here we have a medium spec’d Digital Ocean server, running our WordPress and OsCommerce solutions and handling both pretty well.

But being one to never settle, I kept tweaking stuff and looking at out options. I setup another server (droplet) for testing, another wordpress install later and I’m going through trying out the ecommerce plugins. I was blown away with WooCommerce! yes OsCommerce worked for us. and yes I had put in quite a bit of time customising it and getting it to work with our processes. but the whole feel of the interface was crap. Woocommerce was like a breathe of fresh air. It had a bunch of functionality, there’s loads more plugins, it’s far easier to customise, it works from the wordpress themes, and fits right in with our blog and doesn’t look disjointed.

I proposed we move over to this and it went down well. Well enough infact the the others wanted to get more involved, we spent weeks working on changes to the theme (that’d we’d paid about $50 for), I moved the shop over and made it live without telling our facebook audience. We started getting some sales via Woocommerce, and it was obvious that this just integrated well.

We were going to have a relaunch to show off the new blog and store, I think I managed to p*ss the others off, when WordPress brought in a new standard theme that worked even better with Woocommerce and I changed to it to show them. It was obvious that it did and we should stick with it, but it also meant the last few weeks of customising was wiped out (and they still bring up the time I wiped out a few weeks work when I changed the theme).

I would never say wordpress/woocommerce is perfect, I’ve found many issues along the way and had to find work arounds for a lot of stuff. I still dont truly feel like I know what I’m doing and there’s no way that we use wordpress to it’s full potential. Currently we have the blog and shop running, we have somewhere in the region of 10k posts and around 15k sales. We still don’t publish to the blog independent of facebook/twitter but it’s on the roadmap.

One thing that has caught us out a few times is DigitalOcean scaling. Because we very often have little traffic, I always keep the servers scaled down with the intention of boosting them up before we push anything new. On at least 2 occasions, we’ve forgotten this and overloaded our site.

I’ve also gone through a few different configurations just trying to find the best solution.

1st We had 1 server, that was mid range and just worked, but I knew this alone wouldn’t handle the traffic.

2nd I brought up 2 web servers and a database server. This wasn’t an ideal setup, loadbalancing was at DNS level, syncing was done via cron jobs, and the whole thing held together via a VPN to keep database connections secure. This had a bunch of problems.

Next I moved back to a single web server but kept a separate database. This was better, and around the time DigitalOcean allowed you to scale up easier (but not down you still had to wipe out the server to do so).

Because having a single web server just wasn’t enough, I went back to 2, but added in the new(ish) cloudflare CDN in front of the servers. This really helped (though I’m still not convinced really does CDN for us)

As part of the above, I tried incorporating GlusterFS (absolute disaster). From every web search I did GlusterFS looks to be THE solution. In practice for us it took a website responding (with some heavy graphics) in 3 seconds longest avg 2secs, to 30sec longest 18secs avg. I know everyone rave’s on about how great it is and how if it’s slow it’s something you’ve done. I dont believe this for a second. I’ve spent days at a time trying to make it better, but the simple truth is if the files are pulled locally I get the 2/3secs above. When using a Gluster mount point to the files (which are still actually local, Gluster on both web servers, mounted back to themselves), I get the 18-30secs. Both web servers have a private lan connection to use gluster in the same Digital Ocean Location and NO amount of tweaking or testing seems to every really improve this. It was only made worse during testing when I took down one of the servers, so that the other could only use itself to serve the files and this managed to take out the mountpoint until I restart, and still it served up the pages slowly. I thought the whole point in using Gluster (at least for me) was HA, no single point of failure. Having both servers offline if one goes down does not seem very HA to me.

The ONE thing I really want Digital Ocean to sort out is their private lan. In order to solve the issue of anyone else on the private lan being able to see my traffic between servers I’ve had to use VPN’s between them. This adds complications to the entire setup, and a private lan per account would be very welcomed.

I’m happy with the load balancers, though I would love for DigitalOcean to offer a proper loadbalancing solution.

The MySQL servers took some config to get replication working properly while also using SSL for the connection to each other and from the backend web servers.

I still haven’t managed to configure MySQL to be HA from the web servers, so at the moment this would be a manual switch. I’ve found HyperDB for wordpress, while should resolve this, but since I had to slightly change the wordpress config to do SSL for MySQL, HyperDB doesn’t seem to be able to use SSL so I need to work out how to do this. I find this really weird as once of the suggestions is to have your database remote, I really would have thought being remote (especially if using something like Amazon for the database) would mean you’d want to use SSL to keep your database traffic secure. It seems strange that this isn’t a fundamental option in HyperDB (unless I’m just not seeing it).

And the last part NFS Servers, I still need to find out how to keep these in sync (without using Gluster), I’ve previously used Syncthing to keep servers in sync, it works but is pretty much held together with tape (my configuration of it not the actual program). Once I have the NFS sync’d I also need to find a way for the web servers to use both HA.

I do feel like this configuration is the closest to the best I can achieve on a budget. Once I have the MySQL and NFS stuff worked out, I will then be able to scale any server without completely taking the site offline. Which will really help in being able to deal with spikes. It is not much easier to scale with Digital Ocean, but I’d still really want to know doing so or taking a server out for maintenance is fine because everything will just keep running.

If you’ve got this far, I really thank you for reading. I hope the next couple of posts will be my solutions to the MySQL SSL and NFS problems. It’s not 2:41am and I think I’ve been writing this for about 2 hours, so I’m going to sleep 🙂 leave a comment if you got this far, include the words ‘sleep deprived’ so I dont think it’s spam.

If your looking for the gluster error ‘brick2.mount_dir not present’ jump to the end

Time for another post 🙂

I’ve been using DigitalOcean for some time now, and I’m still tweaking my setup. Once thing I really hope they sort soon is proper private lan between your own droplets, for now we just have to use a vpn between them.

Being responsible for a new website can give a lot of headaches, especially when you have to try to guess just how popular it will be. So about a year ago I setup a new droplet to host the new site, testing it was going well and I increased the droplet before we launched to handle a spike. Sadly I underestimated just how busy it would get, based on the numbers I was given I think I was about right but unfortunately those numbers were way off.

But each failure it just another learning curve 🙂 so as quickly as possible that was fixed, then the site got to normal volume so we scaled it back down (yes it’s a whole cost exercise especially when your paying for it). Then we had the lead up to Christmas, in an attempt to not repeat the problems at launch, I changed the whole configuration so that I could (if needed) take a server down while staying partly operational. This kind of worked and was needed when some brightspark promoted the site a day early and we hadn’t scaled back up!

Come the new year I decided it was time to seriously sort the infrastructure for the site. It now has an online store and it’s important it keep running, it’s not just a blog anymore. So I put in place the following setup (working around various obstacles).

DNS:

All websites name servers are pointing to cloudflare and they handle the first web connection. It works really well on their free tier, and changes (adding new servers) are pretty quick to take effect.

DigitalOcean Droplets:

1x Server running as a load balancer.

2x Servers running as webservers.

1x Server running as database server.

1x Server running as email (not quite running).

At the same time as making this setup I decided to ditch apache and move to nginx, so loadbalancer and webservers are running that.

Software:

4x Nginx (loadbalancer and webservers, and installed on database server for stats).

2x Syncthing (webservers) to keep the www folders in sync.

1x MySQL Server.

4x OpenVPN (connections between loadbalancer and webservers, webservers and database).

1x Redis Server (for session data, I tried nginx load balancing options but it still screwed up if I had to take one of the web servers out for maintenance, so installed Redis on the Database server.

As this progressed I dropped the VPN between loadbalancer and webservers and just use HTTPS/SSL instead. Syncthing already has it’s own SSL built-in so I could leave that over the semi-private LAN. but I really would like to change MySQL to be encrypted and drop the vpn from that too, but find info on doing this for wordpress seems to be non-existent at the moment.

Roll forward a few months, this has been working but still has areas to resolve. Such as syncthing: yes it keeps the folders in sync and is actually really good that I can also store them on another system easily. but it doesn’t listen to the OS for changes to files. Instead it polls every x seconds. Although there’s nothing much changing, updating plugins became a problem if you click the update button it downloads but then nginx sends your next request to another server and now the plugin.zip isn’t there so wordpress throws an error.

My whole reason for running syncthing was I wanted the files to be available on each server independently. So if Server A goes down it doesn’t matter Server B has all the files locally anyway. NFS would still give me a single point of failure. On looking into resolving this though I remembers GlusterFS. I’d played with it a long time ago, but dropped it as a solution (can’t remember what I was doing or why it wasn’t working). Now it’s time to try it again. downside I’m back to needing VPN’s and OpenVPN isn’t the easiest to quickly add a new server.

So I’ve done the following:

Added a new server just for the files (I don’t like gluster being in a 2 replica incase there’s a problem, there should be a majority who thinks they are holding the correct file).

Swapped out OpenVPN for Tinc, I have to say one of the best decisions. yes there are downsides, it creates a mesh (only doable with OpenVPN by running quagga for manually forcing routes) but I have no idea which Server is actually connected to which Servers. There’s no VPN status and I can’t see how much traffic has gone between 2 particular servers (iptables helps but it’s not 100%)

Added another new server for nagios and central logging.

There were a load of changes within a few weeks of each other, but I now have a setup I’m confident I can scale more quickly than ever before. Yes it has single points (load balancer, mysql) but I know if the load balancer has a problem it’s pretty static so can be wiped and redeployed quickly, as well as it will take a few minutes to open the webservers to the world and let cloudflare hit them directly. So MySQL is the real problem and I’ll be addressing that one soon enough.

So now onto today’s problem 🙂

I’ve had gluster running a few weeks, and I have our testing website (for theme changes etc) setup on our webservers behind the loadbalancer. The last few days I’ve need to do more extensive testing than just changing bits in a theme, so I’ve decided to split the tester site onto it’s own droplet (still behind the loadbalancer and with VPN to the databases). I thought I may as well make use of Gluster here too (yes it would be in a 2 replica setup itself and the fileserver. I don’t like that idea). So I brought up a new server and configured it: new users, firewall rules, tinc, nginx, php, etc.

I added gluster and copied the /etc/hosts entries over from the other servers. All looked good. I gluster peer probe ServerX and it worked, gluster peer status and I could see it fine. but on trying to add a new volume:

Well it had to happen at some point, today I had a nice email from DigitalOcean saying they’ve disabled networking on one of my servers. This was because it’s ip address had been reported to RBL’s by several other servers.

Looking at the logs they included I was beating the s**t out of others wp-admin login pages. Now I know I wasn’t doing it personally, it was the first time in a long time I was in bed early and this seemed to start at 2am.

Luckily I could access my Droplet using the Console page, so after login I sat thinking ‘um…..’ where exactly do you start. The server normally has quite a bit of traffic so the logs are always cluttered. Needle in a haystack springs to mind.

I decided to run htop and see if the server was doing much without any traffic coming in. Oh yes /usr/bin/host is eating resources. So do I kill it or not. I decided not to at this point. Without networking I’m not doing anymore harm, and leaving it running may help find out what’s calling it.

It was a good call. I can’t give details of everything I did, I spent a hour hours checking through stuff. I do remember checking lsof and finding a link between a process id for host and a file within wpallimport’s uploads directory. So I had a look in there, followed by some further searching of google. 1 file in particular .sd0 seems to bring back results and this seems to be what’s caused it.

To get my server running again, I disabled the entire site within apache that was affected (luckily not a major site) and reboot the server. Once I was happy there were no cronjobs or anything calling on this script I mailed DigitalOcean and asked them to re-enable networking. They’re pretty speedy and within 15 mins had done it. A further reboot and my servers back online minus the one site I’ve disabled.

I expect the cleanup for this is going to take weeks of checking files, against backups while keeping as much as possible online.

I’m pretty confident I know what’s caused it, an out of date wordpress install with an out of date wpallimport install. It really goes to show that you have to check old stuff and keep it upto date.

The most annoying thing for me is that WordPress has a multisite option (which I use on 2 installs) and this allows me to keep plugins and everything upto date easily of sub-sites that are barely used. but it doesn’t extend to multiple domains which would really allow wordpress to be used across all my domains from one central console and then everything would be kept upto date in one go.

I know there’s a plugin for multisite domains, but I feel this is more of a hack of the wordpress system rather than wordpress properly designed to function with this in mind. I don’t want to install it and encounter many more problems.

It’s very bad admin on my part not having kept this site upto date, I’ll be the first to admit that but it’s easy to forget about installs you don’t use regularly. There must be some kind of nagios plugin to alert me to out of date plugins/versions for wordpress so I’ll be looking for that later in the week 🙂

A friend recent;y asked me about DarkCoin, and I had to admit I hadn’t heard anything about it. Sure I have some bitcoins but only really to say I have them. So when he asked I just thought `ah there’s another one`. He then asked how someone would mine these things, So I showed him a few youtube videos of pretty impressive setups.

Anyway to the point, I ran through a quick demonstration of how the miner would work and setup one up on a Virtual Server (Droplet). So for anyone that’s interested here goes (p.s. I can’t say you’ll make a fortune from this method, it’s more proof of concept).

We’ll be using DigitalOcean for our server, so you’ll need an account. Signup here There’s always some promo codes for some credit at present SSDMAY10 give you an extra $10 credit. I personaly use DigitalOcean to host a few of my projects and they’re great for being able to test ideas.

Once you have your account setup, Create a new Droplet I normally use the Latest Ubuntu.So the specs I’m testing this on are:-

One thing to note, the above command will test under MY pool test login. You will not receive any coins doing this and should only be used as a test. You can stop this using CTRL+C

As long as that all seems to be ok, we should now join a pool. I’m currently using DarkCoin Official Pool I have no idea how ‘Official’ it is, but I was drawn to the graphs 🙂 I did briefly run a miner on windows using this guide and received coins directly into my wallet. Whereas using the pool I’ll have to cash anything out to my wallet and that will incur a charge. I would have also shown using the servers from the other guide, but at present they seem to be down I suppose that’s why this Pool does charge small admin % on each transaction. I can’t say I mind for good service.