Meta

Tag: Nginx

Spent far too long trying out different ways to get this to work. I need to setup a server listening to the local IP address to restrict things like phpldapadmin to internal requests. But hit problems with nginx appending the location to the root path, and php having no idea where to get the files from.

We’re walking 114,893 requests to just the one server. My first instinct is to add the offending IP address to our IPTables BLOCK list and just drop the traffic altogether. Except this no longer works with CloudFlare since the IP addresses the firewall will see is theirs not the offender!

No problem, CloudFlare deals with this(!?) I can just turn on ‘Under Attack’ mode and let them handle it. This is where you quickly learn it doesn’t really do much. Personally I got a lovely 5 second delay when accessing our websites with ‘Under Attack’ activated. but our servers were still being bombarded with requests. So I added the offending IP addresses to the firewall on CloudFlare. Surely that will block them from even reaching our servers! While I can’t say it had no effect, out of a number of IP addresses I had added some were still hitting our servers quite heavily.

So the question becomes ‘How can I drop the traffic at nginx level?’. Nginx is configured to restore the real IP addresses, so I should be able to block the real offenders here not CloudFlare.

Turns out to be pretty easy. Add:

### Block spammers and other unwanted visitors ###
include blockips.conf;

Just add a deny line for each offending IP. Then reload nginx (service nginx reload). I’d recommend testing your new config first (nginx -t)

Now with that done of both servers I’m still seeing requests but they are now getting 403 errors and not allowed to hit any of the websites on our servers 🙂 after another 5 minutes of attack they clearly gave up as all the requests stopped.

But we’re not done. What if they’re just changing IP addresses? we need this to be automatic. Enter Fail2Ban. I haven’t used this in some time but I know what I need it to do.

Check all the websites log files for wp-login.php attempts

Write to /etc/nginx/blockips.conf

Reload nginx

Should be quite simple. Turns out it is, but it took hours trying to figure out how since all the guides seem to think you’ll be using Fail2Ban to configure IPTables.

It’s pretty simple and will need some tweaking. I’m not so sure 8 requests in 20 mins is very good. We do have customers on some of our sites who regularly forget their password. The regex does look at the 200 code, I read that a successful auth would actually give a 304. Not sure if this is correct so will need some testing. I also found other information on how to change the code to a 403 for failed login attempt. I think this would be a huge plus, but I’m not looking into that tonight.

A few tests using curl and I can see Fail2Ban has done it’s job, added the IP to the nginx blockips file and reload nginx 😀 I’m not too worried about syncing this across servers, as shown tonight they will balance well and I’m confident that within a few minutes of being bombarded they would both independently block the IP.

So there’s my morning of working around using CloudFlare but still keeping some form of block list. Hope this helps someone. Please comment to let me know if you found any of this useful/worth reading.

In the last few weeks I’ve been rebuilding servers & all services. I like to do this every so often to clear out any old junk, and take the opportunity to improve/upgrade systems and documentation.

This time around it’s kind of a big hit though. While I’ve had some services running with HA in mind, most would require some manual intervention to recover. So the idea this time was to complete what I’d previously started.

So far there’s 2 main changes I’ve made:

Move from MySQL Master/Master setup to MariaDB cluster.

Get redis working as HA (which is why your here).

I’ll bore everyone in another post with the issues on MariaDB cluster. But this one concentrates on Redis.

The original setup was 2 redis servers running, 1 master 1 slave. With php session handler configured against a hostname which was listed in /etc/hosts. However this time as I’m running a MariaDB cluster, it kind of made sense to try out a redis cluster (dont stop reading yet). After reading lots, I decided on 3 Servers each running a master and 2 slaves. A picture here would probably help but you’ll just have to visualize. 1, 2 & 3 are Servers, Mx is Master of x, Sx is Slave of x. So I had 1M1, 1S2, 1S3, 2S1, 2M2, 2S3, 3S1, 3S2, 3M3.

This worked, in that if server 1 died, it’s master role was taken over by either 2 or 3. And some nagios checks and handlers brought it back as the master once it was back online. Now I have no idea if this was really a good setup (I didn’t test it for long enough), but 1 of the problems I encountered was where the PHP sessions ended up. I (wrongly) thought the php session would work with the cluster to PUT and GET the data, so I gave it all 3 master addresses. Reading info on redis, if the server your asking doesn’t have the data it will tell you which does, so I thought it’s not really a problem if 1M1 goes down and the master becomes 2M1 because the other 2 masters will know so will say the data is over there. In manual testing this worked. but PHP sessions doesn’t seem to work with being redirected (and this is also a problem later).

So after seeing this as a problem, I thought maybe a cluster is a bit overkill anyway and simplifying it to 1 Master and 2 Slaves would be fine anyway.

I wiped out the cluster configuration and started again, but I knew I was also going to need sentinel this time to manage which is the master (I’d looked at it before, but went for cluster instead. Time to read up again).

After getting a master up and running and then adding 2 slaves. I pointed PHP Sessions to all 3 servers (again a mistake). I was hoping (again) that the handler would be sensible enough to connect to each and if it’s a slave (read only) detect that it can’t write and move to the next. It doesn’t. It happily writes errors in the logs for UNKNOWN.

So I need a way for the session handlers to also know which is the current master, and just use this.

Basically this script is run whenever the master changes (I need to add some further checks to make sure <role> and <state> are valid, but this is being done for quick testing.

I’ve then changed the session path:

session.save_path = "tcp://REDIS-MASTER:6379"

and again in the pool.d/www.conf:

php_admin_value[session.save_path] = "tcp://REDIS-MASTER:6379"

Quite simply I’m back to using a hosts entry which points at the master. but the good thing (I wasn’t expecting) was I dont have to restart php-fpm for it to detect the change in IP address.

It’s probably not the most elegant way of handling sessions, but it works for me. The whole point in this is to be able to handle spikes in traffic by adding more servers N3, N4, etc. and providing they are sentinels with the script they will each continue to point to the master.

I did think about writing this out as a step by step guide with each configuration I use, but the configs are pretty standard and as it’s a bit more advanced than just installing a single redis to handle sessions, I think the above info should really only be needed by anyone with an idea of what’s where anyway.

I still dont really get why the redis stuff for PHP doesn’t follow what the redis server tells it i.e the data is over there. I suppose it will evolve. If your programming up your own use of redis like the big boys then you’ll be programming for that. I feel like I’m doing something wrong or missing something, I’m certain I’m not the only one who uses redis for PHP sessions across multiple servers behind a load balancer, but there is very little I could find googling beyond those that just drop a single redis instance into place. As this becomes single point of failure it’s pretty bizarre to me that every guide seems to go this way.

If anyone does read this and actually wants a step by step, let me know in the comments.

Just to finish off I’ll give an idea of my current (it’s always changing) topology.

2 Nginx Load Balancers (1 active, 1 standby)

2 Nginx Web Servers with PHP (running syncthing to keep files in sync, I tried Gluster FS which everyone raves about, but I found it crippling so I’m still looking for a better solution).

If your looking for the gluster error ‘brick2.mount_dir not present’ jump to the end

Time for another post 🙂

I’ve been using DigitalOcean for some time now, and I’m still tweaking my setup. Once thing I really hope they sort soon is proper private lan between your own droplets, for now we just have to use a vpn between them.

Being responsible for a new website can give a lot of headaches, especially when you have to try to guess just how popular it will be. So about a year ago I setup a new droplet to host the new site, testing it was going well and I increased the droplet before we launched to handle a spike. Sadly I underestimated just how busy it would get, based on the numbers I was given I think I was about right but unfortunately those numbers were way off.

But each failure it just another learning curve 🙂 so as quickly as possible that was fixed, then the site got to normal volume so we scaled it back down (yes it’s a whole cost exercise especially when your paying for it). Then we had the lead up to Christmas, in an attempt to not repeat the problems at launch, I changed the whole configuration so that I could (if needed) take a server down while staying partly operational. This kind of worked and was needed when some brightspark promoted the site a day early and we hadn’t scaled back up!

Come the new year I decided it was time to seriously sort the infrastructure for the site. It now has an online store and it’s important it keep running, it’s not just a blog anymore. So I put in place the following setup (working around various obstacles).

DNS:

All websites name servers are pointing to cloudflare and they handle the first web connection. It works really well on their free tier, and changes (adding new servers) are pretty quick to take effect.

DigitalOcean Droplets:

1x Server running as a load balancer.

2x Servers running as webservers.

1x Server running as database server.

1x Server running as email (not quite running).

At the same time as making this setup I decided to ditch apache and move to nginx, so loadbalancer and webservers are running that.

Software:

4x Nginx (loadbalancer and webservers, and installed on database server for stats).

2x Syncthing (webservers) to keep the www folders in sync.

1x MySQL Server.

4x OpenVPN (connections between loadbalancer and webservers, webservers and database).

1x Redis Server (for session data, I tried nginx load balancing options but it still screwed up if I had to take one of the web servers out for maintenance, so installed Redis on the Database server.

As this progressed I dropped the VPN between loadbalancer and webservers and just use HTTPS/SSL instead. Syncthing already has it’s own SSL built-in so I could leave that over the semi-private LAN. but I really would like to change MySQL to be encrypted and drop the vpn from that too, but find info on doing this for wordpress seems to be non-existent at the moment.

Roll forward a few months, this has been working but still has areas to resolve. Such as syncthing: yes it keeps the folders in sync and is actually really good that I can also store them on another system easily. but it doesn’t listen to the OS for changes to files. Instead it polls every x seconds. Although there’s nothing much changing, updating plugins became a problem if you click the update button it downloads but then nginx sends your next request to another server and now the plugin.zip isn’t there so wordpress throws an error.

My whole reason for running syncthing was I wanted the files to be available on each server independently. So if Server A goes down it doesn’t matter Server B has all the files locally anyway. NFS would still give me a single point of failure. On looking into resolving this though I remembers GlusterFS. I’d played with it a long time ago, but dropped it as a solution (can’t remember what I was doing or why it wasn’t working). Now it’s time to try it again. downside I’m back to needing VPN’s and OpenVPN isn’t the easiest to quickly add a new server.

So I’ve done the following:

Added a new server just for the files (I don’t like gluster being in a 2 replica incase there’s a problem, there should be a majority who thinks they are holding the correct file).

Swapped out OpenVPN for Tinc, I have to say one of the best decisions. yes there are downsides, it creates a mesh (only doable with OpenVPN by running quagga for manually forcing routes) but I have no idea which Server is actually connected to which Servers. There’s no VPN status and I can’t see how much traffic has gone between 2 particular servers (iptables helps but it’s not 100%)

Added another new server for nagios and central logging.

There were a load of changes within a few weeks of each other, but I now have a setup I’m confident I can scale more quickly than ever before. Yes it has single points (load balancer, mysql) but I know if the load balancer has a problem it’s pretty static so can be wiped and redeployed quickly, as well as it will take a few minutes to open the webservers to the world and let cloudflare hit them directly. So MySQL is the real problem and I’ll be addressing that one soon enough.

So now onto today’s problem 🙂

I’ve had gluster running a few weeks, and I have our testing website (for theme changes etc) setup on our webservers behind the loadbalancer. The last few days I’ve need to do more extensive testing than just changing bits in a theme, so I’ve decided to split the tester site onto it’s own droplet (still behind the loadbalancer and with VPN to the databases). I thought I may as well make use of Gluster here too (yes it would be in a 2 replica setup itself and the fileserver. I don’t like that idea). So I brought up a new server and configured it: new users, firewall rules, tinc, nginx, php, etc.

I added gluster and copied the /etc/hosts entries over from the other servers. All looked good. I gluster peer probe ServerX and it worked, gluster peer status and I could see it fine. but on trying to add a new volume: