Amusingly, Rawhide changed the ABI, which the configure script had problems with. But I found a solution, which is going into ngx_pagespeed – github.com/pagespeed/ngx_pagespeed/issues/942. Documentation fix until the code is brought up to the newer C++ standard.

TodoDone:

Fix up the config file – centOS 6 doesn’t like the uncommented pid file set to /run/pid; – default for systemd, doesn’t exist for initfixed by commenting out the pid declaration in the config file

My webserver’s running nginx 1.4.7, a version that hasn’t gotten non-bugfix attention since March 2013, according to the changelog. Oddly enough for a Fedora package, the version in koji is the stable branch – something that makes sense for CentOS/EPEL since that’s a long term support release, but not for Fedora. Doubly annoying was the fact that the packages for Fedora 20 (because Fedora 21 isn’t supported on OpenVZ yet) was one step behind the official stable – 1.4 instead of 1.6.
If I wanted mainline on Fedora, I was going to have to build it from source.Read the rest of this entry »

So I wrote a bunch about VPSes a few months ago, and what I thought my future looked like with them. Well, a bunch has changed since then, and will going forward, so let’s go:

Full out cloud hosting

Still good for buy-as-you-need systems, still not right for my usecase. Other than reading about price drops in The Register/on Twitter, I’ll be passing over them.

Cloudy VPSes

Haven’t seen too much movement on this. I’m going to lump RamNode (www.ramnode.com) into the Cloudy VPSes section, simply because they’re at 5 locations. Sheer scale. Also, their prices are approaching Digital Ocean level. I’d still go with DO though – The cheaper plans are OpenVZ based, not KVM as DO is.

Traditional VPSes

This is probably the most significant change – instead of migrating to a Digital Ocean droplet/Vultr node, I decided to go with Crissic.

I haven’t actually had my VPS with them go down over ~1 year (yet), and the weird SSH issues resolved themselves around December, so I decided to bite the bullet and extend my existing VPS plan with them. (Or should I say him – all the support tickets have been signed Skylar, so it’s looking like a one man operation.) Crissic is consistently coming in the top 5 in the LowEndTalk forum provider poll so that was additional validation.

With Nginx/PHP-FPM/MariaDB all going, I’m hovering around 300MB of RAM used. So I have a bunch of headroom, which is good.

I also picked up a bunch of NAT-ed VPSes – MegaVZ is offering 5 512MB nodes spread around the world for 20€. So I got them as OpenVPN endpoints. They’re likely going to be pressed into service as build nodes for a bunch of projects (moving Jenkins off my Crissic VPS), but those plans are still up in the air. We shall see…

Dedicated Server?

The most interesting thing I found was Server Bidding. Hetzner (massive German DC company) auctions off servers that were setup for people, but have since cancelled their service. Admittedly, most of their cheaper stuff is consumer grade (i7 instead of Xeons), but I can’t really complain. There’s an i7-3770 with 32GB of RAM and 2x 3TB drives going for € 30.25/USD$34 right now. And prices only go down over time (until someone takes it).

That mix of price/specs is pretty much unmatchable. KVM servers are ~$7/month for a 1GB (and roughly scale linearly), so I’d be looking at $28/month for 4GB of RAM if I were to go normal routes. And that’s totally ignoring the 2x3TB HDD (admittedly, I’d want these in RAID 1, so effectively 3TB only.)

Tasks do not like having the remote_user changed mid-playbook if you specify a SSH password

Specifically, having an ‘ansible’ user created as the first task, then using that for everything in the rest of the playbook doesn’t work because ansible will always attempt to use the declared password for the newly created user, which promptly fails

People debate the usefulness of a separate config account, since it’s effectively root + key-based login. They have a point, since I can get the same security with disabling password-based auth.

If your inventory file is +x, ansible will attempt to execute it, even if it is the standard inventory list (.INI-format plain text)

Particularly annoying for me because I’m running ansible in a VirtualBox-powered VM, and using shared folders. Which translates to permissions of 777 in the shared folder in the Linux VM, and can’t be changed

The VPS market is really really interesting to watch. Maybe it’s just me, but the idea of being able to get a year of a decent system for the price of a pizza is fascinating – and somewhat dangerous to my wallet. At my peak, I had 4 VPSes running at the same time – and each of them doing practically nothing. So I ran TF2 servers off the ones that could support it until the prepaid year was over.

It’s just so… attractive. My ‘own’ servers, without the expense of needing to buy and pizza boxes that sound like hurricanes and are apparently good places for cat puke. And now I find myself needing to get one to power everything in the near future because I’ve been unsatisfied with Dreamhost for about 2 years, and they finally pushed me over recently. So I decided to take notes with the target of migrating to having everything of mine on a VPS in December.

My usecase can be summed up simply as a always-on low traffic webserver, hosting a personal blog (WordPress., while I look at Ghost), MySQL database and Jenkins build server (them Lightroom plugins!). Somewhat price-sensitive – I’m not made of money, but at the same time I’m not going to sweat the price of a coffee at Starbucks. Bonus points for extra space for hosting friends’ sites though. They can pay me back by buying me a drink or something.

Cloud Hosting

Cloud hosting is hard to quantify – off-hand, the big three are Amazon, Google And Azure. But I’m going to expand my definition to cloud = anything that’s on demand, per-usage charges. Which opens it up.

The big 3 are pretty much unsuitable. Amazon’s still aiming toward companies that need to spin up and down instances – and quickly prices themselves out looking at the alternatives. Even looking at the price of the heavy use reserved instances, that’s $51+ hourly fees/year for the smallest system (t2.micro with 1GB of RAM & a throttled CPU). But wait, that’s actually $4.25 + $2.16 = $6.41 per month (assuming the current $0.003/hour charge stays the same) which isn’t so bad.

Except I had a wonderful manager in the past who pointed out that it’s not the cost of the compute power that was the killer when it came to building a company on EC2, it’s the cost of the outbound bandwidth. Data out is still an extra fee. Oh, and disk space is charged per GB as well. The wonders of all those options!

Google Compute Engine and Microsoft Azure are pretty much the same. Google wants $6.05 for their lowest system (0.6GB of RAM & 1 “Virtual Core”, likely aka throttled CPU much like Amazon); Microsoft just states an estimated monthly price of $13/month for a A0 instance, for 1 core and 0.75GB of RAM. Here’s a nice difference though: It comes with 20GB of disk.

So the big 3 are out for me (I write, being able to look back after looking in other places). Let’s look at other providers, shall we?

Cloudy VPSes

So named because they are charge-per-use VPSes, but all the ancillary stuff is in the price like traditional VPSes.

I haven’t done a really extensive search for these type of providers. The biggest known one is DigitalOcean, and I found Vultr on Twitter. And then I found Atlantic.Net somewhere else. (I think a Hacker News thread on DigitalOcean, amusingly.)

The biggest thing for me recently was the sudden addition of Atlantic.net‘s lowest end system (www.atlantic.net/vps/vps-hosting/) – $0.99/month for 256MB of RAM, 10GB of disk and 1TB of transfer is pretty much unheard of. Making the ultra low price point generally available is entirely new, and it’s going to be interesting to see if anyone in the cloudy space matches it – it’s actually cheaper than traditional VPS providers for a small always on box. It’s definitely on my list to watch (and possibly get for things like a target for automatic MySQL backups).

Vultr appears to be trying to undercut DigitalOcean in the low end – More RAM/less space/same price for the cheapest box, same RAM/less space/cheaper for the next level up.

Atlantic.Net also appears to trying to undercut DigitalOcean, just with more space and bandwidth instead of RAM. And literally cents cheaper! ($4.97 vs $5? Mhmm…)

Meanwhile, DigitalOcean coasts along on brand recognition and testimonials of friends.

So cloudy VPSes are far better for the single always-on server usecase. How about traditional VPSes?

Traditional VPSes

Tradiational VPSes and I have an interesting history. Prior to finding LowEndBox, VPSes were super expensive compared to shared hosting, eg www.1and1.com/vps-hosting – I still have a client/friend hanging onto his $50/month 1and1 VPS that I cringe about.

Downside of traditional VPSes are that the better prices == prepay annually. No pay-as-you-go here.

My history:

Virpus.com for 2 Xen VPSes in 2012 – I think they are one of the few who are doing Xen as opposed to OpenVZ or even KVM

Hostigation in 2013 – for a personal project that never actually happened

WeLoveServers in September 2013 – basic server for $19/year

Crissic in early 2014 – Ostensibly better than WLS for $2/month that I hosted a client site on

Looking at the Virpus site, they’ve dropped their prices from $10/month to $5/month. I can’t remember what I paid for Hostigation, but I rarely even logged into that VPS.

WLS was at the very least decent. I used it mainly as an OpenVPN endpoint more than anything.

Crissic is good, but overly touchy on limits. I regularly get my SSH connection closed despite keepalives being set, and I can’t figure out why.

Overselling is a thing that happens though – CPU usage in particular is really limited, and suspensions happen. So I’m leery of moving my stuff to a traditional VPS provider without being able to test it. Especially if I just paid for a year of hosting.

New to me: RAM heavy VPSes

VPSDime (https://vpsdime.com/index.php) is a new-to-me provider. 6GB of RAM for $7 a month. Unfortunately, offset by only 30GB of storage and 2TB of bandwidth vs WeLoveServers’ 6GB/80GB/10TB for $19/month. Then again, I don’t really use that much storage or bandwidth.

Bonus: Storage VPSes!

The main reason I kept DreamHost for 2 years after I got annoyed with them was that their AUP allowed me to use RAW images on my photography site, and finding space for 500GB of photos was not cheap anywhere else. With the introduction of their new DreamObjects service (read: S3 clone), that changed, and my last reason for keeping them disappeared.

VPSDime has interesting storage VPS offers. 500GB of disk for $7 is $2 more than Amazon Glacier (not accounting for the fact that I’m unlikely to get to use all 500GB), but offset that against the fact that it’s online (no retrieval times) and I don’t have to pay retrieval fees. Maybe I’ll get to ditch one of my external drives…

Runner up: BuyVM – more established, but about double the price for the same amount of space

TL;DR

I’m inclined to see what Vultr’s 1GB/$7 month plan is like to host my personal stuff. Also thinking of getting a 1GB WLS VPS to be an OpenVPN endpoint/development server. Possibly Atlantic.Net‘s cheapest service instead, but an extra 58 cents per month isn’t going to break the bank and I have the advantage of not needing to give Atlantic.Net my credit card details. In the worst case I can walk away from WLS without worrying about my credit card being charged.

Yay for backups and Ansible making it possible to deploy a new server in minutes once I get the environment setup just right.

Main issue was naively assuming that grub was using the correct initramfs that was built. I installed Fedora, then did a yum update kernel to generate the initramfs image. But that didn’t work, so I tried various other things, not realising that the bugged initramfs image was still there, and because it was newer, GRUB was automatically using that instead of the initramfs that I was testing.

Other issue was selinux – the context of files no longer matched (oddly enough). Not sure how/why this happened, but it ended up that I couldn’t log in – GDM login screen never came up, and logging into a console just dumped me right back to the blank console login. Had to boot into single user mode (append “single” to the kernel boot line in grub) and found out it was selinux issues when I got avc denial errors about /bin/sh not having permissions to execute. set enforce=0 fixed this.

superuser.com/a/368640/35094 appears to suggest that I could boot grub on the internal drive and hand off control to the grub on the SD Card, which means that I can get SD Cards for different distros and not have to worry about keeping copies of the various vmlinuz/initramfses on the hard drive /boot. Which would be the ideal situation.

Or how I spent 3 cents on Digital Ocean to play MvM with my friends for 2 hours.

Maybe the MvM servers were having issues, but 4 different people trying to create a game didn’t work (or at least TF2 kept on saying ‘connection error’ – for everyone.

So I decided to try and spin up a server, like I used to do on EC2, except I decided to use the $10 of credit from Digital Ocean that I got when signing up, simply because Digital Ocean seemed a lot easier to use than Amazon’s Spot Instances.

Used Digital Ocean’s 1GB/1CPU node ($0.015/hour) in the Singapore location, no complaints about slowness/ping issues from the people in Singapore/Japan, but my ping in Ontario was ~330ms.

Why Digital Ocean: In terms of money, 3 cents a week isn’t going to kill me. But I still have some Amazon credit, so it’d be nice to use that up first.

Looking at the prices (as of Jul 20), Digital Ocean is definitely cheaper than Amazon – the cheapest instance that looks like it would work is the t2.small instance, and that’s 4 cents an hour. (I’m pretty sure a t2.micro instance won’t be good enough) More interestingly, a m3.medium instance is ~10 cents an hour.

The m3.medium instance is interesting because it has a spot instance option – and the price when I checked it was 1.01 cents/hour. However, I’m pretty sure the OS install + the TF2 files would be larger than the 4GB of storage assigned to the instance, so I’d also need an additional EBS volume, say ~5GB. Those are $0.12/GB/month, so for 5GB for ~2 hours, so the cost would be pretty much negligible.

However, there is one final cost: data transfer out. Above 1GB, Amazon will charge 19 cents for a partial GB. Assuming I play 4 weekends a month, I’m pretty sure the server will send more than 1GB of data (exceeding the free data transfer tier), so I’d be charged the 19 cents. Averaging this out over 4 weekends, I’d get charged ~5 cents a weekend. Thus, even with the actual compute cost being lower, I’d still get charged more than double on EC2 than Digital Ocean.

But this is still only 7 cents a weekend.

The $10 of credit from Digital Ocean will last me approximately 6 years of weekend playing, assuming no price changes. I have ~$15 of Amazon credit, so it looks like I’ll get 10 years of TF2 playing in – and I’m pretty sure we’ll have moved onto a new game long before then.

Correcting errors in the Master File Table (MFT) mirror.
Correcting errors in the master file table's (MFT) BITMAP attribute.
Correcting errors in the Volume Bitmap.
Windows has made corrections to the file system.

My drive is back! And seemingly OK! I’m celebrating by setting up a Python script to recursively run through my current photo backup and the drive and compare file checksums.

How I did it

The key thing was to isolate the drive and not use it. If I hadn’t done that, it would have been utterly unrecoverable.

I also remembered the layout of the drive: Exactly 500GB on the first partition, and the rest of the disk as an NTFS partition. The first partition was a member of a RAID5 set, which had LVM setup on top of it.

I had used GParted to extend the NTFS partition backwards, to get the extra 500GB. However, this failed for… some reason. I’m not too clear.

TestDisk wasn’t successful – It identified the LVM partition, then promptly skipped everything between the LVM header and where the LVM header said the partition ended. Which meant it skipped to the middle of the drive, since the RAID5 set was 1TB in size. And thus TestDisk refused to restore the partition, because it doesn’t make sense that a 2 TB drive has 2 partitions which take up more than that.

Having tried a bunch of methods in the 1.5 years+ and failing each time, I decided to finally go all the way and wipe out the (screwed up) partition table and recreate it. I didn’t know the original commands run to create the partition setup, so I ended up booting into a Fedora 13 Live image, and doing the ‘Install to Hard Disk’ option, selecting the disk as the install target. I was worried because it wouldn’t allow me to create a partition without also formatting it (Kind of makes sense…), so I terminated the install process after seeing “Formatting /dev/sdd1 as ext4″ – in other words, the first partition was being formatted.

I then turned to fdisk to create the partition, selecting the defaults which should have extended the partition to the end of the disk. However, there was some disagreement on what consistuted the end of the disk, leaving me with ~2MB of unallocated space. When I created the partition in Windows, it went all the way to the end of the disk. What this meant is that I ended up with a sector mismatch count (along the lines of “Error: size boot_sector 2858446848 > partition 2858446017″).

So I had a semi-working drive, just with a number screwed up. And what edits numbers on a drive? A hex editor, that’s what. So it was off to edit the NTFS boot sector, and the MBR. I had correct looking numbers from TestDisk’s analysis, so I plugged those in, and since I had the hex editor opened, I wiped out the LVM header at the same time.

Turned out wiping the LVM header was an excellent idea, because TestDisk then found the NTFS boot sector, and allowed me to restore it:

After that, the disk still wouldn’t mount in Windows, but chkdsk at least picked it up. After letting chkdsk run overnight, I got my drive from August 2012 back, with (as far as I can tell) no data loss whatsoever.

We assume everything will go well, but we’re actually crap at estimating

We confuse progress with effort

Because we’re crap at estimating, managers are also crap at estimating

Progress is poorly monitored

If we fall behind, natural reaction is to add more people

Overly optimistic:

Three stages of creation: Idea, implementation, interaction

Ideas are easy, implementation is harder (and interaction is from the end user). But our ideas are flawed.

We approach a task as a monolithic chunk, but in reality it’s many small pieces.

If we use probability and say that we have a 5% chance of issues, we would budget 5% because it’s one monolithic thing.

But the real situation is that each of the small tasks has a 5% probability of being delayed. Thus, 0.05^n

Oh, and our ideas being flawed? Yeah… virtually certainty that the 5% will be used.

Progress vs effort:

Wherein the man-month fallacy is discussed. It comes down to:

Adding people works only if they’re completely independent and autonomous. No interaction means assign a smaller portion, which equates to being done faster

If you can’t split it up tasks, it’s going to take a fixed amount of time. Example in this case is childbearing – you can’t shard the child across multiple mothers. (“Unpartitionable task”)

Partitionable w/ some communication overhead – pretty much the best you can do in Software. You incur a penalty when adding new people (training time!) and a communication overhead (making sure everyone knows what is happening)

Estimation issues:

Testing is usually ignored/the common victim of schedule slippage. But it’s frequently the time most needed because you’re finding & fixing issues with your ideas.

Recommended time division is 1/3 planning, 1/6 coding, 1/4 unit tests, 1/4 integration tests & live tests (I modified the naming)
Without checkins, if the schedule slips, people only know towards the end of the schedule, when the product is almost due. This is bad because a) people are preparing for the new thing, and b) the business has invested on getting code out that day (purchasing & spinning up new servers, etc)

Better estimates:

When a schedule slips, we can either extend the schedule, or force stuff to be done to the original timeframe (crunch time!) Like an omelette, devs could increase intensity, but that rarely works out well

It’s common to schedule to an end-user’s desired date, rather than going on historical figures.

Managers need to push back against schedules done this way, going instead for at least somewhat data-based hunches instead of wishes

Fixing a schedule:

Going back to progress vs effort.

You have a delayed job. You could add more people, but that rarely turns out well. Overhead of training new people & communication takes it’s toll, and you end up taking more time than if you just stuck with the original team

Recommended thing is to reschedule; and add sufficient time to make sure that testing can be done.

Comes down to the maximum number of people depends on the number of independent subtasks. You will need time, but you can get away with fewer people.

The central idea, that one can’t substitute people for months is pretty true. I’ve read things that say it takes around 3 months to get fully integrated into a job, and I’ve found that to be largely true for me. (It’s one of my goals for future co-op terms to try and get that lowered.)

The concept of partitioning tasks makes sense, and again it comes back to services. If services are small, 1 or 2 person teams could easily take care of things, and with minimal overhead. When teams start spiraling larger, you have to be better at breaking things down into small tasks, so you can assign them easily, and integrate them (hopefully) easily. It seems a bit random, but source control helps a lot here.

Estimation is tricky for me, and will continue to be, since it only comes from experience – various ‘rules of hand’ that I’ve heard include take the time that you’ll think you’ll need, double it, then double it again.

But it’s a knock-on effect – I estimate badly, tell my manager, he has bad data, so he estimates wrongly as well… I’ve heard stories that managers pad estimates. That makes sense, especially for new people. I know estimates of my tasks have been wildly off. Things that I thought would take days end up taking a morning. Other things like changing a URL expose a recently broken dependency, and then you have to fix that entire thing… yeah. 5 minute fix became afternoon+. One thing which I’ll try to start doing is noting down how ling I expect things to take me, and then compare it at the end to see whether or not I was accurate. Right now it’s a very handwavey “Oh I took longer/shorter than I wanted, hopefully I’ll remember that next time!”