I planned to write about this for quite some time now (last time end of April), and now, thanks to the GitHub acquisition by Microsoft and all that #movingtogitlab traffic, I am finally sitting here and writing these lines.

This post is not about Microsoft, GitHub or GitLab, and it's neither about any other SaaS solution out there, the named companies and products are just examples. It's more about "do you really want to self-host?"

Every time a big company acquires, shuts down or changes an online service (SaaS - Software as a Service), you hear people say "told you so, you should better have self-hosted from the beginning". And while I do run quite a lot of own infrastructure, I think this statement is too general and does not work well for many users out there.

Software as a Service

There are many code-hosting SaaS offerings: GitHub (proprietary), GitLab (open core), Pagure (FOSS) to name just a few.
And while their licenses, ToS, implementations and backgrounds differ, they have a few things in common.

Benefits:

(sort of) centralized service

free (as in beer) tier available

high number of users (and potential collaborators)

high number of hosted projects

good (fsvo "good") connection from around the globe

no maintenance required from the users

Limitations:

dependency on the interest/goodwill of the owner to continue the service

some features might require signing up for a paid tier

Overall, SaaS is handy if you're lazy, just want to get the job done and benefit from others being able to easily contribute to your code.

First, many contributions happen because someone sees something small and wants to improve it, be it a typo in the documentation, a formatting error in the manpage or a trivial improvement of the code. But these contributions only happen when the complexity to submit it is low. Nobody not already involved in OpenStack would submit a typo-fix to their Gerrit which needs a Launchpad account… A small web-edit on GitHub or GitLab on the other hand is quickly done, because "everybody" has an account anyways.

Second, while it is called "self-hosting", in most cases it's more of a "self-running" or "self-maintaining" as most people/companies don't own the whole infrastructure stack.

Let's take this website as an example (even though it does not host any Git repositories): the webserver runs in a container (LXC) on a VM I rent from netcup. In the past, netcup used to get their infrastructure from Hetzner - however I am not sure that this is still the case. So worst case, the hosting of this website depends on me maintaining the container and the container host, netcup maintaining the virtualization infrastructure and Hetzner maintaining the actual data center. This also implies that I have to trust those companies and their suppliers as I only "own" the VM upwards, not the underlying infrastructure and not the supporting infrastructure (network etc).

SaaS vs Hosted

There is no silver bullet to that. One important question is "how much time/effort can you afford?" and another "which security/usability constraints do you have?".

Hosted for a dedicated group

If you need a solution for a dedicated group (your work, a big FOSS project like Debian or a social group like riseup), a hosted solution seems like a good idea. Just ensure that you have enough infrastructure and people to maintain it as a 24x7 service or at least close to that, for a long time, as people will depend on your service.

The same also applies if you need/want to host your code inside your network/security perimeter.

Hosted for an individual

Contrary to a group, I don't think a hosted solution makes sense for an individual most of the time. The burden of maintenance quite often outweighs the benefits, especially as you'll have to keep track of (security) updates for the software and the underlying OS as otherwise the "I own my data" benefit becomes "everyone owns me" quite quickly. You also have to pay for the infrastructure, even if the OS and the software are FOSS.

You're also probably missing out on potential contributors, which might have an account on the common SaaS platforms, but won't submit a pull-request for a small change if they have to register on your individual instance.

SaaS for a dedicated group

If you don't want to maintain an own setup (resources/costs), you can also use a SaaS platform for a group. Some SaaS vendors will charge you for some features (they have to pay their staff and bills too!), but it's probably still cheaper than having the right people in-house unless you have them anyways.

You also benefit from a networking effect, as other users of the same SaaS platform can contribute to your projects "at no cost".

Saas for an individual

For an individual, a SaaS solution is probably the best fit as it's free (as in beer) in the most cases and allows the user to do what they intend to do, instead of shaving yaks and stacking turtles (aka maintaining infrastructure instead of coding).

And you again get the networking effect of the drive-by contributors who would not sign up for a quick fix.

Selecting the right SaaS

When looking for a SaaS solution, try to answer the following questions:

Do you trust the service to be present next year? In ten years? Is there a sustainable business model?

Do you trust the service with your data?

Can you move between SaaS and hosted easily?

Can you move to a different SaaS (or hosted solution) easily?

Does it offer all the features and integrations you want/need?

Can you leverage the network effect of being on the same platform as others?

Selecting the right hosted solution

And answer these when looking for a hosted one:

Do you trust the vendor to ship updates next year? In ten years?

Do you understand the involved software stack and willing to debug it when things go south?

Can you get additional support from the vendor (for money)?

Does it offer all the features and integrations you want/need?

So, do you really want to self-host?

I can't speak for you, but for my part, I don't want to run a full-blown Git hosting just for my projects, GitHub is just fine for that. And yes, GitLab would be equally good, but there is little reason to move at the moment.

And yes, I do run my own Nextcloud instance, mostly because I don't want to backup the pictures from my phone to "a cloud". YMMV.

Background

Ansible, while being agent-less, is not interpreter-less and requires a working Python installation on the target machine. Up until Ansible 2.3 the minimum Python version was 2.4, which is available in EL5. Starting with Ansible 2.4 this requirement has been bumped to Python 2.6 to accommodate future compatibility with Python 3. Sadly Python 2.6 is not easily available for EL5 and people who want/need to manage such old systems with Ansible have to find a new way to do so.

First, I think it's actually not possible to effectively manage a RHEL5 (or any other legacy/EOL system). Running ad-hoc changes in a mostly controlled manner - yes, but not fully manage them. Just imagine how much cruft might have been collected on a system that was first released in 2007 (that's as old as Debian 4.0 Etch). To properly manage a system you need to be aware of its whole lifecycle, and that's simply not the case here. But this is not the main reason I wanted to write this post.

Possible solutions

liquidat's article shows three ways to apply changes to an EL5 system, which I'd like to discuss.

Use the power of RAW

Ansible contains two modules (raw and script) that don't require Python at all and thus can be used on "any" target. While this is true, you're also losing about every nice feature and safety net that Ansible provides you with its Python-based modules. The raw and script modules are useful to bootstrap Python on a target system, but that's about it. When using these modules, Ansible becomes a glorified wrapper around scp and ssh. With almost the same benefits you could use that for-loop that has been lingering in your shell history since 1998.

Using Ansible for the sake of being able to say "I used Ansible"? Nope, not gonna happen.

Also, this makes all the playbooks that were written for Ansible 2.3 unusable and widens the gap between the EL5 systems and properly managed ones :(

Upgrade to a newer Python version

You can't just upgrade the system Python to a newer verion in EL5, too many tools expect it to be 2.4. But you can install a second version, parallel to the current one.

There are just a few gotchas with that:
1. The easiest way to get a newer Python for EL5 is to install python26 from EPEL. But EPEL for EL5 is EOL and does not get any updates anymore.
2. Python 2.6 is also EOL itself and I am not aware of any usable 2.7 packages for EL5.
3. While you might get Python 2.6 working, what's about all the libs that you might need for the various Ansible modules? The system ones will pretty sure not work for 2.6.
4. (That's my favorite) Are you sure there are no (init) scripts that check for the existence of /usr/bin/python26 and execute the code with that instead of the system Python? Now see 3, 2 and 1 again. Initially you said "but it's only for Ansible", right?
5. Oh, and where do you get an approval for such a change of production systems anyways? ;)

And yet, I still think that's the sanest solution available. Just make sure you don't use any modules that communicate with the world (which includes the dig lookup!) and only use 2.3 on an as-needed basis for EL5 hosts.

Conclusion

First of all, please get rid of those EL5 systems. The Extended Life-cycle Support for them ends in 2020 and nobody even talks about support for the hardware it's running on. Document the costs and risks those systems are bringing into the environment and get the workloads migrated, please. (I wrote"please" twice in a paragraph, it must be really important).

I called this post "Building Legacy 2.0" because I fear that's a recurring pattern we'll be seeing. On the one hand legacy systems that need to be kept alive. On the other the wish (and also pressure) to introduce automation with tools that are either not compatible with those legacy systems today or won't be tomorrow as the tool develop much faster than the systems you control using them.

And by still forcing those tools into our legacy environments, we just add more oil to the fire. Instead of maintaining that legacy system, we now also maintain a legacy automation stack to pseudo-manage that legacy system. More legacy, yay.

I am using Chromium/Chrome as my main browser and I also use its profile/people feature to separate my work profile (bookmarks, cookies, etc) from my private one.

However, Chromium always opens links in the last window (and by that profile) that was in foreground last. And that is pretty much not what I want. Especially if I open a link from IRC and it might lead to some shady rick-roll page.

Thankfully, getting the list of available Chromium profiles is pretty easy and so is displaying a few buttons using Python.

To do so I wrote cadmium, which scans the available Chromium profiles and allows to start either of them, or Chromium's Incognito Mode. On machines with SELinux it can even launch Chromium in the SELinux sandbox.

While visiting our Raleigh office, I managed to crack the glass on the screen of my OnePlus 3. Luckily it was a clean crack from the left upper corner, to the right lower one. The crack was not really interfering with neither touch nor display, so I had not much pressure in fixing it.

eBay lists new LCD sets for 110-130€, and those still require manual work of getting the LCD assembly out of the case, replacing it, etc. There are also glass-only sets for ~20€, but these require the complete removal of the glued glass part from the screen, and reattaching it, nothing you want to do at home. But there is also still the vendor, who can fix it, right? Internet suggested they would do it for about 100€, which seemed fair.

As people have been asking about the support experience, here is a quick write up what happened:

Opened the RMA request online on Sunday, providing a brief description of the issue and some photos

Monday morning answer from the support team, confirming this is way out of warranty, but I can get the device fixed for about 93€

After confirming that the extra cost is expected, I had an UPS sticker to ship the device to CTDI in Poland

UPS even tried a pick-up on Tuesday, but I was not properly prepared, so I dropped the device later at a local UPS point

It arrived in Poland on Wednesday

On Thursday the device was inspected, pictures made etc

Friday morning I had a quote in my inbox, asking me to pay 105€ - the service partner decided to replace the front camera too, which was not part of the original 93€ approximation.

Paid the money with my credit card and started waiting.

The actual repair happened on Monday.

Quality controlled on Tuesday.

Shipped to me on Wednesday.

Arrived at my door on Thursday.

All in all 9 working days, which is not great, but good enough IMHO. And the repair is good, and it was not (too) expensive. So I am a happy user of an OnePlus 3 again.

Well, almost. Before sending the device for repairs, had to take a backup and wipe it. I would not send it with my, even encrypted, data on it. And backups and Android is something special.

Android will backup certain data to Google, if you allow it to. Apps can forbid that. Sadly this also blocks non-cloud backups with adb backup. So to properly backup your system, you either need root or you create a full backup of the system in the recovery and restore that.

I did the backup using TWRP, transferred it to my laptop, wiped the device, sent it in, got it back, copied the backup to the phone, restored it and... Was locked out of the device, it would not take my password anymore. Well, it seems that happens, just delete some files and it will be fine.

It's 2017, are backups of mobile devices really supposed to be that hard?!

For quite some time I wanted to have tuned in Debian, but somehow never motivated myself to do the packaging.
Two weeks ago I then finally decided to pick it up (esp. as mika and a few others were asking about it).

There was an old RFP/ITP 789592, without much progress, so I did the packing from scratch (heavy based on the Fedora package).
gustavo (the owner of the ITP) also joined the effort, and shortly after the upstream release of 2.8.0 we had tuned in Debian (with a very short time in NEW, thanks ftp-masters!).

I am quite sure that the package is far from perfect yet, especially as the software is primary built for and tested on Fedora/CentOS/RHEL. So keep the bugs, suggestions and patches comming (thanks mika!).

TL;DR: DNS for golov.de and other (14) domains hosted on my infra was flaky from 15th to 17th of May, which may have resulted in undelivered mail.

Yeah, I know, I haven't blogged for quite some time. Even not after I switched the engine of my blog from WordPress to Nikola. Sorry!

But this post is not about apologizing or at least not for not blogging.

Last Tuesday, mika sent me a direct message on Twitter (around 13:00) that read „problem auf deiner Seite?“ or “problem on your side/page?”. Given side and page are the same word in German, I thought he meant my (this) website, so I quickly fired up a browser, checked that the site loads (I even checked both, HTTP and HTTPS! :-)) and as everything seemed to be fine and I was at a customer I only briefly replied “?”. A couple messages later we found out that mika tried to send a screenshot (from his phone) but that got lost somewhere. A quick protocol change later (yay, Signal!) and I got the screenshot. It said "<evgeni+grml@golov.de>: Host or domain name not found. Name service error for name=golov.de type=AAAA: Host found, but no data record of requested type". Well, yeah, that looks like an useful error message. And here the journey begins.

For historical nonsense golov.de currently does not have any AAAA records, so it looked odd that Postfix tried that. Even odder was that dig MX golov.de and dig mail.golov.de worked just fine from my laptop.

Still, the message looked worrying and I decided to dig deeper. golov.de is served by three nameservers: ns.die-welt.net, ns2.die-welt and ns.inwx.de and dig was showing proper replies from ns2.die-welt.net and ns.inwx.de but not from ns.die-welt.net, which is the master. That was weird, but gave a direction to look at, and explained why my initial tests were OK. Another interesting data-point was that die-welt.net was served just fine from all three nameservers.

Let's quickly SSH into that machine and look what's happening… Yeah, but I only have my work laptop with me, which does not have my root key (and I still did not manage to setup a Yubikey/Nitrokey/whatver). Thankfully my key was allowed to access the hypervisor, yay console!

Now let's really look. golov.de is served from from the bind backend of my PowerDNS, while die-welt.net is served from the MySQL backend. That explains why one domain didn't work while the other did. The relevant zone file looked fine, but the zones.conf was empty. WTF?! That zones.conf is autogenerated by Froxlor and I had upgraded it during the weekend to get Let's Encrypt support. Oh well, seems I hit a bug, damn. A few PHP hacks later and I got my zones.conf generated properly again and all was good.

This weekend, Bernd Zeimetz organized a BSP at the offices of conova in Salzburg, Austria.
Three days of discussions, bugfixes, sparc removals and a lot of fun and laughter.

We squashed a total of 87 bugs: 66 bugs affecting Jessie/Sid were closed, 9 downgraded and 8 closed via removals. As people tend to care about (old)stable, 3 bugs were fixed in Wheezy and one in Squeeze. These numbers might be not totaly correct, as were kinda creative at counting... Marga promised a talk about "an introduction to properly counting bugs using the 'Haus vom Nikolaus' algorithm to the base of 7".

Speaking of numbers, I touched the following bugs (not all RC):

#741806: pygresql: FTBFS: pgmodule.c:32:22: fatal error: postgres.h: No such file or directory
Uploaded an NMU with a patch. The bug was introduced by the recent PostgreSQL development package reorganisation.

#744229: qpdfview: FTBFS synctex/synctex_parser.c:275:20: fatal error: zlib.h: No such file or directory
Talked to the maintainer, explaining the importance of the upload and verifying his fix.

#744300: pexpect: missing dependency on dh-python
Downgraded to wishlist after verifying the build dependency is only needed when building for Wheezy backports.

#744917: luajit: FTBFS when /sbin is not in $PATH
Uploaded an NMU with a patch, which later was canceled due to a maintainer upload with a slightly different fix.

#742943: nagios-plugins-contrib: check_raid: wants mpt-statusd / mptctl
Analyzed the situation, verified the status with the latest upstream version of the ckeck and commented on the bug.

#732110: nagios-plugins-contrib: check_rbl error when nameserver available only in IPv6
Verify that the bug is fixed in the latest release and mark it as done.

#728087: thinkfan: Document how to start thinkfan with systemd
Apply Michael's patch to the Debian packaging.

#742515: blktap-dkms: blktapblktap kernel module failed to build
Upload an NMU with a patch based on the upstream fix.

#745598: libkolab: FTBFS in dh_python2 (missing Build-Conflicts?)
Upload an NMU with a patch against libkolab's cmake rules, tightening the search for Python to 2.7.

#745599: libkolabxml: FTBFS with undefined reference to symbol '_ZTVN5boost6detail16thread_data_baseE'
Upload an NMU with a patch against libkolabxml's cmake rules, properly linking the tests to the Boost libraries.

#746160: libcolabxml: FTBFS when both python2 and python3 development headers are installed
Filling the bug while working on #745599, then uploading an NMU with a patch against libkolabxml's cmake rules, tightening the search for Python to 2.7.

#714045: blcr-dkms: blcr module is not built on kernel 3.9.1
Checking the status of the bug upstream, and marking it as forwarded.

Let's assume you are a sysadmin and have to debug a daemon giving bad performance on one machine, but not on the other. Of course, you did not setup either machine, have only basic knowledge of the said daemon and would really love to watch that awesome piece of cinematographic art with a bunch of friends and a couple of beers. So it's like every day, right?

The problem with understanding running setups is that you often have to read configuration files. And when reading one is not enough, you have to compare two or more of them. Suddenly, a wild problem occurs: order and indentation do not matter (unless they do), comments are often just beautiful noise and why the hell did "that guy" smoke/drink/eat while explicitly setting ALL THE OPTIONS to their defaults before actually setting them as he wanted.

If you are using diff(1), you probably love to read a lot of differences, which are none in reality. Want an example?

are actually the same, at least for some parsers. XTaran suggested using something like wdiff or dwdiff, which often helps, but not in the above case. Others suggested vimdiff, which is nice, but not really helpful here either.

As there is a problem, and I love to solve these, I started a small new project: cfgdiff. It tries to parse two given files and give a diff of the content after normalizing it (merging duplicate keys, sorting keys, ignoring comments and blank lines, you name it). Currently it can parse various INI files, JSON, YAML and XML. That's probably not enough to be the single diff tool for configuration files, but it is quite a nice start. And you can extend it, of course ;)

When you run Puppet, it is very important to monitor whether all nodes have an uptodate catalog and did not miss the last year of changes because of a typo in a manifest or a broken cron-script. The most common solution to this is ascriptthatchecks/var/lib/puppet/state/last_run_summary.yaml on each node. While this is nice and easy in a small setup, it can get a bit messy in a bigger environment as you have to do an NRPE call for every node (or integrate the check as a local check into check_mk).

Given a slightly bigger Puppet environment, I guess you already have PuppetDB running. Bonuspoints if you already let it save the reports of the nodes via reports = store,puppetdb. Given a central knowledgebase about your Puppet environment one could ask PuppetDB about the last node runs, right? I did not find any such script on the web, so I wrote my own: check_puppetdb_nodes.

The script requires a "recent" (1.5) PuppetDB and a couple of Perl modules (JSON, LWP, Date::Parse, Nagios::Plugin) installed. When run, the script will contact the PuppetDB via HTTP on localhost:8080 (obviously configurable via -H and -p, HTTPS is available via -s) and ask for a list of nodes from the /nodes endpoint of the API. PuppetDB will answer with a list of all nodes, their catalog timestamps and whether the node is deactivated. Based on this result, check_puppetdb_nodes will check the last catalog run of all not deactivated nodes and issue a WARNING notification if there was none in the last 2 hours (-w) or a CRITICAL notification if there was none for 24 hours (-c).

As a fresh catalog does not mean that the node was able to apply it, check_puppetdb_nodes will also query the /event-counts endpoint for each node and verify that the node did not report any failures in the last run (for this feature to work, you need reports stored in PuppetDB). You can modify the thresholds for the number of failures that trigger a WARNING/CRITICAL with -W and -C, but I think 1 is quite a reasonable default for a CRITICAL in this case.

Using check_puppetdb_nodes you can monitor the health of ALL your Puppet nodes with a singe NRPE call. Or even with zero, if your monitoring host can access PuppetDB directly.

Some days ago I got myself a new shiny Samsung 840 Pro 256GB SSD for my laptop. The old 80GB Intel was just too damn small.

Instead of just doing a pvmove from the old to the new, I decided to set up the system from scratch. That is an awesome way to get rid of old and unused stuff or at least move it to some lower class storage (read: backup). One of the things I did not bother to copy from the old disk were my ~/Debian, ~/Grml and ~/Devel folders. I mean, hey, it's all in some kind of VCS, right? I can just clone it new, if I really want. Neither I copied much of my dotfiles, these are neatly gitted with the help of RichiH's awesome vcsh and a bit of human brains (no private keys on GitHub, yada yada).

After cloning a couple of my personal repos from GitHub to ~/Devel, I realized I was doing a pretty dumb job, a machine could do for me. As I already was using Joey's mr for my vcsh repositories, generating a mr config and letting mr do the actual job was the most natural thing to do. So was using Python Requests and GitHub's JSON API.