2016 journal

I’m experimenting with some very basic semantic markup. I noticed that some sites have names rather than URLs in their Google search results. It turns out that this is part of Google’s push for publishers to add structured data markup to their sites. One of the simplest of these is metadata about the site itself: the WebSite schema. This can include a friendly name and also a search engine, which Google will show as part of the search results.

My initial attempts to add the search engine failed. I dutifully created a custom search engine (as per Google’s instructions) but the metadata markup failed to validate because the domains didn’t match. So I did something crazy: I created a forwarding rule, forwarding search.nimblemachines.com to cse.google.com, substituted my domain in the CSE URL, and put this into the site’s metadata markup. Miraculously, it validated!

The redirection (of the top-level index page to a journal page) is now gone. It created more problems than it solved.

I initially implemented it because it seemed like people were bookmarking the site root hoping to bookmark the content that was there right then, but because the site root dynamically mirrors the current year’s journal page, a year later their bookmark would point to totally different content. To forestall this potential confusion, I added code forcing a redirect to the current journal page; the user then bookmarks that specific journal page, which will remain valid in coming years.

But what about someone who actually wants to bookmark the site itself, and wants to see the new, dynamic home page, not the 2016 version of it? That was tricky. Or maybe impossible.

The “canonical url” meta tag of the site root now points to itself, not to the journal page it mirrors, and I’ve added a “noindex” meta tag to (hopefully) prevent search engines from suggesting that people visit the site root directly. Casual visitors will end up on pages, not on the root, and so confusion should be minimal. But people who, for whatever reason, really want to link to the site root can do so again.

I’m pretty happy. Yesterday I breathed new life into my Nexus 7 (2012). It’s a snappy device again – just like when it was new!

After suffering for three weeks or so with Lollipop (5.1.1) on the Nexus 7, I rolled back to Jelly Bean (4.3), and the difference was astonishing.

While running Lollipop it was not unusual to wait seconds – even tens of seconds – after touching the screen before something happened. It was truly like watching paint dry, and probably the slowest, and most infuriating, computing experience of my life! (In contrast, the best and slickest probably remains the brief period that I owned an HP Chromebook 14 – the original Haswell version.)

This is not the first time I’ve reflashed my Nexus 7 back to an earlier release. (Back-flashing? Flash-backing?). But it’s the first time I’ve gone all the way back to Jelly Bean, and this time I won’t succumb to the belief that Things Will Be Better if I update (instead, things always got much worse).

I dug this out of my “sticky notes” page, which I just deleted (part of an ongoing project to remove cruft from this site):

A nice essay by Kyle Gann about John Cage, Henry Cowell, Conlon Nancarrow, Harry Partch, Henry Brant (spatial music), and Trimpin. If you are interested in truly “out there” musical visionaries, this might be for you.

The few people who are reading this site doubtless arrive here via web search; I don’t think anyone scans the “recent changes” page for new stuff. (And anyway, most additions of interest will be announced here.)

If you were a dedicated “recent changes” junkie, write me via the comments link in the footer and I’ll consider reversing this change. But I’m not expecting a flurry of email. ;-)

I’ve been trying to make sure that this site gets crawled and indexed reasonably well. I’ve got my robots meta tags, I’ve submitted a sitemap, and bing.com still says the same thing: that there are 22 pages on my site (the number is closer to 150), and that http://www.nimblemachines.com/all-pages/ has some kind of redirect error. Of course, before I submitted a sitemap, that file was the de facto list of all the live pages, and a great resource for Google et al: by crawling that one page they could find everything interesting on the site.

When submitting a sitemap to Google Webmaster Tools, you get immediate feedback: after publishing the file and pointing Google to it, seconds later it says “successfully parsed 145 URLs” or some such.

On Bing Webmaster Tools, the same process is rather unrewarding. After submitting the link to the sitemap file, the interface just shows “pending”. Later, it says “0 URLs”. I just re-submitted my sitemap (for the third or fourth time) but I’m not expecting much.

Why is Bing so broken?

It also means that Yahoo (if anyone still uses it) is also broken – they share crawling metadata.

I’m close to no longer caring, but I’m surprised that Bing is so messed up.

I just read two blog posts about Ansible vs Salt (by Jens Rantil and Ryan Lane). The authors came to the same conclusions: Salt’s community was much better about bug fixes and accepting contributions; the Salt docs, while disorganized, were very deep and thorough; Salt was generally much faster; and Salt’s built-ins made things easier. Also, while Salt was initally more difficult, it paid off to stick with it.

I’m not sure exactly why I was reading about these tools. I’m not in the biz of “standing up” cloud servers or services. But I do find the whole thing fascinating. Last year I dug into Docker a bit, to try to understand it, and ended up discovering CoreOS – which appealed to me in part because their update mechanism was borrowed from ChromeOS, where it works very well. While CoreOS was originally written to work with Docker, the missions of the teams diverged; the CoreOS folks created their own container runtime, rkt (aka “Rocket”). I never tried to actually do anything with CoreOS, rkt, or Docker, however.

Digging into the idea of containers led to me to a blog post by Poul-Henning Kamp, a longtime FreeBSD committer, about inventing jails and how happy he was to see to see the idea taking off in the form of containers on Linux. Curiously, he committed the jails support into FreeBSD in April 1999. Linux containers only started to take off in 2013 or so.

I’ve been interested in the fundamental problem of software deployment for a long time; I started installing (and upgrading!) operating systems in 1996. I’ve installed Linux, Windows, and FreeBSD more times than I can count, and struggled with various package managers. The imperative approach of overwriting the OS and applications in-place – whether via a package manager or by doing BSD’s make installworld – always seemed like a mistake. It’s very error-prone. Even a “successful” upgrade of a package can, by changing an underlying library, render another package unusable. This is one reason that the CoreOS update feature appealed to me – it is atomic – but CoreOS depends on containers to solve the rest of the deployment problem, and it’s not completely clear (to anyone!) how to do it correctly.

Frustration with all of this “culminated” in my discovery of Eelco Dolstra’s PhD thesis – The purely functional software deployment model – and the Nix package manager project, which I now desultorily use. I have a few programs on my Mac managed via the Darwin port of Nix, and I run NixOS – a Linux distro based on Nix – on a very creaky old laptop. Nix seems like the most elegant solution to this problem – even though it often seems beyond my ken. The Nix community have their own cloud deployment tool, NixOps, for deploying sets of machines running NixOS.

Someday I’ll write more about Nix and NixOS – they are both deep subjects, and I have a love-hate relationship with both!

And perhaps someday I’ll do a real cloud deployment and can talk it about it in more than theoretical terms. ;-)

Sometimes Google really pisses me off by doing something obviously stupid. The latest? They decided to remove the Archived History database in Chrome, which stored browsing history that is older than 90 days. The “logic” of this? That the user is better off without this history, since the UI in Chrome does nothing useful (and sometimes does something confusing) with it. But instead of leaving existing Archived History files alone, they went ahead and deleted them, without asking the user.

In addition to this, the online help is now really confusing. It suggests that history more than 3 months old is still available somewhere, but the “instructions” (I use the word advisedly) for accessing it tell you to basically open the history page using the “hamburger” menu. Which shows the first few entries of the last 90 days of history. If you were patient enough to click the “Older” link until you got RSI, you’d find that nothing older than 90 days had been saved.

I discovered this only after spending the time to understand how the history database works, how to export it from sqlite into a couple of text files (urls and visits), and how to read and make sense of those. I was hoping to recreate my browsing history from last year – in a more usable form than the browser presents (ahem) – so I could dig up the interesting things I was reading about, and follow the paths of some of the “rabbit holes” I had gone down. I had been careful (so I thought) not to delete my history, for exactly this reason.

But then Google, without asking me, quietly deleted it. Thanks a lot, assholes!

I’m misrepresenting the situation – slightly. The more astute (pedantic?) among you will notice that the discussion linked to above is dated April 2014, so my Archived History disappeared then. The 2015 history that I thought I was accumulating was a mirage. It never existed. I was too naive and trusting of Chrome’s “benevolence” to realize that, because Chrome’s UI for browsing history is desperately broken, it only stores 90 days of history, rather than keeping it until the user deletes it.

Shouldn’t this be in bold print somewhere? If you asked a hundred Chrome users, how many would know that Chrome quietly and continually loses their older browsing history? Ten? Five? One?

What if the engineers had instead decided to keep this (possibly valuable) data around, and simply built an API to access it? Just because they are too lame to figure out how to use and display it doesn’t mean that someone else won’t come along and write a brilliant extension that does.

I just found what I think is a bug in Github Pages. I’m not entirely sure only because the behaviour that I saw was rather confusing. I’ll explain.

This web site is hosted using Github Pages, which I set up in the normal way: I created a repository with a special name under my user account, put some special files there (CNAME, 404.html, etc), set up a few DNS entries for my domain, and by “magic” this website appeared. I also created a tiny test site for my muFORTH project, also in the normal way: I created a gh-pages branch in the project, and put a single file there: index.html.

Because of how Github Pages maps project names to URLs, the muFORTH project URL is the same as the muFORTH page URL of this site, and that collision caused some odd behaviour – or that, at least, is my conjecture.

I found that some random subdirectories of nimblemachines.com/muforth/ would return a generic Github Pages 404 error, while others returned my custom 404 page, page not found. I tried doing this on three machines: a Linux laptop (using lynx!!), my Mac (Chrome), and my Nexus 7 (Chrome for Android). Actually I tried a fourth way too: using curl. What’s crazy is that I saw different behaviour in each case!

I figured that the problem could only be the gh-pages branch of muforth, so I deleted it – but the problem remained. However, a few hours later, I only get my 404 page no matter what URLs I try. The immediate failure to fix the problem was probably due to caching by Github’s CDN.

So, problem solved, but some time needed to pass.

I wanted to quickly write this up in case anyone else is having bizarre issues with this kind of URL collision on Github Pages between a personal site and a project site. I hope this helps.

I realize now that my previous solution to the problem of redirecting visitors to the current “journal” page (in 2016, the page you are now reading) isn’t ideal.

For one thing, non-Javascript-enabled clients – and that might include search-engine crawlers! – visiting www.nimblemachines.com will see nothing. Also, I am using Google Webmaster Tools and they need to see a “verify” meta tag, so they know that I own this domain. That tag was missing from the previous, simple index.html.

My first solution to fix this was to create a pseudo-page, which I called The tip of the iceberg, and which basically just contained a link to the current journal, and the normal footer links (all pages, recent changes, home). This was better – it had the right <meta> tags – but for clients that were Javascript-enabled, things were ugly. A page would flash by – you would briefly see the title “The tip of the iceberg” – before redirecting.

My second try – the current implementation – was to create a top-level index.html that is essentially a copy of the current journal page, but with a few changes:

the canonical link element points to the top-level, not the journal page; this way crawlers know that these are two different pages;

I want to “announce” two neat new features of this site: self-linking headings, and automatic current-year redirection. What do these mean in English?

Any heading on any page is a “self-link”: if clicked on, the heading moves to the top of the browser page, and the heading’s “fragment id” becomes the new URL in the address bar, which can then be easily shared or bookmarked. The way the fragment id’s are constructed – leaving out the initial year if found on a journal page – makes them look like blog URLs. (On non-journal pages the complete heading text is used as the fragment id.)

In addition to making linking and sharing easy, I want the “front page” of the site to be the journal for the current year. My previous solution was to symlink (on the server side) the toplevel index.html to the current journal page, say 2016. The problem with this is that if bookmarked or shared, in a year the URL will be wrong. Instead of pointing to 2016’s journal page it will point to 2017’s. And any heading self-links (as described above) will also fail.

The solution is for visitors to get redirected. This already happens to the “apex” domain. Github Pages redirects nimblemachines.com to www.nimblemachines.com. But it isn’t possible (as far as I can tell) to have Github Pages redirect to a page on the site. I ended up writing a totally simple new index.html that basically consists of one line of Javascript. The current index.html looks like this:

Here is something for everyone who is annoyed by telemarketers. At least two people have created “bots” that can answer a call, and using simple logic and some prerecorded audio, can “pretend” to listen and respond to the caller. Both projects have recorded some of these calls, and they are priceless.

Ok, one more quick link before the day expires... I found an interesting talk by Bryan Cantrill, an engineer from Sun who invented DTrace, worked on Solaris (and OpenSolaris), briefly worked at Oracle (after the Sun acquisition), and now works for Joyent. His talk – about illumos, and why it was forked from OpenSolaris – is funny and cutting. If you’re interested in operating systems and the engineers who love them, you should enjoy this.

In my ongoing quest to make the web a better place, I exported my bookmarks from Delicious (which has since become disgusting), deleted my account, and ran the weird HTML that Delicious exported through a custom Lua filter, to create a simple page, bookmarks from delicious.

It’s unedited, at the moment. Some of the notes have been truncated (esp the early ones – I think the site had some database issues or something). There are going to be broken links – and links I might be embarrassed about today – but there are a lot of gems among the junk. It also stands as a history of what I was researching (and thinking about) over a four-year period.

Just like last year, I’ve totally, painfully, and egregiously neglected this site. Not only have I failed to write about the random things I’ve been thinking about and exploring, but the site itself is still broken and ugly – the lovely huerta tipográficafont I carefully chose is unreadable, the site is useless on mobile devices, etc.

I’d love to see 2016 be different. I’m thinking I’d like to have the journal pages be kind of a “microblogging” thing: random sharing of links, books, ideas, and other small things I happen across that I find interesting. I think what stops me from writing more than I do is the sense of commitment: a longer piece takes time and focus. So what happens is that a bunch of things that I’ve run across that are cool, interesting – even important! – get lost in the shuffle, and I never write about them. I have lists of things from 2014 that I still need to get to. Ugh.

So, I thought I’d start out with something that found me rather randomly: Unicode. I mean, I’ve known about Unicode for ever, but I’d never delved into it, and all that “wide character” support in Linux, BSD, and OSX just seemed like a lot of hooey. I’m not interested in writing in Japanese on this site, so why should I care? I had a hunch that all those sites where apostrophes show up as â€™ had something to do with Unicode, but I never bothered to understand why.