Unsubstantiated Opinions and Meaningless Blather

Main menu

The organization formerly known as “autarch-code” is now called “houseabsolute”. I think some folks may not have wanted to transfer a repo to an organization named “autarch-code”. The new name is hopefully a little less “all about Dave”. I also changed the picture, though I really miss the old one, because I thought it was hilarious. I’ve saved it here on this blog for posterity.

Am I insane? No, I’m not. Clearly. This is the product of a perfectly sane mind. Trust me.

If you have a lot of distributions, you may also have a lot of .travis.yml files. When I want to update one file, I often want to update all of them. For example, I recently wanted to add Perl 5.22 to the list of Perls I test with. Doing this by hand is incredibly tedious, so I wrote a somewhat grungy script to do this for me instead. It attempts to preserve customizations present in a given Travis file while also imposing some uniformity. Here’s what it does:

Finds all the .travis.yml files under a given directory. I exclude anything where the remote repo doesn’t include my username, since I don’t want to do this sort of blind rewriting with shared projects or repos where I’m not the lead maintainer.

Ensures I’m using the right repo for Graham Knop’s fantastic travis-perl helper scripts. These scripts let you test with Perls not supported by Travis directly, including Perl 5.8, dev releases, and even blead, the latest commit in the Perl repo. These helpers used to be under a different repo, and some of my files referred to the old location.

If possible, use --auto mode with these helpers, which I can do when I don’t need to customize the Travis install or script steps.

Make sure I’m testing with the latest minor version of every Perl from 5.8.8 (special-cased because it’s more common than 5.8.9) to 5.22.0, plus “dev” (the latest dev release) and “blead” (repo HEAD). If the distro has XS, it tests with both threaded and unthreaded Perls, otherwise we can just use the default (unthreaded) build. If the distro is not already testing against 5.8.8, this won’t be added, since some of my distro are 5.10+ only.

Add coverage testing with Perl 5.22 and allow blead tests to fail. There are all sorts of reasons blead might fail that have nothing to do with my code.

If possible, set sudo: false in the Travis config to use Travis’s container-based infrastructure. This is generally faster to run and way faster to start builds. If I’m using containers, I take advantage of the apt addon to install aspell so Test::Spelling can do its thing.

Clean up the generated YAML so the blocks are ordered in the way I like.

Feel free to take this code and customize it for your needs. At some point I may turn this into a real tool, but making it much more generic seems like more work than it’s worth at the moment.

In a discussion on #moose-dev today, ether made the following distinction:

author tests are expected to pass on every commit; release tests only need to pass just before release

I think this is a good distinction. It also means that almost every single “xt” type test you might think of should probably be an author test. The only one we can up with in #moose-dev that was obviously a release test was a test to check that Changes has content.

I’m sending PRs to various dzil plugins to move them to author tests, with the goal of being able to safely not run release tests under Travis.

During my Introduction to Go class last Thursday at YAPC::NA::2015, one of the class attendees, David Adler, asked a question along the lines of “why use Go?” That’s a good question, so here is my answer.

Let’s start by first talking about why we use Perl (or Ruby, Python, PHP, JS, etc.). Why use a dynamic language? There are a lot of reasons, but the basic answer is that these languages make it easy to get a system up and running quickly. They are easy to write, include a lot of useful features (regexes, IO, core libraries, etc.), they eliminate large classes of bugs, and generally get out of your way when coding. These languages perform well enough for many tasks, and so the fact that they are not as fast or memory efficient as they could be is not a concern.

But of course, sometimes speed and memory usage are a concern. I suspect that many dynamic language users reach for C or C++ when they need to optimize something. Here’s why …

In Perl, a basic scalar value is represented by a C struct called an SV (see perlguts for gory details). A quick check with Devel::Size tells me that a scalar containing the number 1 uses 24 bytes of memory on my system. A 3 byte string uses 42 bytes of memory. In a language like C, those values can use as little as 1 and 3 bytes respectively.

This isn’t an issue when dealing with hundreds or thousands of such values. The Perl program uses 24 times as many bytes for each integer, but when you’re just dealing with 5,000 integers, this only adds up to 120kib vs 5kib. However, once you start dealing with millions of values (or more), this can become a problem. The program has to allocate memory, usually doing many small allocations. What’s worse is that operations on these values are slower. Integer math in Perl goes through many more steps than in C. Again, for a small number of operations this isn’t a problem, but for millions or billions of operations, the cost becomes significant.

Of course, C and C++ have their own issues, including the difficulty of managing memory, the potential security holes, the segfaults, the double frees, and lots of other fun.

Enter Go. Go gives you a statically compiled language with the potential for carefully managing memory usage while also protecting you from the memory management bugs (and security holes) that C and C++ allow for.

So why use Go? I think that Go is a compelling option for any task that you’d do in C or C++ instead of a dynamic language. Go is fast to run, relatively easy to write, and comes with a pretty good set of core libraries. It gives you many of the niceties of a dynamic language while still offering memory efficiency and high speed.

As a huge plus, Go compiles down to static binaries that are incredibly easy to deploy. This will make your sysadmins or devops folks quite happy.

Of course, Go doesn’t replace C or C++ for all tasks. It’s a garbage collected language, which means that if you need complete control over memory allocation and freeing, it won’t cut it. I don’t expect to see an OS in Go any time soon.

Also, the language itself is missing out on some features that might be appealing for some systems. The example I often use is a database server. I would much rather try to write such a thing in a language like Rust than Go. Rust seems to combine low level optimizability with some nice high level features like generics and traits. If I were writing something complex like a database server (or a browser) I think I’d want those features. But Go is great for things like web application servers, command line tools, and anything else that isn’t a huge complicated system.

(And yes, I know there are people writing database servers in Go. I’m just saying that Go probably wouldn’t be my first choice for such a tool.)

I’m offering this class at such a low cost because I want to get some feedback on it before I give it at YAPC::NA::2015. If this goes well, I plan to give this class in Minneapolis again, but I’ll be charging more, so now’s your chance to take the class for as cheaply as it’ll ever be offered!

This blog post is a public announcement to say that my tuits for CPAN-related work will be in very short supply until after YAPC. I’m basically devoting all of my FOSS programming time to creating the slides and exercises for my Introduction to Go YAPC master class. As you might imagine, creating a one day class is a lot of work. My goal is to finish a teachable draft by May 29 so I can give the class here in Minneapolis on May 30 as a test run. If you’re interested in taking the class then, stay tuned to this blog for details.

This year at YAPC I’ll be giving two master classes. Why am I doing this? I don’t know, I think I may be insane. But that aside, here’s some info about said classes.

My first class is Introduction to Moose. I’ve been giving this class for a number of years, and it’s always been well-received. The class will take place on Sunday, June 7, the day before the conference proper begins. The cost of the class is a mere $175 for a full day! The format of the class consists of alternating lecture and exercise blocks, so you’ll be writing a lot of code over the course of the day. The class is aimed at intermediate Perl programmers with a basic understanding of OO who want to learn more about Moose.

Here’s what one past student said about the class:

Great class. I especially liked your problem sets. You gave out problems you expected your class to actually solve, and you allowed class time for solving them. This should be a basic expectation for any class, but it’s amazing how often teachers don’t do this.

The second class is Introduction to Go. This is a new class for me, and I’m excited to offer it. This class will take place on Thursday, June 11, the day after the conference proper ends. This class is also $175. Like the Moose class, the format is alternativing lecture and exercise blocks, so you’ll get hands-on experience writing Go code. This class is aimed at people who already know one programming language and want to learn Go.

Somehow people seem to keep breaking into my Netflix account. Calling Netflix achieves little. Their go to answer is to have me change my password and sign out all devices. In theory, this should keep hackers out. I’ve done this a number of times to no avail. Last night I changed the email associated with the account, as well as the password, and they’re back in tonight.

Edit: Someone on HackerNews asked how I know that the account was hacked. We only have two people in my household, my wife and I, and we each have a Netflix account on our profile. I have never shared the password with anyone. I see activity on my profile of things that neither my wife nor I watched. Netflix also now shows you the devices that have been used with your account. I see devices from unknown IPs around the world.

Let me first dismiss some other possibilities before settling on Netflix itself having a problem.

Was my email account hacked? If the account (or the server hosting it) was hacked, the attacker would still need to change the password, which they haven’t done. So that’s ruled out.

Was my desktop computer from which I changed the password hacked? Possibly, but if so, these are the world’s most unambitious hackers. They haven’t bothered stealing any other account login info, including things like my Amazon info or credit cards stored in Chrome. If someone had hacked my desktop I’d have much bigger problems than someone using my Netflix account!

Edit: How do I know for sure my desktop wasn’t hacked? I haven’t done a forensic investigation, but it seems unlikely. I’m running an up to date Ubuntu machine and I use Chrome as my browser. I also have a reasonably sane firewall in place, fail2ban, and other security thingamabobs. It’s not impossible to break into (nothing is) but it’s not a particularly soft target.

How about the Xbox 360 we mostly use for watching Netflix? I don’t see how that’s possible without physical access to the machine. I doubt someone broke in just to hack our Xbox 360 and didn’t steal anything.

Did someone guess my Netflix password? Possible, but I use rather long passwords that would be pretty hard to brute force. If Netflix doesn’t have rate limiting in place, that’s a huge problem. That said, I don’t know how someone would know what email address is associated with my account. It’s not an address I’ve used for anything else, ever, and I changed it last night to a new, never-before-used address!

Did someone exploit a flaw in WPA2 to intercept wireless traffic from the Xbox 360, or otherwise intercept traffic between me and Netflix? If Netflix’s authentication system is entirely on SSL, I don’t see how this could possibly work.

So what possibilities does that leave? My guess is that there’s some fundamental brokenness in the authentication system that Netflix uses. Either that or put your conspiracy theory hat on and we can talk about inside men and women at Amazon and/or Netflix. Either way, I’m blaming this on Netflix, and I’m tempted to just cancel the account. Netflix could probably help improve security quite a bit by supporting 2-factor auth in order to authenticate a new device.

That all said, I’d love to hear a better theory, especially if it came with a solution.

This will run the relevant test in a loop over and over, stopping at the first failure. The reset in between each run makes it easy to hit Ctrl-Up in the terminal and go to the beginning of the test run that failed, rather than having a monster scrollback buffer.

About a million years ago (ok, more like 6 months) a kind soul by the name of Polina Shubina reported a small bug in my Markdent module. She was even kind enough to submit a PR that fixed the issue, which was that the HTML generated for Markdown tables (via a Markdown extension) always used </th> to close table cells.

However, there was one problem, there was no test for the bug. I really hate merging a bug fix without a regression test. I know myself well enough to know that without a test the chances of me reintroducing the bug again later are pretty good.

Even more oddly, I thought for sure that this was already tested. Markdent is a tool for parsing Markdown, and includes some libraries for turning that Markdown into HTML. I knew that I tested the table parsing, and I didn’t think I was quite dumb enough to hand-write some HTML where I used </th> to close all the table cells.

I was correct. This was tested, and the expected HTML in the test was correct too. So what was going on?

It turned out that this problem went way back to when I first wrote the module. Comparing two chunks of HTML and determining if they’re the same isn’t a trivial task. HTML is notoriously flexible, and a simple string comparison just won’t cut it. Minor differences in whitespace between two pieces of HTML are (mostly) ignorable, tag attribute order is irrelevant, and so on.

I looked on CPAN for a good HTML diffing module and found squat. Then I remembered the HTML Tidy tool. I could run the two pieces of HTML I wanted to compare through Tidy and then compare the result. Tidy does a good job of forcing the HTML into a repeatable format.

Unfortunately, Tidy is a little too good. It turns out that Tidy did a really good job of fixing up broken tags! It turned my </th> into </td>, so my tests passed even when they shouldn’t. Using Tidy to test my HTML output turned out to be a really bad idea, since I wasn’t really testing the HTML my code generated.

This left me looking for an HTML diff tool again. I really couldn’t find much in the way of CLI tools on the Interwebs. CPAN has two modules which sort of work. There’s HTML::Diff, which uses regexes to parse the HTML. I didn’t even bother trying it, to be honest. (BTW, don’t blame Neil Bowers for this code, he’s just doing some light maintenance on it, he didn’t create it).

Then there’s Test::HTML::Differences. This uses HTML::Parser, at least. Unfortunately, it tries a little too hard to normalize HTML, and it got seriously confused by much of the HTML in the mdtest Markdown test suite.

I also tried using the W3C validator to somehow compare errors between two docs. I ended up adding some validation tests to the Markdent test suite, which is useful, but it still didn’t help me come up with a useful diff between two chunks of HTML.

I finally gave up and wrote my own tool, HTML::Differences. It turned out to be remarkably simple to get something that worked well enough to test Markdent, at least. I used HTML::TokeParser to turn the HTML into a list of events, and then normalized whitespace in text events (except when inside a <pre> tag).

Getting to this point took a while, especially since I was doing all of this in my free time. And that’s the story of why it took me six months to fix an incredibly trivial bug, and how testing HTML is trickier than I understood when I first started testing it with Markdent.

About

This is Dave Rolsky's blog. It contains blog posts. These posts contain ideas, mostly in the form of words. The words are made of letters, and each letter is made of pixels. The pixels are made of turtles.