Cookie Notice

2016/07/28

Our VMs were having problems last fall. Their connections to the file system would falter, causing a large number of processes sitting around waiting to write. The symptom we found was that the load averages would then rise incredibly high.

Like four-digits high.

So, I wrote something that would log load average once an hour. It was a convergence of lab need and an excuse to learn Log::Log4Perl. I also used Pushover to tell me when load average was greater than 20, as if I could do anything about it.

Eventually, those issues worked out. The evidence of file system hinkiness is that, on occasion, we try to save or open a file, it takes a few minutes — I have learned from experience that mkdir does not display atomicity — but we never see the high load averages and catastrophically long file access times of a few months ago.

But the logging never left my crontab.

I started looking at and playing with new things, and I wrote an API that allowed me to curl from several machines once an hour, and I would get Pushover notifications when machines were down.

(You can really thank Phil Sturgeon and his excellent book, Build APIs You Won't Hate, for that. I'm not quite there with my API, though. It'd probably make an interesting blog post, but it's built on pre-MVC technology.)

(And yes, I really like Pushover. In general, I turn off notifications for most apps and only pay attention to what I have Pushover tell me.)

Anyway, I'd get notifications telling me my web server is down, then pull out my phone and find the web server up and responsive. I'm putting that into MySQL, so a query told me that, on some hours, I'd get five heartbeats, some four, and some 3, so I was sure that the issue wasn't with the API.

I log and get Pushover notifications set at the @reboot section of my crontab, and that hadn't warned me recently, so I knew the machines were up, but not responding.

Then I remembered that I never stopped monitoring load averages, and started looking at those logs.

We see above that of the four VMs I monitor, all four fail to log multiple times, and many times, three of four VMs fail to run their crontabs. Since I had something more solid than "Hey, that's funny", I went to my admins about this. Looks like VMs are failing to authenticate with the LDAP server. My admins are taking it up the chain.

So, beyond how I make and parse logs, which might not be the best examples you can find, the message here is that it's hard to identify a problem unless you're tracking it, and even tracking something else might help you identify a problem.

2016/07/27

The first line between Twitter Use and Twitter Obsession is TweetDeck. That's when the update-on-demand single-thread of the web page gives way to multiple constantly-updated streams of the stream-of-consciousness ramblings of the Internet.

That's the first line.

The second line between Twitter use and Twitter obsession is when you want to automate the work. If you're an R person, that's twitteR. If you work in Python, that's tweepy.

And, if you're like me, and you normally use Perl, we're talking Net::Twitter.

Again, this is as simple as we can reasonably do, without pulling the keys into a separate file, which I, again, strongly recommend you do. (I personally use YAML as the way I store and restore data such as access tokens and consumer keys. I will demonstrate that in a later post.)