So, over on my del.icio.us
account, you'll find a link to download the Roku's firmware image. I
thought I should mention that in a venue more people will see. I'm
keen to share notes with anyone who's also poking at this. To prove
you're serious, please include the significance of 192.168.251.0 in
any introductory emails on Roku stuff. Feel free to pass this on to
others of a curious bent, as I'm sure there's plenty for people to
poke at.

The NED (Netflix Device) is quite a slick little box, and I highly
recommend it if you're a Netflix Watch-it-now addict like
me. Alfred Hitchcock Presents, The Outer
Limits, Little Britain... really, this is a great way to
watch classic television. It'd be great if Roku would fulfill
their obligations under the GPL, as I've got one of their boxes in my
grubby little paws as we speak. Whoops, spoke too soon! They have
released
source: Roku
Netflix Player GPL sources. Thank you, Roku!

In other news, I'm between jobs at the moment. Since I'm no longer
there, I can mention that I was
at Paglo Labs, which is a cool
idea. Someday I'll likely regret leaving the place. Unfortunately,
the role wasn't really fitting as time went on (but, I stuck it out
through the crunch of the Public Beta launch). There are about four
people who seem to be warming to the idea of my employ, but I'm always
keen to find more potential positions. There are likely two more
going into the top of the hopper in the next day or two, but if you
know anyone looking for an information person, let me know (that's
stuff like data mining, search, or machine learning). Mo' option, mo'
better.

I'm currently in Chapel Hill, staying with Kristina. We've done Heathen
Children Get Presents Too Day, which is like Christmas, but for those
of us who don't fit into the "Christian" category. (I used to have
Atheist Children Get Presents Too Day for just myself, but broadened
it this year).

Between my folks and K, it's been a very good year for presents.
The real wins were a pocket hole jig and the charcoal Lamy Safari, in Fine. I also
got the appropriate convertor, which is already loaded with Noodler's
Legal Lapis.

So, this is the first post using my new "blog engine," which is
just a messy set of of perl scripts, make, and the C pre-processor
from GCC. At some point, I'll move over to something a bit more
appropriate, but this works wonderfully for now. I think.

I never quite got around to fixing it up and settling it in. Then
I stopped caring. Now, however, I feel like I should have a more
active presence here. I've got a few little projects going, and feel
like a personal outlet could be beneficial.

Therefore, I've dusted off the perl scripts, the macros, and the
Makefile, and am bringing blog back.

(Every so often, I have "clever monkey" moments. These are little
creative flashes, insight or non-linear solutions to problems. They
make me think I might, in fact, be half as clever as people seem to
think I am. Which would make me twice as clever as I usually feel.)

For instance, the other evening, I wanted to wash the pillow slips for
my feather pillows. You know those little zipper thingies you put
around a feather pillow, to contain the feathers that sneak out?
Those things.

This process is akin to cracking open a nuclear containment vessel,
but instead of blue light that gives you cancer, you get a cloud of
loose little feathers that stick to everything and look silly. And,
if you simply wash the cover with feathers in it, they stick to the
inside of your washing machine and show up on your clothes for weeks
to come.

In the past, my solution to this has been to put on a nice linen
shirt, go outside, and flail the feathery pillow slip around. Then, I
carry out the pillow and whack it around for a bit. There were little
white drifts in my back yard last time, in California, in August. And
i still had the damned things in my hair for days.

What I really wanted was a second container for this. I consider
doing all that inside a trash bag, or a grocery bag. Maybe I could
tie the bag up, all puffy, and then beat it around for a bit. All
good ideas, but there's still the problem of opening the bag up. What
you need is a filter on the bag as you open it up and the air comes
out.

Or, you can just stick the whole assembly in the dryer, open it up,
and extract in there. Then, close the dryer, and run air through it
for a little while. Et voila! All the feathers are handily
sequestered in your standard filter, the pillows are delightfully
fluffed, and the slip are wonderfully, well, de-fluffed.

Saturday was the unveiling of the Computer History
Museum's Difference
Engine. Unfortunately, I missed it because I was half-zonked on
the couch all day with something like strep throat. That was a
bummer, but it's okay, I had a bit of a religious revival.

Trolling around Google Video for something interesting, I came across
Google's Tech
Talks again. A total gold mine (and it'd be even better if people
knew how to use microphones).

Supporting
Scalable Online Statistical Processing was really interesting,
about using statistical mechanisms and randomized algorithms to
numerically optimize really hard SQL queries. I recommend it as a
different way of thinking about working with large datasets.

"I am interested in what you say and would like to subscribe to
your newsletter." It reaffirms my belief in the plausibility of
neural networks. There's a long way to go until they're practical,
but this demo makes it clear that the promise is still there. I'm
back in the fold, I believe again. No pulpit-pounding required, no
fire and brimstone. Just the promise of heaven in a clean, intuitive,
fundamentally simple model.

The best part, though, is that they publish all the code for this on
his website. Unfortunately, it's in Matlab; fortunately, it runs in
octave. When I've been up to it (ie: the few hours I was less
feverish Sunday), I've started porting it to C++ with GSL, mostly to
fully understand the underlying structure.

For now, though, I should get back to sleep. Tomorrow morning: to the
doctor's, then hopefully to work. I've run out of watchable movies on
Netflix on-demand, and, quite frankly, can't stand the idea of
sleeping away another day when there's cool stuff to be done.

I've recently been learning Haskell. As part of that, I'm
implementing Huffman Coding.
This is my first real project in the language. It's been overall
quite pleasant, and has taught me a lot.

The biggest lesson has been thoroughly meta: compression tasks are a great way to learn
a language/environment. For this project, I had to learn how to
use modules, do I/O, mangle arrays, and define tree structures. It
might not look like too much, but that's actually a huge amount of
stuff to shove into a couple weekends of hacking.

Other things I've learned (or relearned), in no particular order:

You'll get a reasonable answer if you hop on IRC to for help in #haskell.

"You remember the mental leap from imperative languages to OCaml?
I had about the same level of change going from OCaml to Haskell." --
My friend Evan, talking about
Haskell (he convinced me to try it, at least).

The language-of-the-month club is incredibly time-consuming. It
took forever to get really basic stuff down in a new language
(and I even had the benefit of having already coded some in OCaml).

A good book makes all the difference. I highly recommend The
Craft of Functional Programming. It's a great book, even if
you've already been programming in other functional languages.

This week, I did something tantamout to virtual suicide. I went
through my social networking profiles and disabled, deactivated, or
deleted most of them. After much consideration, it became clear that
social networking sites were bad for my relationships, bad for for my
social life, and generally bad for me.

Social networking sites are the high-fructose corn syrup of social
interaction. High-fructose corn syrup in food is trouble for two
reasons: it doesn't nourish you, nor does it fill you up. Similarly,
social networking makes you feel like you're involved in another
person's life, without providing the nourishing fulfillment meaningful
interaction gives. You get a brief glimmer of fulfillment when you
read a snippet of your friend's life, but it wears off quickly,
leaving you needing more interaction. This means that you'll visit
the site again, which is perfect for the provider: they get another
set of ad views.

Don't believe me when I say it's less fulfilling? Let's try a
thought experiment. You might get a brief burst of warm fuzzies when
someone posts pictures of their newborn baby on facebook, but it's
short-lived. Contrast this with someone walking around the office
with cameraphone pictures of the same baby. There's a real difference
in the quality of these two interactions: the in-person interaction is
more affecting than the online one. There are myriad reasons for
this, but the important thing is that the in-person, in-depth,
in-excitement experience is much richer and more fulfilling than the
mediated, short, distant experience of facebook.

Of course, high-fructose corn syrup is okay in moderation, when
balanced out with something healthy. Enjoy a Coke when you go see a
movie or are stuck in the airport. It's a nice thing, in balance.
Similarly, social networking is fine if it's balanced out with more
healthy interactions. That comes down to a sort of self-control,
which is where I have a hard time, and what finally pushed me over the
edge.

I'm a bit of a work-a-holic, having a hard time with work-life
balance. I have an intense day job at a startup, where I'm a third of
the engineering team. When I get home in the evening, I tinker on
personal projects: more software, writing a book or two, and studying
new things. I find a lot of satisfaction in these things, and
therefore overdo it. In fact, I overdo it so much that my friends
won't see me (in person) for weeks at a time. But they'll see my
facebook status update every other day or so, "Josh looks forward to
taking a day off," or "Josh is finally shipping his pet project!"

Social networking sites make it too easy to "work friends in" around
your schedule. They're an enabler for this sort of thing, both in
scheduling and perception. If I had to go to dinner to see my
friends, I would make sure the few hours were well-spent, and connect
with the other people. I might even stop thinking about work for a
while. Social networking sites, however, reduce the cost of
socializing, which is a great thing for keeping in vague touch with
people. Unfortunately, this reduces the perceived value of
relationships: there's less social capital invested in these short
bursts of activity. This, in turn, makes them less seem meaningful to
the participants.

The perception of value is a funny thing. In dating, there's a reason
people play hard to get. It's the same reason that food you cooked
yourself tastes better. Somewhere in the back of our brains, there's
a tiny beancounter, keeping track of how much time, money, and emotion
we've put into things. This little accountant isn't always rational
or consistent, but generally, the more you put into a thing, the more
valuable you find it. By reducing the cost of relationships, social
networking sites accidentally trick us into thinking our relationships
are less valuable.

Of course, often the relationships are less valuable. It is
possible to hold truly deep and meaningful discussions with people
online. It's a great medium for this, just as television is a great
medium for teaching people. In television, you can show animated
graphs, moving diagrams, and demonstrate experiments, all with
expository notes (think Mythbusters). What's television
actually used for? Fear Factor, Maury Povich, and so
on. Similarly, social networking sites don't typically make good use
of the medium. They encourage lots of short interactions, which are
really great for ad revenue, but are terrible for meaningful
connections.

Some sites are better than others for this, and allow you to grow a
group of really great friends. This, though, also poses a problem.
There's always someone out there to listen and offer advice.
Therefore, you never have to think for yourself. Which means you
never make your own decisions/mistakes in a vacuum. And,
correspondingly, you're never forced into full independence.
Collaboration is a great tool for developing new ideas, but it might
not be the best thing for one's internal life, as it tends to
encourage this sort of promiscuous codependency.

Now that the accounts are closed and the bookmarks are deleted, what
do I do next? First, I set up a public Google
Calendar. Josh's
Google Calendar is a good first approximation of whether or not
I'm busy at a given time. It's also a good motivator, reminding
myself of the fact that I haven't seen people in X days, and maybe I
should get out more.

Next, it's time to really clean up my house, so I feel confident in
having people around more often. The war on slightly embarassing
dustbunnies is nigh. After that, it's time to start collecting
people's phone numbers. Along with this, I need to get better about
calling people to hang out more often.

Maybe I should just create a new event on facebook and invite everyone.

So, this is the first post using my new "blog engine," which is
just a messy set of of perl scripts, make, and the C pre-processor
from GCC. At some point, I'll move over to something a bit more
appropriate, but this works wonderfully for now. I think.

(obdisclaimer: I don't speak for my employer here, and
this is my idea, not theirs.)

Quick summary: Applying machine learning to uncover bottlenecks,
predict system capacity, user growth, hardware purchases, and,
generally, everything you need to know when running a service-based
business. You give it your monitor data, and it gives you a
diagnostic and predictive model of your system.

Time Data Into Knowledge (TDIK) is an idea I had a little over a year
ago. Since I've talked with several people about it in that time,
it's no longer patentable in the US. And, since I haven't actually
done more than a proof of concept, I wanted to make a full public
disclosure of the idea, in the hopes that it would inspire someone.

Imagine you have a multi-tier application stack, with fairly complete
monitors. So, your frontend server, the message queues, databases,
and a few backend processes (bulk and on-line). Additionally, assume
you've got complete monitors in this stack: the normal machine
telemetry (CPU usage, disk capacity, network utilization, etc), as
well as application-specific stuff, like number of users, hits per
second, message queue depth, etc.

Traditional monitoring systems give you graphs of all this; you do the
analysis. The best you can hope for is a big display of all your
graphs together, then eyeball them for correlations. You can shuffle
them around to make it easier, but it's still human work. This kind
of correlation is great for fires, where you have a sudden large shift
in two variables and don't care about the precise magnitude of the
relations. It's no good for a more valuable, big-picture task:
capacity planning, where the relationships are less pronounced and
more complicated.

That's where TDIK comes in. There are ways to find out how correlated
two datasets are and then extract models of their relationship. You
can expand these out to any number of combinations, though it gets
much more computationall expensive. Once you have these models,
though, they're invaluable.

You can make a model of your system yourself, using your knowledge of
it. Your model will probably be darned good, since you built the
thing. I've done these, and not only are they fun, they're handy.
But there's always dark corners lurking around. What's the
performance interaction of running MySQL and Squid on the same host,
for instance? They're both memory-intensive, especially when big
requests are getting bandied about. Ideally, I'd have separate
hardware for them, but, well, you know how that goes.

TDIK, since it's learning the model from scratch every time, will find
out how things interact on your particular system. It discovers
correlations that don't seem straightforward, but make sense after the
fact. Things like "Webserver load is highly correlated with the
number of user profile views in full mode" (you eventually discover
that someone accidentally left in the debug code that disables
template caching there).

TDIK's models are useful for more than troubleshooting, though. You
can also use them for planning. For instance, let's say your website
has N concurrent users. Would you like to know how many users the
current system can support? The model can tell you how many, and
which component will be your bottleneck. Or, perhaps you know that
you want to be able to support some number of users. The model could
tell you how much you'd need to scale each component in your current
system to get there.

What about firefighting? Your model reflects the steady-state
performance of the overall system. If you have the last few minutes
of monitor data, you can quickly re-correlate and see which components
don't fit the model. In fact, you can see exactly how much
they don't fit the model, and prioritize the order in which your team
checks things out.

But why firefight in the first place? Using that same correlation,
you can get alerts when the current state deviates appreciably from
the model. One things I've always said about nagios alerts is that
"They're only as good as your experience and creativity." If you
don't know that a failure mode is waiting, you're not going to have a
nagios alert prepared for it. TDIK's model obviates that, since it
knows your system intimately. It will notice the increase in CPU time
versus page hits even if the raw number of hits is low (say, at
midnight), letting you identify and avert the morning meltdown.

"So where do I download or buy this product, or pay for the service?"
you ask. Well, it's mostly still vapor. I did a small
proof-of-concept, and found that modelling this many variables is
noisy. And, honestly, I haven't had time to make this happen. It
could probably be a startup, but I'm not certain of it yet. Feel free
to drop me an email at josh@joshisanerd.com if you
feel otherwise.