~ Science! Culture! Computational Engines!

I think a lot of thesturmunddrang about the current state of science and
academia in general comes from the fact that the reality of science today is
wildly different than what we are told for our entire lives (duh). Society,
education, whatever, says that being a scientist is like being a
priest/priestess. We think what we will do is discover arcane secrets,
entering into a class divorced from the world through its connection to deeper
truths. The reality of being a scientist is that you are a monk. As much as
the job involves the priestly craft of divining ultimate reality, what it
really involves, day to day, is aesceticism. To be a scientist in the age of
austerity is to take a vow of poverty, promising to forgo the worldly
posessions of the middle class that could easily have been yours years ago. It
means self-mortification, flagellating yourself daily with the scourge of
stress, self-doubt, and overwork. So we have a generation of young scientists
who don’t realize that the great ritual at the altar of Truth has a sacrifice
at its centre, and that sacrifice is you.

Heads up kids, if you are using vimrepress for blogging, and you want to use
the BlogList and one of the edit (BlogEdit, BlogNew, etc.) features, make sure
you use separate vim sessions. Vimpress keeps a “view” state, so it knows
whether your buffer can be pushed to WordPress, and the state is shared between
tabs. So if you start a list in a second tab after you’ve started editing,
vimrepress will not let you save your edit, unless you fire up another tab in
the edit view to knock the state back to edit.

Well, hopefully I will be blogging a bit more frequently. I just finished
writing a manuscript for a paper from the last 2 years of PhD work, and it was
no fun. I am, at this stage, pretty shitty at scientifc writing. I’m hoping
some practice will help improve me. I’ve installe the wickedvim-repress which will hopefully
make the whole affair more fun. I can’t stand writing with a tool other than
vim, and markdown is a bicycle to HTML’s 18-wheeler. Lighter and more fun to
use.

I’ve been working on building a pedal-powered twitter projection wall as part of a collaboration between Think|Haus and THAAT, to display at HIVEX. On Saturday, our hardware genius Gord got the frame and motor mount completed, and we decided to have a brief rave-light party in the factory.

So, I’m taking a class this year on high performance computing, and I figure’d I might as well kill two birds with one stone: write some blog posts, and also get some studying done. Let’s get to it!

What Is OpenMP?

OpenMP is an API for working with shared memory parallel computers. Essentially everyone now owns one of these machines, as any multi-core machine is a shared memory parallel machine. What it isn’t is a tool for GPU programming or programming on distributed memory systems (like a Beowulf cluster).

OpenMP is one of the fastest and easiest ways to squeeze extra performance our of modern multicore CPUs.

How to Set Up OpenMP?

Unlike some parallel tools (I’m looking at you CUDA 2 years ago), OpenMP is ridiculously easy to set up. If you are running a Debian-like system, it is just:

apt-get install libgomp1

And that’s it! All you need to do now is compile your code, as you normally would, with gcc and the -fopenmp flag

One of the projects in my list of stuff I’ll get around to is making a 3D unprinter: a machine that can melt a thermoplastic object down and extrude it back into filament. McMaster has this cool course called Sustainable Future, and part of the course is for the students to do a real world project involving sustainability. I pitched the idea to the class, and I’ve got a team of 4 students now working with me to build one! We’re blogging here, and we’ve set up a github repo here. Watch our progress, we should have a good prototype by December.

I have always had a problem with the concept of intellectual property. The great western tradition of post-enlightenment values have always placed the free flow of art and ideas on a pedestal, as a sacrosanct cornerstone of a just society. That the ideas living in our heads and flowing from our lips were the domain of no king, pope, or policeman is the one of the most important cultural norms that has emerged from the enlightenment into modern liberal democracies. The legal constructs associated with intellectual property, in my evaluation, cannot be reconciled with this. A corpuscle of information cannot be at once free to be spoken or expressed and also be the property of some individual and corporation. Information Theory, the fantastic work pioneered by Claude Shannon, only swells my distaste for intellectual property. We know now that with simple coding, all information is reducible to a common binary form. Film, print, music, photography: all is merely a collection of ordered bits. Which makes the idea of owning information all the more ridiculous, as the process can be just as easily reversed: A song can be represented by a string of Shakespeare quotations, a movie can be rendered in musical score. As an illustration of this, I’ve written a short program that takes any file and converts it to a long, rambling nonsense-poem. Poetry as Piracy.

Making the Wordlists

The first step is generating a set of words to use to generate our poems, categorized by their grammatical type. To do this, I downloaded the English wiktionary. I then used grep, sed, and awk to split it into plain lists of words: nouns, past tense verbs, present participle verbs, and adjectives. I then shuffled these lists, and trimmed them down so that their length was a multiple of 2. I didn’t need to do this, but it simplified the work slightly. In the end, I was left with 17 bits worth of information stored in each noun (131,072 words), 13 bits in each past-tense verb (8192), 13 bits in each present-participle verb (8192), and 15 bits for each adjective.

Sentence Skeletons

I then decided on two rough sentence skeletons:

The ADJECTIVE NOUN PAST-VERBED the ADJECTIVE NOUN.

ADJECTIVE NOUN is PRESENT-VERBING the ADJECTIVE NOUN.

Each of those sentences can store 77 bits of information. A 1Mb file, for example, will require roughly 10,000 sentences, or about a novelette worth of words. If that 1 Mb file was a copyrighted song, you would not in fact have the freedom to print and distribute your nice new novel (not that you would want to, it would be random nonsense.)

Encoding the File

Now, 77 bits is a bit awkward. Just choosing between each sentence type gives me 1 bit of information. I also get punctuation at the end. If I end each sentence with either a period, exclamation mark, two exclamation marks, or three exclamation marks, that gets me an extra two bits of information. This gets me up to 80 bits per sentence, or 10 bytes. I can now easily encode my data as nonsense poetry! I use the first bit to select which tense of verb, the second two decide if I get a period or exclamation series, and the rest determine the sentence itself. If my file isn’t nicely divisible into base 10, I simply add an additional line at the end:

All that remains are NUM memories and NUM regrets.

Where NUM is the base-10 representation of the remaining bytes in the first case, and the number of bytes remaining in the second instance (as a long string of leading zeros will get truncated in converting to decimal).

Decoding the File

Decoding the file is as simple as just reading in each line, checking what sentence type it is, and what the punctuation at the end is, and returning it to the original binary form!

I’ve been warned that I sometimes veer too far in the direction of toolmaker away from the standard path followed by most scientists. Try as I might, I cannot seem to avoid finding the process of doing science nearly as interesting as the goal of getting that science done. And so, my mind has been orbiting around a problem I suspect is endemic amongst all physicists, if not all scientists. That problem, captured so nicely by this PhD comic is that of filesystem cruft. Science, being at it’s core an experimental art, produces for every successful idea a whole panoply of failed experiments, mistakes, and generally messed-up crap. Being paranoid creatures consumed by our own fears, along with the awareness that serendipity has been a cornerstone of great work, we are loathe to sweep these ill-fated children of the mind into the trash where they (mostly) belong. And so those of us who rely on computers for most of our day-to-day work end up with home directories filled to the brim with old scripts, corrupted data files, a dozen different versions of the same list of values, and other digital detritus. And this situation makes for errors, confusion, thousand yard stare, anal leakage, and other evils too foul to discuss in polite company. Just looking at my /home directory on my workstation at the University, I have more than 100,000 files sitting around, waiting for me to stare at them for a quarter hour trying to remember what they were for.

Inspired by a reddit image post (which I cannot for the life of me find again), I decided to take a series of photos of the sunset from my parents’ house at Cedar-by-the-Sea, Vancouver Island. I many photos over the course of several hours using a digital camera fixed in position on a tripod.

I thought it would look good to blend the images one into the other, so I wrote a quick python script using the Python Image Library. The script blends consecutive images using linear interpolation. An artistic choice to make was how wide the blended regions should be. I tried everything from relatively thin blending regions:

To almost completely blended images:

In the end, however, I decided that what looked the best was actually to have no blending, but rather sharp boundaries between the images. This actually accentuates the effect I was going for, which was to show the changing light over time. Blending the images together actually lessens the effect, rather than enhancing it as had hoped. I plan to get the finished product printed and framed:

Here’s the code for the script I used (apologies for quick-and-dirtiness):

So, as you are all fully aware, I have been silent for the past few weeks. Moving across the country can do that to you. Now that I am no longer living out of boxes, expect a rapid catchup as I make up the posts I missed.