Perhaps three months late for an announcement, and at the risk of totally reckless narcissism, I should mention that four of my projects are currently on display in the Design and the Elastic Mind exhibition at the Museum of Modern Art in New York. My work notwithstanding, I hear that the show is generating lots of foot traffic and positive reviews, which is a well-deserved compliment to curator Paola Antonelli.

There’s a New York Timesarticle and slide show (too much linking to the Times lately, weird…) and a writeup in the International Herald Tribune that even mentions my Humans vs. Chimps piece.

The first wall as you enter the show is all of Chromosome 18, done in the style of this piece.

It’s a 3 pixel font at 150 dpi, so there are 37.5 letters per inch in either direction, and the wall is about 20 feet square, making 75 million letters total. Paola and her staff asked whether it was OK to put the text on the piece itself, which I felt was fine, as the nature of the piece is about scale, and the printing would not detract from that. The funny side effect of this was watching people at the opening take one another’s picture in front of the piece, mostly probably not realizing that the wall itself was part of the exhibition. Perhaps my most popular work so far, given the number of family photos in which it will be found.

Also in the show is the previously mentioned Humans vs. Chimps project as seen below:

This image is about three feet wide so you can read the letters accurately. It’s found next to an identically sized print of isometricblocks depicting the CFTR region of the human genome (the area implicated in connection to Cystic Fibrosis). The image was first developed for a Nature cover.

Finally, the Pac-Man print of distellamap is printed floor to ceiling on another wall in the exhibition. Unfortunately there was a glitch in the printing that caused the lines connecting portions of the code to be lost (because they’re too thin to see at a distance), but no matter.

Much moreso than my own work, however, by far the most exciting for me is the number of projects built with Processing that are in the show. It’s a bit humbling and the sort of thing that makes me excited (and relieved) to have some time this summer to devote to Processing itself.

Unlike most people with a heartbeat, I didn’t find Google Maps particularly interesting on arrival. I was a fan of the simplicity of Yahoo Maps at the time (but no longer, eek!) and Microsoft’s Terraserver had done satellite imagery for a few years. But the same way that Google Mars shows us something we’re even less familiar with than satellite imagery of Earth, there’s something really exciting about possibility of seeing beneath the oceans.

Kottke and Freakonomics were kind enough to link over here, which has brought more queries about salaryper. Rather than piling onto the original web page, I’ll add updates to this section of the site.

I didn’t include the project’s back story with the 2008 version of the piece, so here goes:

Some background for people who don’t watch/follow/care about baseball:

When I first created this piece in 2005, the Yankees had a particularly bad year, with a team full of aging all-stars and owner George Steinbrenner hoping that a World Series trophy could be purchased for $208 million. The World Champion Red Sox did an ample job of defending their title, but as the second highest paid team in baseball, they’re not exactly young upstarts. The Chicago White Sox had an excellent year with just one third the salary of the Yankees, while the Cardinals are performing roughly on par with what they’re paid. Interestingly, the White Sox went on to win the World Series. The performance of Oakland, which previous years has far exceeded their overall salary, was a story, largely about their General Manager Billy Beane, told in the book Moneyball.

Some background for people who do watch/follow/care about baseball:

I neglected to include a caveat on the original page that this is a really simplistic view of salary vs. performance. I created this piece because the World Series victory of my beloved Red Sox was somewhat bittersweet in the sense that the second highest paid team in baseball finally managed to win a championship. This fact made me curious about how that works across the league, with raw salaries and the general performance of the individual teams.

There are lots of proportional things that can be done too—the salaries especially exist across a wide range (the Yankees waaaay out in front, followed the another pack of big market teams, then everyone else).

There are far more complex things about how contracts work over multiple years, how the farm system works, and scoring methods for individual players that could be taken into consideration.

This piece was thrown together while watching a game, so it’s perhaps dangerously un-advanced, given the amount of time and energy that’s put into the analysis (and argument) of sports statistics.

That last point is really important… This is fun! I encourage people to try out their own methods of playing with the data. For those who need a guide on building such a beast, the book has all the explanation and all the code (which isn’t much). And if you adapt the code, drop me a line so I can link to your example.

I have a handful of things I’d like to try (such as a proper method for doing proportional spacing at the sides without overdoing it), though the whole point of the project is to strip away as much as possible, and make a straightforward statement about salaries, so I haven’t bothered coming back to it since it succeeds in that original intent.

It’s April again, which means that there are messages lurking in my inbox asking about the whereabouts of this year’s Salary vs. Performance project (found in Chapter 5 of the good book). I got around to updating it a few days ago, which means now my inbox has changed to suggestions on how the piece might be improved. (It’s tempting to say, “Hey! Check out the book and the code, you can do anything you’d like with it! It’s more fun that way.” but that’s not really what they’re looking for.)

One of the best messages I’ve received so far is from someone who I strongly suspect is a statistician, who was wishing to see a scatter plot of the data rather than its current representation. Who else would be pining for a scatterplot? There are lots of jokes about the statistically inclined that might cover this situation, but… we’re much too high minded to let things devolve to that (actually, it’s more of a pot-kettle-black situation). If prompted, statisticians usually tell better jokes about themselves anyways.

At any rate, as it’s relevant to the issue of how you choose representations, my response follows:

Sadly, the scatter plot of the same data is actually kinda uninformative, since one of your axes (salary) is more or less fixed all season (might change at the trade deadline, but more or less stays fixed) and it’s just the averages that move about. So in fact if we’re looking for more “accurate”, a time series is gonna be better for our purposes. In an actual analytic piece, for instance, I’d do something very different (which would include multiple years, more detail about the salaries and how they amortize over time, etc).

But even so, making the piece more “correct” misses the intentional simplifications found in it, e.g. it doesn’t matter whether a baseball team was 5% away from winning, it only matters whether they’ve won. At the end of the day, it’s all about the specific rankings, who gets into the playoffs, and who wins those final games. Since the piece isn’t intended as an analytical tool, but something that conveys the idea of salary vs. performance to an audience who by and large cares little about 1) baseball and 2) stats. That’s not to say that it’s about making something zoomy and pretty (and irrelevant), but rather, how do you engage people with the data in a way that teaches them something in the end and gets them thinking about it.

Now to get back to my inbox and the guy who would rather have the data sonified since he thinks this visual thing is just a fad.

Some favorite error messages while working on the All Streets project (mentioned below). I was initially hoping to use Illustrator to open the generated PDF files (generated from Processing), but Venus informed me that it was not to be:

I’m having difficulties as well. Why did I pay for this software?

Generally, Photoshop is far better engineered so I was hoping that it would be able to rasterize the PDF file instead, never mind the vectors and all.

Oh come on… Just admit that you ran out of memory and can’t deal. Meanwhile, Eugene was helping out with the site, from the other end of iChat:

From the New York Times, a piece about Predictably Irrational from Dan Ariely. I’m somewhat fascinated by the idea of our general preoccupation with holding on to things, particularly as it relates to retaining data (see previous posts referencing Facebook, Google, etc.)

Our natural tendency is to keep everything, in spite of the consequences. Storage capacity in the digital realm is only getting larger and cheaper (as its size in the physical realm continues to get smaller), which only seeks to feed off this tendency further. Perhaps this is also why more individuals don’t question Google claiming a right to keep messages from their Gmail account after the messages, or even the account, have been deleted.

Ariely’s book describes a set of experiments performed at M.I.T.:

[Students] played a computer game that paid real cash to look for money behind three doors on the screen… After they opened a door by clicking on it, each subsequent click earned a little money, with the sum varying each time.

As each player went through the 100 allotted clicks, he could switch rooms to search for higher payoffs, but each switch used up a click to open the new door. The best strategy was to quickly check out the three rooms and settle in the one with the highest rewards.

Even after students got the hang of the game by practicing it, they were flummoxed when a new visual feature was introduced. If they stayed out of any room, its door would start shrinking and eventually disappear.

They should have ignored those disappearing doors, but the students couldn’t. They wasted so many clicks rushing back to reopen doors that their earnings dropped 15 percent. Even when the penalties for switching grew stiffer — besides losing a click, the players had to pay a cash fee — the students kept losing money by frantically keeping all their doors open.

(Emphasis mine.) I originally came across the article via Mark Hurst, who adds:

I’ve said for a long time that the solution to information overload is to let the bits go: always look for ways to delete, defer, or otherwise avoid bits, so that the few that remain are more relevant and easier to handle. This is the core philosophy of Bit Literacy.

Put another way, do we need to take more personal responsibility for subjecting ourselves to the “information overload” that people so happily buzzword about? Is complaining about the overload really an issue of not doing enough spring cleaning at home?

Mr Rumsfeld showed the picture to illustrate how backward the northern regime really is – and how oppressed its people are. Without electricity there can be none of the appliances that make life easy and that we take for granted, he said.

“Except for my wife and family, that is my favourite photo,” said Mr Rumsfeld.

“It says it all. There’s the south, the same people as the north, the same resources north and south, and the big difference is in the south it’s a free political system and a free economic system.

I’ve vowed to myself not to make this page be about politics so I won’t get into the fatuous arguments of a warmonger (oops), but I think the fascinating thing is that

This image, this “information graphic,” would be of such great importance to a person that he would see fit to even mention it in reference to photos of his wife and children. This is a strong statement for any image, even if he is being dramatic.

The use of images to make or score political points. There’s some great stuff buried in recent Congressional testimony about the Iraq War, for instance, that I want to get to soon.

In regards to #1, I’m trying to think of other images to which people maintain such a personal relationship (particularly those whose job is not info graphics—Tufte’s preoccupation with Napoleon’s March doesn’t count.)

New work, now posted. All of the streets in the lower 48 United States: an image of 26 million individual road segments. This began as an example I created for one of my students in the fall of 2006, and I just recently got a chance to document it properly.

Nothing particularly genius about this piece—it’s mostly just a matter of collecting the data and creating the image. But it’s one of those cases where even in a (relatively) raw format, the data itself is quite striking.

The data in this piece comes from the U.S. Census Bureau’s TIGER/Line data files. The data is first parsed and filtered (to remove non-street features) using Perl. Next, using Processing, the latitude and longitude coordinates are transformed using an Albers equal-area conic projection (which gives it that curvy surface-of-the-Earth look that we’re used to), and then plotted to an enormous image that’s saved to the disk. The steps are similar to the preprocessing stages described in Chapter 6 of Visualizing Data.

I had originally hoped to use this piece to show patterns in street naming, but I didn’t manage to find as much as I had hoped. For instance, names of local trees and flowers being tied to the local geographic regions where they’re found. However, cookie cutter suburban neighborhood developments seem to have obliterated any causation. “Magnolia” is such a nice sounding, outdoorsy word; who wouldn’t want it adorning their street corner? Local flora be damned.

There are, however, a few other interesting tidbits in the data that I hope to cover in a future project. Real work be damned.

The quote is primarily in regards to Web 2.0 (cough), and I couldn’t agree more.

“Praising companies for providing APIs to get your own data out is like praising auto companies for not filling your airbags with gravel. I’m not saying data export isn’t important, it’s just aiming kinda low. You mean when I give you data, you’ll give it back to me? People who think this is the pinnacle of freedom aren’t really worth listening to.”

There’s nothing worse than someone keeping a journal or blog and having it go stale, so I’ve watched in horror during the forty day Lenten fast since I’ve had a chance to post. Things should be better in the next few weeks.

My guidance is Mark Twain, speaking in The Innocents Abroad, who lampooned blogging so accurately a short 139 years ago.

One of our favorite youths, Jack, a splendid young fellow with a head full of good sense, and a pair of legs that were a wonder to look upon in the way of length and straightness and slimness, used to report progress every morning in the most glowing and spirited way, and say:

“Oh, I’m coming along bully!” (he was a little given to slang in his happier moods.) “I wrote ten pages in my journal last night – and you know I wrote nine the night before and twelve the night before that. Why, it’s only fun!”

“What do you find to put in it, Jack?”

“Oh, everything. Latitude and longitude, noon every day; and how many miles we made last twenty-four hours; and all the domino games I beat and horse billiards; and whales and sharks and porpoises; and the text of the sermon Sundays (because that’ll tell at home, you know); and the ships we saluted and what nation they were; and which way the wind was, and whether there was a heavy sea, and what sail we carried, though we don’t ever carry any, principally, going against a head wind always – wonder what is the reason of that? – and how many lies Moult has told – Oh, every thing! I’ve got everything down. My father told me to keep that journal. Father wouldn’t take a thousand dollars for it when I get it done.”

“No, Jack; it will be worth more than a thousand dollars – when you get it done.”

“Do you? – no, but do you think it will, though?

“Yes, it will be worth at least as much as a thousand dollars – when you get it done. May be more.”

“Well, I about half think so, myself. It ain’t no slouch of a journal.”

But it shortly became a most lamentable “slouch of a journal.” One night in Paris, after a hard day’s toil in sightseeing, I said:

“Now I’ll go and stroll around the cafes awhile, Jack, and give you a chance to write up your journal, old fellow.”

His countenance lost its fire. He said:

“Well, no, you needn’t mind. I think I won’t run that journal anymore. It is awful tedious. Do you know – I reckon I’m as much as four thousand pages behind hand. I haven’t got any France in it at all. First I thought I’d leave France out and start fresh. But that wouldn’t do, would it? The governor would say, ‘Hello, here – didn’t see anything in France? That cat wouldn’t fight, you know. First I thought I’d copy France out of the guide-book, like old Badger in the for’rard cabin, who’s writing a book, but there’s more than three hundred pages of it. Oh, I don’t think a journal’s any use – -do you? They’re only a bother, ain’t they?”

“Yes, a journal that is incomplete isn’t of much use, but a journal properly kept is worth a thousand dollars – when you’ve got it done.”

“A thousand! – well, I should think so. I wouldn’t finish it for a million.”

Wonderful project that shows power usage mapped to a green cloud, projected into the sky and onto the output of the Salmisaari power plant in Helsinki. From their description:

Every night from the 22 to the 29 of February 2008, the vapour emissions of he Salmisaari power plant in Helsinki will be illuminated to show the current levels of electricity consumption by local residents. A laser ray will trace the cloud during the night time and turn it into a city scale neon sign. Nuage Vert is a communal event for the area of Ruoholahti, which anticipates esoteric cults centred on energy and transforms an active power plant into a space for art, a living factory. In tandem, as a reversal of conventional roles whereby the post-industrial factory is turned into space for culture, Kaapeli (the cultural factory) becomes the site of operation and Salmisaari (the industrious factory) becomes the site of spectacle.

An excellent post from Joel Spolsky about the file format specifications that were recently released by Microsoft (to comply with or avoid more anti-trust / anti-competition mess).

Last week, Microsoft published the binary file formats for Office. These formats appear to be almost completely insane. The Excel 97-2003 file format is a 349 page PDF file. But wait, that’s not all there is to it!

This is a perfect example of the complexity of parsing, and dealing with file formats (particularly binary file formats) in general. As Joel describes it:

A normal programmer would conclude that Office’s binary file formats:

are deliberately obfuscated

are the product of a demented Borg mind

were created by insanely bad programmers

and are impossible to read or create correctly.

You’d be wrong on all four counts.

Read the article for more insight about parsing and the kind of data that you’re likely to find in the wild. While you’re at it, his Back to Basics post covers similar ground with regard to proper programming skills, and also gets into the issues of file formats (binary versus XML, and how you build code that reads it).

Joel is another (technical) person whose writing I really enjoy. In the course of digging through his page a bit, I also was reminded of the Best Software Writing I compilation that he assembled, a much needed collection because of the lack of well chosen words on the topic.

A New York Times article from February about the difficulty of removing your personal information from Facebook. I believe that in the days that followed Facebook responded by making it ever-so-slightly possible to actually remove your account (though still not very easy).

Further, there is the network effect of information that’s not “just” your own. Deleting a Facebook profile does not appear to delete posts you’ve made to “the wall” of any friends, for instance. Do you own those comments? Does your friend? It’s a somewhat similar situation in other areas—even if I chose not to have a Gmail account, because I don’t like their data retention policy, all my email sent to friends with Gmail accounts is subject to those terms I’m unhappy with.

Regardless, this is an enormous issue as we put more of our data online. What does it mean to have this information public? What happens when you change your mind?

Facebook stands out because it’s a scenario of starting college (at age 17 or 18 or now even earlier), having a very different view of what’s public and private, and that evolving over time. You may not care to have things public at the time, but one of the best things about college (or high school, for that matter) is that you move on. Having a log of your outlook, attitude, and photos to prove it that is stored on a a company’s servers means that there are more permanent memories of the time which are out of your control. (And you don’t know who else beside Facebook is storing it—search engine caches, companies doing data mining, etc. all take a role here.) Your own memories might be lost to alcohol or willful forgetfulness, but digital copies don’t behave the same way.

The bottom line is an issue of ownership of one’s own personal information. At this point, we’re putting more information online—whether it’s Facebook or having all your email stored by Gmail—but we haven’t figured out what that really means.

One of the chapters that I had to cut from Visualizing Data was about scenarios—building interactive “what if” tools that help you quickly try out several possibilities. This is one of the most useful aspects of dynamic visualization—being able to try out different ideas in a quick way (and safe, as in non-destructive, since Undo is always nearby). Hopefully I’ll be able to cover this sometime soon.

At any rate, one such scenario-building tool is Slate’s Delegate Calculator, where you can drag primitive sliders back and forth and see the possibilities for delegate outcomes for Hillary and Obama.

I’ve seen complaints about its math, but it seems to do an OK job for a big-picture look at the likelihood of different outcomes. Getting the math 100% is impossible (unless you have a far more complicated interface) because the delegate selection process is different in each state. It appears that none of the states wanted to be seen using the same approach as another, and with fifty states going their own way, things got pretty random (Texas: we’ll have a caucus and a primary).

Found this on Slashdot, but their headline—“Microsoft Developing News Sorting Based On Political Bias” made it sound a lot more interesting than it may be. The idea of mining text data to tease out mythical media biases and leanings sounds fascinating. What sort of axes could be determined? Could we see how different kinds of language are used, or ways that particular code words or phrases infect news coverage?

Unfortunately, the research project from Microsoft looks like it’s just procuring link counts from “liberal” and “conservative” blogs, and gauging the vigor of commentary on either side. Does this make you uneasy yet?

We are politically binary: the world has devolved into conservative and liberal! (Or not, yet why do people insist on it?) The representation seems almost entirely U.S.-centric, right down to the red and blue coloring on either side. Red states! Blue states! Red blogs! Blue Blogs! A maleficent Dr. Seuss has infected our political outlook.

What about those other axes, where are they? Of all the things to cull from political discourse, liberal vs. conservative must be one of the least interesting. Did you need a team of six from Microsoft, plus all the computing power at their disposal, to tell you that one article or another ruffled more feathers on either side of this simplified spectrum?

There’s so much to be learned from propagation of phrases and ideas in the news; why hasn’t there been a more sophisticated representation of it? (Because it’s hard?) The Daily Show has shown this successfully (queueing several people in order repeating something like “axis of evil” or something about “momentum” for a candidate).

Blogs are not real. When you turn off the computer, they go away. The internet is not a place, and is too divorced from actual reality to be a useful gauge on most social phenomena. Using blogs as input for a kind of meta-study seems like a poor way to acquire data.

The problems I cite are a bit unfair since they haven’t posted much on their site (looks like they’re presenting a paper…soon?) so the reaction is just based on what they’ve provided. I knew Sumit Basu back at the Media Lab and I think it’s safe to assume there’s more going on…

Halfway through The Fog of War by Errol Morris (of The Thin Blue Line, or the Apple “Switch” ad campaign depending on your persuasion), Robert S. McNamara (Secretary of Defense for the Kennedy and Johnson administrations) describes proportionality in war:

Why was it necessary to drop the nuclear bomb if [General Curtis] LeMay was burning up Japan? And he went on from Tokyo to firebomb other cities. 58% of Yokohama. Yokohama is roughly the size of Cleveland. 58% of Cleveland destroyed. Tokyo is roughly the size of New York. 51% percent of New York destroyed. 99% of the equivalent of Chattanooga, which was Toyama. 40% of the equivalent of Los Angeles, which was Nagoya. This was all done before the dropping of the nuclear bomb, which by the way was dropped by LeMay’s command.

The gruesome description is abetted by a different kind of proportionality—that when placed in the context of size with regard to U.S. cities, these numbers become more “real.” I found this set particularly striking for how ordinary the cities were—Cleveland and Chattanooga, in addition to the usual New York and Los Angeles. The huge metropolitan areas may be too abstract for many, but Cleveland!?—those are actual people!

The entire transcript is also on Errol Morris’ site—amazing. Why don’t more studios do this? It’s great to be able to study it more closely, and was enough to convince me to purchase (rather than just rent) the movie.

Every college has a hot-ticket class. Maybe it’s the subject matter (serial killers! sailing!) or maybe it’s a celebrity professor (George Tenet! Toni Morrison!). Whatever it is, everybody wants to get in.

And, of course, not everybody can. So how do you decide who gets a seat and who’s disappointed?

If you’re Patricia de Castries, you make everybody sleep outside your door. Ms. de Castries, assistant director of the Stanford Language Center, teaches a wildly popular wine-tasting course at the university. Often more than 100 would-be connoisseurs compete for the 60 spots, so on the eve of registration students show up with pillows and sleeping bags, hoping to get their names on the list. “It’s tough,” says Ms. de Castries, “but if you want to be in the class, you do it.”

Covers the range from MIT’s technical approach to Wharton’s free market approach, where students at the latter bid on courses using a point system. Sadly, the article now seems to be blocked except for those academic-types who have access to a subscription.

I’ve always been uncomfortable with the idea of David Brin’s The Transparent Society, because it provides an over-simplified version of a very complex problem. While it appeals to our general obsession with finding simple solutions, it fails to actually address a very real problem. Rather than a revolutionary or provocative idea, it’s in fact an argument for maintaining the status quo.

I’ve never quite been able to parse it out properly, but was pleased to see that Bruce Schneier (Chuck Norris of the security industry) addressed Brin’s argument this week for Wired:

When I write and speak about privacy, I am regularly confronted with the mutual disclosure argument. Explained in books like David Brin’s The Transparent Society, the argument goes something like this: In a world of ubiquitous surveillance, you’ll know all about me, but I will also know all about you. The government will be watching us, but we’ll also be watching the government. This is different than before, but it’s not automatically worse. And because I know your secrets, you can’t use my secrets as a weapon against me.

This might not be everybody’s idea of utopia — and it certainly doesn’t address the inherent value of privacy — but this theory has a glossy appeal, and could easily be mistaken for a way out of the problem of technology’s continuing erosion of privacy. Except it doesn’t work, because it ignores the crucial dissimilarity of power.

Schneier’s most recent book is Beyond Fear (which I’ve not yet had a chance to read) and also has an excellent monthly mailing list (that I read all the time) that covers topics like privacy and security. He is a gifted writer who can explain both the subtleties of the privacy debate as well as the complexities of security in terms that are informative for technologists and interesting for anyone else.

Chapters 9 and 10 (acquire and parse) are secretly my favorite parts of Visualizing Data. They’re a grab bag of useful bits based on many years of working with information (previous headaches)… the sort of things that come up all the time.

Page 327 (Chapter 10) has some discussion about little endian versus big endian, the way in which different computer architectures (Intel vs. the rest of the world, respectively) handle multi-byte binary data. I won’t repeat the whole section here, though I have two minor errata for that page.

First, an error in formatting which lists network byte order, rather than network byte order. The other problem is that I mention that little endian versions of Java’s DataInputStream class can be found on the web for little more than a search for DataInputStreamLE. As it turns out, that was a big fat lie, though you can find a handful if you search for LEDataInputStream (even though that’s a goofier name).

To make it up to you, I’m posting proper DataInputStreamLE (and DataOutputStreamLE) which are a minor adaptation of code from the GNU Classpath project. They work just like DataInputStream and DataOutputStream, but just swap the bytes around for the Intel-minded. Have fun!

I’ve been using these for a project and they seem to be working, but let me know if you find errors. In particular, I’ve not looked closely at the UTF encoding/decoding methods to see if there’s anything endian-oriented in there. I tried to clean it up a bit, but the javadoc may also be a bit hokey.

The terms “big-endian” and “little-endian” come from Gulliver’s Travels by Jonathan Swift, published in England in 1726. Swift’s hero Gulliver finds himself in the midst of a war between the empire of Lilliput, where people break their eggs on the smaller end per a royal decree (Protestant England) and the empire of Blefuscu, which follows tradition and breaks their eggs on the larger end (Catholic France). Swift was satirizing Henry VIII’s 1534 decision to break with the Roman Catholic Church and create the Church of England, which threw England into centuries of both religious and political turmoil despite the fact that there was little doctrinal difference between the two religions.

The United Nations has just launched a new web site to house all their data for all you kids out there who wanna crush Hans Rosling. The availability of this sort of information has been a huge problem in the past (Hans’ talks are based on World Bank data that costs real money), and while the U.N. has been pretty good about making things available, a site whose sole purpose is to disseminate usable data is enormous.

Dominic Allemann has developed a Swiss version of the zipdecode example from chapter six of Visualizing Data. This is the whole point of the book—to actually try things out and adapt them in different ways and see what you can learn from it.

Switzerland makes an interesting example because it has far fewer postal codes than the U.S., though the dots are quite elegant on their own. With fewer data points, I’d be inclined to 1) change the size of the individual points to make them larger without making them overwhelming, 2) or work with the colors to make the contrast more striking, since changing the point size is likely to be too much), and 3) get the text into mixed case (in this example, Gossau SG instead of GOSSAU SG). Something as minor as avoiding ALL CAPS helps get us away from the representation looking too much like COMPUTERS and DATABASES, and instead into something meant for regular humans. Finally, 4) with the smaller (and far more regular) data set, it’s not clear if the zoom even helps—could even be better off without it.

I’m in the midst of rolling out a web site redesign. The former site (un)design was assembled just after finishing my Ph.D. I expected it to be bad enough to force myself to make a proper site. Three and a half years passed, with even friends who weren’t designers (including my future mother-in-law) taking exception. The redesign was done by my friend Eugene Kuo, who couldn’t deal with it any longer.

I’m currently building out the design and hooking up all the pages (including a handful of projects that weren’t linked before). The navigation at the top will slowly begin to work as this process continues. For instance, the “projects” link currently points to my old site, which is missing anything I’ve done in the past four years. The big images on the home page will soon be rotating through projects, while the new projects page will provide a better visual overview of what’s inside.

I’ve not had a chance to try these out with an actual project yet, but the Google Chart API seems to be a decent way to get Tufte® compliant chart images using simple web requests. Just pack the info for the chart’s appearance and data into a specially crafted URL and you’re set.

It’s a nice idea for a service, and I also appreciate that Google has kept it simple, rather than implementing it through a series of obfuscated and strangely-crafted embedded JavaScript (like, say, Google maps or their newer search APIs after discontinuing the SOAP protocol).

Beautiful info graphic from a September 2007 article about the restoration of the Guggenheim, depicting the cracks in the concrete walls. From the image:

Since the Guggenheim Museum opened in 1959, Frank Lloyd Wright’s massive spiral facade has been showing signs of cracking, mainly from seasonal temperature fluctuations that caus the concrete walls, built without expansion joints, to contract and expand.

The image is partly striking for the contrast between the NYT-style geometric graphic and pale colors mixed with the organic shape of the cracks. Wonderful.

Sent from one of my former students at CMU (you know who you are, drop me a line if it was you…I’ve lost the original message!)

Somewhere between the “most important” and “only useful” thing about the wide availability map data, GPS systems, and the sort of mash-up type things that are all the rage is the ability to actually annotate map information in a useful way by combining these features.

An unfortunately titled article from the Technology Review describes a system being used in Iraq to help soldiers with their counterinsurgency efforts.

The new technology … is a map-centric application that … officers … can study before going on patrol and add to upon returning. By clicking on icons and lists, they can see the locations of key buildings, like mosques, schools, and hospitals, and retrieve information such as location data on past attacks, geotagged photos of houses and other buildings (taken with [GPS-equipped] cameras), and photos of suspected insurgents and neighborhood leaders. They can even listen to civilian interviews and watch videos of past maneuvers. It is just the kind of information that soldiers need to learn about Iraq and its perils.

It’s a wonder that such systems aren’t the norm, and the software described seems quite straightforward. But a step further, I found this quote intriguing:

“It is a bit revolutionary from a military perspective when you think about it, using peer-based information to drive the next move … Normally we are used to our higher headquarters telling the patrol leader what he needs to think.”

Not so much the cliché of technology being an enabler or democratizer (that can’t be a word, can it?) Rather, there’s something interesting about how the strength of a military structure (in discipline and rote effectiveness) is derived in part from top-down control, but that lies in direct contradiction to how information—of any kind, really—needs to move around this organization for it to be effective. What does it mean an approach like this one works in such contrast to tradition?

Blake Tregre found a typo on page 55 of Visualizing Data in one of the comments:

// Set the value of m arbitrarily high, so the first value
// found will be set as the maximum.
float m = MIN_FLOAT;

That should instead read something like:

// Set the value of m to the lowest possible value,
// so that the first value found will automatically be larger.
float m = MIN_FLOAT;

This also reminds me that the Table class used in chapter 4, makes use of Float.MAX_VALUE and -Float.MAX_VALUE, which are inherited from Java. Processing has constants named MAX_FLOAT and MIN_FLOAT that do the same thing. We added the constants because -Float.MAX_VALUE seems like especially awkward syntax when you’re just trying to get the smallest possible float. The Table class was written sometime before the constants were added to the Processing syntax, so they use the Java approach.

There is a Float.MIN_VALUE in Java, however the spec does a very unfortunate thing, because MIN_VALUE is defined as “A constant holding the smallest positive nonzero value of type float”, which sounds promising until you realize that it just means a very tiny positive number, not the minimum possible value for float. It’s not clear why they thought this would be a more useful constant (or useful at all).

And to make things even more confusing, Integer.MAX_VALUE and Integer.MIN_VALUE behave more like the way you might expect, where the MIN_VALUE is in fact that the lowest (most negative) value for an int. Had they used the same definition as Float.MIN_VALUE, then Integer.MIN_VALUE would equal 1. Which illustrates just how silly it is to do that for the Float class.

I missed seeing it live, but was told about it by a baffled friend who muttered about watching CNN and that the anchors were having a little too much fun with a new touch-screen toy while they covered returns for the primaries. The Washington Post provides more details.

Standing in front of an oversize monitor, King began poking, touching and waving at the screen like an over-caffeinated traffic cop. Each movement set in motion a series of zooming maps and flying pie charts, which King was then able to position around the screen at will.

The story also references Tim Russert’s much-talked-about (and I’d-forgotten-about) whiteboard scribbling for the 2000 election, which has me wondering about which presentation was actually more informative for viewers.

Visualization works because our eyes are the highest bandwidth channel for getting information into our brains. Researchers working to restore sight have found that the second best place may be the the tongue, due to the high density of nerve endings. An amazing testament to the adaptability of the brain to begin perceiving visual/spatial information from sensors of another organ.

Researchers at the University of Wisconsin-Madison are developing this tongue-stimulating system, which translates images detected by a camera into a pattern of electric pulses that trigger touch receptors. The scientists say that volunteers testing the prototype soon lose awareness of on-the-tongue sensations. They then perceive the stimulation as shapes and features in space. Their tongue becomes a surrogate eye.

Earlier research had used the skin as a route for images to reach the nervous system. That people can decode nerve pulses as visual information when they come from sources other than the eyes shows how adaptable, or plastic, the brain is, says Wisconsin neuroscientist and physician Paul Bach-y-Rita, one of the device’s inventors.

An example of how cartoonists embed sophisticated ideas inside their drawings, videos from the Washington Post of caricaturist John Kascht describing his process. I especially liked the idea of Obama not smiling (in spite of the positive persona the campaign has been selling), and the description of McCain’s head as a “clenched fist” couldn’t be more apt. These are impressions that will stick with me next time I see all these candidates.

On Obama: “There’s a messianic aura about him. … That air of destiny really registers all across his face and in his body language as well. He shines. Light literally bounces off the guy from everywhere … And yet for all of the surface appeal of him, I’m drawn to the unsmiling images of him, where he has his head tipped back with an almost aristocratic bearing. Seems very telling somehow. As a work in progress he’s completely fascinating to watch and to draw.”

On Hillary: “It seems to fit that Clinton’s cheeks are her most prominent features. Cheeks aren’t exactly the windows to the soul, but Hillary Clinton’s not exactly a ‘peek inside my soul’ kind of person, anyway. … Her round facial features seem to balance on top of one another, and along the same lines her head seems to balance on top of her narrow shoulders like a boulder on a pyramid. I find it really interesting that this graphic profile that she cuts—of all of these elements in precarious alignment—is such a perfect metaphor for her political balancing act.”

On McCain: “His jaw gives him away…it’s an anger barometer. During debates when he’s being challenged by an opponent, he bites down hard, and you know what he really wants to do is go to the podium next door and smack somebody. … He’s got a head like a clenched fist, and it expands with every passing year. … His small, dark eyes are watchful and wary. Whether he’s smiling or talking he bares his teeth; they’re choppers really, and they flash with metal. They look like weapons. His skin isn’t skin so much as hide.”

On Mitt: “Mitt Romney is both the easiest and the hardest of the candidates to caricature. … He seems less like an individual person than a ‘type’ of person. He’s what central casting might come up with for the game show host type or the Ward Cleaverish 50’s dad type. … Because of the heavy ridge of his brow and his deep-set eyes, it’s tough to even see his eyes, much less find a twinkle in them. But his hair sparkles. That’s what we end up making eye contact with. It’s off-putting rather than inviting.”

Wonderfully simple explanation of how to draw an eye. Karl used to be the graphics editor at Newsweek, and now teaches in the journalism school at Sparty.

I thought I’d share a short video I just made on how to draw an eye. I think it’s fun… Skip to the end if you’re in a hurry, though it’s only a couple of minutes long. Please pass it along to any budding artists! I plan to do a series of drawing instruction videos over time and this is the first.

Karl put together a fun conference last year. Conference might not be the right word (the attendees were the speakers, and the speakers the only attendees); really it was a handful of info geeks hanging out in Newport discussing each other’s work, but we certainly had a good time.

Information visualization is the process of converting abstract information, like raw numbers, into form. Visualization is about representing phenomena, like weather, that already have a physical manifestation. Then there’s open your damn eyes, where you just stare at the thing you’re studying. Researchers at Children’s Hospital in Boston have created a see-through Zebrafish, allowing them to watch cancer growth in the fish’s body.

Visualizing Data is my 2007 book about computational information design. It covers the path from raw data to how we understand it, detailing how to begin with a set of numbers and produce images or software that lets you view and interact with information. When first published, it was the only book(s) for people who wanted to learn how to actually build a data visualization in code.

The text was published by O’Reilly in December 2007 and can be found at Amazon and elsewhere. Amazon also has an edition for the Kindle, for people who aren’t into the dead tree thing. (Proceeds from Amazon links found on this page are used to pay my web hosting bill.)

The book covers ideas found in my Ph.D. dissertation, which is the basis for Chapter 1. The next chapter is an extremely brief introduction to Processing, which is used for the examples. Next is (chapter 3) is a simple mapping project to place data points on a map of the United States. Of course, the idea is not that lots of people want to visualize data for each of 50 states. Instead, it’s a jumping off point for learning how to lay out data spatially.

The chapters that follow cover six more projects, such as salary vs. performance (Chapter 5), zipdecode (Chapter 6), followed by more advanced topics dealing with trees, treemaps, hierarchies, and recursion (Chapter 7), plus graphs and networks (Chapter 8).

This site is used for follow-up code and writing about related topics.