Sunday, June 21, 2009

I received this message today, from what seem to be a somewhat sophisticated spammer. I receive academic spam regularly, usually from conferences in Florida that accept computer-written papers. I usually just delete them but this one was interesting. They managed to grab my name, my thesis title, and my institution and fill them into what is certainly a form letter. Note the extra spaces before the commas and the two periods after my thesis title (and the odd use of two closing statements).

In the course of a research at the Library ofUniversity of Toronto , We came across a reference to your thesis on Head-driven probabilistic parsing for word lattices. .

As we would like to make your work available to a larger audience, I am wondering if you may be interested in publishing your thesis in the form of a printed book.

Your reply including an e-mail address to which I can send an e-mail with further information in an attachment will be greatly appreciated.

I am looking forward to hearing from you.

Sincerely yours, Kind Regards,

ToolaseeMarooodamoothoo Acquisition Editor"

A quick internet search for "LAP Lambert Academic Publishing" leads to some interesting blog hits of other people challenging this shady practice. Apparently if you follow through with their invitation, the first thing they ask for is your banking information, ostensibly to deposit funds from all those sales. To be fair, from their Amazon.com listings (which I won't link to, they don't deserve the hits), and from many experience reports on blogs, they do actually turn your PDF into a 'book' and send you 5 copies. So, if you want five free copies of your thesis, maybe this is a good scheme.

However, since my thesis is available freely on my website [pdf] in full format. I can't imagine why I would want to sell it on Amazon for an hugely inflated price. It won't be a bestseller. I won't make money. It would just be a rip-off for a couple (at most) of people. I guess now that we have print-on-demand publishing, which is what LAP Lambert uses, anything can be a book. It ushers in a new age of vanity publishing. I can't say I have much respect for people who feel their research gains legitimacy because they put a hard cover on it and charge US$100+. In case my website ever goes offline, my thesis is also freely available forever at the Library of Canada. So, LAP Lambert, no I don't need your services to make my thesis more widely available.

"LAP Lambert Academic Publishing" is the English language division of VDMVerlag. At first I thought 'oh, that must be related to Springer-Verlag', a well-respected academic publishing house. It turns out that 'verlag' means 'publishing house' and they are not affiliated.

[Edit 2015 - an earlier version of this post called out the Acquisition Editor's name as being funny. This was insensitive and offensive, and I apologize. Please everyone stop talking about the names of the editors - it is irrelevant to the issue.]

Thursday, June 18, 2009

The new Thomson Reuters iPhone app from the folks at Reuters Labs is really great. It has easy and fast access to a huge collection of Reuters information from around the world. Also, you can read news offline, which is helpful for iPod touch owners like me. But, there is one irksome design issue, at least for visualization fans like me.

It has a charting tool to review the history of financial indices. At first glance, it seems really slick. InfoVis for the iPhone! Yay! However, when you actually play with it, you realize that as you scan time, the y-axis is constantly changing. This does not allow for true visual comparison between different time periods. The scale should be set to fit the data for all the available time periods, then panning would not be so disconcerting. An option to zoom in or locally optimize the scale could be provided.

Check out the elastic y-axis on this video. I've noted three different y-values that all appeared at roughly the same height as I scanned the data. Also, the bottom of the y-axis is set to maximize the visual variance in the time period. This means that in the example in the video, the y-axis origin is at about 8,000. This helps a viewer to understand the local variance, but exaggerates the overall impression of variance. Whether this is a good idea depends on the task. I guess most financial analysts are interested in recent history, but it would be nice to be able to zoom out to a zero-origin as a sanity check on relative fluctuations.

Thursday, June 4, 2009

Enrico Bertini writes the interesting Visuale blog, and recently posted a piece arguing that our research quest for 'Sensemaking' misses the forest for the trees: in the creation and study of analysis processes, we are not actually supporting realistic scenarios where decision support is needed in a timely manner. Specifically, he says "visualization is useless if it doesn't help people take actions". While I don't necessarily agree that all our InfoVis research is barking up the wrong tree, I see his point. Some projects, such as my own Uncertainty Lattices, are specifically designed to help people make fast decisions about data. However, it is true that in the InfoVis, and especially in the sensemaking communities, we seem to focus on process before results.

I see his point in that many of the solutions we develop as researchers are decoupled from actual use. I think Shneiderman & Plaisantaddressed this somewhat in their paper on MILCS (longitudinal case studies). The problem is indeed structural: we cannot prove real usefulness without long term deployments, and the incentive for such deployments is low in academia (and, these sorts of experiments are time consuming). We cannot become toolbuilders for business without careful (and publishable) follow up evaluations. So, what is the solution?

I think we could be doing great InfoVis research but also having an impact in the analytics world, especially business analytics. We need to partner more with those real world users of data... I would be elated to see some of the great ideas I see every year at InfoVis and other venues actually become real products. There is a gaping hole between the great research we do and the market.

However, I'm not sure that adding the constraints Enrico mentioned will necessarily lead to a situation of improved design, no matter how much design is improved by explicit constraints. Even a cursory look at the bulk of currently commercially available business analytics tools shows that they would never been acceptable to the 'academic' audience (due to poor information design, layout, and breaking well known constraints about human perception). On top of that, they are almost all ugly.

I recently saw a deployed visual analytic tool using dark blue text on a purple background. It was illegible. But it was deployed and paid for. And, it was working for the customer. I would argue that deployment success and ability to provide insight over exploration is not an indicator of quality design. This is the age old question of the mystery of product adoption by the market. Perhaps it is a factor of providing that immediacy Bertini mentions: the decision support in a short time; the answer rather than a lengthy exploration process. The hated fuel gauges might do that better than my own VisLinks. Great, if we are going for speed and quality of decisions and not depth of insight or potential for discovery. We need to separate the two, as they can't be supported the same way. Sensemaking is not about providing a single answer. That's artificial intelligence, or maybe even 'smart graphics'.

I agree completely on Data Mining vs. Visualization... I would sum it up to say the 'vs.' needs to become '&'. I think the strength for the future lies in closer ties between the two. We have 'data manipulations' as a step in every version of the InfoVis pipeline and in all visual analytics process diagrams, but too often the visualization is actually of some surface data, or the outputs of data mining. I think a closer coupling of the two, bringing vis as a 'box opening' tool for data mining will be important. My own thesis research as been looking at just this for statistical linguistic processes such as translation and information retrieval, and I hope to do more of it in the future.

Tuesday, June 2, 2009

There are many things about the iTunes interface that are irritating, but by far the most annoying is the fact that it does not maintain a 'live' library -- changes to your music outside iTunes are virtually impossible to propagate back to the library. One solution is to delete and re-scan completely, but then you lose playlists, play counts, etc. Another solution used to be to use the great program iTunes Library Updater, but it does not work with newer versions of iTunes.

I just followed a complicated process outlined on Paul Mayne's blog but it only worked for files that do not have a duplicate. If you delete one copy of a duplicated file, then the Smart Playlist method doesn't see the file as missing, because the duplicate is still there. So, you can't clear up situations like this, where an album was accidentally duplicated, then removed:

To make matters worse, you can sort the iTunes table by any column except the indicator column containing the exclamation point. So, it seems the only solution (at least for Windows users) is to ctrl-click every second song in lists like this. That would be a long and tedious process, prone to accidentally deleting the wrong lines. I guess I'll have to do the total delete-rebuild operation.

There are several possible easy fixes to this:

Live monitoring of music folders, as in Windows Media Player

A "remove missing tracks" button

Allow sort on missing status to put all (!) files in a contiguous list

With mature software like iTunes, I don't understand why this feature has not been created. A simple web search yields many complex workarounds -- obviously it's not just me wanting to do this.