You might think journalism and data science don’t really go together, but on that, I differ. Below are some thoughts on the topic and lessons we can draw from data science on how to make journalism better and more effective in these times.

I read a couple of items in this month’s Fortune magazine that I thought it was worth passing along.

The first was a small article by Brian Dumaine about the work being done at Applied Proteomics to identify cancer before it develops. At Applied Proteomics, they use mass spectroscopy to capture and catalog 360,000 different pieces of protein found in blood plasma, and then let supercomputers crunch on the data to identify anomalies associated with cancer. The company has raised $57 million in venture capital and is backed by Microsoft co-founder Paul Allen. You can read the first bit of the article here.

The second is from the Word Check callout, showing how access to information is making the word a better place:

wasa: Pronounced [wah-SUH]

(noun) Arabic slang: A display of partiality toward a favored person or group without regard for their qualifications. A system that drives much of life in the Middle East — from getting into a good school to landing a good job.

Ever wonder what your own personal network looks like? You are likely connected to many different groups (family, friends, community, work), but do you know how they are connected? Or are they connected at all? Are you the glue that connects these various groups?

This is a great age we’re living in, and I’m glad to be involved with developing lots of really advanced technologies. One of the technology areas that I’m really fascinated with has been pushed forward by Stephen Wolfram. He created the industry standard computing environment Mathematica, which now serves as the engine behind his company’s newest creation, Wolfram|Alpha. (I’ve written a few posts on Wolfram|Alpha in the past, and you can read them here and here).

J. Edgar entered theaters this weekend, and my wife and I had the opportunity to see it last night. Unfortunately, the movie only put the exclamation point on a disappointing evening (it was raining, dinner at Macaroni Grill was an hour and fifteen minutes wait to get our meal after we ordered – well, you get the picture…). While we really looked forward to seeing this film, I’d have to rate the movie between 2 and 2.5 stars out of 5.

The movie was directed by Clint Eastwood, who won Academy Awards for Unforgiven and Million Dollar Baby, and J. Edgar Hoover was played quite well by Leonardo DiCaprio, who has been toyed with by the Academy in receiving three nominations for acting, but is yet to receive his deserving award.

[Aside: DiCaprio is a fantastic actor, and I enjoy watching him on the screen. He’s received Oscar nominations for Who’s Eating Gilbert Grape?, The Aviator, and Blood Diamond. However, he starred in Titanic, which won the most Oscars ever, yet he didn’t received a nomination?! He was great in Catch Me If You Can, Inception, Shutter Island, and particularly The Departed, but no Oscar nods. Letter to the Academy: Honor the man, pronto – don’t make him wait 25+ years like you made Martin Scorsece wait… OK, stepping down from soapbox…]

Through Eastwood’s telling of the story, we find that there were three main people in Hoover’s life: his mother Annie Hoover, Clyde Tolson, his number two in the Bureau and his life companion, and Helen Gandy, Hoover’s personal secretary. These three people really did comprise the entirety of Hoover in who defined him, who nurtured him, and who protected him.

What’s clear from the picture is that J. Edgar Hoover had an incredibly logical mind (he even invented a card catalog system when he worked at the Library of Congress) and was a true innovator in the area of criminal and forensic science. His recognition of the use of fingerprint forensics to solve crimes was genius, and he certainly was constructive in fighting communism as a radical force in the United States, and in fighting organized crime elements in the big cities.

However, the part of Hoover for which he will be most remembered is his surveillance (in some cases, illegal) of public figures such as President John F. Kennedy and Dr. Martin Luther King Jr., and the secret “personal and confidential” files that he kept on people in order to coersce high-ranking U.S. officials to get his way.

Hoover had elements of genius, but some real shortcomings. Certainly the times in which he lived didn’t allow him to live his life as transparently as he might have liked. Yet, other flawed parts of his character showed through quite clearly regardless of the times.

Eastwood did as good of a job (I think) as he could with Hoover’s life (and the screenplay), and I personally think that DiCaprio did a great acting job in portraying Hoover as a human being, even though he ran the FBI with an iron fist during his 48-year tenure, intimidating Attorneys General and Presidents in the process. We do get the sense of the strong personal bonds Hoover had with his mother, Tolson, and Gandy, even though he didn’t always treat them well. The acting is very solid – Naomi Watts is very good as Helen Gandy, Dame Judi Dench plays Hoover’s mother amazingly well, and Armie Hammer is quite good as Hoover’s companion Clyde Tolson.

However, while I like nonlinear storylines with flashbacks to fill in the timeline, the timeline for this movie goes back and forth a bit too much for my taste – it actually made it hard to figure out where I was in the story. Plus, the story itself was somewhat slow at times, which made the overall length of the movie seem longer than it really was.

Overall, I enjoyed learning about Hoover and his life (both public and private), but I probably would have enjoyed a one-hour documentary instead of Eastwood’s two-and-a-half hour drama. If you want to see good actors, J. Edgar might be good (as a rental), but you want to know more about J. Edgar Hoover, there’s probably a good documentary out there.

In this week’s Nerd Pride Friday segment, I wanted to highlight a documentary that was released on DVD a few weeks ago called The People v. George Lucas.

For those of you who (like me) are big Star Wars fans, you’ll probably appreciate this documentary. Star Wars has been a solid part of the popular culture since the first movie was released in 1977. However, as new home movie technologies come out (VHS, widescreen, DVD, Blu-ray…), the movies themselves have changed, because they’ve been re-edited by Lucas and his team to add, delete, or change some of the content.

People LOVE these films, and changing them feels to some like a bit of them is being changed along with it. This has led to discomfort by some and outrage by others about the modification of the films they grew up loving. The People v. George Lucas is a documentary about this very phenomenon.

Examples of some of the changes include:

Changing the Mos Eisley cantina scene where Han Solo met Greedo. Han kills Greedo in the cantina, but in the original, Han shot Greedo because he was tired of the conversation and the pressure Greedo was putting on him. In the revised version, Greedo shoots at Han first, giving Han “justification” for killing Greedo. This slight revision changes the whole nature of Han’s character, where he was originally a “bad guy turned good guy”. This has led to T-shirts that say “Han Shot First” as a mantra for the original films…

In the original Star Wars (which has now been renamed Star Wars Episode IV: A New Hopeto align itself with the other five movies in the series), we never saw Han Solo confront Jabba the Hutt – Jabba was always this character we heard about through the dialogue. In the re-edited version, we see old footage where they do meet. Certainly, Lucas wanted this scene in the original film, but the special effects technology didn’t exist to do it well. I like the included scene, but there’s a weird part where Han walks behind Jabba and has to step on and walk over his tail – would Han really do this? It’s clunky but necessary, only because of the way the scene was filmed way back when…

Revision releases of the films edited two actors out of the films, one of them altogether. In a scene at the end of the film, the actor playing Anakin Skywalker/Darth Vader in Star Wars Episode VI: Return of the Jedi, Sebastian Shaw, is removed and replaced with Hayden Christensen, the actor who plays Anakin in the three prequel movies; at least Shaw is still in the film in other scenes. However, the original actor who played the hologram version of Emperor Palpatine in Star Wars Episode V: The Empire Strikes Back, voiced by Clive Revell, was completely replaced with Ian McDiarmid, the actor who plays Palpatine in the prequels. Some like the new continuity that the revisions provide; others feel for the actors that were completely removed from the historic film series…

There are even more changes in the Blu-ray releases of the original trilogy, including Darth Vader saying “No” several times as he picks up Emperor Palpatine and tosses him into the Death Star reactor core, there are computer-generated rocks in front of R2-D2 while he’s hiding in the canyon; however, they’re “magically not there” after he comes out of hiding, and Greedo shoots first – again – but this time with slightly fewer frames than the previous release.

I, for one, like some changes (I like to see the previously deleted scenes which provide more to the backdrop of the Star Wars universe), and feel for those who have seen the originals changed from what they remember.

I posted previously about the ongoing discussion of privacy, but I’ve found another post on GigaOM about the same topic. According to the article, the Supreme Court of the United States heard oral arguments on Tuesday in a case that could decide how connected the concept of big data is to constitutional expectations of privacy.

The case, United States v. Jones, is specifically about whether police needed a search warrant to place a GPS device on a suspect’s car and monitor his movements for 28 days. Several justices, however, seized upon a very important question: How much data is too much before allowable surveillance crosses the line into an invasion of privacy? This is a really nice post, and if you’re interested in the constitutional issues regarding privacy (for example, an appellate court has found that warrantless GPS tracking is a violation of the Fourth Amendment), I’d recommend that you take time to read the article…

These two posts do highlight interesting differences in privacy and who controls our data. We sometimes have a knee-jerk reaction to institutions that keep data on us and then use it for other purposes (whether they benefit us or not). George Orwell’s 1984 and the Big Brother metaphors with which we’re all familiar deal with government controlling the data and what it can do with it – that’s what the US v. Jones case is really all about.

However, in the private world where we interact with companies and people more directly, it’s not really a Big Brother issue, because we give up our privacy all the time – there’s no legal requirement to give up data; we do it by choice. We willingly give up our privacy in order to benefit from technology – little bit by little bit. If we want a website to provide us great recommendations (say Netflix), the company is going to have to know more about us – what we like, and what we don’t like.

It seems a bit “Big Brother”, but even people store data about us all the time – they’re called memories. Some are good and some are bad; people remember what we enjoy and what we hate. People who become our friends are the ones that become great matches for us – they enjoy our humor, they know what we like to discuss, and look out for us when we’re not around.

Companies will be trying to do that as well, but of course, it’s all about trust. Just as we trust our friends with all that they know about us, we hope to trust companies with all the data they store about us. That’s probably the biggest thing we need to wrestle with in the Age of Big Data – how to establish trust between people and the machines that will be keeping and using the data they have about us…

Here’s a neat little interview conducted by Internet Evolution’s Todd Watson of Michael Lewis and Billy Beane. Watson was attending the Information OnDemand event this past month, where one of the key themes of the event was the idea of putting business analytics into practice to help improve business outcomes. Watson felt that Beane did a great job of this in the business of baseball, and Lewis did a great job of writing about this, so he got both together for this interview.

Billy Beane is the general manager of the Oakland Athletics, changing the way that major league baseball uses data to field their rosters. Michael Lewis is the author of Moneyball, documenting Beane’s efforts to build a winning baseball franchise while being limited with a payroll that dwarfs his competition.

Lewis’ book was recently turned into a major motion picture featuring Brad Pitt as Beane and Jonah Hill as the statistics whiz that helps Beane turn the A’s around.

Here’s just a little bit from Watson’s interview on how Lewis got turned on to writing about Beane and the A’s:

Todd Watson: One of the key themes of the IOD event has been “turning insight into action,” and that seems to be a theme prevalent in some of your books — most notably Moneyball and The Big Short. I’m curious, in terms of baseball managers who are using sabermetrics to make more informed decisions, I’m really interested in how you got turned on to that topic and also just how that came to be and what inspired you to write the book?

Michael Lewis: It was really simple. I was living in Billy’s backyard in Berkeley so I was paying attention to the A’s. I didn’t know… I wasn’t a baseball fanatic, but I did know there was this payroll issue and I got interested in that.

I got interested in that in the first place, because at first I thought I was going to write a piece about the A’s. I think it was when Jose Canseco got this giant deal, and he was being paid something like $8 million, and the right fielder and left fielder were being paid something like $150,000, and I wanted to know if the outfielders were pissed!

And, how they felt when those Jose Canseco dropped a fly ball. (Laughter) And I was going to come out and write about that, and then I started thinking about it, and I realized there were these huge discrepancies from team to team. And then I wondered, so how does the whole team feel about being poor?

I enjoyed Moneyball, both the movie and the book. I have mentioned before that Lewis is a really great author – I wrote another post about Michael Lewis’ book on the 2008 global financial meltdown called The Big Short…

Today, Audrey Watters of O’Reilly Radar posts her interview with Terence Craig, co-author of Privacy and Big Data, about the impacts of big data on personal privacy. Craig makes the claim that data transparency will eventually trump anonymity, meaning that our lives will be less private in the future as we all take advantage of the technologies that come from the new information age.

Here’s a quick Q&A between Watters and Craig on the subject of privacy:

Assuming that data can’t be anonymized and companies don’t have malicious plans for our personal data, what expectations can we have for privacy?

Terence Craig: We’ve moved back to our evolutionary default for privacy, which is essentially none. Hunter-gatherers didn’t have privacy. In small rural villages with shared huts between multi-generational families, privacy just wasn’t really available there.

The question is how do we address a society that mirrors our beginnings, but comes with one big difference? Before, anyone who knew the intimate details of our lives were people we had met physically, and they were often related to us. But now the geographical boundary has been erased by the Internet, so what does that mean? And how are we as a society going to evolve to deal with that?

With that in mind, I’ve given up on the idea of digital privacy as a goal. I think you have to if you want to reap the rewards of being a full participant in a digitized society. What’s important is for us to make sure we have transparency from the large institutions that are aggregating data. We need these institutions to understand what they’re doing with data and to share that with people so we, in aggregate, can agree whether or not this is a legitimate use of our data. We need transparency so that we — consumers, citizens — can start to control the process. Transparency is what’s important. The idea that we can keep the data hidden or private, well … that horse has left the stable.

Unfortunately, I can’t find a real good reason to post this – except the fact that I LOVE Seinfeld, and I thought these T-shirts were some of the coolest things I’ve run across in a while. Since there was a Seinfeld episode about Jerry going to see “Plan 9 from Outer Space”, I thought I’d use that “sciency” connection to brag about the cool T-shirts.