Sunday, March 31, 2013

Jake Porway who was a data scientist at the New York Times R&D labs has a great perspective on why multi-disciplinary teams are important to avoid bias and bring in different perspective in data analysis. He discusses a story where data gathered by Über in Oakland suggested that prostitution arrests increased in Oakland on Wednesdays but increased arrests necessarily didn't imply increased crime. He also outlines the data analysis done by Grameen Foundation where the analysis of Ugandan farm workers could result into the farmers being "good" or "bad" depending on which perspective you would consider. This story validates one more attribute of my point of view regarding data scientists - data scientists should be design thinkers. Working in a multi-disciplinary team to let people champion their perspective is one of the core tenants of design thinking.

One of the viewpoints of Jake that I don't agree with:

"Any data scientist worth their salary will tell you that you should start with a question, NOT the data."

In many cases you don't even know what question to ask. Sometimes an anomaly or a pattern in data tells a story. This story informs us what questions we might ask. I do see that many data scientists start with knowing a question ahead of time and then pull in necessary data they need but I advocate the other side where you bring in the sources and let the data tell you a story. Referring to design, Henry Ford once said, ""Every object tells a story if you know how to read it." Listen to the data—a story—without any pre-conceived bias and see where it leads you.

You can only ask what you know to ask. It limits your ability to unearth groundbreaking insights. Chasing a perfect answer to a perfect question is a trap that many data scientists fall into. In reality what business wants is to get to a good enough answer to a question or insight that is actionable. In most cases getting to an answer that is 95% accurate requires little effort but getting that rest 5% requires exponentially disproportionate time with disproportionately low return.

Thrive for precision, not accuracy. The first answer could really be of low precision. It's perfectly acceptable as long as you know what the precision is and you can continuously refine it to make it good enough. Being able to rapidly iterate and reframe the question is far more important than knowing upfront what question to ask; data analysis is a journey and not a step in the process.

Friday, March 15, 2013

Hopefully you really have a good answer for this. Getting hacked is no longer a distant probability; it's a harsh reality. The most recent incident was Evernote losing customer information including email addresses and passwords to a hacker. I'm an Evernote customer and I watched the drama unfold from the perspective of an end user. I have no visibility into what level of security response planning Evernote had in place but this is what I would encourage all the critical services to have:

Prevent

You are as secured as your weakest link; do anything and everything that you can to prevent such incidents. This includes hardening your systems, educating employees on social engineering, and enforce security policies. Broadly speaking there are two kinds of incidents - hijacking of a specific account(s) and getting unauthorizd access to a large set of data. Both of these could be devastating and they both need to prevented differently. In the case of Evernote they did turn on two-factor authentication but it doesn't solve the problem of data being stolen from their systems. Google has done an outstanding job hardening their security to prevent account hijacking. Explore shared-secret options where partial data loss doesn't lead to compromised accounts.

Mitigate

If you do get hacked, is your system instrumented to respond to such an incident? It includes locking acconts down, taking critical systems offline, assess the extent of damage etc. In the case of Evernote I found out about the breach from Twitter long before Evernote sent me an email asking to change the password. This approach has a major flaw: if someone already had my password (hard to decrypt a salted and hashed value but still) they could have logged in and changed the password and would have had full access to my account. And, this move—logging in and changing the password—wouldn't have raised any alarms on the Evernote side since that's exactly what they would expect users to do. A pretty weak approach. A slightly better way would have been to ask users to reset the password and then follow up with an email verification process before users could access the account.

Manage

If the accounts did get hacked and the hackers did get control over certain accounts and got access to certain sensitive information what would you do? Turns out the companies don't have a good answer or any answer for this. They just wish such things won't happen to them. But, that's no longer true. There have been horror stories on people losing access to their Google accounts. Such accounts are further used for malicious activities such as sending out emails to all contacts asking to wire you money due to you being robbed in . Do you have a multi-disciplinary SWAT team—tech, support, and communication—identified when you end up in such a situation? And, lastly, have you tested your security response? Impact of many catastrophes, natural or otherwise, such as flood earthquakes, and terrorist attacks can be reduced if people were prepared to anticipate and respond. Getting hacked is no different.