computer science

Cool infographic about our core technology at Hunch, what we call the “taste graph.” My favorite stat is that we have >25,000 API clients making >400,000 calls per day (many clients seem to be devs building not-yet-release apps). Upcoming infographic will focus just on API usage and growth. And yes, crazy as it seems, we have Assembly and C code, which was necessary to optimize core inner loops which are extremely computationally intensive.

A platform is a technology or product upon which many other technologies or products are built. Some platforms are controlled by a single corporation: e.g. Windows, iOS, and Facebook. Some are controlled by standards committees or groups of companies: e.g. the web (html/http), RSS, and email (smtp).

Platforms succeed when they are 1) financially sustainable, and 2) have a sufficient number of developers that are financially sustainable. Fostering a successful developer community means convincing developers (and, possibly, investors in developers) that the platform is a worthwhile investment of time and money.

Developers who create applications for platforms take on all the usual risks related to launching a new product, but in addition take on platform-specific risks, namely:

Platform decline: the platform will decline or go away entirely.

Subsumption risk: the platform will subsume the functionality of the developer’s application.

The most successful platforms try to mitigate these risks for developers (not just the appearance of these risks). One way to mitigate platform decline risk is to launch the platform after the platform’s core product is already successful, as Facebook did with its app platform and Apple did with its iOS platform. Platforms that are not yet launched or established can use other methods to reassure developers; for example, when Microsoft launched the first Xbox they very publicly announced they would invest $1B in the platform.

To mitigate subsumption risk, the platform should give developers predictability around the platform’s feature roadmap. Platforms can do this explicitly by divulging their product roadmap but more often do it implicitly by demonstrating predictable patterns of feature development. Developers and investors are willing to invest in the iOS platform because – although Apple will take 30% of the revenue – it is highly unlikely that Apple will, say, create games to compete with Angry Birds or news to compete with The New York Times. Similarly, Facebook has thus far stuck to “utility” features and not competed with game makers, dating apps, etc.

Platforms that are controlled by for-profit businesses that don’t yet have established business models have special challenges. These companies are usually in highly experimental modes and therefore probably themselves don’t know their future core features. The best they can do to mitigate developers’ risks are 1) provide as much guidance as possible on future features, and 2) when developer subsumption is necessary, do so in a way that keeps the developer ecosystem financially healthy – for example, by acquiring the subsumed products.

The least risky platforms to develop on are successful open platforms like the web, email, and Linux. These platforms tend to change slowly and have very public development roadmaps. In the rare case where a technology is subsumed by an open platform, it is usually apparent far in advance. For example, Adobe Flash might be subsumed by the canvas element in HTML5, but Adobe had years to see HTML5 approaching and adjust its strategy accordingly. The predictability of open platforms is the main reason that vast amounts of wealth have been created on top of them and investment around them continues unabated.

I think you could make a strong argument that the most important technologies developed over the last decade are a set of systems that are sometimes called “collective knowledge systems”.

The most successful collective knowledge system is the combination of Google plus the web. Of course Google was originally intended to be just a search engine, and the web just a collection of interlinked documents. But together they provide a very efficient system for surfacing the smartest thoughts on almost any topic from almost any person.

The second most successful collective knowledge system is Wikipedia. Back in 2001, most people thought Wikipedia was a wacky project that would at best end up being a quirky “toy” encyclopedia. Instead it has become a remarkably comprehensive and accurate resource that most internet users access every day.

Other well-known and mostly successful collective knowledge systems include “answer” sites like Yahoo Answers, review sites like Yelp, and link sharing sites like Delicious. My own company Hunch is a collective knowledge system for recommendations, building on ideas originally developed by “collaborative filtering” pioneer Firefly and the recommendation systems built into Amazon and Netflix.

Dealing with information overload

It has been widely noted that the amount of information in the world and in digital form has been growing exponentially. One way to make sense of all this information is to try to structure it after it is created. This method has proven to be, at best, partially effective (for a state-of-the-art attempt at doing simple information classification, try Google Squared).

It turns out that imposing even minimal structure on information, especially as it is being created, goes a long way. This is what successful collective knowledge systems do. Google would be vastly less effective if the web didn’t have tags and links. Wikipedia is highly structured, with an extensive organizational hierarchy and set of rules and norms. Yahoo Answers has a reputation and voting system that allows good answers to bubble up. Flickr and Delicious encourage user to explicitly tag items instead of trying to infer tags later via image recognition and text classification.

Importance of collective knowledge systems

There are very practical, pressing needs for better collective knowledge systems. For example, noted security researcher Bruce Schneier argues that the United States’ biggest anti-terrorism intelligence challenge is to build a collective knowledge system across disconnected agencies:

What we need is an intelligence community that shares ideas and hunches and facts on their versions of Facebook, Twitter and wikis. We need the bottom-up organization that has made the Internet the greatest collection of human knowledge and ideas ever assembled.

The same could be said of every organization, large and small, formal and and informal, that wants to get maximum value from the knowledge of its members.

Collective knowledge systems also have pure academic value. When Artificial Intelligence was first being seriously developed in the 1950′s, experts optimistically predicted they’d create machines that were as intelligent as humans in the near future. In 1965, AI expert Herbert Simon predicted that “machines will be capable, within twenty years, of doing any work a man can do.”

While AI has had notable victories (e.g. chess), and produced an excellent set of tools that laid the groundwork for things like web search, it is nowhere close to achieving its goal of matching – let alone surpassing – human intelligence. If machines will ever be smart (and eventually try to destroy humanity?), collective knowledge systems are the best bet.

Design principles

Should the US government just try putting up a wiki or micro-messaging service and see what happens? How should such a system be structured? Should users be assigned reputations and tagged by expertise? What is the unit of a “contribution”? How much structure should those contributions be required to have? Should there be incentives to contribute? How can the system be structured to “learn” most efficiently? How do you balance requiring up front structure with ease of use?

These are the kind of questions you might think are being researched by academic computer scientists. Unfortunately, academic computer scientists still seem to model their field after the “hard sciences” instead of what they should modeling it after — social sciences like economics or sociology. As a result, computer scientists spend a lot of time dreaming up new programming languages, operating system architectures, and encryption schemes that, for the most part, sadly, nobody will every use.

Meanwhile the really important questions related to information and computer science are mostly being ignored (there are notable exceptions, such as MIT’s Center for Collective Intelligence). Instead most of the work is being done informally and unsystematically by startups, research groups at large companies like Google, and a small group of multi-disciplinary academics like Clay Shirky and Duncan Watts.

Someone asked me the other day whether I thought the United States was vulnerable to a large scale “cyber” attack. While I have no doubt that any particular organization can be compromised, what comforts me at the national level is the sheer diversity of our systems. We have – unintentionally – employed a very effective defensive strategy known as “security through diversity.”

Every organization’s IT system is composed of multiple layers: credential systems, firewalls, intrusion detection systems, tripwires, databases, web servers, OS builds, encryption schemes, network topologies, etc. Due to a variety of factors — competitive markets for IT products, lack of standards, diversity of IT managers’ preferences — most institutions make independent and varied choices at each layer. This, in turn, means that each insitution requires a customized attack in order to be penetrated. It is therefore virtually impossible for a single software program (virus, worm) to infiltrate a large portion of them.

On the web, a particular form of uniformity that can be dangerous are the centralized login systems like Facebook Connect. But this is preferable to the current dominant “single sign on system”: most regular people use the same weak password over and over for every site because it’s too hard to remember more than that (let along multiple strong passwords). This means attackers only need to penetrate one weak link (like the recent Rock You breach), and they get passwords that likely work on many other sites (including presumably banking and other “important” sites). At least with Facebook Connect there is a well funded, technically savvy organization defending its centralized repository of passwords.

I first heard the phrase “security through diversity” from David Ackley who was working on creating operating systems that had randomly mutated instances (similar ideas have since become standard practice, e.g. stack and address space randomization). It struck me as a good idea and one that should be built into systems intentionally. But meanwhile we get many of the benefits unintentionally. The same factors that frustrate you when you try to transfer your medical records between doctors or network the devices in your house are also what help keep us safe.

I got my first computer (TRS-80 Model 1) in 1980 at the age of 8. I got my second computer – an Atari 800 – two years later. I was living in Springfield, Ohio. Very few people were interested in computers in that area then. The only people that seemed to be were engineers at the nearby Air Force base, Wright Patterson. Every month, I used to get my parents to drive me over to meet the engineers there for Atari “user group” meetings.

Like most computer enthusiasts back then, I wanted to program video games. This of course was pre-internet and before the PC boom, so information on computer programming was scarce. At the user group meetings we would trade information as basic as what memory locations performed what functions, or new techniques people had developed (vsync interrupt, page 6 techniques – old school readers will know what I mean). After a while I was increasingly frustrated by the lack of technical information so I decided to write a letter to Atari asking them for manuals. I got a hand written letter back from Alan Kay, who was already quite famous at the time and was working at Atari, along with a giant box full of manuals and technical documentation. I’ve never met the man but I give him a lot of credit for my lifelong interest in computers.

I was reminded of this yesterday when I had the pleasure to meet with Om Malik. Om took time to meet with me years ago when I was struggling to get SiteAdvisor off the ground. No other popular bloggers would meet with me, but Om spent over an hour listening to me talk and giving me advice. I was introduced to Om by Ron Conway who invested in my company despite the fact that the industry experts he introduced me to as part of diligence hated my idea.

People never forget who helps them when they are struggling. It’s a cliche, perhaps, but true – and a good thing to always keep in mind. Thanks Alan, Om, and Ron.