Beware the data monopoly

21 July 2013

I’m convinced that the future of software lies in data. Data has always been important but now we actually have cheap ways of analyzing it with constant improvements in data extraction and machine learning algorithms. We’re also tethered to our digital devices which are collecting tons of data that’s waiting to be analyzed.

I worry that it’s going to get increasingly more difficult to build a software startup in the future as large companies develop data monopolies. Imagine trying to write language translation software without having access to Google’s data? Or trying to do audio transcription by relying on publicly available data? It’s going to be impossible to compete by relying on publicly available data source while large companies build out their internal data monopolies - especially by using their existing products to subsidize the cost of collecting this data. Data also begets more data. By giving us great experiences, we’re willing to provide more and more information that is then used to launch new products which have us surrendering more and more data.

No matter how good an algorithm is it still needs data to be useful and I hope we’re not shooting ourselves in the foot by volunteering our data so easily. I’d love to see companies that collect user-contributed information be required to have it shared with their users so that they can have it used by other services. It’s not going to solve everything but it’s a step in the right direction.

Successful startups have always had to overcome challenges so the data monopoly problem will just be more of the same and should hopefully lead to some new approaches. An example that comes to mind is how Duolingo is able to generate revenue by selling document translations that are transformed into language lessons that are then done freely by the community. I’m excited to see new business models that are able to innovate past this data gap.