Since the sequence of the human genome was unravelled in the early noughties a vast amount of data has been generated – so much information, in fact, that it needs innovative tools to make full use of it. Enter Genestack.

Even after just five years, Genestack is a classic Cambridge firm that is flourishing as scientific research trailblazes new pathways. Developing the potential of new genomic datasets requires huge resources – if only in terms of intellectual capital. The Genestack route has been to build a bioinformatics platform that “helps you streamline big data R&D by providing a flexible data management infrastructure, visual analytics tools, pipelines and reports”.

So a bit like Linguamatics, then, I say to founder Misha Kapushesky over coffee at the firm’s UK base at the station end of Hills Road? Linguamatics mines text for speech, you mine biological data for possible genomic applications?

“Maybe,” says Misha. “Yes, it could be, but I’d have to think some more about that.”

Misha has an incredible back story which includes having lived in three science-based hubs – St Petersburg, Boston in the US, and Cambridge. Raised in Russia’s second city, he left in 1991 with his family to go to Boston. There he stayed for some years until he went to Cornell University to study Mathematics & Comparative Literature. “I was going to be a literary professor but got sidetracked,” he says wryly.

At Cornell he was introduced to a Harvard-based scientist studying the embryology of the South African clawed toad, so Misha found himself “spending weekends at Harvard getting to understand the frog’s Wnt signalling pathway, which is important for understanding cancers.

“I wasn’t a very good lab person but I got into the computing side and when we had a visitor from Oxford talking about mathematical biology I was fascinated. We struck up a conversation and became friends and I ended up going to Oxford.”

What, just like that?

“He invited me as a visitor for a year in 1998, then I went back to Cornell to do some comparative literature stuff, and I met the woman who became my wife.”

Oh, where was that?

“It was in Prague during the Millennium celebrations – she’s Russian – and I ended up working on a couple of start-ups in Boston, also for a short period with Microsoft in Seattle, then it was decision time, the dotcom era was collapsing in on itself and we decided to go and live in Europe. I read about bioinformatics in a magazine article which talked about a company called DeCode, an Icelandic company which in 2000 had a remit to decode the DNA of the Icelandic population, but my wife wasn’t a big fan of Iceland.”

Unravelled: DNA for each individual is an incredible mission and has become affordable in an astonishingly short period of time

Who were you working for at this point?

“I was sent by the EBI, the European Bioinformatics Institute, it’s the world’s top academic institute, a bit like the World Bank, created a by a number of European countries.” The EBI is based in Hinxton.

And Cambridge?

“I got a job as a scientific application programmer in Cambridge and came here for a contract which began on December 23, 2000. And stuck around.”

By way of completing the circle Genestack has a team of developers in St Petersburg. Misha had “almost completely severed ties but kept the language and a strong cultural connection”. The link was renewed for reasons that were “serendipitous – our investor has a big office in Russia and one of the key people who helped Genestack start up, Nicholai, is also Russian, so they brought together the first couple of programmes and it helped us and it’s evolved from there”.

Read More

Of the division of labour Misha says that “generally but not 100 per cent of the time St Petersburg is more the core software engineering and the bioinformatics, and the client-facing work is done in Cambridge”.

What has been achieved in terms of trans-national business models is in itself noteworthy. “St Petersburg is a very vibrant city, technology-wise, and there’s a tremendous interest in bioinformatics there. I’m proud to say I’ve made some contribution to this interest. In 2010 I was invited to give a series of talks on bioinformatics and organised the Bioinformatics Institute, which has led to the establishment of several research groups, some start-ups and even a bioinformatics hackathon.”

Any immediate competitors?

“Not so much in the UK though our biggest one has a presence in the UK. It’s a global market – having competition is good. And the market is quite big. Just for the informatics investment in pharma – not counting agri-genomics, health or consumer pharma – annual investment is of the order of £8 billion. And that’s growing at between 15 and 20 per cent globally.”

How’s that split up?

“If you look at the way it’s split then a good chunk of that is in the US. If you look at the rest of the world then the UK and Europe together is almost the same size, and the UK is on a par with Europe, so the UK is a big powerhouse of pharma R&D.

Read More

“As a proportion of genomics and sequencing the production costs are going down – it’s getting cheaper – and data management is going up. The progression of this technology goes in leaps and bounds. The last leap was six or seven years ago which was NGS – next-generation sequencing. That allowed David Cameron to announce the 100,000 Genome Project.” This project meant 100,000 UK patients with cancer and rare diseases had their entire genome decoded, leading to targeted therapies which could make chemotherapy “a thing of the past”.

The Human Genome Project, which started in 1990, meant that the first full human genome cost $1 billion to be sequenced and today, as Misha points out, “all across the world people are generating genetic data at tremendous scale”.

Illumina: $100 genome now within reach

“Then, earlier this year, Illumina announced the $100 genome, so it’s becoming commoditised, but finding the right data and the right tools to work with this data is where the challenge is. Everybody is struggling with that.”

Except Genestack, which is of biology and computer science and riding the wave with considerable aplomb. The mission is to find similarities and differences between different genomes, to see important links and draw correct conclusions, and for this bioinformatics is crucial. From inception five years ago Genestack has expanded its services from the pharma sector to agri-genomics and consumer/retail.

Agri-genomics involves analysing how plant genetics work – “plants have bigger genomes than humans” – to optimise crops yields, and retail is another sector altogether.

One of Genestack’s clients is Unilever. Let’s assume they are working on how to perfect a new perfume. One group produces the data for how to make it, another analyses this data, a third offers an interpretation, another makes recommendations, and a fifth has to make management decisions. But it could be food – to work out how one type of produce “reacts with different compounds”.

“How do you integrate all that, especially with health and food safety issues and regulations in mind?” asks Misha rhetorically.

Genestack started out as “an idea to build an infrastructure that I could use at the European Bioinformatics Institute” and now it’s morphed into this huge project. Surely with the speed of this sector’s progress it’s not always possible to see where it’s heading?

“Overall our mission is to develop a system which helps us develop data tools and knowledge which can serve as the easiest and simplest foundation at any scale of bioinformatics. To do that, working with clients helps us as it drives our system into that but it also stimulates us to invest internally – it’s great fun, and also a beefy challenge.”