Although Summer is starting to ebb into Autumn in the northern hemisphere, it’s just getting going south of the equator, so there is still time to profile another start-up in our on-going series of profiles!

Introducing Mendeley

Today I’m very happy to introduce you to Mendeley, a London based startup that harnesses cloud computing to help the academic community manage existing libraries of research, discover new research and collaborate with researchers around the world. They are simultaneously building the worlds largest crowd-sourced database of research covering all disciplines from Arts to Zoology. Mendeleys software also anonymously aggregates all usage data in the cloud and tracks what articles are being read, by whom, when and how often.

Like a lot of great ideas, the founders of Mendeley set out to solve their own problem, and came up with the concept for Mendeley while studying for higher degrees in business, psychology and machine learning. The team includes many people with backgrounds in software development, academia and publishing.

I spoke to Dan Harvey, a Data Mining Engineer at Mendeley about how they came to use AWS:

“We started out buying our own hardware 34 years ago. Initially our main reasons for using AWS were due to being able to scale up far more quickly and cheaply than we could ourselves for document storage. Over time this is still true with regard to cost and scaling, but the elastic properties of EC2 mean we only have to pay for resources when we are using them. More recently we’re finding that AWS gives our developers more flexibility to have the resources they need to test out new code and ideas, rather than stepping on one another’s toes on shared servers”

Mendeley are using a wide collection of AWS services to power their fast growing business, which now manages over 100 million papers.

“We wanted to produce previews of these documents for use on our article pages on the web. This was done using a combination of Elastic Beanstalk to host a Java app to render PDFs into raw images, S3 to store the data, CloudFront to serve the images to end users, and SQS to glue this all together”, said Dan.

Data driven

With such a rich collection of documents and data, Mendeley also provides tailored recommendations to its users, making use of Elastic MapReduce, and Mahout. Dan Harvey continues:

“Our latest use of AWS is with the Apache Mahout project. This is distributed collaborative filtering on top of the Hadoop framework; we use it to provide tailored recommendations for our users. We have our own Hadoop cluster internally but chose EMR for this because Mahout requires a different task granularity to our existing workload; we can optimise Hadoop on EMR for the specific recommendation task. It also allows us have a simple way of calculating the daily cost of recommendations based on the on-demand EC2 instances EMR uses with each run with a multi-use Hadoop cluster it is very hard to allocate costs between the different tasks that run on the shared infrastructure. Finally, when we’re done running recommendations, we can shut the cluster down and it costs us nothing.”

Introduction to AWS

Dan will join us to talk about Mendeley’s use of AWS in more detail at our upcoming Introduction to AWS event in London, where newcomers to the cloud can learn about how to build scalable, elastic applications on AWS. Attendance is free, but you’ll need to register.

More information

Mendeley have their own API, with which developers can build applications… for science! The Mendeley Binary Battle, an API competition judged by Amazon CTO Werner Vogels and others, runs until the end of September.

If you’re a start-up running on AWS, don’t forget that there is still time to enter this year’s AWS Start-up Challenge, a worldwide competition with prizes at all levels including $100,000 in cash and AWS credits for the grand prize winner. Learn more, and enter today.