Amr Awadallah is the chief technology officer and co-founder of Hadoop distributor Cloudera.
Previously, he was the vice-president of product engineering at Yahoo. On a recent visit to London,
he briefed Computer Weekly on how the supplier is evangelising the concept of an ‘enterprise
data hub’ in counterpoint to the established enterprise data warehouse.

There are two audiences for big data. Those who are interested in the concept and those who
are doing it. How do you approach these?

There are people at all stages: some not sure what to do and dabbling, and customers who are all
in. It’s normal with any new technology to have an adoption cycle.

Is there less of a need to evangelise with big data technologies? Aren’t the problems more
obvious than usual?

No, we are still at the beginning. There are some use cases that are about operational
efficiency, just saving money. People do get these right away. But to sell the full vision of what
we are calling an ‘enterprise data hub’ – that does require more evangelising, though customers
have been receptive.

More on enterprise data platforms, warehousing

Cloudera’s mission is to enable customers to use all their data to ask bigger questions. ‘All’
is a key word. It’s not big data, but all data. It’s having a holistic view of your customers.

The example I like to give of ‘all data’ is the ATM machine. Ten years ago the only thing
recorded was the explicit transaction. Today, we can collect implicit information, such as your
face, how you interact with the touchscreen, whether you have a mobile device with the bank’s app,
and the information around scanning
pictures of cheques. This all makes fraud detection better.

‘Asking bigger questions’ is important. Traditional software has been focused on using SQL to
ask questions. Now, SQL is powerful, but there are a lot of questions you can’t ask with it. You
can’t do image processing or voice recognition in SQL. You cannot scan PDFs using it.

The ultimate use case for us is a ‘customer 360’, having a 360-degree view of the customer. That
solves the data silos problem, data from different channels. Our platform allows the breaking down
of those silos.

Cloudera is a Hadoop distributor. Explain what makes this positioning a development?

Cloudera's mission is to enable customers to use all their data
to ask bigger questions

It’s not a departure from what we have been doing. But it’s a better language for business.
Eighty per cent of Hadoop distributions are ours. But we have technologies alongside Hadoop. Also,
Hadoop itself is morphing, as with YARN
opening it up. Five years ago, all you could do with Hadoop was a MapReduce operation.
Yarn allows other applications to run on top of the data, such as interactive SQL, which [Cloudera’s]
Impala allows you to do.

We also now have a natively integrated search function. We have integration with SAS, and Splunk
– with Hunk running natively on Hadoop. Also, Informatica’s ETL engine runs natively inside
the Cloudera platform.

The analogy we like to use is that we are the smartphone of data, as opposed to the SLR digital
camera. Enterprise data warehouses are the SLR digital cameras of the data world. They are
expensive and they only do one thing – in the case of the data warehouse, run queries over
structured data. The ‘enterprise data hub’ is like a smartphone. The smartphone is convenient and
applications can all share the data. It is the same with us. The model is that the applications
come to your data instead of your farming out your data to silos, which prevents a 360 view.

Our approach is more economical than traditional enterprise data warehousing. The cost for a
terabyte of data with us is around $1,000. You can pay $100,000 per terabyte to store data that you
don’t use in traditional data warehouses, let’s say data you haven’t looked at for six months. We
offer an active archiving system for that.

More on Hadoop

We do work with Teradata on the integration front. And we have partnerships with Oracle, with
its Big
Data Appliance, and with HP on the Vertica system. There will be use cases for which an SLR
camera is the right device.

A phrase you often hear attributed to big data projects in large companies and organisations
is ‘science project’. Are they getting beyond that to enterprise deployments?

First, 60% of the Fortune 500 are using Cloudera, in production, not in science projects. Three
of the top four credit card companies in the world are using us for fraud detection. Now, these
production use cases don’t necessarily add up to the 360-degree view. About 20 of our 300 paying
customers are doing that, though none in the UK or Europe as yet.

Europe is where the US was two years ago. In the US, there is the Federal government (by which I
do not just mean Intelligence) and there is Monsanto. Monsanto use the platform to collect
experimental data from sensors in fields. They measure temperature, soil composition, humidity, the
rate of growth of the plants. They are looking to come up with more efficient seeds for different
environments across the world. They reckon that over the next ten years humans will consume more
than over the last 100 years. I would never have envisaged the Monsanto use case for our technology
when we started out five years ago.

Sectorally, what is the customer base like?

Hadoop provides a system that is much more flexible, so you can
add new columns and data types quickly

The top sectors for us are retail, web companies (including eBay), telecoms (both infrastucture
and the mobile device manufacturers Nokia, Motorola Mobility and RIM) and oil and gas, genomics,
smart energy, automotive and construction equipment makers.

It is a large organisation thing. This is not a small business technology, save for the web
startups, such as Box.net, King.com,
and so on. Anywhere where there is a data explosion.

How would you sum up the business value of what you are trying to do?

We are trying to provide agility, to lower the cost of curiosity. There is a high cost of
curiosity for organisations today. For example, at Yahoo I ran the IT infrastructure. The business
would come to me wanting, say, a new column for a data models. That’s hard work with an enterprise
data warehouse. It takes weeks, months.

So, I’d say: “how much value will this create for the business?” And they would say: “we can’t
tell you the value till you add the column". That prevents the business from innovating. You need a
system that is much more flexible, so you can add new columns and data types quickly. Hadoop gives
you that. You can experiment more easily. The SLR camera will not go away, but for the right use
cases.

Email Alerts

By submitting my Email address I confirm that I have read and accepted the Terms of Use and Declaration of Consent.

By submitting your personal information, you agree to receive emails regarding relevant products and special offers from TechTarget and its partners. You also agree that your personal information may be transferred and processed in the United States, and that you have read and agree to the Terms of Use and the Privacy Policy.

Google is the latest of the tech giants hiring Wall Street hotshots. The CIO lesson? Partner with your CFO if you want to get ahead. Also in Searchlight: Facebook turns Messenger into an ecosystem; Twitter faces a gender bias lawsuit.