Done well, big data offers businesses the chance to gain a competitive edge by understanding their customers and staying ahead of market trends.

But managing and storing huge volumes of data requires careful planning. Data security, meeting the requirements of regulators and ensuring critical data is properly backed up is a major challenge for the CIO.

Download this free guide

How to implement your big data project

Learn how to manage and capitalize on big data as well as the latest developments and use cases of Hadoop, Apache Spark, MapReduce and NoSQL.

By submitting my Email address I confirm that I have read and accepted the Terms of Use and Declaration of Consent.

By submitting your personal information, you agree that TechTarget and its partners may contact you regarding relevant content, products and special offers.

You also agree that your personal information may be transferred and processed in the United States, and that you have read and agree to the Terms of Use and the Privacy Policy.

But big data doesn’t necessarily mean big infrastructure, a meeting of IT leaders at Computer Weekly’s 500 Club heard. Like space and time, big data is relative concept, and it does not always mean analysing petabytes of information.

Big data is any data that is too big, moves to fast or doesn’t fit the constraints of your existing databases, says Robert White, executive director for the infrastructure group, at investment bank Morgan Stanley, (see panel below).

“You only need to move into this paradigm when you are exceeding what you can do with the technology that you have,” he told the meeting. “What is big data to me, may not be big data to you.”

Big data may have become a big issue for IT suppliers over the last couple of years. But the truth is that IT departments have been processing large volumes of fast moving data for far longer. The finance industry got to grips with big data 15 years ago, and the principles learned then apply just as well today.

“Ten or 15 years ago, we were working with time series databases,” says White. “It wasn’t called big data, but you look at what it was doing and it was a kind of fire hydrant of non-stop market data, and we captured it.”

The best place to start is to focus on asking the right questions. If you know what you want to find out from your data, then many of the questions over which choice of infrastructure to use begin to become clearer.

Real-time processing is not essential

Often you can get the answers you need without having to process data in real time.

A definition of big data

Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it.

Source: O’Reilly Strata

Businesses can save themselves a lot of money and effort by aggregating data, and analysing it in a more leisurely way.

Most organisations don’t need to respond to every data tick, every like on Facebook, or every hashtag on Twitter, says White.

“Don’t get sucked in, thinking I must pull everything off the internet that has anything to do with my organisation, and what we need now is a big data solution. That is going to be a very expensive way forward,” he said.

How to manage unstructured data

By its very nature, big data, is often unstructured and does not fit neatly into the relational databases used by most organisations. Videos, comments on social media, and comments on Twitter, are not easy to manage.

Sean Sadler

There are specialists data base technologies that can analyse unstructured data. IBM, for example, offers a database called Optim, which is capable of analysing unstructured data from a wide range of sources. The database creates a dictionary, which is able to pull together data on the same subject from different data streams.

But for many organisations, it may well be easier and more cost-effective to convert unstructured data into a format that will work with their existing systems.

“One of the decisions you have to make in your organisation is do you put the investment into dealing with that unstructured data or do you invest in a conversion process,” says White.

Rather than taking raw feeds from Twitter and Facebook, it might make more sense to process the data, add some structure to it and use your existing infrastructure to process it.

Big data suppliers

Most big IT suppliers offer good big data technology and it often makes sense to stick with the suppliers you already have an established relationship with.

Owner of Sybase and creator of the Hana, in-memory database, effectively a relational database with a cache later built on top.

Source: Robert White, Morgan Stanley

“If your organisation is very good at dealing with relational data, and has all the tools for that, maybe you should be looking to convert the data instead into a format that you are used to dealing with and can extract value from,” says White.

If analysing unstructured data is essential, there are specialist tools out there that will help you. But, if your focus is analysing customers comments on social media, it may make more sense to hire an agency to do the work for you.

Companies such as Amazon use a mixture of computer algorithms and human analysis to interpret meaning from social media. For example, it still takes a human to work out whether an exclamation mark in a product review indicates sarcasm or a genuine compliment. So outsourcing this work can be an effective solution.

Another option is to use a third party to clean and aggregate your data before you analyse it, says White. “Go to a supplier who has good credence in your world, and a reputation for cleaning data, and who has some understanding of it,” he said.

Managing historic data

Once you know what data you want to analyse, it is worth considering how long you will need to keep it.

Regulators require Morgan Stanley, as a financial services firm, to record data for 7 to 10 years.

The problem is, says White, that you may back up the data from version 7 of a database. But by the time you need to restore it, the supplier has moved on to version 12, which is incompatible with the original data.

There are two ways to deal with the issue. One common approach is to migrate back-ups to the latest version of the database whenever the supplier upgrades.

Robert White

“So that means you have to actually consciously migrate back-ups,” he says. “It's quite a lot of hassle to deal with.”

Morgan Stanley’s approach is to save data in a generic format, almost a text file, that can be adapted to any future version of the database.

Ensuring data does not deteriorate over time is another potential headache.

In practice, uploading historic data into the database will usually give you confidence that data has not been corrupted, says White.

But if you want further assurance, it is possible to run test processes, and introduce “check-sums” – mathematical functions that allow you to check the authenticity of data – that will allow you to be sure.

Most organisations don’t go quite that far. “I think it's quite a neglected area,” says White. “You always have 101 things to do on the list, and that is probably going to be 102.”

Data scientists are in short supply and they are commanding high salaries.

But most organisations will find existing employees more than equal to the work required, says White.

“It's true that, in some professional fields, you are going to have to employ data scientists, but I don’t think that is going to be the case for most,” he says.

Chris in marketing and Joe in accounts will know the right questions to ask, if you give them the tools to look for the answers, he says.

These days, compression technology means it is possible for non-specialists to analyse huge databases on standard office spread sheets such as Excel. Files of up to 8Gbytes are not uncommon.

“Business users love things like that because it’s a product they are familiar with and they use every day,” says White. “Suddenly it gives them an ability to process the big data volumes we are providing on an infrastructure level, in a tool they can use.”

Locking down the organisation

One of the challenges for any organisation is how to separate personal data from corporate data on the IT infrastructure.

Financial services company Morgan Stanley has sidestepped the issue by banning all personal data from the organisation.

The move is essential for a regulated company that has to guard against market-sensitive data leaking from the company’s trading floors.

“We have different devices for work and personal use so people are not allowed to use their own BlackBerry to log into the firm's systems,” says Robert White, executive director of Morgan Stanley’s infrastructure group.

However, as employees become more used to using their own mobile phones and computers at work, regulated companies will need to remain vigilant.

There has been some talk, for instance, about banning personal mobile phones on trading floors, to ensure market-sensitive data is not passed to third parties.

And Morgan Stanley locks down all PCs, so employees cannot use external social networking sites.

“Definitely regulators are worried about social media, almost as a way of doing insider trading,” says White.

Regulators have the capability to monitor company IP addresses for potential breaches, White revealed

“Fortunately we can hide behind the regulators a little bit. We can say that even if we wanted to permit personal devices or social media, the regulators would just not be happy about that,” he says.

The challenge with multiple spreadsheets will be to ensure the company has a “single version of the truth”. Companies will need to get a lot smarter about centralising their spreadsheets.

Choice of supplier

Choosing which supplier to go with is always difficult. The good news is that most of the major suppliers have got to grips with big data. So it makes sense to work with the suppliers you have already built relationships with, says White.

Companies such as Oracle, IBM and Microsoft have been around a long time, and they are aware of the pitfalls of data analysis.

“Don’t necessarily think it’s a new world, and jump to a different provider,” he says. “If you are used to dealing with these people, leverage those relationships.”

These companies have strong developer communities and they are thought leaders in their fields. Big data may not be there yet, but it is only a matter of time before non-relational databases reach the state that relational databases are in today.

Keep calm and carry on

Fundamentally, however, big data is no different to any other IT problem. IT has already gone through batch data processing and real-time data processing.

Now it is big data processing, but while the technology is changing, the same principles of common sense and good business practice apply.

“Don’t let the hype suck you in,” says White. “Keep your head up. Don’t panic, and just apply the same logic that you would apply to everything else that you do.”

Enterprise storage strategy checklist

Sean Sadler, head of infrastructure at Kings College, offers a checklist.

Optimise your network

Consider the speed and frequency with which you need to access data.

If your network performance is critical, then you will need tiered storage: small, low-capacity disks for high-performance applications and lower-speed, high-capacity disks for less critical applications.

Consolidating your systems will improve the performance of your network, particularly if your systems are distributed.

If you have high performance computing requirements make sure you have the disk capacity necessary to support that.

0 comments

Register

Login

Forgot your password?

Your password has been sent to:

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy