When To Use Big Data

With all the current excitement around big data, many are eager to get started with a big data project. But not all projects are big data projects.
Remember, just having large amounts of data does not necessarily require big data technologies. Thus, project managers must be capable of advising
management not only about how, but also when, the business can and should take advantage of big data technologies.

Do You Really Need a Big Data Solution?

Initial Considerations

Before beginning a big data project, companies must ask the following basic questions:

“The first, most obvious question is “Why do this at all?” There should be a compelling use case, a competitive driver, cost driver, or some other
issue that has been identified where the application of big data technologies is in the critical path to solving the problem. Typical drivers
include the information type (for example, under-utilized structured information sources), or the volume of information (retention of IP logs),
but in any case, you need to identify exactly why you are pursuing this path.”

“One of the most important things you should look for is a compelling
ROI (return on investment). That is to say, find something for which you can put a value on the cost of the problem before you plan
a solution (see Selecting Your First Big Data Project).

Are you using the basic data you already have in a way that engages consumers?

“The biggest reason that investments in big data fail to pay off, though, is that most companies don’t do a good job with the
information they already have. They don’t know how to manage it, analyze it in ways that enhance their understanding, and then make
changes in response to new insights. Companies don’t magically develop those competencies just because they’ve invested in high-end analytics tools.
They first need to learn how to use the data already embedded in their core operating systems, much the way people must master arithmetic
before they tackle algebra. Until a company learns how to use data and analysis to support its operating decisions, it will not be in a
position to benefit from big data (see You May Not Need Big Data After All).

Additional Considerations to Help Determine if a Big Data Solution Is Required

Once you have answered the above questions, there are additional considerations. Remember, just because your data is too
big for Excel does not necessarily mean your data is “big data”.

A Test to Determine If You Need a Big Data Solution

The following offers a 4-point test based on volume, velocity, variety, and variability of data to help determine if you
really need a big data solution:

Points to Consider When Determining If You Need a Big Data Solution

The following excerpt from SyonCloud provides several
points for consideration to help determine if a Big Data solution is required.

If your relational databases do not scale to your traffic needs for acceptable cost of hardware and/or licenses.

If normalized schema of your relational database became too complex. If too many tables hold just tiny proportion of overall data.
You can no longer print ERD on single A3 page.

If your business applications generate lots of supporting and temporary data that does not really belong to main data store.
Such data includes customer's search results, visited pages, historical share prices, contents of abandon shopping carts and so on.

Your database schema is already denormalized in order to improve response times of your applications.

When joins in relational databases slow the system down to a crawl.

Relational data doesn’t map well to typical programming structures that often consist of complex data types or hierarchical data.

Data such as XML is especially difficult because of its hierarchical nature. Complex objects that contain objects and lists inside of them
do not always map directly to a single row in a single table.

If documents from different sources require flexible schema or no schema at all. If it is required to keep input data in its original formats.

If ETL (Extract Transform Load) is required on source data. NoSQL engines or Map/Reduce can perform ETL steps and produce output suitable to
load into a RDBMS.

If missing data can be ignored when the volume of data is large enough. The law of Big Data is “More data beats clever algorithms.”

When flexibility is required for analytics. It allows experimentation into what questions we should be asking before defining a fixed data model.

In NoSQL databases each data element or each document is versioned. This enables queries for values at specific time in history.

When we need to utilize outputs from many existing systems. An example is: In order to prepare relevant offer to a customer we need
information from billing system, from historical orders of the customer, from orders of similar customers as well as from stock system and CRM system.
Traditional integration of all the systems is expensive and not very flexible.

When we need to analyze unstructured data such as documents, log files or semi-structured data such as CSV files, forms and
exports from other systems.

More General Tips to Consider Before Starting a Big Data Project

And here are a few more general tips to consider before undertaking a big data project:

A big data analytics solution should be a business decision, not an IT decision.