What You Need To Know About Big Data, But Were Too Afraid To Ask

I had reconnected with an old friend a few days ago (the magic of Facebook). Then he calls me on Sunday (Philly to Norway – he clearly thought this was important) with a single question. “So Jane, you said you’re working with big data analytics. What actually is big data?”

Now he is an IT guy, and understands most things computers. But as a term, big data has been subject to so many attempts to define it, that the risk is that many people are completely bamboozled.

My personal favourite statistic – a full 9% of the organisations that have already invested in big data projects have trouble understanding what big data is. You have to wonder what they invested in. The Emperor’s new clothes?

In sympathy, I offer up this hypothetical Q&A to anyone out there confused about big data but too afraid to ask.

Q: Is big data just data that is big?

A: Although the name would suggest that, we actually use the term big data to describe any data that, for one or more reasons, doesn’t fit in the traditional database software tools that have been used for analysis and business intelligence over the last few decades. For example, it could be that it doesn’t fit well in a relational database for querying (the pixels of an image), or that it wants a different type of processing before it can be usefully joined with other data (time series data from equipment).

Q: Haven’t we always had big data in Oil and Gas?

A: Yes! Seismic surveys, and sensor data stored in historians are two examples. Because they are big and unwieldy and don’t behave well in typical database tools, we have restricted them to predefined workflows and application silos. As a consequence, we are unwittingly limiting ourselves from finding accurate answers to critical business issues. The current big data movement is all about making it possible to use this awkward, logistically-challenging data in new ways to answer more questions.

Q: What’s the current big data movement all about?

A: It’s about being able to use all the data at our disposal - whether it is images, video, audio, natural language text, machine-readable text, sensor data, or plain old-fashioned relational data in a database. Whether there is megabytes of it, or terabytes of it. Whether it is information from a snapshot in time, or data constantly streaming in.

Q: But how? The whole point is that this data is difficult to manage.

A: It will take different IT solutions to manage and query this data compared with what has been used for “traditional” data. There’s much that we can learn from dotcomland – guys like Yahoo, Google, and eBay – who are pioneering new tools and techniques. The types of data they are using on a daily basis are very similar to the ones the Oil and Gas industry have historically struggled with. They interrogate terabytes of web server logs to get deeper understanding of customer interactions; they use natural language processing, including sentiment analysis, for analysing social media content. And with the proliferation of Internet of Things, which encompasses “wearables” like FitBit or Apple Watch, sensor data is a big focus of theirs too. .

Here’s what the transport industry is doing with their big data. Sensor data (a big data source) that monitors engine behaviour and engine performance can be combined with its engine or vehicle master data, its repair history, its service and utilisation history (all data sources that they had before big data came along), allowing operators to accurately predict when that engine will fail. For train operators, airlines and delivery companies, this means that they can now plan to take a vehicle out of service for preventative maintenance instead of waiting for the vehicle to break down, leaving passengers or goods stranded.

Now translate that to Production Operations. If we combine our sensor data with well information, maintenance records, subsurface geology and topside conditions (like weather), we could improve our maintenance plans, logistics, supply chain – lowering costs for the business while avoiding unplanned shut-ins.