This chapter is from the book

The era of radically different competition is here. It is a tsunami that has transformed entire industries and left numerous casualties in its wake. Like Gutenberg’s invention of the printing press changing the world through printing, the move toward big data is creating an equally tectonic shift in business and society. Transform or be left behind.

Consider the fate of Borders.1 In l971, the company opened its first store in Ann Arbor, Michigan, when the book industry was a different place. In 2011, 40 years later, the bookstore chain closed its doors. So, what happened? Borders fell behind the curve on embracing the Web and the digital world of data. Not understanding that the rules of the game had changed, Borders had outsourced its online bookselling to Amazon.com. So any time you visited Borders.com, you were redirected to Amazon. Playing by the old rules made this seem like a smart decision. In the new world, however, there was a problem.

To jump on the tails of Amazon and leverage its competitive priorities did not take into account that playing in the digital world was the competitive priority. Relinquishing control to another company would simply cut into the company’s customer base. Also, not understanding that the world was now a digital place, Borders did not embrace e-books, like Amazon and Barnes & Noble. Walking into Borders was like walking into a bookstore of yesteryear. The outcome was predictable.

The competitive world Borders lived in was one where booksellers tracked which books sold and which did not. Loyalty programs could help tie purchases to individual customers. That was about it. Then shopping moved online. The ability to understand and track customers changed dramatically. Online retailers could track every aspect of what customers bought. They could track what customers looked at, how they navigated through the site, how long they hovered over a site, and how they were influenced by promotions and page layouts. They were now able to develop microsegments of individual customers and groups based on endless characteristics. They could then create individually targeted promotions. Then algorithms were developed to predict what books individual customers would like to read next. These algorithms were self-teaching and performed better every time the customer responded to a recommendation. Traditional retailers like Borders simply couldn’t access this kind of information. They could not compete in a timely manner.

And Amazon? With its Kindle e-book readers and convincing hundreds of publishers to release their books on the Kindle format, the company has cornered the market. It has “datafied” books—turning them into quantified format that can be tabulated and analyzed.2 This allows Amazon everything from recommending books to using algorithms to find links among the topics of books that might not otherwise be apparent. Embracing the digital age, technology, and data-driven decisions, the company is moving well beyond wanting to be the biggest bookstore on the Internet. It is moving toward being the most dominant retailer in the world. Amazon understands that this means using big data and technology to manage its entire supply chain in a synchronized manner. In fact, Jeff Bezos, Amazon’s CEO, is known for demanding rigorous quantification of customer reactions before rolling out new features.3 Data and technology have been used to coordinate everything from customer orders to fulfillment, inventory management, labor, warehousing, transportation, and delivery.

Amazon is not the only one. Leading-edge companies across the globe have scored successes in their use of big data. Consider Walmart, Zara, UPS, Tesco, Harrah’s, Progressive Insurance, Capital One, Google, and eBay.4 These companies have succeeded in this game-changing environment by embracing and leading the change. They have used big data analytics to extract new insights and create new forms of value in ways that have changed markets, organizations, and business relationships.

1.1 Big Data Basics

To fully understand the impact of big data analytics, we first need to have a clear idea of what it actually is. In this section we explain big data basics. We define the key concepts of big data analytics and explain how these concepts find novel applications across business. This will set up the book’s subsequent discussions of big data analytics applications in supply chain management.

1.1.1 Big Data

Big data is simply lots of data. The term big data specifically refers to large data sets whose size is so large that the quantity can no longer fit into the memory that computers use for processing. This data can be captured, stored, communicated, aggregated, and analyzed.5 There is no specific definition of the size of big data, such as the number of terabytes or gigabytes. The reason is that this is a moving target. Technology is advancing over time and the size of data sets that are considered big data will also increase.

As the volume of data has grown, so has the need to revamp the tools used for analyzing it. That is how new processing technologies like Google’s MapReduce and its open source equivalent, Hadoop, were developed. These new technologies enable companies to manage far-larger quantities of data than before. Most important, unlike in the past, this data does not need to be placed in neat rows and columns as traditional data sets to be analyzed by today’s technology.

Big data comes in different forms. It includes all kinds of data from every source imaginable. It can be structured or unstructured. It can be a numerical sequence or voice and text and conversation. It can come in the form of point-of-sale (POS), radio-frequency identification (RFID), or Global Positioning System (GPS) data, or it can be in the form of Twitter feeds, Facebook, call centers, or consumer blogs. Today’s advanced analytical tools allow us to extract meaning from all types of data.

1.1.2 Analytics

Analytics is applying math and statistics to these large quantities of data. When we apply math and statistics to big data—often called big data analytics—we can gain insights into the world around us unlike ever before. We can infer probabilities or likelihoods that something will happen.

We are used to this in our everyday life. We are accustomed to e-mail filters that estimate the likelihood that an e-mail message is spam or that the typed letters teh are supposed to be the. The key is that these systems perform well because they are fed with lots of data on which to base their predictions. Moreover, the systems are built to improve themselves over time, by keeping tabs on the best signals and patterns to look for as more data is fed in. Think about “teaching” your e-mail filter that a type of e-mail is a spam by labeling similar e-mails.

It is through big data that Walmart learned that customers prefer to stock up on the sugary treat Pop-Tarts during a hurricane,6 eBay identified which Web designs generate the highest sales,7 and Progressive Insurance learned how to optimize insurance premiums by risk category.8

Even small companies have benefited. Consider the online music equipment retailer The Musician’s Friend. Using basic analytics, the company was able to compare different versions of its Web page to identify customer preferences. The preferred site generated a 35 percent increase in sales over the original home page. This simple change resulted in a measurable improvement on return on investment (ROI).9

1.1.3 Big Data and Analytics: The Perfect Duo

To set the record straight, big data without analytics is just lots of data. We’ve been accumulating a lot of data for years. Analytics without big data is simply mathematical and statistical tools and applications. Tools such as correlation and regression, for example, have been around for decades. In fact, Google’s director of research, Peter Norvig, explained it well by saying: “We don’t have better algorithms. We just have more data.”10

However, it is the combination that makes the difference. It is through the combination of big data and analytics that we can get the really meaningful insights and turn information into business intelligence (see Figure 1.1). Also, big data and analytics build on each other. Continued application of even simple analytical tools results in their improvement, refinement, and sophistication. Consider that as you increasingly identify the number of e-mails as spam, the filter “learns” and becomes better at correctly identifying spam. It is for this reason we use the term big data analytics throughout this book to refer to the application of analytics to these large data sets.

1.1.4 New Computing Power

How can companies extract intelligence out of these huge amounts of data? This is made possible through today’s massive computing power available at a lower cost than ever before. Large data, coupled with larger and more affordable computing power, means that you can do on a larger scale that which cannot be done on a smaller one. Improvements in computing have resulted in large advances in capability. This has enabled high-level analytics to be performed on these large and unstructured data sets.

Processing power has increased over the years just as predicted by Moore’s law.11 The law is named after Intel cofounder Gordon E. Moore and states that the amount of computing power that can be purchased for the same amount of money doubles about every two years. This law has proven correct. We have seen computers becoming faster and memory more abundant. Similarly, storage space has expanded through cloud computing, which refers to the ability to access highly scalable computing resources through the Internet. Cloud computing is often available at a lower cost than that required for installation on in-house computers. This is because resources are shared across many users. Further, the performance of the algorithms that drive so many of our systems has also increased. Therefore, the gains from big data are a combination of the size of current data sets coupled with rapidly increasing processing capability and improved algorithms.

Hadoop is an open source technology platform that has received a great deal of buzz as it was designed to solve problems with lots of data. In fact, Hadoop was specifically designed to deal with big data that is a mixture of complex and structured data that does not fit nicely into tables. It was originated by Google for its own use for indexing the Web and examining user behavior to improve performance algorithms. Yahoo! then furthered its development for enterprise purposes. Hadoop uses distributed applications across many servers. The database is distributed over a large number of machines. Spreading data over multiple machines greatly improves computing capability. Because the tables are divided and distributed into multiple servers, the total number of rows in each table in each database is reduced. This reduces index size and substantially improves search performance. The database is typically divided into partitions called database shards. A database shard can be placed on separate hardware and multiple shards can be placed on multiple machines. Database shards significantly improve performance. The segment of the database placed on a shard can be based on real-world segmentation. This can greatly help analysis—such as separating Canadian customers versus American customers. This makes it especially easy to query a particular segment of the data or evaluate comparisons across segments.

1.1.5 New Problem Solving

Just a few years ago, the topic of analytics and computing power would have concerned only a few data geeks. Today, however, big data is an imperative for all business leaders across every industry and sector—from health care to manufacturing. The ability to capture, store, aggregate, and combine data—and then perform deep analyses—has now become accessible to virtually all organizations. This will continue as costs of computing power, digital storage, and cloud computing continue to drop. These advancements will further break down technology barriers and level the playing field, especially for small and medium-sized firms. Just consider that today an individual can purchase a disk drive with the capacity to store all of the world’s music for less than $600.12 In fact, the cost of storing a terabyte of data has fallen from $1 million in the l970s to $50 today.13

What does this mean for business? Simply put, increasingly sophisticated analytical techniques combined with growing computer horsepower mean extracting unparalleled business insights. It means new and revolutionary problem-solving capability. Big data is not about the data itself. It is about the ability to solve problems better than ever before. And now pretty much everyone can do it.