Amazon EFS enables us to be more experimental. We can prototype whatever we want, and we know the storage will be there."

Jon Pospischil Co-founder, Custora

The Challenge

Next-Generation Segmentation

Data is a gold mine for today’s brands, enabling them to personalize offers, increase engagement, and boost sales. However, with so much data flowing from so many sources—orders, websites, email, customer relationship management (CRM) systems, and more—it can be challenging for marketing teams to get a unified view of their customers. Additionally, getting value from that data requires applying complex predictive models that are costly to develop and run.

Custora brings all that data together in one place, enriches it with its predictive models, and unifies it into individual customer profiles. Then, it applies predictive models to surface preferences for products, price points, and discounts, as well as to delineate the customer lifecycle and identify those customers who are most likely to return products. The result is similar to traditional customer segmentation—just faster, more accurate, and more useful.

“Instead of sending out requests to the BI team and waiting weeks for results, we enable marketers to get insights and deploy campaigns immediately,” says Jon Pospischil, cofounder of Custora. “We help them to earn more with what they already spend on marketing while increasing customer engagement and building lifetime loyalty.” Custora’s success speaks for itself: the company counts 7 of the 20 largest retailers in the United States among its customers.

Why Amazon Web Services

Big Data, Big Results

Delivering the next generation of customer engagement requires highly scalable computing and storage to ingest, transform, and analyze massive amounts of data—which is why Custora’s solution is built entirely on Amazon Web Services (AWS). Using Amazon Elastic Compute Cloud (Amazon EC2) gives Custora the ability to ingest almost any amount of data, run sophisticated predictive models, and obtain results in minutes rather than hours or days.

The data Custora uses for its analytics come from retail point-of-sale systems, e-commerce websites, customer databases, and third-party systems, typically in nightly or weekly batches. A large cluster of Amazon EC2 instances performs Extract, Transform, and Load (ETL) processes to normalize the data for analysis. Then, the data is pushed from a caching layer into a variety of analytics tools, such as Hadoop, Apache Spark, or Custora’s custom predictive-modeling pipeline for analysis.

“Our predictive-modeling pipeline uses more than 20 different models, all operating on the same underlying data structure,” says Pospischil. “There are a lot of computations that can be shared across all the different models. We built a caching layer that takes intermediate results and shares them across the modeling processes to increase compute efficiency.”

Operational Complexity, Managed

Both the data-ingestion ETL pipeline and the analytics caching layer are foundational to Custora’s ability to deliver its services. Both use massively parallel processing to speed results, meaning multiple Amazon EC2 instances need to be able to access data at the same time.

Initially, to enable parallel processing, Custora was using a self-managed RAID cluster of Amazon Elastic Block Store (Amazon EBS) volumes running Network File System (NFS) protocols. However, as the company grew, it found it had to regularly increase volume sizes, requiring it not only to set up new volumes but to replicate many terabytes of data. “Whenever we did those transitions, there was downtime,” says Pospischil. “Additionally, replicating and resizing volumes required significant time from our operations staff.”

The company decided to adopt Amazon Elastic File System (Amazon EFS) to address these issues. Amazon EFS provides a simple, scalable file system for use with Amazon EC2 instances. It scales elastically, growing and shrinking storage automatically. Because Amazon EFS uses the standard NFSv4.1 protocol, Custora was able to seamlessly integrate it with existing applications and tools. Multiple Amazon EC2 instances can access Amazon EFS at the same time, supporting the parallel-processing capabilities that enable Custora to speed results to its customers. With Amazon EFS, Custora has consolidated 81 TB of data into a single, unified solution.

The Benefits

Innovation Without Hesitation

“The major benefit of moving to Amazon EFS is the reduced operational burden,” says Pospischil. “With Amazon EFS, we don’t have to think about how much storage we have provisioned, even when we’re taking on large customers.” The company no longer has to overprovision storage or perform rigorous capacity planning, which enables it to use the fully managed Amazon EFS solution at a cost comparable to self-provisioned Amazon EBS storage.

Simplified operations lead to greater uptime because the company no longer has to stop processes to mount new volumes. “We initially considered mounting a new volume for each customer, but there was no partitioning strategy that worked for all our customers,” says Pospischil. “With Amazon EFS, our file system scales to meet our needs.”

This elasticity also makes it easier for Custora to try new things. “Amazon EFS enables us to be more experimental,” says Pospischil. “If we were bound by volume sizes, we had to be more methodical about testing new things to make sure we didn’t run out of space. Now, we can prototype whatever we want, and we know the storage will be there.”