How I built a tool which saved Millions for an e-commerce giant

Data Analytics industry has seen a boom all across the world, with a double digit growth forecast. Everyone’s analyzing data in one form of another to drive industries and governments. With growing competition and each product competing heads on, the need to analyze data as fast as possible becomes elementary. Whether that be sales, marketing, finance or customer domain — there is a dire need to build products which enhance this process and help us make data-driven decisions as fast as possible.

A product to further advance capabilities of your product?

I built a simple product that could be used by the sales, marketing, and finance teams to drill down on their core metrics without engineering help. The motivation for building such a product was three-fold:

large pre-processing times required to query 100s of terabytes,

large number of weekly adhoc requests,

consolidating redundant business performance reports required by different teams

The product layout had to be simple so that it could be easily scaled across the organization without much learning. At the same time it had to solve most critical problems for business.

The key to building an amazing product — Focus on your customer and their needs, and the right product just follows suit.

Building the PRD

The first step required understanding the businesses and the existing solution for them. I discussed the requirements with several business teams, like customer, marketing, finance, and more. This allowed me to understand the most critical metrics for each team and how they helped create an impact across the business.

With an initial view of business cases at hand and the required metrics, the next challenge for me was to design a product that was simple to use and presented metrics visually to get quick insights.

Product Concept Design

In the product concept design phase, I combined the requirements of different teams into a uniform workflow. This was critical from two perspectives

Redundant data requests require huge cluster time which is expensive,

a uniform workflow is more object oriented and follows basic computer science principles and data normalization patterns. This makes the code more maintainable and scalable.

The product was structured with the following views:

Customer EngagementA conversion funnel is a must-have for e-commerce industry. It was added to the view to better understand user conversions from a visitor to a customer. Analyzing metrics such as visits, add to cart across marketing channels was important, as it helped understand factors which drive user conversions.

A sales tree was added alongside to help drill down into each visit and checkout steps to understand business granularity. Designing these as part of same view made intuitive sense as one is a byproduct of another.

Deep Dive Analysis for Business VerticalsThis view showed business performance for each business vertical at its most granular level to detect performance anomalies. A click on any of the verticals in a chart, e.g., ‘grocery’ for the vertical department will update other charts to show revenue for grocery through each marketing channel and so on.

Consolidated KPIs for Tracking Performance This provided rolling 2 years data across 10+ levels, such as marketing channels, browser, device for 50+ KPIs. The data contained details such as revenue, sales, margin, and conversion rate. The table had advanced filtering and sorting options for variables and values along with data download capabilities.

A prototype of the design was discussed across business and teams to cross reference for various business cases.

Engineering Challenges

To build a successful product, it is very important to focus on a well designed data pipeline and plan for scale for future engineering failures. There was a two step approach that was put together to combat such problems

Data Pipeline & ValidationThe data backend was restructured to obtain data rolled up for various levels and time frames. The new table creation codes ran for nearly 6 hours. These were structured to run in parallel with other aggregation queries to unify the tables.

Maintaining data integrity and sanity across data sources along with an optimized query were the key factors here. Hence, a step was added to compare latest datapoint with past week to check for correct data loads in the backend.

Caching & PreprocessingData snapshots were saved to bypass cluster failures and incorrect data loads. This prevented wrong data from loading in the tool frontend and helped build a more trusted framework for business.

The final product was developed in R Shiny. It offers design flexibility and multiple data manipulation libraries for parallel processing.

Product Impact

The product is being used across business for driving insights and generating quick reports & visualizations. This helps them dive into the key factors for business growth maximization.

The product also eliminated the need to focus on individual automated reports and redundant data querying. This helped teams focus on achieving business goals.

Time and Cost Savings

Number of dev hours / query = 2

Number of running hrs / query = 1–3 hrs (avg. 2)

Number of requests / week = 30–60 (average 45)

Total Hrs / week spent = 180 hrs.

Average salary of a data engineer = $125k / annum = $65 / hr

Total Savings over 2 years =

$65 * 180 * 100 weeks / 2 years = $1.2 Million

(This doesn’t include the time wasted due to cluster failures which was prevented because of caching).

Tool development time → 3 months * 3 people

Product Scaling

This product is currently being scaled to include views for customer forecasting and future business performance. This can provide a holistic view of business performance connecting past, present and future data.

The old solution which required manual work was not scalable to accommodate companies’ increasing analytical demands. The current solution eliminated the need to manually query the data and provided consolidated solution for the companies’ growing needs.

Passionate about building products that help increase team productivity, and in-turn drive revenues for the company.

Let me know if you have faced similar problem within your team and how you built solutions around it.