Seattle's Best Tech Team

Main menu

Category Archives: Others

At Zulily, our need to perform various dependent actions on different schedules has continued to grow. Sometimes, we need to communicate inventory changes to advertising platforms. At other times, we need to aggregate data and produce reports on the effectiveness of spend, conversions of ads or other tasks. Early on, we knew that we needed a reliable workflow management system. We started with Apache Airflow version 1.8 two years ago and have continued to add more hardware resources to it as the demands of our workloads increased.

Apache Airflow is a workflow management system that allows developers to describe workflow tasks and their dependency graph as code in Python. This allows us to keep a history of the changes and build solid pipelines step-by-step with proper monitoring.

Our Airflow deployment runs a large majority of our advertising management and reporting workflows. As our usage of Airflow increased, we have made our Airflow deployment infrastructure more resilient to failures leveraging the new KubernetesPodOperator. This post will talk about our journey with Airflow from Celery to KubernetesPodOperator.

Our First Airflow 1.8 Deployment using Celery Executor

With this version of Airflow we needed to maintain many separate services: the scheduler, RabbitMQ, and workers.

Components and Concepts

DAG is the one complete workflow definition code that is composed of tasks and their dependencies with other tasks. The AWS Elastic File Share contains the code for the DAGs.

Git Syncer is responsible for polling and getting the DAG code from Zulily’s Gitlab at regular intervals of 5 minutes and putting the code on the AWS EFS.

AWS EFS is the file share that has all the DAG code. It is mounted on the Webserver pod and Scheduler pod.

Webserverpod hosts the Airflow UI that shows running tasks, task history and allows users to start and stop tasks and view logs of tasks that already completed.

Scheduler pod reads the DAG code from AWS EFS and reads the scheduling data from the Airflow Metadata DB and schedules tasks on the Worker pods by pushing them on the RabbitMQ.

Airflow Metadata DB contains the scheduling information and history of DAG runs.

Workers deque the tasks from the RabbitMQ and execute them copying the logs to S3 when done.

Advantages

Having the ability to add more worker nodes as loads increased was a plus.

By using the Git Syncer we were able to sync code every 5 minutes. From a developer point of view, after merging code in master branch it would automatically get pulled to production machines within 5 minutes.

Disadvantages

Multiple single points of failure: RabbitMQ, GitSyncer

All DAGs and the Airflow scheduler comprised of one application that shared packages across the board. This means that there was one gigantic Pipfile and each package in that had to be compatible with all the others.

All the DAGs had to be written in Python which restricted the ability to re-use existing components written in Java and other languages.

Our Current Airflow 1.10.4 Deployment using KubernetesPodOperator

In Airflow version 1.10.2 a new kind of operator called the KubernetesPodOperator was introduced. This allowed us to reduce setup steps and make the overall setup more robust and resilient by leveraging our existing Kubernetes cluster.

Differences and New Components

DAG continues to be a Python definition of dependencies. We used the KubernetesPodOperator to define all our DAGs. As a result, our DAG becomes a tree of task containers. We used the LocalExecutor to run our DAGs from the scheduler.

Temporary Task Pods run Task Containers which operate like any other container and contain the business logic needed for that task. The key benefit is that there is no need to bundle in any Airflow specific packages in the task container. This is a game changer as it allows us to use pretty much any code, written in any language, that can be made into a container, to be used as tasks inside an Airflow DAG. For our Python DAGs, it also breaks up the giant Pipfile into smaller Pipfiles, one per Python task container, making the package dependencies much more manageable.

GitSyncer goes away. Git Syncer was polling Zulily’s Gitlab every 5 minutes. We avoid that by using kubectl cp command during the CI CD. After developer merges code to master, as a step in CI CD, the updated DAG definition is copied over to the AWS EFS. Hence, the need for polling every 5 minutes is eliminated using this push-based approach.

Webserver and Scheduler containers both run on the same pod. Starting from the Airflow Kubernetes deploy yaml, we removed the portions for setting up the git sync and created one pod with both Webserver and Scheduler containers. This simplified deployment. We used a minimal version of the Airflow Dockerfile for our Webserver and Scheduler containers.

Scheduling: When the scheduler needs to schedule a new task, using the Kubernetes API, it creates a temporary worker pod with the container image specified and starts it. After the task has completed, the logs are copied over to S3. Then the worker pod ends. By following this approach, the task worker containers are automatically distributed over the whole Kubernetes cluster. In order to increase processing capacity of Airflow, we simply need to scale up Kubernetes which we do using Kops.

It creates the short-lived Task pod with the Task container image specified.

Task executes and finishes.

Logs are copied to S3.

Task pod exits.

Resiliency & Troubleshooting

Using a single pod for Airflow webserver and scheduler containers simplifies things; Kubernetes will bring this pod up if it goes down. We do need to handle orphaned task pods.

Worker pods are temporary. If they error out, we investigate, fix and re-run. The KubernetesPodOperator provides the option to keep older task containers around for troubleshooting purposes.

As before, Airflow Metadata DB is a managed AWS RDS instance for us.

The DAGs volume is also an AWS EFS. If we were to lose this in the case of a catastrophic, although unlikely event, we can always restore from our source repository.

Kubernetes cluster has been working for us without any major resiliency issues. We have it deployed on AWS EC2s and use Kops for cluster management. AWS EKS is something we are exploring at the moment.

Conclusion

Airflow continues to be a great tool helping us achieve our business goals. Using the KubernetesPodOperator and the LocalExecutor with Airflow version 1.10.4, we have streamlined our infrastructure and made it more resilient in the face of machine failures. Our package dependencies have become more manageable and our tasks have become more flexible. We are piggy-backing on our existing Kubernetes infrastructure instead of maintaining another queuing and worker mechanism. This has enabled our developers to devote more time to improve customer experience and to worry less about infrastructure. We are excited about future developments in Airflow and how they enable us to drive Zulily business!

Share:

Like this:

Learn how Zulily and Sounders FC get the most out of their metrics!

On Tuesday, September 10th, Zulily was proud to partner with Seattle Sounders FC for a tech talk on data science, machine learning and AI. This exclusive talk was led by Olly Downs, VP of Data & Machine Learning at Zulily, and Ravi Ramineni, Director of Soccer Analytics at Sounders FC.

Zulily and Sounders FC both use deep analysis of data to improve the performance of their enterprises. At Zulily, applying advanced analytics and machine learning to the shopping experience enables us to better engage customers and drive daily sales. For Sounders FC, the metrics reflect how each player contributes to the outcome of each game; understanding the relationship between player statistics, training focus and performance on the field helps bring home the win. For both organizations, being intentional about the metrics we select and optimize for is critical to success.

We would like to thank everyone who attended the event for a great night of discussion and for developing new ties within the Seattle developer community. For any developers who missed this engaging discussion, we invite you to view the full presentation and audience discussion:

Acknowledgments:

Thanks to Olly Downs and Ravi Ramineni for presenting their talks, Sounders FC for hosting, and Luke Friang for providing a warm welcome. This would not have been possible without the many volunteers from Zulily, Bellevue School of AI for co-listing the event, as well as all the attendees for making the tech talk a success!

“As you grow in your career, you are being sought for your leadership and critical thinking skills, and for your ability to diagnose and solve problems, not regurgitate facts.”– Kelly Wolf, VP of People at Zulily

“I wouldn’t be where I am if it wasn’t for my mentors. We need to push more, take more risk to support each other and come together as a community. It doesn’t matter if you’re a man or a woman, we all need to work together.” – Kat Khosrowyar, Head Coach at Reign Academy, former Head Coach of Iran’s national soccer team, Chemical Engineer

“I am not a developer, but currently mentor a female developer. She drives the topic, and I act as a sounding board. Working on a predominately male team, she needed a different confidante to work through issues, approach, development ideas and career path goals.” – Jana Krinsky, Director of Studio at Zulily

“When you have confidence in yourself, when you think ‘I’m going to own it, this is going to happen because I’m going to make it happen,’ it matters. As women, we can’t use apologetic language like ‘Sorry, whenever you have a second, I would like to speak to you’ — we don’t need to be sorry for doing our jobs. Women need to start changing those sentences to, ‘when would be a good time to talk about this project?’ and treating people as your equal, not as someone who’s above you.” – Celia Jiménez Delgado, right wing-back for Reign FC + Spain’s national soccer team, Aerospace Engineer

“We all have to find our courage. Because if you want to grow and be in a leadership role, that’s going to be a requirement. I think identifying that early in your career is a great way to avoid some pitfalls, down the road.” – Angela Dunleavy-Stowell, CEO at FareStart, Co-Founder at Ethan Stowell Restaurants

Share:

Like this:

Hello, I’m Han. I am from Turkey but moved to the Greater Seattle area 5 years ago. I finished high school in Bellevue, went to Bellevue College, transferred to the University of Washington Computer Science department, and now going into my Senior year. This summer for 3 months, I worked as an intern in Zulily’s Member Engagement Platform (MEP) team. This post is about my internship journey.

This was my first internship, so I came to my first day of work with one goal; learning, but I didn’t know it was going to be something bigger.

My first month I worked on a Java project in which I would download data from an outside source and use our customer data to map customers to their time-zones. During that time, I learned about AWS services, such as ECS, ECS, Lambda, Route53, Step Function, etc. I learned containerized deployments, created CI/CD and CloudFormation files.

On my second month, I got into working with the UI. Before my internship, I have never worked on any front-end UI work. But during my second month, I learned how to work on a React app, and use JavaScript. I was working with engineering and Marketing to implement features to the MEP UI.

Beginning of my third month I was working with our Facebook Messenger bot, implementing features that users were able to use in the Messenger App. I was then working on projects both in front and back-end, I was touching and getting my hands dirty in every part of the stack. I was learning, deploying and helping.

In the beginning, either my manager or other engineers would assign me tasks. But after 2 months, I was picking up my own. I was picking tasks, working with engineers and marketing, going to design meetings, and helping other engineers.

Before the start of the internship, my friends and my adviser told me that I should expect to be given one big project that I would work on somewhere in the corner by myself. They also warned me that my project would probably never be deployed or used. But here at Zulily that wasn’t the case at all. I was working with the team, as a part of the team, not as an outsider intern. I was deploying new features every week. Features I can look back at, show others and be proud of. I was coming to work every day with a goal to finish tasks and leaving with the feeling of accomplishment. I felt like I was part of a bigger family. In my team, everyone was helping each other, they were working together in order to succeed together, just like a team, just like a family.

Now that I’m at the end of my internship, I’m leaving with a lot of accomplishments, a lot more knowledge, and going back to school to finish my last year. In conclusion, I believe interning at Zulily was the perfect internship. I, as an intern, learned about the company, the team, the workflow, the projects. I learned new engineering skills, learned how to work in a different environment with a team, accomplished a lot of tasks and finally contributed to the team and the company in general.

Share:

Like this:

Introduction

Here at Zulily, we offer thousands of new products to our customers at a great value every day. These products are available for about 72 hours; to inform existing and potential customers about our ever-changing offerings, the Marketing team launches new ads daily for these offerings on Facebook.

To get the biggest impact, we only run the best-performing ads. When done manually, choosing the best ads is time-consuming and doesn’t scale. Moreover, the optimization lags behind the continuously changing spend and customer activation data, which means wasted marketing dollars. Our solution to this problem is an automated, real-time ad pause mechanism powered by Machine Learning.

Predicting CpTA

Marketing uses various metrics to measure ad efficiency. One of them is CpTA or Cost per Total Activation (see this blog post for a deeper dive on how we calculate this metric). Lower CpTA means spending less money to get new customers so lower is better.

To pause ads with high CpTA, we trained a Machine Learning model to predict the next-hour CpTA using the historical performance data we have for ads running on Facebook. If the model predicts that the next-hour CpTA of an ad will exceed a certain threshold, that ad will be paused automatically. The marketing team is empowered to change the threshold at any time.

Ad Pause Service

We host the next-hour CpTA model as a service and have other wrapper microservices deployed to gather and pass along the real-time predictor data to the model. These predictors include both relatively static attributes about the ad and dynamic data such as the ad’s performance for the last hour. This microservice architecture allows us to iterate quickly when doing model improvements and allows for tight monitoring of the entire pipeline.

The end-to-end flow works as follows. We receive spend data from Facebook for every ad hourly. We combine that with activation and revenue data from the Zulily web site and mobile apps to calculate the current CpTA. Then we use CpTA threshold values set by Marketing and our next-hour CpTA prediction to evaluate and act on the ad. This automatic flow helps manage the large number of continuously changing ads.

Results and Conclusion

The automatic ad pause system has increased our efficiency through the Facebook channel and gave Marketing more time to do what they do best: getting people excited about fresh and unique products offered by Zulily. Stay tuned for our next post where we take a deeper dive into the ML models.

Share:

Like this:

Remember Bart Simpson’s punishment for being bad? He had to write the same thing on the chalkboard over and over again, and he absolutely hated it! We as humans hate repetitive actions, and that’s why we invented computers – to help us optimize our time to do more interesting work.

At zulily, our Marketing Specialists previously published ads to Facebook individually. However, they quickly realized that creating ads manually was limiting to the scale they could reach in their work: acquiring new customers and retaining existing shoppers. So in partnership with the marketing team, we worked together to build a solution that would help the team use resources efficiently.

At first, we focused on automating individual tasks. For instance, we wrote a tool that Marketing used to stitch images into a video ad. That was cool and saved some time but still didn’t necessarily allow us to operate at scale.

Now, we are finally at the point where the entire process runs end-to-end efficiently, and we are able to publish hundreds of ads per day, up from a handful.

Here’s how we engineered it.

The Architecture

Sales Events

Sales Events is an internal system at zulily that stores the data about all sales events we run; typically, we launch 100+ sales each day that could include 9,000 products that last three days. Each event includes links to appropriate products and product images. The system exposes the data through a REST API.

Evaluate an Event

This component holds the business logic that allows us to pick events that we want to advertise, using a rules-based system uniquely built for our high-velocity business. We implemented the component as an Airflow DAG that hits the Sales Events system multiple times a day for new events to evaluate. When a decision to advertise is made, the component triggers the next step.

Make Creatives

In this crucial next step, our zulily-built tool creates a video advertisement, which is uploaded to AWS S3 as an MP4 file. These creatives also include metadata used to match Creatives with Placements downstream.

Product Sort

A sales event at zulily could easily have dozens if not hundreds of products. We have a Machine Learning model that uses a proprietary algorithm to rank products for a given event. The Product Sort is available through a REST API, and we use it to optimize creative assets.

Match Creatives to Placements

A creative is a visual item that needs to be published so that a potential shopper on Facebook can see it. That end result advertisement that is seen by the potential shopper is described by a Placement. A Placement defines where on Facebook the ad will go and who the audience should be for the ad. We match creatives with placements using Match Filters defined by Marketing Specialists.

Define Match Filters

Match Filters allow Marketing Specialists to define rules that will pick a Placement for a new Creative.

These rules are based on the metadata of Creatives: “If a Creative has a tag X with the value Y, match it to the Placement Z.”

MongoDB

Once we match a Creative with one or more Placements, we persist the result in MongoDB. We use the schemaless database technology rather than a SQL database because we want to be able to extend the schema of Creatives and Placements without having to update table definitions. MongoDB (version 3.6 and above) also gives us a change stream, which is essentially a log of changes happening to a collection. We rely on this feature to automatically kick off the next step.

Publish Ads to Facebook

Once the ad definition is ready, and the new object is pushed to the MongoDB collection, we publish the ad to Facebook through a REST API. Along the way, the process automatically picks up videos to S3 and uploads them to Facebook. Upon a successful publish, the process marks the Ad as synced in the MongoDB collection.

Additional Technical Details

While this post is fairly high level, we want to share a few important technical details about the architecture that can be instructive for engineers interested in building something similar.

Self-healing. We run our services on Kubernetes, which means that the service auto-recovers. This is key in an environment where we only have a limited time (in our case, typically three days) to advertise an event.

Retry logic. Whenever you work with an external API, you want to have some retry logic to minimize downtime due to external issues. We use exponential retry, but every use case is different. If the number of re-tries is exhausted, we write the event to a Dead Letter Queue so it can be processed later.

Event-driven architecture. In addition to MongoDB change streams, we also rely on message services such as AWS Kinesis and SQS (alternatives such as Kafka and RabbitMQ are readily available if you are not in AWS). This allows us to de-couple individual components of the system to achieve a stable and reliable design.

Data for the users. While it’s not shown directly on the diagram, the system publishes business data it generates (Creatives, Placements, and Ads) to zulily’s analytics solution where it can be easily accessed by Marketing Specialists. If your users can access the data easily, it’ll make validations quicker, help build trust in the system and ultimately allow for more time to do more interesting work – not just troubleshooting.

In Marketing Tech, one of our jobs is to tell customers about zulily offers. These days everything and everyone goes mobile, and Mobile Push notifications are a great way to reach customers.

Our team faced a double-sided challenge. Imagine that you have to ferry passengers across a river. There’ll be times when only one or two passengers show up every second or so, but they need to make it across as soon as possible. Under other circumstances, two million passengers will show up at once, all demanding an immediate transfer to the opposite bank.

One way to solve this is to build a big boat and a bunch of small boats and use them as appropriate. While this works, the big boat will sit idle most of the time. If we build the big boat only, we will be able to easily handle the crowds, but it will cost a fortune to service individual passengers. Two million small boats alone won’t work either because they will probably take the entire length of the river.

Fortunately, in the world of software we can solve this challenge by building a boat that scales. Unlike the Lambda architecture with two different code paths for real-time and batch processing, an auto-scaling system offers a single code path that can handle one or one million messages with equal ease.

Let’s take a look at the system architecture diagram.

Campaigns and one-offs are passengers. In the case of a campaign, we have to send potentially millions of notifications in a matter of minutes. One-offs arrive randomly, one at a time.

An AWS Kinesis Stream paired with a Lambda function make a boat that scales. While we do need to provision both to have enough capacity to process the peak loads, we only pay for what we use with Lambda, and Kinesis is dirt-cheap.

We also ensure that the boat doesn’t ferry the same passenger multiple times, which would result in an awful customer experience (just imagine having your phone beep every few minutes). To solve this problem, we built a Frequency Cap service on top of Redis, which gave us a response time under 50ms per message. Before the code attempts to send a notification, it checks with the Frequency Cap service if the send has already been attempted. If it has, the message is skipped. Otherwise, it is marked as “Send Attempted”. It’s important to note that the call to the Frequency Cap API is made before an actual send is attempted. Such a sequence prevents the scenario where we send the message and fail to mark it accordingly due to a system failure.

Another interesting challenge worth explaining is to how we line up millions of passengers to board the boat efficiently. Imagine that they all arrive without a ticket, and the ticketing times vary. Yet, the board departs at an exact time that cannot be changed. We solve for this by ticketing in advance (the Payload Builder EMR service) and gathering passengers in a waiting area (files in S3). At an exact time, we open multiple doors from the waiting area (multithreading in the Kinesis Loader Java service), and the passengers make their way onto the boat (Kinesis Stream). The Step Function AWS service connects the Payload Builder and a Kinesis Loader into a workflow.

In summary, we built a system that can handle one or one million Mobile Push notifications with equal ease. We achieved this by combining batch and streaming architecture patterns and adding a service to prevent duplicate sends. We also did some cool stuff in the Payload Builder service to personalize each notification so check back in a few weeks for a new post on that.

Like this:

A critical part of any e-commerce company is getting product to its customers. While many of the customer experience discussions that you hear about companies focus on their website and apps or customer service and support, we often forget to think about those companies delivering their customers’ products when the company said they would. This part of the promise made (or implied) for customers is critical for building trust and providing a great end-to-end customer experience. Most large e-commerce companies operate — or pay someone else to operate — one or more “fulfillment centers”, which is where products are stored and combined with other items that need to be sent to the customer. zulily’s unique business model means we work with both big brands and boutique, smaller vendors with a variety of different capabilities, and so our products are inspected for quality, frequently bagged to keep clothing from getting dirty and often need barcoding (as many smaller vendors may not have them). The quality of zulily’s fulfillment processes drives our ability to deliver on our promises to customers and zulily’s software drives those fulfillment processes.

All fulfillment center systems start with a few basic needs: be able to receive products in from vendors, store products in a way that they can be later retrieved, and ship the product to customers. “Shipping product out,” also known as “outbound” is the most expensive operation inside the fulfillment center, so we have invested heavily in making it efficient. The problem seems simple at first glance. You gather product for customer shipment, put products in boxes, put labels on the boxes, and hand the box to UPS or USPS, etc.. The trick is making this process as efficient as possible. When zulily first started, each associate would walk the length of the warehouse picking each item and sorting it into 1 of 20 shoebox sized bins they had in their cart with each bin representing a customer shipment. Once all of the shipments had been picked, the picker delivers the completed cart to a packing station. The job of collecting products to be shipped out is known as “picking” and when our warehouse was fairly small, this strategy of one person picking the whole order worked fine. As the company has grown, our warehouses did too – some of our buildings have a million square feet of storage spread over multiple floors. Now these pickers were walking quite a long way in order for just 20 shipments. We could have just increased the size or quantity of the carts, but this is a solution that costs more as the company grows. In addition, concerns about safety related to pulling more or larger carts and the complexities of taking one cart to multiple floors of a building make this idea impractical, to say the least.

A pick cart. Each of the 20 slots on the cart represents a single customer shipment. The picker, guided by an app on a mobile device, walks the storage area until they’ve picked all of the items for the 20 shipments. We call this process “pick to shipment” because no further sorting is necessary to make sure each shipment is fully assembled.

We needed a solution that would allow pickers to spend less time walking between bins and more time picking items from those bins. We have developed a solution such that the picking software tries to keep a given picker within a zone of 10-20 storage aisles and invested in a conveyor system to carry the picked items out of the picking locations. The picker focuses on picking everything that can be picked within their zone and there’s no need for a picker to leave a zone unless they are needed in another zone. The biggest difference from the old model is that the picker is no longer assembling complete shipments. If you ordered a pair of shoes and a t-shirt from zulily, it’s unlikely that those two items would be found in the same zone due to storage considerations. Instead of an individual picker picking for 20 orders, we now have one picker picking for many orders at the same time, but staying within a certain physical area of the building. This is considerably more efficient for the pickers, but it means that we now needed a solution to assemble these zone picks into customer shipments.

The picker picks for multiple shipments into a single container. Because the sorting into customer shipments happens later, this solution is called “pick to sort”.

In order to take the efficiently picked items and sort them into the right order to be sent to our customers, we have implemented a sorting solution that uses a physical solution we call a “put wall”. A put wall looks like a large shelf with no back divided into sections (called “slots”), each measuring about one foot cubed. Working at these put walls is an employee (called a “putter”) whose job is to take products from the pick totes and sort them into a slot in that put wall. Each slot in the wall is assigned to a shipment. Once all the products needed for a given shipment have been put into the slot, an indicator light on the other side of the wall lets a packer know that the shipment is ready to be placed into a box and shipped out to our customer. In larger warehouses, having just one put wall is not practical because putters would end up having to move too much distance and all the efficiency gained in packing would be lost on the putting side, so defining an appropriate size for each put wall is critical. This creates an interesting technical challenge as we have to make sure that the right products all end up in the put wall at the right time. Our picking system has to make sure that once we start picking a shipment to a wall that all the other products for that shipment also go to that wall as quickly as possible. This challenge is made more difficult by the physical capacity of the put walls. We need to limit how much is going to the wall to avoid a situation where there is no slot for a new shipment to go. We also have to make sure that each of the walls have enough work so we don’t have idle associates. When selecting shipments to be picked, we must include shipments that are due out today, but also include future work to make the operation efficient. To do this, we have pickers rotate picking against different put walls to make sure that they get an even spread of work. A simple round-robin rotation would be naive, since throughput of the put walls is determined by humans with a wide range of different work rates. In order to solve this problem, we turn to control theory to help us select a put wall for a picker based on many of the above requirements. We also need to make sure that when the first product shows up for a shipment there is room in the wall for it.

As totes full of picked items are conveyed to the put wall, a putter scans each item and puts them into a slot representing a customer shipment. He is guided by both his mobile device and flashing lights on the put wall which guide him to the correct slot.

As we scaled up our operation, we initially saw that adding more pickers and put walls was not providing as much gain in throughput as we expected. In analyzing the data from the system, we determined that one of the problems was how we were selecting our put walls. Our initial implementation would select a wall for a shipment based on that wall having enough capacity and need. The problem with this approach is that we didn’t consider the makeup of each of the shipments. If you imagine a shipment that is composed of multiple products spread throughout the warehouse, you have situations where a picker has to walk through their zone N times, where N is the number of put walls we are using at any given time. As we turn on more and more put walls, that picker will have to walk through the zone that many more times. We realized that if we can create some affinity between zones and walls, we can limit the amount of put walls that a picker needs to pick and make them more efficient. We did this by assigning put walls a set of zones and try to make the vast majority of shipments for that put wall come from those zones. While we need to sometimes have larger sets that normal to cover a given shipment, we can overall significantly improve pick performance and increase the overall throughput for putters and packers.

And that’s really just the beginning of the story for a small part of our fulfillment center software suite. As the business grows, we continue to find new ways to further optimize these processes to make better use of our employees’ time and save literally millions of dollars while also increasing our total capacity using the same buildings and people! This is true of most of the software in the fulfillment space – improved algorithms are not just a fun and challenging part of the job, but also critical to the long-term success of our business.Continue reading →

Share:

Like this:

zulily is an e-commerce company that is changing the retail landscape through our ‘browse and discover’ business model. Long term, there are tremendous international expansion opportunities as we change the way people shop across the planet. At zulily we’re building a future where the simultaneous release of 9,000 product styles across 100+ events can occur seamlessly in multiple languages across multiple platforms. From an engineering perspective, as we expand globally the size and scope of our technical challenge is nothing short of localization’s Mount Everest climb.

Navigating steep localization challenges is not new to companies expanding globally yet zulily faces a particularly unique thrill-ride ahead. For me personally, this is incredibly exciting. Prior to coming to zulily one role I held at my former company was global readiness – to ensure the products and services that the company delivered to customers were culturally, politically and geographically appropriate. I was in the unique role of mitigating the company’s risk of negative press, boycotts, protests, lawsuits or being banned by governments. The content my team reviewed was never considered life-threatening until the 2005 incident when the cultural editor of Jyllands-Posten in Denmark commissions twelve cartoonists to draw cartoons of Islamic prophet Muhammad. Those cartoons were published and their publication led to the loss of life and property. Suddenly my team took notice. Having content thoughtfully considered and ready for global markets took on a grave seriousness moving from a ‘nice-to-have’ to a ‘must-have’ risk management function. While my role was to ensure the political correctness (neutrality) of content across 500+ product groups, the group I led rarely dealt with the size and scope of technical localization challenges that zulily is facing as we expand globally.

As companies expand globally it’s important for the employees to conceptually shift their mindset from a U.S. centric perspective to a global view. This paradigm leap in how the employees of an organization consider themselves is significant. When people are increasingly thinking globally about their role and the impact of their decisions on a global audience, specifically in the area of technology and how we enable our platforms, tools and systems to be ‘world-ready’, opportunities for growth and development naturally occur while cultural content risks reduce. Today all of the content on our eight country-specific sites is in English yet we are now thinking about how to tackle bigger challenges in the future, such as supporting multiple languages. The technical implementation for international expansion has enabled our developers and product managers to gain a new appreciation for the importance of global readiness and the challenge of ‘going international’.

zulily offers over 9,000 product styles through over 100 merchandising events on a typical day. As we expand into new markets around the globe we face an extraordinary challenge from both an engineering and operations perspective. Imagine, every day at 6 a.m. PT we’re publishing the content equivalent of a daily edition of The New York Times (about one half million new words per day or over 100 million new words per year). Another way to conceptualize the technical problem – each day we offer roughly the same number of SKUs of what you would find in a typical Costco store. Launching a new Costco store every day is difficult enough in one language yet as we scale our offerings globally the complexity to simultaneously produce this extremely high volume of content in multiple languages grows exponentially. Arguably, no e-commerce company in the world is publishing the volume of text that zulily produces on a daily basis. Further, the technical challenge is amplified because the massive volume of content must be optimized to work across multiple platforms, e.g., iPhone, iPad, Android and web based devices. In fact, over 56% of our orders are placed on mobile devices. From a user-experience across platforms and languages, text expansion and contraction become a significant issue. European languages such as French, German and Italian may require up to 30% more space than English. Double byte characters such as Chinese and Japanese will require less space.

The content on our sites is produced in-house each day by our own talented copy writers and editors. Not unlike publishing a newspaper the deadline for a 6 a.m. launch of fresh, new product styles is typically the night before. Last minute edits can happen as late as midnight! While this makes for an exciting and dynamic environment, it requires some of the brightest engineering and operational minds on the planet to bring it all together with the quality and performance we expect.

From a technical perspective our recent global expansion to Mexico, Hong Kong and Singapore faced typical localization hurdles. We needed to implement standard solutions to accommodate additional address line fields in the shipping address; ensure proper currency symbols are displayed; coding rules that allowed us to process orders without postal codes which are not required in Hong Kong.

Address Field Requirement Example:

Bold = unique as compared to U.S. requirement

As we continue to move into new markets we’ll be driven to apply new solutions to traditional localization problems simply based upon the sheer volume of content and arduous daily production requirements. These two forces alone combine to drive creativity, invention and technological breakthroughs that will accelerate our growth and expansion. Our team at zulily is now exploring various strategies in engineering and operations to tackle the local/regional cultural differences across markets while also bringing zulily to our customers in their own language. We continue to hire the brightest minds to help us invent solutions for a new way of shopping. At zulily, we tell our customers ‘something fresh every day’. Our engineers enable that reality by creating something fresh every day. Our ambition to grow globally will give everyone at zulily that opportunity.