How to UX without interface

Allegro is a leading Central European e-commerce platform, offering a vast diversity of new
and pre-owned products. Search engine is the main entry point to allegro.pl product stock.
Designing a bunch of UX metrics for a SaaS solution or
a social networking site is a must. Typically no one would dare
discuss whether it is worth our time to measure the effect of changes through an
A/B test or a focus group and monitor
the impact on metrics. When it came to search engine — a back office product with almost no interface — we had our
doubts. We are the search team behind the Allegro search engine. This is the complete interface of our product:

And this is it. Not much visible for users but there are huge challenges hidden behind it.

In this article we hid a mistake, for those who find it we have a surprise — details at the end.

Ghost in the machine

In the past we have been focused on stability and response time metrics related to our infrastructure. This was the
right approach and the fundamental starting point. There is no point in measuring Net Promoter Score
or user delight if the search has stability issues, right? This would be especially true
with a vast infrastructure, both in terms of hardware and software. We currently support:

all this to assure 99.95% availability with maximum 2 seconds response times.
Logic behind sorting scenarios for Best Match — relevance sorting order of search results and category listings, is powered by several
feeders and workers that provide and calculate data for our Lambda architecture solutions based on Badoop,
Spark, Kafka.

Based on this technical infrastructure we started to measure and monitor several metrics like turnover and number of
transactions (a constant A/B test with 4 treatment groups and 1 control group), response times (Zabbix, etc.),
measurements with URLs and cookies, and finally created a dedicated dashboard in Business Intelligence solution.
But in time, these metrics proved to be insufficient.

Reaching beyond RPS

We have concluded that our current set of metrics shows us only a part of the picture but not the whole one. The number
of transactions and turnover are too general and too chaotic over time to reflect users’ satisfaction. They do not
allow us to evaluate the effect of our smaller changes and they are too coarse to show us the subtleties of users’
reactions. We also could not see how many users received no search results at all nor analyze the reasons for it. Darn!
We had to think of a new approach and instead of monitoring Solr and our single search field, we decided to spy on real
humans.

Start, fail and proceed

The first attempt to implement user-oriented metrics took a lot of time and ended in a failure as we had insufficient
know-how to do this with only developers and a product owner. We drew conclusions and asked for help in our UX Division.
Fortunately they also felt that search is an unjustly UX-neglected area and were happy to assist. We received support
of dedicated usability and user survey specialists. Piotr and Alicja from User eXperience team joined us and became
part-time members of our technical team, supplying us with the user perspective and an array of testing tools. Yeah!

Stepping out of the comfort zone

Selecting the most relevant metrics was probably the most difficult part, second only to comprehending how badly we needed
them. We desired tools which would show us a perspective completely detached from transactional data and instead focused on users point
of view. We believe that customer satisfaction is far more important than an immediate increase in
turnover — it has been proven that this measure provides a leading indicator of
consumer purchase intentions and loyalty. And a loyal customer is far more likely to make further purchases in the
future.

How we fell in love with HEART

Lately in our organization we became more focused on measuring user experience not only qualitatively, but also
quantitatively. It gives us the possibility to collect data automatically and often, as well as to interpret UX data
unambiguously – everyone who accesses the report can easily say if the quality of using our product increased or
decreased during last weeks. Using Google framework called HEART
seemed a great way to start. To make the most of HEART you have to work in
teams of people with different skills who
debate together what is the signal of users’ product related happiness, engagement, adoption, retention and
task success and what are the metrics to measure them. Afterwards data analysts do the magic and include all
of the information in easily accessible and readable reports updated daily or weekly. HEART seems like a
reasonable methodology for almost every part of Allegro.

Trouble in paradise

Well, almost… HEART for a search engine is a tough nut to crack — first of all, everybody knows and can identify a
search engine. Second, the vast majority of our users regard it as the only way to explore Allegro, so they use it
regardless of the possible negative emotions it causes. How can we talk about adoption, retention or engagement in this
case? Also, only search engine text input is visible to our users, every other aspect is hidden. Where does it
demonstrate its quality then? Should we measure the success by listings, offer views or by purchases? How can we
minimize the bias related to different designs of these pages and types of items sold at allegro.pl? After a few intense
meetings we agreed to give up on HEART entirely and find a custom way to accurately measure the quality of our search engine’s UX.

After many long and lively debates we chose 10 main metrics which will allow us to approach search results differently
and from a broader perspective — such as:

how long it takes for the user to find an interesting offer from the moment when user enters the phrase to offer

satisfaction from the search results (measured by surveys)

percentage of queries returning no results

number of users who have seen empty search result at least once today

etc…

At the moment we are setting up a dashboard in our room to display the new metrics in an appealing and colorful
manner, all in one place. This will be a clear and easily accessible way of monitoring our users’ satisfaction.
Everyone in our room can see how happy (or not) our users are today. They can monitor the impact of deployed
changes and optimise according to users’ responses.

Future of measuring user satisfaction

It is too early to fully analyze the results of our new metrics yet. Our HEART-inspired approach towards measuring
users’ experience has just been released and as of today we have fractional data, with no historical trends. For
example — for our initial metrics we have chosen empty search responses and weekly aggregated users’ satisfaction. We
strive to collect feedback from users along with these metrics as well to analyze queries that ended up with an empty
listing. Thanks to this we have already managed to find a couple of causes for the latter and will continue our work
towards combating the empty search results’ screen. In future we plan to develop more metrics — this is a constant work
in progress.

Bonus: found our “mistake”?

In this article we misspelled the name of one of the tools we use. If you know the
correct name, send it to us through this form; first three people will have the chance to name one of our servers and
become its godfather / godmother! Send your suggested name with your answer. The form will be active throughout April 2015.