How do we scale translation quality and speed at Unbabel?

July 25, 2017

Building the world’s translation layer is a fantastic mission. For us, that means becoming a transversal and pervasive service which can remove communication barriers anywhere, anytime, using a combination of artificial intelligence technologies(machine translation and an assortment of machine learning mechanisms) and a growing global community of bilinguals.

This also means ingesting, processing and distributing a massive amount of data per second while guaranteeing that our customers’ standards for quality and speed are met.

A translation job is requested via our API or via integrations with platforms like Salesforce andZendesk;

Our AI-powered translation pipeline will pick up the job, sending it for machine translation through an engine which has been customised by content types and sometimes by the client;

As soon as the machine translation engine finishes, the Quality Estimation system(QE) will assess if the output quality is good and, if so, it’s shipped and promptly sent to the requesting system it came from;

When the QE deems a human to be necessary for final adjustments, the text is broken into several pieces and sent to our community;

As the community finishes up the last adjustments, the final text is grouped back together and shipped accordingly.

Even simplified, that’s rather a lot of moving parts from one end to the other. So how do we maintain quality and speed at scale with such a setup?

Scaling quality

The Unbabel AI evolves continuously, not only because our AI teams work daily to improve the existing algorithms at the core but also because it keeps learning from its own results and the proprietary data we gather.

One data type we have access to is linguistic annotations made on the work we’ve translated. A global team of expert linguists work around the clock to annotate our translations with qualitative and quantitive information that allows us to map what’s already working and that which can be improved, further enhancing the overall pipeline’s ability to produce higher quality results, consistently.

We can also revisit our original machine learning models and update them on a regular basis. More data means better models, which means better machine translation quality.

AI training automation

The key to having a healthy development pipeline is to make small increments and deploy often. This is a general best practice and can be found in any DevOps book. We take this mantra one step further and apply it to our AI training as well.

We automated the full process by having an autonomous scheduling pipeline that:

Runs data through an anonimisation process— so that any personally identifiable information or otherwise sensitive data is removed; we don’t need or want this data, and we remove the risk of it moving beyond where it should stay;

Retrieves and tests training model results based on a sanity test suite;

Updates AI servers with the new models.

Since we deliver client and category-based domain adaptation, this process is executed for all models, making sure that we keep our machine translation engines always up to date with the freshest data we can provide.

Delivery

Our software engineering architecture is designed to scale painlessly, vertically and horizontally. With a microservices model, we’re able to scale up and down very specific areas of the translation pipeline, boosting efficiency overall.

A majority of our servers are based on container technologies and, as such, updating an AI server is just as easy as updating any other part of the system. Microservices and containers go hand in hand with the concept of an immutable architecture where all stateless parts are disposable, replaceable and upgradeable in real-time and, with the proper methodologies, with no downtime.

Scaling Speed

When it comes to everything we’ve covered so far, our response times are essentially realtime. However, as soon as these outputs are distributed to our community of bilingual post-editors, there may well still be some considerable work to be done in order to achieve the quality our customers expect and demand from us.

It’s not that human beings are slow at translating— indeed, most research shows that they are many times faster at post-editing machine translation that translating from scratch— but that understanding the context, potential technical detail, linguistic ambiguities and other cultural sensitivities will always require some time to ensure an acceptable result.

Improving Unbabel editor user interfaces

Whether it’s our web-based interface or mobile apps on the two major platforms, the user experience we offer our global community of more than 50,000 bilinguals is one of the most important pieces of the whole puzzle.

If we want to continue to scale high quality in the fastest possible time, then we need to relentlessly improve the tools so our editors can work on translation tasks anytime, anywhere.

We are in constant contact with our incredible and engaged community, and we take pride in listening to and acting on their feedback. In running user testing against our existing interfaces and ones we’re considering deploying. Measuring their interactions on the platform to find novel ways to improve things even further. Anything that removes friction and enables our community to do better work, faster, with increased satisfaction is absolutely crucial to our mission.

AI+Human = quality and speed at scale

Quality andspeed might sometimes be perceived as contradictory choices, and scaling both is indeed an immense technological challenge, but it’s key to our success as a business.

We’re often asked,“won’t translation be solved by AI alone?” but decades of and billions spent in research have shown that there will always be a gap in fully grasping the ambiguity, idiosyncrasies and paradoxes of human language.

Only by nurturing the symbiotic relationship between artificial and human intelligence can we build the world’s translation layer.