Translation Error Rate (TER) is a method used by Machine Translation specialists to determine the amount of Post-Editing required for machine translation jobs. The automatic metric measures the number of actions required to edit a translated segment inline with one of the reference translations. It’s quick to use, language independent and corresponds with post-editing effort. When tuning your KantanMT engine, we recommend a maximum score of 30%. A lower score means less post-editing is required!

How to use TER in KantanBuildAnalytics™

The TER scores for your engine are displayed in the KantanBuildAnalytics™ feature. You can get a quick overview or snapshot in the summary tab. But for a more in depth analysis and to calculate the amount of post-editing required for the engine’s MT output select the ‘TER Score’ tab, which takes you to the ‘TER Scores’ page.

Place your cursor on the ‘TER Scores Chart’ to see the ‘Translation Error Rate’ for each segment. If you hold the cursor over the segment, a pop-up will appear on your screen with details of each segment under these headings, ‘Segment no.’, ‘Score’, ‘Source’, ‘Reference/Target’ and ‘KantanMT Output’.

To see a breakdown of the ‘TER Scores’ for each segment in a table format scroll down. You will now see a table with the headings ‘No’, ‘Source’, ‘Reference/Target’, ‘KantanMT Output’ and ‘Score’.

To see an even more in depth breakdown of a particular ‘Segment’ click on the ‘Triangle’ beside each number.

To download the ‘TER Scores’ of all segments click on the ‘Download’ button on the ‘TER Scores’ page.

This is one of the many features included in KantanBuildAnalytics, which can help the Localization Project Manager improve an engine’s quality after its initial training. To see other features used in KantanBuildAnalytics please see the links below.

Regardless of what we do in our professional careers there is one thing that we all have in common, and that is how to get more done, be more productive and achieve the results we want…yesterday! For Machine Translation or Localization engineers this means finding the quickest way to get their MT engines ready to translate files.

KantanBuildAnalytics™ is a feature that solves the problem of how to quickly improve an engine after its initial training with minimum cost and effort. This post will teach you how to use KantanBuildAnalytics to get your KantanMT engines ready to translate faster.

Lets look at some of the features available for KantanBuildAnalytics:

Fluency Analysis – work with segment level BLEU scores to find out how relevant your training data is and how it impacts engine fluency.

Training Data Reject Reports – see any training data segments that have been rejected from the engine and their reason for rejection in a downloadable excel file.

Timeline – like your facebook timeline, see your MT engine’s history, with every action taken to improve the engine. It even lets you archive versions so if something goes wrong in the retraining, you can go back to an earlier version.

How to use KantanBuildAnalytics

You will be directed to the ‘My Client Profiles’ page. You will be in the ‘Client Profiles’ section of the ‘My Client Profiles’ page. The last profile you were working on will be ‘Active’.

My Client Profiles Dashboard, KantanMT.com

To use ‘KantanBuildAnalytics’ with another profile other than the ‘Active’ profile. Click on the profile you want to use the ‘KantanBuildAnalytics’ with and make sure that the profile selected has at least one ‘Build’ job done successfully.

Then click on the ‘Build Analytics’ tab on the My Client Profiles’ page.

Selecting KantanBuildAnalytics™ on an active KantanMT profile.

This will take you to the ‘KantanBuildAnalytics’ page, where you will see the ‘Summary’ tab. This is selected by default. Your summary tab should give you an overview of the performance and measurement of your KantanMT engine.

And of course for the excel lovers, its possible to download the full summary report as an excel spreadsheet, so the engine’s performance information can be analysed to suit your organisation’s specific style requirements. To download the report click on the ‘Download summary report’ button.

To ‘Deep Tune’ the engine click on the ‘Deep Tune’ button. be warned though, this is a thorough tuning of the engine and will take a lot of time, the bigger the MT engine, the longer the tuning process takes.

Download KantanBuildAnalytics Summary Report

A ‘Tune Engine’ pop up window will now appear on your screen, click on the ‘OK’ button if you want to deep tune or on ‘Cancel’ if you no longer wish to deep tune the engine.

To see how many segments in the training data were rejected, click on the ‘Rejects Report’ tab. This takes you to the ‘Rejects Report’ page, where you will see a list of segments and the reasons they were rejected.

Generating your KantanBuildAnalytics Rejects Report

To download an excel version of the rejects report click on the ‘Download’ button.

These features help MT or Localization Engineers build and develop better performing KantanMT engines. Read more about these features below, or Contact a member of our sales team, to start using our platform now!

Like this:

In our last blog post I discussed some of the Key Performance Indicators (KPIs) used by SMT developers to estimate the performance quality of their KantanMT engines. These KPIs help developers understand what aspects of their SMT engine are performing well and which need improvement.

In this blog I’m going to dive deep into F-Measure, a KPI which can provide insight into; the relevancy of your training data, the engine’s overall performance, and the suitability of an SMT engine for a particular domain or content type.

What is F-Measure?

F-Measure is a KPI which measures the precision and recall capabilities of an SMT system. It can also be viewed as a measure of translation accuracy and relevancy.

Bursting Red Balloons

In SMT, we can look at precision as a percentage of retrieved words that are relevant and recall (sometimes referred to as sensitivity) as the percentage of relevant words that are retrieved.

This is best explained using a thought experiment: So, imagine a box containing 10 red balloons and a few green balloons. Suppose we burst 5 balloons at random and 3 of these are red – we can calculate our precision as 3/5 (60%) and our recall as 3/10 (40%).

These two calculations offer a good estimation of the accuracy with which we are able to burst red balloons – the higher this calculation is, the better the chances that we will burst more red balloons.

So what has this thought experiment got to do with SMT systems?

Precision & Recall

Precision and recall are closely related to the understanding of accuracy. Since SMT systems are based on pattern recognition, it is helpful to see how accurate they are at retrieving words and more importantly how relevant this retrieval is.

F-Measure is a calculation of both precision and recall and is expressed as a ratio.
If we go back to our balloon bursting experiment, precision was calculated as 60% and recall as 40%. To express these two values as a ratio, we can use the F-Measure formula as follows:-

0.48

Source: Statistical Machine Translation by Philipp Koehn

In simple terms – we’re just not good at bursting red balloons 🙂

F-Measure and SMT engines

Using F-Measure we can get a general sense of the accuracy in which an SMT engine can retrieve words. If we examine the distribution of these scores across a set of reference translations we can get helpful insights which we can use to improve the training data and boost engine performance.

Here’s an example of an F-Measure distribution:

Screen shot of Kantan BuildAnalytics F-Measure distributions

The overall F-Measure score for this particular SMT engine is 72%. This is a good value, and we can say that this engine is highly accurate at retrieving words for its target language and domain i.e. it has high precision in word retrieval and these are relevant to the target domain.

Also, the distribution of these scores across the reference translation set shows that the majority of these (60% of the total reference translations set) are in the 70-100% range. The distribution graph also shows that approximately 20% of the reference translations score less than 40%. By examining this we can check to see if words/terminology are missing, and then create additional training material to improve the performance the engine.

Closing remarks…

F-Measure is a good starting point for understanding the quality of an SMT engine but it does have a major downfall, while it measures the recall and precision capabilities of an SMT engine, it doesn’t take into the account the order in which the words are retrieved.

So, as in the famous sketch with Andre Previn and Morecambe and Wise, we may know all the notes but not necessarily in the right order:

One more thing…
In order to improve the F-Measure score, an engine must become aware of word order, which is sometimes referred to as fluency. In the next post I will look at BLEU (Bilingual Evaluation Understudy) and examine how this metric helps us to further understand the quality of SMT engines.

KantanMT’s new BuildAnalytics technology illustrates the distribution of F-Measure, BLEU, and TER score across our members SMT engines. It also generates a Gap Analysis, highlighting missing words in members training data, and gives a provides KantanMT members with a training data rejects reports – great information that helps members of KantanMT.com develop a deep understanding of how their SMT engines work, and how to improve their performance.