M

said
over 2 years ago

I share this question!

I

Igor Kmitowski

said
over 2 years ago

Hello,

CafeTran offers its own Machine Translation engine which is consists of a few features such as exact, fuzzy, fragment, subsegment matching as well as auto-assembling and auto-suggestion. Please see this link for the details of the subsegment thresholds: http://cafetran.wikidot.com/subsegment-match.

Igor

k

kamonchanok.k15

said
over 2 years ago

Hi Igor,

Thank you.

I've come across that. Unfortunately, I seem to not understand how I can achieve what I want.

I want CT to only show the yellow hits (i.e. the ones with full target text) as most of the time the orange hits are wrong, and I don't want CT to use any guessed subsegment in AA.

Kwang

I

Igor Kmitowski

said
over 2 years ago

Hi Kwang,

Yes, the 'guessing' results may vary depending on the language pair. The accuracy of hits should increase as your TM gets larger and larger since CT has more data to analyze. The blue hits are low-accuracy "guesses", the orange ones are medium-accuracy while the purple ones are the highest accuracy hits. Hover the mouse over a hit number and you will see the full context of the 'guess'.

Igor

M

M

said
over 2 years ago

Sorry to cut in, but I haven't recognized these colors.

Where do they appear?

Masato

k

kamonchanok.k15

said
over 2 years ago

Hi Masato

It is in where your fuzzy matches/fragments are shown.

You have to use Matching Type: Fuzzy and Hits, though, for that TM.

Kwang

k

kamonchanok.k15

said
over 2 years ago

Hi Igor,

So it seems I have to increase Subsegment to Virtual threshold to increase probability of accuracy.

Other questions now arise.

How can I allow CT to show only a subsegment/to treat it as a hit only when it contains more than, let's say, 5 words?

By enabling only Fuzzy, will CT still use subsegment matches in AA?

Is it possible to not allow CT to use subsegment matches in AA?

BTW, why do I not have blue hits, but I have magenta, orange, and yellow hits?

Kwang

I

Igor Kmitowski

said
over 2 years ago

Subsegment to Virtual threshold tells the program that after that number of hits, the program will treat it the same as an exact fragment. Then. it is a sort of 'virtual' or 'guessed' exact fragment.

> How can I allow CT to show only a subsegment/to treat it as a hit only when it contains more than, let's say, 5 words?

Set the Subsegment to Auto threshold very high. Then CT will pick only 'sure' candidates for AA.

Igor

I

Igor Kmitowski

said
over 2 years ago

> BTW, why do I not have blue hits, but I have magenta, orange, and yellow hits?

Blue hits turn to yellow when you set the dark theme.

Igor

k

kamonchanok.k15

said
over 2 years ago

Thank you!

M

M

said
over 2 years ago

A question about hits.

I understand hits are about subsegments or parts of a source sentence, but I don't understand how CT can identify subsegments of TM target sentences that correspond to those source subsegments (especially when Japanese is involved).

Is the hits feature mainly designed for language pairs with a word separator (space)?

Peace,

Masato

I

Igor Kmitowski

said
over 2 years ago

Hi Masato,

CafeTran uses two steps to detect hits for source subsegments in TM:

1. By looking for exact fragments in the TM. It should always produce an exact match for a given source subsegment.

2. By analyzing the frequencies of hits on the source and target side. This is a statistical approach and the results get more accurate as the number of hits increases. You can see different colors for the hits meaning the accuracy of the hit. The higher number of the source hits, the higher probability the target hit is accurate. Japanese language hits are analyzed on the character level while the languages with a defined word separator are analyzed on the word level.

Igor

M

M

said
over 2 years ago

Hi Igor,

I'm asking this question because in most (not all) cases, the whole target segment (Japanese) is shown as a "guess" as follows.

So, first I thought that the "hits" function is designed to display TM target segments whose source segments contain certain words appearing in the current source segment, rather than finding possible subsegment pairs that could ultimately be used for auto-assembling.

Is it possible, if there is a large enough TM, for CT to pick up a certain part of the target segment as a possible Japanese equivalent for, say, "the cost of"?

Thanks always,

Masato

I

Igor Kmitowski

said
over 2 years ago

Hi Masato,

Yes, it is possible. One of the tuning options for the target hits is "Subsegment minimal length difference" in Edit > Options > Memory tab. As English and Japanese term pair may have a significant difference in length, try to lower this settings for your language pair.