In this paper, we present a Cross Language Information Retrieval (CLIR) approach using corpus driven query suggestion. We have used corpus statistics to gather a clue on selecting the right query terms when the translation of a specific query is missing or incorrect. The derived set of queries are ranked to select the top ranked queries. These top ranked queries are further used to perform query formulation. Using the re-formulated weighted query, we perform cross language information retrieval. The results are compared with the results of CLIR system with Google translation of user queries and CLIR with the proposed query suggestion approach. We have English and Tamil corpus of FIRE 2012 dataset and analyzed the effects of the proposed approach. The experimental results show that the proposed approach performs well with the incorrect translation of the queries.

All information may not be available in all languages. Suppose a user may query for some
information in one language. The information may not be present in that language but
may be available in another language that could fulfil their information needs. To
support users to access information present in a different language, we require
information retrieval systems for different language pairs. Such system are called
Cross Language Information Retrieval (CLIR) systems. The
language of the user query is referred to as Source Language (SL)
and the language in which information is sought is the Target
Language (TL).

In the simplest implementation of a CLIR system, a query given in the source language needs to
be translated in the target language. For this, one may use a SL-TL bilingual
dictionary or any other available SL-TL machine translation system. In Natural
Language Processing(NLP), the same concepts may be expressed by different terms or
phrases. This is called Synonymy. Also a term or a phrase can have
multiple meanings. This is referred to as polysemy. These
variations create problems for monolingual searches, but the effects are more in
cross language retrieval. The translated query may not be able to retrieve document
in the target language because the concepts may be expressed in the target language
using different terms. Secondly the bilingual SL-TL dictionary may be incomplete and
the query terms may not have right mapping to a query term in the target language.
Even if a dictionary is large, it may not have coverage for technical terms, named
entities and so on. Such terms occur very commonly in a user query. Thirdly the
SL-TL translation system may be inaccurate and the terms in the source language may
be translated to wrong terms in the target language.

There may occur several issues in the translation process of the query from SL to TL. The translation process may result in the following issues:

some query terms may not be translated because they are absent in the dictionary.

some query terms may be wrongly translated.

In some cases, even when a query term is translated from the source to the target language, the translated term may not be appropriate.

We have listed three queries in Figure 1. In this table, the first column shows the original query in Tamil language, the second column shows the actual query intent of user information needs, and the third column shows the dictionary based translation of these three query in the English.

Figure 1 List of query terms in Tamil and English with their meaning in the correct query context.

Let us look at the queries listed in Figure 1. While
using the dictionary based query translation, the query terms underlined in the
first column are not translated from source language to the target language. In the
first query, the query terms "vengai" has a correct translation in the
dictionary: leopard. But this query term has another correction
translation: vengai tree (a kind of tree known for its strength)
which is not found in the dictionary. In the context of the query, the term,
vengai tree is the right translation. In the second query, the
query terms doosu (dust in English) and
padindha (ingrained in English) are not found
in the dictionary. In the third query, the query term velli has
three different correct translations: day in a week or
moon or silver metal. Out of these three
senses, the query term moon is the correct translation and its
sense is appropriate to the actual context of the query.

There may also exist a case in which we may not be able to find the translation of a compound
term in the dictionary. For example, the Google translation tool may not be able to
translate a term: marachchattam in the second query. This term is a
compound word composed of the terms: maram (tree
in English) and chattam (this term has two translations in English:
law and reaper or frame). In
this case, wooden frame or wooden reaper is the
correct translation that fits with the context of the actual query.

Sometimes, the correct query translation can be obtained by referring to the entire corpus. Since we can only retrieve documents present in the corpus, we may look for a set of query terms which co-occur together in the given corpus.

In this work, we want to use corpus based evidence for translating the query so that the query is appropriate with respect to the corpus. It is not about whether the translation is good or bad, but we are concerned about whether the query retrieves the documents or not. Secondly, the retrieved documents should satisfy the information needs as expressed in the original query. Here we propose a general methodology that makes use of the corpus in order to find the translation of the query terms used for document retrieval task.

In this paper, we have worked on cross language information retrieval with Tamil-English
datasets of FIRE corpus1 and run
experiments for adhoc news document retrieval. In this work, the
query is given in Tamil language which is the source language and documents
retrieved are in English which is our target language.

The paper is organized as follows: The next section presents a comprehensive the review of literature related to various strategies in cross lingual information retrieval. Section III presents motivations and objective of this research work. Then we describe the underlying cross lingual information retrieval problem and the issues associated with CLIR systems in Section IV. Then in Section V, we describe our proposed CLIR approach in the context of Indian language pairs. Then we present our experimental results in Section VI. Finally Section VII concludes the paper.

II. Existing work

We have presented some work related to the query translation issues in cross language information retrieval. Here we describe some dictionary based approaches related to our proposed approach.

A bi-lingual machine readable dictionary or thesaurus based query translation is well studied for different language pairs in cross language information Retrieval [1], [2], [3], [4].

Xi and Cho [5]. proposed a method to automatically construct a dictionary based on co-occurrence from English-Chinese parallel corpus for query translation. They used different approaches to calculate the candidate translation equivalent pair correlation degree.

Hull and Grefenstett [6] developed a multilingual IR systems at Xerox which translated French queries and English documents. Their approach works as follows: after morphological analysis, each term is replaced with its inflectional root and the system forms a translated query by looking at each root in the bilingual transfer dictionary. Missing terms are kept unchanged in the translated query. This translated query is then used to perform monolingual document retrieval.

Ballesteros and Croft [2] proposed phrasal translation approach to handle multi-term phrases in cross language information retrieval. In this work, authors focused on the local context analysis to find words and phrases related to each query. They compared this approach with local feedback approach to address the errors associated with the dictionary based translation of words and phrases.

Gelbukh [7] presented a thesaurus-based information retrieval system that enriches the query with the whole set of the equivalent forms. Their approach considers enriching the query only with the selected forms that really appear in the document base and thereby providing a greater flexibility.

Oard and Ertunc [8] proposed a translation based indexing approach in which translation and indexing processes area integrated to improve query time efficiency. This approach uses machine readable bilingual dictionary in which the document's language is the source language and query language is the target language. The idea is to add every possible translation of each document - language term in the index.

Garain et al. [9] described an approach
to deal the transliteration of out-of-vocabulary (OOV) terms in English into Bengali
to improve English-Bengali cross language information retrieval. They used a
statistical translation model as a basis for transliteration, and present evaluation
results on the FIRE 2011 datasets. Authors used Indri system with #syn operator to
handle OOV terms.

Recently, Ali Hosseinzadeh et al. [10]
presented a set of experiments in which the impact of applying Google and Bing
translation systems for query translation across multiple language pairs has been
compared for two very different cross language information retrieval tasks.

III. Objectives

Since the document retrieval process depends on the translation of the user query, getting the correct translation of the user query is of great interest. There could be many issues in getting the right translation. Terms present in the dictionary may have multiple meaning and it is essential to identify and choose the right meaning appropriate for the user information needs. Alternatively, query terms in the source language may or may not be present in the dictionary (for example, name of a person or a place). So the actual query terms in source language has to be mapped appropriately to its related query terms in the target language. So in the presence of resources like incomplete dictionary, inaccurate machine translation system, and insufficient tools, we have to identify the appropriate translation for the original user query. Also there might exist multiple translations for a given query. The right translation pertaining to user information needs to be identified from multiple translation outputs. The underlying corpus evidence may suggest a clue on selecting a suitable query that could eventually perform better document retrieval. In order to do this, we want to use the corpus in order to find the most appropriate query translation that could be used for better document retrieval.

IV. Cross language information retrieval

In this section, we describe the basic working principles of a cross language information
retrieval system. Users search for some information in a language of their choice
and we call this language the source language. The user looks for
information present either in the query language or in a different language which we
call the target language. Some cross-language IR systems first
perform the translation of the user query given in the source language to the target
language. Then using the translated query, the CLIR system performs document
retrieval in that target language and translates the retrieved documents in the
source language so that the users can get the relevant information in a language
that is different from their own.

In CLIR systems, translation and ranking are two major tasks.

A. Translation Task

In CLIR systems, either a query or a document has to be translated from SL to TL. We describe below both these methods:

Query Translation: Since a query is very short and
contains a few terms, it is convenient to translate it from SL to TL
and this task is much easier than translating the whole document.
Then the translated query is used for monolingual retrieval in the
target language.

Document Translation: Often query translation
suffers from certain ambiguities in the translation process, and
this problem is amplified when queries are short and
under-specified. In these queries, the actual context of the user is
hard to capture and this results in translation ambiguity. From this
perspective, document translation appears to be more capable of
producing more precise translation due to richer contexts.

In this work, query translation is much simpler compared with the document translation. So we used query translation to map the user query from source language to the target language and then performed monolingual document retrieval in the target language.

B. Document Ranking

Once documents are retrieved and translated back into the source language, a ranked list has to be presented based on their relevance to the actual user query in the source language. So a good ranking methodology is important in cross language information retrieval.

V. The proposed CLIR system

We present an approach to improve the cross lingual document retrieval using corpus driven
query suggestion (CLIR-CQS) approach. We have approached this problem of improving
query translation process in the cross language information retrieval by
accumulating the corpus evidence and using such evidences to re-formulate the user
query for better information retrieval. Here we assumed that a pair of languages:
(SL, TL) is chosen and a dictionary D is given
for this pair of languages.

A. Identifying Missing / Incorrect Translations

Any query translation system (using either a dictionary based or machine translation based
approach) translates the user query given in the source language
SL into the target language TL. For every
query term, we may either get one more terms correct meaning from the
dictionary. Such terms are referred to as synonyms. But there
are terms that have multiple meanings. Such terms are referred to as
polysemy terms. These polysemy terms having multiple
meaning may result in multiple terms during the translation. Since the
dictionary may have limited number of entries, we may have missing or incorrect
translation of the user query in language TL. To compensate for
the missing translation of query terms, we could use co-occurrence statistics
from the entire corpus. But this would take a substantial amount of query
processing time in an online system. So we use an initial set of document for
this purpose. We explore additional terms for partially translated query using
co-occurrence of terms in this initial set of retrieved documents. We present an
approach that handles the missing or incorrect translation of the user query and
improves the retrieval of information in the target language
TL.

Let us look at a case in which we have a partially correct translation of the original query.
In this case, some query terms are translated into the target language and some
are not. In case of missing translations, we use the co-occurrence statistics of
query terms in language SL and their translated terms in
language TL to identify the probable terms for missing
translations of query terms that could result in better retrieval of cross
lingual information retrieval.

B. Corpus Driven Query Suggestion Approach

In this section, we describe the Corpus driven Query Suggestion(CQS) approach for the missing or incorrect translations.

Let qTL be the translation (may be a correct or partially correct
or incorrect translation) of qSL. We consider the
case in which some query terms are translated into the target language and some
are not. In case of missing translations, we use the co-occurrence statistics of
query terms in the initial set of retrieved documents in language
SL and their translated terms in language
TL to identify the probable terms for missing translations
of query terms that could result in better retrieval of cross lingual
information retrieval. We say that two terms co-occur if any only if they appear
in the same text segment. In our experiments, we used paragraphs as the unit of
text segment.

Initial Retrieval: At first, the user query in
language SL is given to the search engine which
performs monolingual document retrieval in the source language
SL and retrieve top n documents in that
language: CCSL,Q. From this initial set
of documents, segment the text into paragraph units. From these text
segments, extract the list of terms that co-occur with any of the
query terms in the text segments. Let
QCOSL be the list of all terms that
co-occur with the original query term in the corpus:
CCSL,Q.

Query Translation: Now using a bi-lingual
dictionary, for each of the original query terms, find its
translations in the target language TL. Here we may
or may not find the translation for all query terms in the target
language. We assume that we are able to find the translation for at
least one query term. Then using the translated query
Q', we perform monolingual document retrieval
in the target language TL and retrieve top
n documents:
CCSL,Q' . We identify and extract
text segments from these n documents in the target
language TL based on paragraphs. From these text
segments, we extract the list of terms that co-occur with any of the
translated query terms. Let QCOTL be the
list of all terms that co-occur with the translated query
Q' in the corpus:
CCTL,Q'n.

From these two lists: QCOSL and
QCOTL, we organize the terms in the source and
target language as shown in Figure 2.

In this figure, each node corresponds to a term. More specifically large circled nodes represent the actual query terms in the source language and small circled nodes are the terms that co-occur with the query terms in the source language. Similarly, each big hexagon shaped node represents the translated query term in the target language and each small hexagon shaped node denotes the terms that co-occur with the translated query term in the target language. As shown in Figure 2, Group (A) contains actual query terms and terms that co-occur with these actual query terms, both in source language. Similarly, Group (B) contains the translated query terms and terms that co-occur with these translated query terms, both in the target language.

To understand the proposed methodology, we present an example illustrated in Figure 3 to show the mapping between the co-occurring terms so that the set of probable terms could be selected for incorrect or missing translations.

Fig. 3 Example that shows the identification of probable terms for incorrect translation.

We find all terms that co-occur with the query terms in SL using the
retrieved set of documents in SL. Similarly we find all terms
that co-occur with the translated query terms in TL using the
retrieved set of documents in TL. Since the number of
co-occurring terms may be very large, we can afford to select only a few of them
due to the actual query processing time in online. So we propose a method to
score the co-occurring terms. Based on the score, we may select the terms which
are more important. The method used for scoring is given below:

Weighting of query terms Using the initial set of documents, we compute the weights of the
terms that co-occur with the query terms. We consider the initial set of top
n documents retrieved for the user query in the source
language SL.

Let Q = q1; q2;...; qp be the query terms
and COq be the set of terms that co-occur in the
same text segment as term q.

We get the set of terms, say QCOq, that co-occur with all query
terms Q as follows:

(1)

Next, we describe the proposed approach for weighting of co-occurring terms in detail. At
first, for a given user query, we retrieve top n documents using any standard
monolingual IR system. Then for every term in the actual query
Q, we obtain the set of terms that co-occur in the same
text segment. In order to get a clue on the importance of the co-occurring
terms, we compute a weight for each term. In order to compute the weight, we
first define the term frequency and inverse document frequency of a co-occurring
term as follows:

The term frequency tf of each term is defined as the number of times it
occurs in n text segments. The idf of a term,
say qi is computed as follows:

(2)

The weight of each co-occurring term cti ∈
QCOq, (1 ≤ i ≤
|QCOQ|) is computed using the equation:

(3)

Here we explain three different approaches to estimate the weight of the term frequency (in the above equation) across n segments of the retrieved documents:

1) Term Frequency (tf): The tf of a term
cti is defined as the number of
time it occurs in n text segments.

2) Logarithmic Term Frequency (log tf): Logarithmic value of
the term frequency of the term cti

3) Average Term Frequency (avg tf): The avg
tf of a term cti is the
ratio between the total number of occurrences of the given term in
n text segments and the total number of text
segments in which that term occurs.

In this formulation, we use averageTF as an indicator for those terms that
could either be an entity or the term that tells the type of an entity. For
example, named entities may score more term weight giving an indication that its
equivalent translation may not exist in the dictionary. Based on the weights, we
select the co-occurring terms which are more important in exploring the query
terms for missing translations in the target language.

Next, we create a bipartite network by connecting the nodes in the group (A) with the nodes in group (B).

c. Bipartite Network: We have two different list of terms: one in the
source language SL and another one in the target language TL. Now we
create a bipartite network with the terms in
QCOSL and
QCOTL. Here each term is
considered as a node and we add link between a node in
QCOSL and a node in
QCOTL as follows: A term
qSL ∈
QCOSL is connected to a term
qTL ∈
QCOTL if <
qSL; qTL
> is found in the dictionary D. We have
illustrated an example showing the links between the terms in the
SL and the terms in TL in
Figure 3.

d. Term Importance: Next we perform the scoring of the co-occurring terms
of correct translations in the target language TL.
This scoring is used to find a list of candidate terms for missing
translations in the target language. We use the bipartite network to
find the importance of the terms in
QCOTL. For this, we estimate term
score tscore(qj) for
each term qj that has a link to a term
in QCOSL.

This term score tscore(qj) is calculated as
follows:

For each term qj 1 ≤ j ≤
|QCOTL|,
tscore(qj) of
jth term in QCOTL is
computed as:

(4)

where α is a factor to scale the term weights in the retrieved document collection.

e. Identify Probable Query terms for Missing Translations: Based on the
computed term scores, we sort the terms in
QCOTL and selected the terms with
high term scores. Since our method may not be able to get the exact
matching terms for missing translations, we add multiple terms that
represent different aspects of the missing translation. Then the
number of terms with high score is chosen as follows: We assume that
a set of topics denoted it by ntopics, would be
better to represent the missing terms in TL during
the translation process. Let nt be the number of
terms (in the original query) for which no equivalent translation
exists. Now we choose (ntopics×nt) terms from
QCOTL associated with each
missing query term. This list of probable terms is a representative
list for missing or incorrect translation of the query terms in the
target language TL.

f. Query Formulation: Using the list of probable terms and their
associated term scores, we perform a new weighted query formulation.
In this query formulation, we use the term score of each probable
term in the target language TL as the boost factor
and form a single weighted query. We give more boosting score for
the terms for which the one correct translation exists in the
dictionary. Otherwise, we distribute the score equally likely to all
correct translations. The reformulated weighted query is used by the
searcher to perform document retrieval.

g. Document Retrieval and Ranking: Now using the new weighted and
reformulated query, we perform document retrieval using BM25 as the
ranking function as described in Section VI-A. In fact, we use the
default parameters of BM25 ranking approach unchanged.

h. Output: Finally, we return the ranked list of top k documents from the
retrieved and ranked set of documents.

We present the pseudocode of the proposed approach in Algorithm V-B0h.

3: Using the dictionary based approach, find the translation of the user query in the
target language TL

4: Using this translated query, retrieve an initial set of n documents in the target language

5: Using the documents retrieved for the actual query, identify co-occurring terms of
the actual query terms in SL. We call this
list as QCOSL.

6: Using the documents retrieved for the translated query, identify the co-occurring
terms in the target language TL. We call
this list as QCOTL.

7: For each term in QCOSL and
QCOTL, we compute a term
weight.

8: Based on the term weights, we select the top scoring terms in
QCOSL and
QCOTL separately.

9: Using the selected top scoring terms, we create a bipartite network: Terms are
referred to as nodes. An edge from a node x in QCOSL to a
node y in QCOTL is added if and
only if the pair < x, y > exists in
the dictionary

10: Now compute the term importance score for each term in the target language using the Equation 4

11: Based on the term importance score, rank the terms in
QCOTL and choose top
d terms with their term importance
score.

12: Formulate a single new weighted query in the target language using the terms and their term importance scores which are used as boost factors

13: Perform document retrieval in the target language TL using the
newly formed weighted query and BM25 as the ranking
function.

14: Generate the final ranked list of documents in the target language
TL

15: return top k documents in TL as final
search results

Output: The ranked list of top k documents in the target language
TL.

In the next section, we present the details of our experiments with the proposed cross language document retrieval approach.

VI. Experimental results

A. Corpus

In this experiment, we considered cross language information retrieval approach on
Tamil and English languages. We have used
the multi-lingual adhoc news documents collection of FIRE2 datasets for our experiments. More
specifically, we have used English and Tamil corpus of FIRE 2012 dataset and
analyzed the effects of the proposed approach. FIRE 2012 is an incrementally
added collection documents from FIRE 2008, FIRE 2010 and FIRE 2011 corpus. The
coverage of documents in each of FIRE 2012 collection is listed in Table I. In this collection, Tamil
collection contains news documents more than English collection. English news
documents consists of more news documents at the national level where as Tamil
news collection covers more regional news.

Table I FIRE 2012 ad hoc dataset used in this CLIR experiment.

We have considered a set of 10 queries selected in Tamil which are listed in
Table II. We have used a
Tamil-English bilingual dictionary with 44,000 entries in which there are 20,778
unique entries and 21,135 terms have more than one meaning. We have used this
dictionary for translating query terms and also to map the terms co-occurring
with the correctly translated pairs.

Table II List of queries used in our experiments

We use Lucene3 for indexing and retrieval system with Okapi Best Matching 25 (BM25) ranking used in this paper.

a) Okapi BM25: To rank the final set of the retrieved documents, Okapi BM25
[11], [12] may be used as a ranking function. BM25 retrieval function
ranks a set of documents based on the query terms appearing in each document,
regardless of the inter-relationship between the query terms within a document.
Given a query Q and a document D, the similarity score between them is computed
using BM25 ranking function as follows:

(5)

where tfi,D is the term frequency of the query term
i in the document D; |D|
is the length of the document D and avgdoclength is the average
document length in the text collection; k1,
k1 ∈ {1:2; 2:0} and b, b = 0.75 are
parameters; and idfi is the inverse document
frequency of the query term i.

Then we retrieve top 20 documents for each query to perform the scoring of candidate terms. We have used three different approaches in weighting the co-occurring terms:

- Term Frequency: We use the standard counting of the term frequency across
n text segments to compute the weight of each
co-occurring term cti in
SL and TL languages:

(6)

- Logarithmic Term Frequency: We use logarithmic function to scale the term
frequency using the following equation:

(7)

- Average Term Frequency:avg tf is used to compute the weight of the
co-occurring term cti in both
languages:

(8)

where m denotes the total number of segments in which the term
cti occurs.

We have found that Average Term Frequency (avgt f )
captures query terms that maps to the translations of the query terms in the
target language related to the actual context of the user query. Table III shows the lists of top terms
ranked by different term weighting approaches.

Table IV shows the monolingual retrieval
performance with FIRE 2012 Tamil Corpus. This is to show the coverage of news
documents for the selected query terms in the source language:
Tamil.

Table IV Selected queries in Tamil, the dictionary translations in English and the retrieval efficiency in Tamil monolingual retrieval.

* Calcutta and Telegraph are the most frequent terms occur in most of the documents. So these terms are not included in our derived query terms.

B. Comparisons

We have considered 4 different methods to evaluate the proposed approach.

- CLIR with Dictionary Based Approach (CLIA-DICT): In this experiment, we
used the dictionary based approach to translate the user query in
source language into the target target language and then documents
are retrieved in the target language.

- CLIR with Google Translation Tool (CLIA-GTT): In this experiment, we
considered the machine translated query in English using Google
translation tool4 in
the period between January 30 and February 9, 2015.

- CLIR with the Proposed Approach (CLIA-CQS): The proposed approach is
applied to perform translation of the query from SL
to TL using the procedure described in section V-B.
In this experiment, we have used top 20 documents are used as the
initial set of relevant documents and additional terms are explored
for the missing translations.

- CLIR with the Manual Reference Translation (CLIA-REF): Finally we have
manually translated the user query into the target language and then
performed document retrieval in the target language.

We have manually evaluated the top 10 retrieved documents for each query in the 3-points
scale: relevant (1.0), partially relevant (0.5) and irrelevant (0). We used the
measure precision @ top n documents for each
query and tabulated the results. Table V
presents the details of our experiments done in CLIR with Dictionary based
Translation (CLIR-DICT); CLIR with machine translation of user queries with
Google translation tool (CLIR-GTT); CLIR with the proposed corpus based query
selection approach (CLIR-CQS); and CLIR with Manual Reference Translation
(CLIR-REF). We used Google translation tool (GTT)5 to translate the user query given in Tamil language
into English language.

Consider the query ID 1. In this query, there are three Tamil query terms: { Vengai,
Marangal, Kadaththal }. The term Vengai may refer
to two variations: Vengai, type of a tree whose botanical name
is Pterocarpus marsupium or leopard, and
animal; Marangal, trees: the correct translation; and finally
Kadaththal may refer to at least three variations:
trafficking or smuggling or
stealing. This would give 2 x 1 x 3 = 6 different queries.
We identify a set of terms that boosts these query variations and then choose
top k terms to form the single weighted query using query terms weighting
approach.

During the evaluation of the proposed approach, we have used 3-points scale for making
relevant judgments. We have considered top 10 documents for each query and
manually evaluated the retrieved results using the metric: precision @
top k documents. The preliminary results show that the proposed
approach is better in disambiguating the query intent when query terms that have
multiple meanings are given by the users. The average access time for terms set
in Tamil is 765.3 milliseconds and 97.8 milliseconds in English. Since the
retrieval of initial set of documents and finding co-occurrence terms from this
initial set of documents take very negligible amount of time (less than 2
seconds even for top 50 documents), we did not consider the retrieval time
comparison in this work.

VII. Conclusion

We have presented a cross language document retrieval approach using corpus driven query suggestion approach. In this work, we have used corpus statistics that could provide a clue on selecting the right query terms when translation of a specific query term is missing or incorrect. Then we rank the set of the derived queries and select the top ranked queries to perform query formulation. Using the re-formulated weighted query, cross language information retrieval is performed. We have presented the comparison results of CLIR with Google translation of the user queries and CLIR with the proposed corpus based query suggestion. The preliminary results show that the proposed approach seems to be promising and we are exploring this further with graph based approach that could unfold the hidden relationships between query terms in a given pair of languages.

Acknowledgment

Authors gratefully acknowledge the support extended by Dr. Philip O'Reilly of University College Cork, Cork, Ireland during the last stage of this work.