A Study of English-Vietnamese Statistical
Machine Translation
Hoang Cuong
Faculty of Information Technology
University of Engineering and Technology
Vietnam National University, Hanoi
Supervised by
Prof. Pham Bao Son
A thesis submitted in fulfillment of the requirements for the degree of
Master of Computer Science
December, 2012
ORIGINALITY STATEMENT
‘I hereby declare that this submission is my own work and to the best of my knowledge
it contains no materials previously published or written by another person, or substantial proportions of material which have been accepted for the award of any other degree
or diploma at Vietnam National University, Hanoi or any other educational institution,
except where due acknowledgement is made in the thesis. Any contribution made to the
research by others, with whom I have worked at Institute for INFOCOMM Research,
Singapore (I2R), Vietnam Institute for Advanced Study in Mathematics, Hanoi (VIASM) or elsewhere, is explicitly acknowledged in the thesis. I also declare that the
intellectual content of this thesis is the product of my own work, except to the extent that
assistance from others in the project’s design and conception or in style, presentation
and linguistic expression is acknowledged.’
Signed ........................................................................
i
APPROVAL
I, the supervisor, hereby approve that the Thesis in its current form is ready as the final
version at the University of Engineering and Technology, Vietnam National University,
Hanoi.
Prof. Pham Bao Son
ii
iii
x
ABSTRACT
Previous works from Vietnamese statistical machine translation (SMT) community research just focus on some top “researches” of the field. Some are based on the ideas
which are really simple. We lack a fundamental work on the core of SMT system to make
a significantly solid work on the statistical English-Vietnamese translation. We also lack
some large bilingual corpora with high quality. This work will overcome that problem.
We present a fundamental and primitive study of English-Vietnamese statistical machine translation. We make a serious research to the core of any SMT system such as
exploiting bilingual corpora, improving word alignment or phrase translation modeling
quality. We also focus on developing a better evaluation metric for tuning SMT system.
We especial try our best to make a fundamental and solid work on building or improving
the performance of the English-Vietnamese SMT system in overall.
Though we focus on the English-Vietnamese pair. In every aspect, we also deploy
and compare our research to the pair English-French to have a deeper view. We hope our
work research will be a solid work for other studies on deploying and improving the SMT
for English-Vietnamese machine translation systems.
Publications:
• Cuong Hoang, Cuong-Anh, Le, Thai-Phuong, Nguyen, Bao-Tu, Ho. Exploiting
Non-Parallel Corpora for Statistical Machine Translation. In Proceedings of the
international conference on Information and Communication Technologies (RIVF
2012)1 .
• Cuong Hoang, Cuong-Anh, Le, Son-Bao, Pham. A Systematic Comparison Between Various Statistical Alignment Models for Statistical English-Vietnamese
Phrase-Based Translation. In Proceedings of the 4th international conference on
Knowledge and Systems Engineering (KSE 2012).
1
Best Student Paper Award
iv
v
• Cuong Hoang, Cuong-Anh, Le, Son-Bao, Pham. Refining Lexical Translation
Training Scheme for Improving The Quality of Statistical Phrase-Based Translation. In Proceedings of the 3th international symposium on Information and Communication Technology (SoICT 2012).
• Cuong Hoang, Cuong-Anh, Le, Son-Bao, Pham. Improving the Quality of Word
Alignment By Integrating Pearson’s Chi-square Test Information. In Proceedings of the international conference on Asian Language Processing (IALP 2012).
ACKNOWLEDGEMENTS
Life is so valuable when we found something which is merit to chase
First, I would like to express my deep gratitude to my supervisor - Prof. Pham Bao
Son - who has been also my iconic researcher in Vietnam since I was as a freshman. I also
want to thank Prof. Le Anh Cuong for his so-much-careful supervision for me, though
he do not register me as his student to the school. For both of them, I own their patient
guidance and support through-out the years.
I would like to give my honest appreciation to my other unofficial supervisors - Prof.
Ho Tu Bao (JAIST, Japan), Prof. Zhang Min (I2R, Singapore), Prof. Nguyen Xuan Long
(Michigan, USA) - for their great support. They are, in diversified perspectives, have been
helping my passion in Computer Science increases intensively.
I sincerely acknowledge Vietnam National University, Hanoi. I want to thank some
of my best teachers - Dr. Nguyen Van Vinh, Dr. Nguyen Phuong Thai or Prof. Nguyen
Le Minh (JAIST, Japan) who make many useful discussion. Especially, I want to give
my honest appreciation to the Statistical Language Processing Laboratory I work at Infocomm for Institute Research, Singapore (I2R) for the infrastructure and other uncountable
support. I would like to thank my Chinese friends here - Yun Huang, Prof. Yue Zhang,
Jun SUN and Yanxia Qin. I wish I could work with them as much longer as possible. I
also want to make a appreciation to some of my best friends - Dinh Xuan Nhat, Nguyen
Dao Thai - for their helps in my work.
Finally, this thesis would not have been possible without the support and love of my
Family - Dad, Mum, my Sister - Lan Ni and her small family. Without their support in
variety perspectives, I sure that I cannot finish my Master Degree in this way!
And ...
To love, My Mimosa ♥ !!!
vi
Table of Contents
1
Introduction
1.1 Statistical Machine Translation - An Overview . . . . . . . . . . . . . . .
1.2 Literature Survey on English-Vietnamese Machine Translation . . . . . .
1.3 Our Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4 Thesis Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4.1 Exploiting non-parallel corpora for statistical machine translation
1.4.2 Systematic comparison between various statistical alignment models for statistical English-Vietnamese phrase-based translation . .
1.4.3 Improving word alignment models . . . . . . . . . . . . . . . . .
1.4.4 Improving phrase translation modeling . . . . . . . . . . . . . .
1.4.5 Developing an evaluation metric for SMT . . . . . . . . . . . . .
vii
2
2
7
9
9
10
10
11
11
12
List of Figures
1.1
1.2
1.3
1.4
The architecture of the translation approach based on source-channel models
An example of word alignments between the pair of English-French . . .
An example of phrase alignments between the pair of English-German . .
The architecture of the translation approach based on log-linear models .
viii
3
4
4
6
List of Tables
ix
List of Abbreviations
AFA
EM
GIS
HMM
MERT
MIRA
ME
NLP
SMT
TER
Average Fraction of best Alignment
Expectation-Maximization
Generalized Iterative Scaling
Hidden Markov Model
Minimum Error Rate Training
Margin Infused Relaxed Algorithm
Maximum Entropy
Natural Language Processing
Statistical Machine Translation
Translation Edit Rate
1
Chapter 1
Introduction
Machine translation (MT) is the automatic translation from one natural language into another using computers. It has since remained a key application in the field of natural language processing (NLP). Statistical machine translation (SMT) is a machine translation
approach in which we treat the translation complication as a machine-learning problem.
From the analysis of many “samples” of human-produced translation, the probabilistic
parameters of the translation system will be generated based on the basis of statistical
models.
1.1
Statistical Machine Translation - An Overview
SMT treats translation as a machine learning problem. This means that we apply a learning algorithm to a large body of previously translated text, known variously as a parallel
corpus, parallel text, bitext, or multitext. The learner is then able translate previously
unseen sentences. Basically, the quality of any SMT system depends crucially on the
quantity, quality, and domain of the data. As the result, constructing bilingual corpora
which contain millions of words is a vital work in building a statistical machine translation system.
We are given a source sentence f = f1J = f1 , ..., fj , ...fJ , which is to be translated into
a target sentence e = eI1 = e1 , ..., ei , ..., eI . Among all possible target sentences, we will
choose the sentence with the highest probability:
eˆI1 = arg max P r(eI1 |f1J )
eI1
(1.1)
Previously, according to the Bayes’ decision rule, we can equivalently to Eq. 1 per2
1.1. Statistical Machine Translation - An Overview
3
form the following maximization:
eˆI1 = arg max P r(eI1 ) · P r(f1J |eI1 )
eI1
(1.2)
This approach is referred to as the source-channel approach to SMT and is firstly
proposed by (Brown et al., 1993). The architecture of the translation approach based on
source-channel models could be described in Fig 1.1 (Och & Ney, 2002).
Figure 1.1: The architecture of the translation approach based on source-channel models
Basically, two “features” in this model are translation model (P r(f1J |eI1 )) and language model (P r(eI1 )).
For the first one, it is easier to model the “generative language”
though the role of language model is so important. Building language model from large
monolingual corpora could significantly improve the quality of an SMT system (Brants
et al., 2007). An important notice is that syntactic language model is gaining attention in
language model research (Charniak et al., 2003).
For the second one, the statistical translation models were initially word based. The
idea of word-based translation could be traced back to (Brown et al., 1990). They intro-
4
Chapter 1. Introduction
duce the idea of an alignment between a pair of strings as an object indicating for each
word in the French string that word in the English string from which it arose. We take an
example from (Brown et al., 1993). Alignments are shown graphically, as in Figure 1.2,
by drawing lines, which we call connections, from some of the English words to some of
the French words. For the estimation of word translation parameter, we use each wordbased translation model such as IBM Models 1-5 (Brown et al., 1993), Hidden Markov
Model (HMM) (Vogel et al., 1996) or IBM Model 6 (Och & Ney, 2003).
Figure 1.2: An example of word alignments between the pair of English-French
However, the significant advances were made with the introduction of phrase based
models (Koehn et al., 2003). We take an example from (Koehn, 2010). Phrase alignments
are shown graphically, as in Figure 1.3, by drawing lines, which we call connections, from
groups of English words to groups of the French words.
Figure 1.3: An example of phrase alignments between the pair of English-German
In fact, the best performing systems are based in some ways on phrases (or the groups
of words) in SMT. The basic idea of phrase-based translation is to learn to break a given
1.1. Statistical Machine Translation - An Overview
5
source sentence into phrases, then translate each phrase and finally compose the target
sentence from these phrase translations (Koehn et al., 2003; Och & Ney, 2004). However,
the step of phrase learning, which is a vital component in a phrase-based SMT system,
heavily relies on the alignments between words.
One of the main disadvantages of phrase-based translation is the way of extracting
phrase or phrase translation modeling is unrelated to the lexical probability values which
are estimated by word-based translation models. Some researches also try to automatically extract phrases without relying on the alignments (Marcu & Wong, 2002). However,
it has been proved in practice that it does not gain a better result than our traditional approach (Koehn et al., 2003).
Another important disadvantage of our purity phrase-based translation model is the
integrating linguistics information problem. It is very hard to integrate linguistics information to improve the extracting phrases (Koehn et al., 2003). Some researches focus on
a syntactic transformation model in the pre-processing phase which reorder the structure
of source sentence so that it is closer to the structure of target sentence (Collins et al.,
2005). Basically, the idea of pre-processing is simple but we could obtain a significantly
improvement.
Lately, modern SMT paradigm has been moving from the source-channel approach to
a generalization model as the log-linear model. Basically, instead of using only two features, we could combine many features (including both of them) as a mixture framework.
That framework contains as special case the source channel approach. Basically, we combine many features hm (eI1 , f1J ) that are weighted as λm (m = 1, ..., M ) and multiplied
together. We have the decision rule:
M
eˆI1
=
arg max P r(eI1 |f1J )
eI1
λm hm (eI1 , f1J )
= arg max
eI1
(1.3)
m=1
The architecture of the translation approach based on log-linear model approach could
be described in Fig. 1.4 (Och & Ney, 2002).
This approach has been suggested by (Och & Ney, 2002; Och & Ney, 2004). Deploying log-linear model, we could obtain a significantly better translating system. Till now,
almost state-of-the-art translation frameworks use this model (Och & Ney, 2004; Mari`oo
et al., 2006; Chiang, 2007). Basically, using more features helps us gain a much better
translating system. As the result, many researches focus on automatic finding appropriate
features in which the range of them could be from 10-15 (Och & Ney, 2002), thousands
(Chiang et al., 2009) or even millions of features.
6
Chapter 1. Introduction
Figure 1.4: The architecture of the translation approach based on log-linear models
Another big advantage of this approach is the easy integration of phrase-based translation with syntax-based translation model. Basically, syntax-based translation is based
on the idea of translating syntactic units, rather than single words or strings of words
(Yamada, 2003; Galley et al., 2004). The syntax-based translation is widely focused
based on the advent of strong stochastic parsing techniques (Collins, 2003; Steedman,
2000). The idea for that integration is the hierarchical phrase-based translation (Chiang,
2005; Chiang, 2007). In practice, hierarchical phrase pairs improve translation accuracy
significantly compared with a state-of-the-art phrase-based system.
Often, the training procedure for statistical machine translation models is based on
maximum likelihood or related criteria. Hence, from the beginning days, we use the
GIS (Generalized Iterative Scaling) algorithm (Darroch & Ratcliff, 1972). However, a
general problem of this approach is that there is only a loose relation to the final translation
quality on unseen text. Interestingly, (Och, 2003) presented alternative training criteria for
log-linear statistical machine translation models which are directly related to translation
quality helps us a significantly better result.
1.2. Literature Survey on English-Vietnamese Machine Translation
7
Therefore, we encounter with a very exciting problem. Giving a development corpus,
how to develop a metric which is better to obtain a better optimizing weight results. For
the field, BLEU (Papineni et al., 2002) is widely consider as the standard automatic
SMT evaluation method. The original idea of BLEU, compares SMT output with expert
reference translations in terms of the statistics of short sequences of words (word n-grams,
1 ≤ n ≤ N , N is fixed).
Following to BLEU, researchers also developed other variants, such as: METEOR
(Banerjee & Lavie, 2005), or NIST (Doddington, 2002), etc. This idea is elegant in
its simplicity. In fact, the matching n-grams approach has frequently been reported as
correlating well with human judgement (Burch et al., ). Integrating those evaluation
metrics, we use MERT (Och, 2003) for a small scale or MIRA (Margin Infused Relaxed
Algorithm) (Crammer et al., 2006) for a larger scale to tune the weights. However, the
traditional N -grams approach has many problems which we will exploit later.
1.2
Literature Survey on English-Vietnamese Machine Translation
From 1990s, many researches focus on English-Vietnamese machine translation in the
statistical perspective. Basically, these researches focus on the top perspective of SMT
such as adopting syntax-based translation system to our pair of languages, incorporate
linguistic and syntactic information into the translation model, etc. Some others try to
improving other studies such as exploiting bilingual corpora , improving word alignment
quality, etc. We will take a glance on notable researches.
Some researches focus on a syntactic transformation model in the pre-processing step.
That is, we reorder the structure of source sentence so that it is closer to the structure of
target sentence. This idea inspires from (Collins et al., 2005). Also in the pre-processing
step, a dependency-based parser together with a set of additional “hand-crafted” rules to
generate the transformation is used (Hoang et al., 2008). To besides, reordering at trunk
level and incorporate the global reordering model into the decoder could also improve not
only the quality but also the speed of the system (Nguyen et al., 2008b).
One another problem is how to incorporate linguistic and syntactic information directly into the translation model. From above we known that syntactic information can
be integrate into phrase-based SMT using syntax-based SMT approaches. It is expected
that the combination models will outperform the best phrase-based systems in the near
8
Chapter 1. Introduction
future. This is a very hot topic in SMT. Among them, Tree-to-String (Nguyen & Shimazu, 2006b; Nguyen et al., 2008a) method is applied to language pairs with a syntactic
parser is available for source language while Tree-To-Tree (Cowan et al., 2006) can be
applied for language pairs which both have a syntactic parser. Linguistic knowledge of
language can be introduced in the preprocessing phase using a morphological analysis or
POS tagger on the source sentence (Nguyen & Shimazu, 2006a). The more detail survey
on this topic could be referred to (Ho et al., 2008).
Others also try to improve word alignment quality in simple adaptation for the pair
English-Vietnamese. Some of them use some heuristics constraint rules (Hung & Cuong,
2012). Some others try to improve word disambiguation (Nguyen & Dinh, 2012; Nguyen
et al., 2012) or to improve the quality of word reordering (Hoang et al., 2008; Thi &
Dinh, 2008; Van Nguyen et al., 2009). In fact, word reordering is as one of the very
difficult problems for the pair English-Vietnamese (Hoang et al., 2012d). The reordering
translation is not always consistent. For example, the first word in the source sentence
could align to the first word in the target sentence. However, it could also align to the last
word or others. Almost researches are simple to overcome that complex problem. For
this phenomenon, we will exploit in later section.
Translation quality is often disappointed when a phrase based machine translation system deals with long sentences. Because of syntactic structure discrepancy between two
languages, the translation output will not preserve the same word order as the source.
When a sentence is long, it should be partitioned into several clauses and the word reordering in the translation should be done within clauses, not between clauses. (Hung
et al., 2012) propose a rule-based technique which they use to split long Vietnamese
sentences based on linguistic information.
To besides, some researches focus on constructing parallel corpora (Dang & Ho, 2007;
Hung & Cuong, 2010; Do et al., 2010b; Do et al., 2010a). Since Vietnamese is a
scared language, the parallel corpora from our low-resource English-Vietnamese pair is
extremely scarce. From those languages, non-parallel corpora are much more available
from various resources in different domains, such as from Wikipedia, News websites,
etc. though we have to bear the low quality of the articles written in those languages.
However, the methods from those works in regardless of supervised or un-supervised
approach are usually too simple and not suitable for extracting bitext from “noise” nonparallel corpora. This is one of the main disadvantage of many previously researches
when we cannot deploy the works on a large training data.
1.3. Our Work
1.3
9
Our Work
In fact, previous works just focus on some top “researches” of the field. Some are based on
the ideas which are really simple. We lack a fundamental work on the core of SMT system
to make a significantly solid work on the statistical English-Vietnamese translation. We
also lack some large bilingual corpora with high quality. This work will overcome that
problem.
We present a fundamental and primitive study of English-Vietnamese statistical machine translation. That is, we make a solid work to the core of any SMT system such
as exploiting bilingual corpora, improving word alignment, phrase translation modeling
quality or developing a better evaluation metric for tuning SMT system. We make a big
effort to creat a fundamental and solid research on building English-Vietnamese SMT
system.
Though we focus on the English-Vietnamese pair. In every aspect, we also deploy and
compare our research to the pair English-French to have a deeper view and significant
consider. Many of them could equally deploy to any pairs of language. We also develop
LGIZA, which is small and fast, as our developed IBM Models Toolkit freely available1 .
We develop LGIZA be easily expand by other researchers to deploy their ideas in the
simplest way.
In the process of preparing this work, we submit few parts of our work to some international conferences2 . Some anonymous reviewers recommend our work as a “solid
work” or a “good document for researchers in the field”. This makes us a great force to go
on. We hope our solid research will be a reliable fundamental work for other studies on
deploying and improving the SMT for English-Vietnamese machine translation systems.
1.4
Thesis Contents
This work will cover many aspects which we mentioned above. Each aspect will be split
in each separately chapter. We list the tasks iteratively as bellow:
• Exploiting non-parallel corpora for statistical machine translation
• Systematic comparison between various statistical alignment models for statistical
English-Vietnamese phrase-based translation
1
2
LGIZA is available on: />Some others are still in reviewing progress or have not submitted yet
10
Chapter 1. Introduction
• Improving word alignment models
• Improving phrase translation modeling
• Developing a better evaluation metric for tuning SMT system
1.4.1
Exploiting non-parallel corpora for statistical machine translation
Constructing bilingual corpora which contain millions of words is a vital work in building a statistical machine translation system. This task is extremely difficult for the lowresource languages. There do not exist any parallel corpus with large and high quality
bitext data. In addition, the quality of some traditional non-parallel corpora, such as
Wikipedia, is extremely “noise” because of the low quality of the articles written in those
languages.
This work will overcome that problem. We deploy an efficient non-parallel corpora
extracting framework with the core is our proposed similarity measurement. As the vital
task, that similarity metric is used for each clarifying decision with a significantly better
precision and recall performance than previous works. We test the performance of our
framework for the low-resource Vietnamese language and obtain more than 5 millions of
words bitext English-Vietnamese data.
1.4.2
Systematic comparison between various statistical alignment
models for statistical English-Vietnamese phrase-based translation
In statistical phrase-based machine translation, the step of phrase learning heavily relies
on word alignments. This paper provides a systematic comparison of applying various statistical alignment models for statistical English-Vietnamese phrase-based machine translation. We will also invest a heuristic method for elevating the translation quality of using
higher word-alignment models by improving the quality of lexical modeling. In detail, we
will experimentally show that taking up the lexical translation seems to be an appropriate
approach to force “higher” word-based translation models be able to efficiently “boost”
their merits. We hope this work will be a reliable comparison benchmark for other studies
on using and improving the statistical alignment models for English-Vietnamese machine
translation systems.
1.4. Thesis Contents
1.4.3
11
Improving word alignment models
Refining training scheme
Under word-based alignment, frequent words with consistent translations can be aligned
at a high rate of precision. However, the words that are less frequent or exhibit diverse
translations in training corpora generally do not have statistically significant evidences
for confident alignments (Ker & Chang, 1997). In this work, we will focus on proposing
a bootstrapping algorithm to capture those less frequent or exhibit diverse alignments.
Interestingly, we avoid making any explicit assumption concerning with the pair of languages used. As the result, we take the experimental evaluations on two phrase-based
translation systems: the English-Vietnamese and English-French translation systems. Experiments point out a significant “boosting” capacity for the quality in overall for both
these tasks.
Improving the diverge of parameters
modeling the distortion parameter in word-based alignment models better is one of the
most difficult problems for improving the quality of a statistical machine translation. This
is drastically tough for the case of English-Vietnamese because the difference from grammatical structure between this language pair is quite clear. In this work, we will focus on
improving the the training scheme of alignment models for elevating the quality of translation system in overall. That is, we propose an algorithm which makes word translation
parameter more divergent to reduce the search space for finding the best “Viterbi” alignment sequences. The experimental results confirm that our method significantly boosts
the quality of the system on various test sets.
1.4.4
Improving phrase translation modeling
Phrase translation modeling, which is very heuristic and noisy, is one of the most difficult
problems in statistical phrase-based translation. The noise leads us many disadvantages
such as: 1. We are hard to gain an exactly improved result in system’s final evaluation
(such as BLEU, NIST, METOR, etc.) when improved word alignment quality. 2. It becomes less impact to improve word alignment in the research though its quality is still
an open question. This work will overcome that problem. We point out the inconsistency problem of phrase translation modeling. Hence, we propose a novel method which
12
Chapter 1. Introduction
directly integrates word-to-word translation parameter into phrase translation weight estimation. This method reduces deeply the effect of noise reduction phenomenon. We
evaluate our approach on the WMT10 French-to-English task, and show significant improvements on parallel data sets of different scales. By gaining these advantages, we also
show the much better improvement for the upgraded systems trained on the tasks when
improved word alignment quality.
1.4.5
Developing an evaluation metric for SMT
Tuning the parameters from a log-linear model is an important step to find out the best fitting weights for model. Modern tuning techniques directly use automatic evaluation metrics as the training criteria for optimizing the system. This aspect is very important since
we have been forced to find good evaluation metrics. Many machine translation evaluation metrics have been proposed after the seminal BLEU metric. They have been found
to outperform BLEU, demonstrated by the better correlations with human judgments. We
hope that to train machine translation systems using these new metrics can lead directly
to advances in automatic machine translation. However, to our knowledge though, there
has been no unambiguous report that we can improve a state-of-the-art machine translation system over its BLEU-tuned baseline.
In this work, we will present a novel
automatic evaluation metric, entitled SEMI. It bases on the phrasal overlapping measurement scheme and especially favours the grading scheme with longer n-gram matchings.
We evaluate our metric on the the WMT10 French-to-English task. We will show that
the evaluation metric is the first one significantly led directly to advances in automatic
machine translation on parallel data sets of different scales.
Bibliography
AbduI-Rauf, S., & Schwenk, H. (2009). On the use of comparable corpora to improve
smt performance. Proceedings of the 12th Conference of the European Chapter of
the Association for Computational Linguistics (pp. 16–23). Stroudsburg, PA, USA:
Association for Computational Linguistics.
Abdul-Rauf, S., & Schwenk, H. (2009). Exploiting comparable corpora with ter and terp.
Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from
Parallel to Non-parallel Corpora (pp. 46–54). Stroudsburg, PA, USA: Association for
Computational Linguistics.
Abdul Rauf, S., & Schwenk, H. (2011). Parallel sentence generation from comparable
corpora for improved smt. Machine Translation, 25, 341–375.
Achananuparp, P., Hu, X., & Shen, X. (2008). The evaluation of sentence similarity
measures. Proceedings of the 10th international conference on Data Warehousing and
Knowledge Discovery (pp. 305–316). Berlin, Heidelberg: Springer-Verlag.
Adafre, S. F., & de Rijke, M. (2006). Finding Similar Sentences across Multiple Languages in Wikipedia. Proceedings of the 11th Conference of the European Chapter of
the Association for Computational Linguistics, 62–69.
Banerjee, S., & Lavie, A. (2005). Meteor: An automatic metric for mt evaluation with
improved correlation with human judgments (pp. 65–72. ).
Banerjee, S., & Pedersen, T. (2003). Extended gloss overlaps as a measure of semantic
relatedness. Proceedings of the 18th international joint conference on Artificial intelligence (pp. 805–810). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Bertoldi, N., Haddow, B., & Fouet, J.-B. (2009). Improved Minimum Error Rate Training
in Moses. The Prague Bulletin of Mathematical Linguistics, 91, 7–16.
13