Regression Model for MTE
→ segment-level MTE metric for to-English language pairs.
→ estimates the translation quality as real number from MT hypothesis t and a reference translation r.
→ 단순 문자 matching만 확인하는 character나 N-grams로 는 얻을 수 없는 global information을 universal sentence embeddings로 얻는다.
Universal Sentence Embeddings
- InterSent
supervised model + Stanford Natural Language Inference dataset으로 sentence embedding + classification task
sentence pair u and v → sentence embeddings 벡터 u, 벡터 v
- Quick-Thought
unsupervised + input sentence(w. context) + document classification tasks
- Universal Sentence Encoder
estimate neighboring sentences for unsupervised learning + tasks conversational input-response and natural language inference for supervised learning
document classification + semantic textual similarity tasks
DATASET (to-English)
WMT16 test = segment-level WMT15 9:1(총 2000 instance) 비율 training, dev set 나눔
WMT 17 test = segment-level WMT15 + WMT16 (총 5360) 9:1 비율로 training, dev set 나눔
이후, segment-level WMT15, WMT16, WMT17 (총 9280) 9:1 비율로 training, dev set 나눔
→ regression 모델은 MLP(Multi Layer Perceptron) 과 SVR(support Vector Regression) 사용해서 비교
RESULT
- segment-level Pearson correlation
- System-level Pearson correlation
conclusion
→ universal sentence embeddings can more accurately consider the similarity between the MT hypothesis and the reference
'NLP > Paper Review' 카테고리의 다른 글
[SentenceSimilarity] SentSim : Crosslingual Semantic Evaluation of Machine Translation 리뷰 (0) | 2021.08.18 |
---|---|
RoBERTa : A Robustly Optimized BERT Pretraining Approach 리뷰 (0) | 2021.08.18 |
How Multilingual is Multilingual BERT? 리뷰 (0) | 2021.08.04 |
[SentenceSimilarity] MTE with BERT Regressor (0) | 2021.08.04 |
[SentenceSimilarity] BERTScore : Evaluating Text Generation With BERT (0) | 2021.08.04 |