NLP/Paper Review

[SentenceSimilarity] RUSE : Regressor Using Sentence Embeddings for Automatic Machine Translation Evaluation 리뷰

joannekim0420 2021. 8. 12. 15:25
728x90

Regression Model for MTE

→ segment-level MTE metric for to-English language pairs. 

→ estimates the translation quality as real number from MT hypothesis t and a reference translation r. 

→ 단순 문자 matching만 확인하는 character나 N-grams로 는 얻을 수 없는 global information을 universal sentence embeddings로 얻는다.

Universal Sentence Embeddings

  • InterSent 

supervised model + Stanford Natural Language Inference dataset으로 sentence embedding + classification task 

sentence pair u and v → sentence embeddings 벡터 u, 벡터 v

 

 

 

 

 

 

 

 

 

 

 

 

  • Quick-Thought

 

 

 

 

unsupervised + input sentence(w. context) + document classification tasks

 

 

 

 

 

 

 

 

  • Universal Sentence Encoder

estimate neighboring sentences for unsupervised learning + tasks conversational input-response and natural language inference for supervised learning 

document classification + semantic textual similarity tasks

 

DATASET (to-English)

WMT16 test = segment-level WMT15 9:1(총 2000 instance) 비율 training, dev set 나눔

WMT 17 test = segment-level WMT15 + WMT16 (총 5360) 9:1 비율로 training, dev set 나눔

이후, segment-level WMT15, WMT16, WMT17 (총 9280) 9:1 비율로 training, dev set 나눔

 

→ regression 모델은 MLP(Multi Layer Perceptron) 과 SVR(support Vector Regression) 사용해서 비교

 

RESULT

  • segment-level Pearson correlation

  • System-level Pearson correlation

conclusion

→ universal sentence embeddings can more accurately consider the similarity between the MT hypothesis and the reference 

 

 

 

논문 : https://aclanthology.org/W18-6456.pdf