[SentenceSimilarity] SentSim : Crosslingual Semantic Evaluation of Machine Translation 리뷰

NLP/Paper Review

[SentenceSimilarity] SentSim : Crosslingual Semantic Evaluation of Machine Translation 리뷰

joannekim0420 2021. 8. 18. 09:25

728x90

FOCUS

Mutilingual BERT 를 이용하면 reference sentence의 필요성이 없음
Sentence Semantic Similarity는 sentence embedding 과 word embedding을 linerly combine → word & compositional semantic

METHODS

WMD (Word Mover's Distance)

→ 문서 A와 문서 B의 비슷한 단어 간 words distance

(=computing the semantic distance between two text documents by aligning semantically similar words and capturing the word traveling flow between the similar words utilizing the vectorial relationship between their word embeddings. )

BERTScore

reference sentence 와 machine-generated sentence의 semantic similarity 를 계산.

SSS (Semantic Sentence Similarity)

두 문장을 요약하는 두 vector들의 cosine distance으로 sentence similarity 계산

SENTSIM

(A: sentence-level metric , B: token-level metric)

semantic similarty → token similarity 에 적용하기 위해, semantically fine-tuned sentence embedding된 문장 cosine similarity와 contextual word embeddings를 combine

DATASET

Multi-30k - 2018 English-German / English-French image description dataset (2000 sentence tuples each)
WMT17 - German,Chinese,Latvian,Czech,Finnish,Turkish,Russian (to-English ) / Russian,Chinese (-from English) (560 sentence tuples) →main experimental data
WMT20 - Sinhala,Nepalese, Estonian (-to English) / German, Chinese, Romanian, Russian (-from English) (1000 sentence tuples) → crosslingual evaluation

WMT17 예시

reference 문장과 비교했을 때, BERTScore가 원 문장의 부정의 의미를 갖는 문장에 더 높은 점수를 준다.

SSS는 MT1, MT2에는 높은 점수를 주고 MT3, MT4에는 낮은 점수를 준다.

→ BERTScore와 SSS를 combine = SentSim

RESULT

Pearson Correlation with human scores for Multi-30K with Roberta-Base in the SRC-MT(Source - Machine Translation) and MT-REF(Machine Translation - Reference) settings.

For the latter we evaluate German to German and French to French as monolingual tasks

Pearson Correlation with human scores for the WMT-17 with Roberta-Base in the SRC-MT(Source - Machine Translation) setting.

→ multi-30k 와 WMT-17의 결과는 sentence length의 차이로 인해 발생.(multi-30k는 12-14word 이고, WMT17은 이 보다 길어서 word alignment 을 보는 BERTScore에서 성능이 낮고, whole sentence를 보는 WMD에서 높다)

Pearson Correlation with human score for the WMT-20 dataset with Roberta-Base in the SRC-MT setting

Examples from various datasets including the comparisons among BERTScore, SSS and SentSim(SSS+BERTScore)

논문 : https://aclanthology.org/2021.naacl-main.252.pdf

깃헙 : https://github.com/Rain9876/Unsupervised-crosslingual-Compound-Method-For-MT

'NLP > Paper Review' 카테고리의 다른 글

[MTQE]Error detection and error correction for improving quality in MT and human post-editing 리뷰 (0)	2021.11.22
Sentence-BERT : Sentence Embedding using siamese BERT-Networks(SBERT) 리뷰 (0)	2021.10.05
RoBERTa : A Robustly Optimized BERT Pretraining Approach 리뷰 (0)	2021.08.18
[SentenceSimilarity] RUSE : Regressor Using Sentence Embeddings for Automatic Machine Translation Evaluation 리뷰 (0)	2021.08.12
How Multilingual is Multilingual BERT? 리뷰 (0)	2021.08.04

현재글[SentenceSimilarity] SentSim : Crosslingual Semantic Evaluation of Machine Translation 리뷰

내일을찾는중♪

인공지능개발자

pytorch, POP, Python, deque, Machine Translation, defaultdict, level2, 프로그래머스, AssertionError, 인공지능대학원, BERTScore, Linux, NLP, MTQE, level3, 파이썬, heapq, 공대 대학원, counter, programmers,

Today :
Yesterday :

내일을찾는중♪