[SentenceSimilarity] BERTTune: Fine-Tuning Neual Machine Translation with BertScore

NLP/Paper Review

[SentenceSimilarity] BERTTune: Fine-Tuning Neual Machine Translation with BertScore

joannekim0420 2021. 7. 26. 11:14

728x90

용어/기술 설명

BLEU Score 는 n-grams 단위로 정확하게 일치하는 것만 고려 → BERT Score 문맥을 고려하여 평가. (https://openreview.net/pdf?id=SkeHuCVFDr)

BERTScore는 pretrained 된 BERT 모델 기반으로 context embedding 을 계산하는데 사용.

(predicted sentence와 reference sentence의 normalized된 context embeddings의 cosine similarites 값)

RBERT = recall

PBERT = precision

FBERT = F1score (reall 과 precision의 평균값으로, Fbert 는 단어 카테고리가 아니라 유사한 embedding일수록 높은 점수를 받는다.)

Cosine Similarity

Cosine similarity measures the similarity between two vectors of an inner product space.

It is measured by the cosine of the angle between two vectors and determines whether two vectors are pointing in roughly the same direction.

It is often used to measure document similarity in text analysis

Teacher Forcing

teacher forcing is the technique where the target word is passed as the next input to the decoder

이전 예측을 고려해주는 decoder의 장점이 잘못된 예측에서는 학습 속도 저하의 원인.

Ground Truth를 같이 input해서 초기 학습 속도를 빠르게 올림

논문의 목표

저자는 기존의 token-level의 NMT모델의 sentence-level 인 Bert Score를 이용하여 fine-tuning하며 성능 향상을 기대.

모델을 BertScore로 fine-Tune하기 위한 필수 조건은 end-to-end differentiability 가 보장 되어야 함

BERTScore 자체는 word embedding 이고 differentiable 이지만 (argmax, sampling) categorical prediction input으로 differentiability가 깨지게 됨.

→ 이러한 문제를 극복하기 위해 3가지 soft predictions(1. gumbel-softmax 2.sparsemax 3.dense vector)를 도입.

BERTScore 작동 방법 및 Fine-Tuning 방법

1. reference 와 predicted에 있는 문장들을 embedding matrix을 이용하여 상응하는 static word embedding으로 변환

2. 이러한 static word embedding 들을 language model의 input으로 넣어 contextualized embeddings을 생성

3. BERTScore 계산

→ FBert를 optimize.

그러나, FBert score는 argmax function이 discontinuous하기 때문에 바로 사용 못함. 그래서 hard decision 인 argmax대신에 differentiability를 유지하는 soft predictions 사용

(조건: NMT모델이 pretrained 모델과 target vocabulary 가 같아야 함.)

Gumbel Softmax

reparametrization technique 으로 gumbel softmax를 이용하면 exploration에서 이익. → fully differentiable

NMT transformer가 validation set 에서 convergence할 때까지 훈련 ← 훈련 과정에서 FBert loss fine-tuning (모든 모델에 teacher forcing 적용)

Datasets

독일어 - 영어 (de-en) 152K

중국어 - 영어 (zh-en) 156K

영어 - 터키어 (en-tr) 141K

영어 - 스페인어 (en-es) 172K

Results

각 언어쌍에서 Dense Vector, Sparsemax, Gumbel Softmax모델을 이용한 BLEU, FBERT, MS 값 결과 (FBert과 MS는 contextual embedding 방식)

→ Gumbel-softmax 로 de-en(독-영), en-tr(영-터)에서 가장 좋은 결과를 얻음.

→ Dense Vector 와 Sparsemax는 미미한 효과인데, Dense Vector만 zh-en(중-영)에서 좋은 결과.

→ en-es(영-스)에서는 그 어떤 것도 좋은 결과X (en-es은 baseline이 워낙 강해서 효과 상승 어렵)

de-en 에서 생성된 probabilty vectors의 entropy 값

→ dense-vector 와 gumberl-softmax로 fine-tune 된 모델은 기존baseline NMT모델보다 sparse 함.

→ Sparsemax 는 더 dense 함.

→ 전반적으로, predictions의 sparsity가 개선에 상응함. (=sparse 할 수록 개선됨)

de-en 에서 에폭에 따른 Gumbel Softmax 모델의 Bleu, Fbert, MS 값 변이

→ FBert Score 와 MS score 모두 epoch 1,200 즈음에서 피크 찍고 하강.

→ bleu score는 계속 하강했는데 이는 test set에서는 나타나지 않은 것 보면 test-set과 training set의 데이터 분포도가 비슷할 것이라고 추정

논문 : https://arxiv.org/pdf/2106.02208.pdf

'NLP > Paper Review' 카테고리의 다른 글

RoBERTa : A Robustly Optimized BERT Pretraining Approach 리뷰 (0)	2021.08.18
[SentenceSimilarity] RUSE : Regressor Using Sentence Embeddings for Automatic Machine Translation Evaluation 리뷰 (0)	2021.08.12
How Multilingual is Multilingual BERT? 리뷰 (0)	2021.08.04
[SentenceSimilarity] MTE with BERT Regressor (0)	2021.08.04
[SentenceSimilarity] BERTScore : Evaluating Text Generation With BERT (0)	2021.08.04

현재글[SentenceSimilarity] BERTTune: Fine-Tuning Neual Machine Translation with BertScore

내일을찾는중♪

인공지능개발자

Linux, pytorch, AssertionError, level3, MTQE, heapq, POP, Machine Translation, 인공지능대학원, BERTScore, counter, 공대 대학원, 프로그래머스, programmers, NLP, defaultdict, level2, Python, 파이썬, deque,

Today :
Yesterday :

내일을찾는중♪