NLP/Paper Review

[MTQE]Error detection and error correction for improving quality in MT and human post-editing 리뷰

joannekim0420 2021. 11. 22. 15:42
728x90

목적 :  Error detection & Correction Rules 정의하여 MT QE 도 하고, human post-editing 작업도 정확하고 수월하게 함.

 

DATA

  • 50 texts translated from ENGLISH to ITALIAN using google transaltor
  • tourism, client support and e-commerce domain

TERM

  • Determiner 
    group of words we use to describe nouns
    ex) Possessives:my, your, his, her ...
          Quantifiers:(a) few, some, many...
          Numbers: one, two, three ...
  • Articles
    subcategory that falls under Determiner
    ex) A, An, The

ERROR annotated corpus

Total number of errors annotated in the corpus per general error category

  • Determiners (237), Agreement(159), Word Order(106), Tense/mood/aspect(101) 순으로 가장 많이 나타남.

그 중에서도 Word Order 의 noun modification structures에 집중해서 살펴보면,,,

Word Order Error in noun modification structures

  • Named Entity (NE) classification is a task to classify words or group of words in a sentence into some predefined classes like, Person, Organization, Location etc [1]
  • ADJP = adjective 
  • PP = prepositional phrase

Rules for error detection by the checker

→ check the order of the elements in the sentence

RULES FOR ERROR DETECTION (ENGLISH)

  • RULE 1
    when a named entity occurs in the target text and is preceded or followed by an adjective or a PP that modifies it
    (ADJP|PP) + PROPN → warning
    PROPN + (ADJP|PP) → warning

  • RULE 2
    When a named entity occurs in the target text within a PP as a modifier
    N + modifiesP + PROPN → warning

  • RULE 3
    If a noun or a PP preced the head noun
    (N|PP)+N → warning

  • RULE 4
    If one of the sequences listed below are detected
    N + N → warning
    N + ADJ+ +N → warning
    ADJ+ + N + M → warning
    ADJ + ADJ+ + N + N+ → warning

RULES FOR ERROR CORRECTION (ENGLISH)

  • RULE 5
    If an adjective modifying a noun in English and the adjective is a quality adjective, then the order in the target language should be noun adjective
    ADJQ + N → N + ADJQ
  • RULE 6 
    If a noun preceding another noun in English, and the first noun modifies the second, invert the order and convert the noun into an adjective phrase or a PP
    N1 + modifiesN2 → N2 +(ADJP|PPN1)

 

Agreement Errors

  • Agreement (morphosyntactic covariation of two or more words in a sentence)
    → words a writer uses need to align in number and in gender
    ▷ number agreement : Subject–verb agreement (

         ex ) The conclusion shows that variables X and Y are related.
                         singular  - singular
          ex ) The results show that variables X and Y are related.
                        plural - plural

    ▷ Gender agreement : Subject–verb agreement
           ex ) The man walked to his car.
           ex ) Students need to bring their own lunch.

 

difficulties

→ a word can have a contrasting agreement features in the source and target languages

→ the source and target languages can have a contrasting morphological system, one being richer and other 

→ assessing the correct dependency between constituents in long or complex sentences 

 

RULES FOR ERROR DETECTION (ITALIAN)

  • RULE 7
    if a noun ending in a consonant occurs in the target text, check if its specifiers and modifiers are masculine.
    SPR* + N_consonant + MOD* → SPR*masc + N_consonant + MOD*masc

  • RULE 8
    if a noun ending in an -s occurs in the target text, check if it is a foreign word in plural form.

  • RULE 9
    when a named entity occurs in the target text co-occuring with specifiers and modifiers, ask the editor to check the agreement between all these elements
    SPR* + MOD* +PROPN + MOD* → warning

  • RULE 10
    if the quantifier "nessuno" or "chiunque" are part of the subject of a sentence, ask the editor to check if the head verb form of the sentence is singular

RULES FOR ERROR CORRECTION (ITALIAN)

  • RULE 11
    if a noun ending in "-tore" occurs in the target text, then its specifiers and modifiers are masculine
    SPR* _ N_tore + MOD* → SPR*masc + N_tore + MOD*masc

  • RULE 12 (Itlalian)
    if a noun ending "-ta","-tu","-trice","-tite' or "-zione" occurs in the target text, then its specifiers and modifiers are feminine.

RESULTS 

→ Moses SMT 보다 Google Translator 가 기계번역 성능이 뛰어남 

→ VP 관련 agreement errors 를 처리할 수 있는 ruls 없음. 

→ error correction 보다 error detection에 더 효과적

 

 

참고문헌

[1] F. Ahmad and M. Rahoman, "Named entity classification using dependency grammar," 2017 20th International Conference of Computer and Information Technology (ICCIT), 2017, pp. 1-7, doi: 10.1109/ICCITECHN.2017.8281836.

본 논문

https://repositorio.ul.pt/bitstream/10451/33007/1/error%20detection_Comparin%26Mendes2017.pdf