목적 : Error detection & Correction Rules 정의하여 MT QE 도 하고, human post-editing 작업도 정확하고 수월하게 함.
DATA
- 50 texts translated from ENGLISH to ITALIAN using google transaltor
- tourism, client support and e-commerce domain
TERM
- Determiner
group of words we use to describe nouns
ex) Possessives:my, your, his, her ...
Quantifiers:(a) few, some, many...
Numbers: one, two, three ... - Articles
subcategory that falls under Determiner
ex) A, An, The
ERROR annotated corpus
Total number of errors annotated in the corpus per general error category
- Determiners (237), Agreement(159), Word Order(106), Tense/mood/aspect(101) 순으로 가장 많이 나타남.
그 중에서도 Word Order 의 noun modification structures에 집중해서 살펴보면,,,
Word Order Error in noun modification structures
- Named Entity (NE) classification is a task to classify words or group of words in a sentence into some predefined classes like, Person, Organization, Location etc [1]
- ADJP = adjective
- PP = prepositional phrase
Rules for error detection by the checker
→ check the order of the elements in the sentence
RULES FOR ERROR DETECTION (ENGLISH)
- RULE 1
when a named entity occurs in the target text and is preceded or followed by an adjective or a PP that modifies it
(ADJP|PP) + PROPN → warning
PROPN + (ADJP|PP) → warning - RULE 2
When a named entity occurs in the target text within a PP as a modifier
N + modifiesP + PROPN → warning - RULE 3
If a noun or a PP preced the head noun
(N|PP)+N → warning - RULE 4
If one of the sequences listed below are detected
N + N → warning
N + ADJ+ +N → warning
ADJ+ + N + M → warning
ADJ + ADJ+ + N + N+ → warning
RULES FOR ERROR CORRECTION (ENGLISH)
- RULE 5
If an adjective modifying a noun in English and the adjective is a quality adjective, then the order in the target language should be noun adjective
ADJQ + N → N + ADJQ - RULE 6
If a noun preceding another noun in English, and the first noun modifies the second, invert the order and convert the noun into an adjective phrase or a PP
N1 + modifiesN2 → N2 +(ADJP|PPN1)
Agreement Errors
- Agreement (morphosyntactic covariation of two or more words in a sentence)
→ words a writer uses need to align in number and in gender
▷ number agreement : Subject–verb agreement (
ex ) The conclusion shows that variables X and Y are related.
singular - singular
ex ) The results show that variables X and Y are related.
plural - plural
▷ Gender agreement : Subject–verb agreement
ex ) The man walked to his car.
ex ) Students need to bring their own lunch.
difficulties
→ a word can have a contrasting agreement features in the source and target languages
→ the source and target languages can have a contrasting morphological system, one being richer and other
→ assessing the correct dependency between constituents in long or complex sentences
RULES FOR ERROR DETECTION (ITALIAN)
- RULE 7
if a noun ending in a consonant occurs in the target text, check if its specifiers and modifiers are masculine.
SPR* + N_consonant + MOD* → SPR*masc + N_consonant + MOD*masc - RULE 8
if a noun ending in an -s occurs in the target text, check if it is a foreign word in plural form. - RULE 9
when a named entity occurs in the target text co-occuring with specifiers and modifiers, ask the editor to check the agreement between all these elements
SPR* + MOD* +PROPN + MOD* → warning - RULE 10
if the quantifier "nessuno" or "chiunque" are part of the subject of a sentence, ask the editor to check if the head verb form of the sentence is singular
RULES FOR ERROR CORRECTION (ITALIAN)
- RULE 11
if a noun ending in "-tore" occurs in the target text, then its specifiers and modifiers are masculine
SPR* _ N_tore + MOD* → SPR*masc + N_tore + MOD*masc - RULE 12 (Itlalian)
if a noun ending "-ta","-tu","-trice","-tite' or "-zione" occurs in the target text, then its specifiers and modifiers are feminine.
RESULTS
→ Moses SMT 보다 Google Translator 가 기계번역 성능이 뛰어남
→ VP 관련 agreement errors 를 처리할 수 있는 ruls 없음.
→ error correction 보다 error detection에 더 효과적
참고문헌
[1] F. Ahmad and M. Rahoman, "Named entity classification using dependency grammar," 2017 20th International Conference of Computer and Information Technology (ICCIT), 2017, pp. 1-7, doi: 10.1109/ICCITECHN.2017.8281836.
본 논문
https://repositorio.ul.pt/bitstream/10451/33007/1/error%20detection_Comparin%26Mendes2017.pdf