Selected article for: "low resource and machine translation"

Author: Goyal, Naman; Gao, Cynthia; Chaudhary, Vishrav; Chen, Peng-Jen; Wenzek, Guillaume; Ju, Da; Krishnan, Sanjana; Ranzato, Marc'Aurelio; Guzman, Francisco; Fan, Angela
Title: The FLORES-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation
  • Cord-id: hykb43oa
  • Document date: 2021_6_6
  • ID: hykb43oa
    Snippet: One of the biggest challenges hindering progress in low-resource and multilingual machine translation is the lack of good evaluation benchmarks. Current evaluation benchmarks either lack good coverage of low-resource languages, consider only restricted domains, or are low quality because they are constructed using semi-automatic procedures. In this work, we introduce the FLORES-101 evaluation benchmark, consisting of 3001 sentences extracted from English Wikipedia and covering a variety of diffe
    Document: One of the biggest challenges hindering progress in low-resource and multilingual machine translation is the lack of good evaluation benchmarks. Current evaluation benchmarks either lack good coverage of low-resource languages, consider only restricted domains, or are low quality because they are constructed using semi-automatic procedures. In this work, we introduce the FLORES-101 evaluation benchmark, consisting of 3001 sentences extracted from English Wikipedia and covering a variety of different topics and domains. These sentences have been translated in 101 languages by professional translators through a carefully controlled process. The resulting dataset enables better assessment of model quality on the long tail of low-resource languages, including the evaluation of many-to-many multilingual translation systems, as all translations are multilingually aligned. By publicly releasing such a high-quality and high-coverage dataset, we hope to foster progress in the machine translation community and beyond.

    Search related documents:
    Co phrase search for related documents
    • Try single phrases listed below for: 1