Selected article for: "low resource and machine translation"

Author: Oktem, Alp; DeLuca, Eric; Bashizi, Rodrigue; Paquin, Eric; Tang, Grace
Title: Congolese Swahili Machine Translation for Humanitarian Response
  • Cord-id: m5gtyvfl
  • Document date: 2021_3_19
  • ID: m5gtyvfl
    Snippet: In this paper we describe our efforts to make a bidirectional Congolese Swahili (SWC) to French (FRA) neural machine translation system with the motivation of improving humanitarian translation workflows. For training, we created a 25,302-sentence general domain parallel corpus and combined it with publicly available data. Experimenting with low-resource methodologies like cross-dialect transfer and semi-supervised learning, we recorded improvements of up to 2.4 and 3.5 BLEU points in the SWC-FR
    Document: In this paper we describe our efforts to make a bidirectional Congolese Swahili (SWC) to French (FRA) neural machine translation system with the motivation of improving humanitarian translation workflows. For training, we created a 25,302-sentence general domain parallel corpus and combined it with publicly available data. Experimenting with low-resource methodologies like cross-dialect transfer and semi-supervised learning, we recorded improvements of up to 2.4 and 3.5 BLEU points in the SWC-FRA and FRA-SWC directions, respectively. We performed human evaluations to assess the usability of our models in a COVID-domain chatbot that operates in the Democratic Republic of Congo (DRC). Direct assessment in the SWC-FRA direction demonstrated an average quality ranking of 6.3 out of 10 with 75% of the target strings conveying the main message of the source text. For the FRA-SWC direction, our preliminary tests on post-editing assessment showed its potential usefulness for machine-assisted translation. We make our models, datasets containing up to 1 million sentences, our development pipeline, and a translator web-app available for public use.

    Search related documents:
    Co phrase search for related documents
    • local language and low resource setting: 1
    • low resource and machine translation: 1, 2, 3, 4, 5, 6, 7
    • low resource language and machine translation: 1, 2