Selected article for: "dense layer and LSTM layer"

Author: Abbad, Hamza; Xiong, Shengwu
Title: Multi-components System for Automatic Arabic Diacritization
  • Cord-id: w031hm7f
  • Document date: 2020_3_17
  • ID: w031hm7f
    Snippet: In this paper, we propose an approach to tackle the problem of the automatic restoration of Arabic diacritics that includes three components stacked in a pipeline: a deep learning model which is a multi-layer recurrent neural network with LSTM and Dense layers, a character-level rule-based corrector which applies deterministic operations to prevent some errors, and a word-level statistical corrector which uses the context and the distance information to fix some diacritization issues. This appro
    Document: In this paper, we propose an approach to tackle the problem of the automatic restoration of Arabic diacritics that includes three components stacked in a pipeline: a deep learning model which is a multi-layer recurrent neural network with LSTM and Dense layers, a character-level rule-based corrector which applies deterministic operations to prevent some errors, and a word-level statistical corrector which uses the context and the distance information to fix some diacritization issues. This approach is novel in a way that combines methods of different types and adds edit distance based corrections. We used a large public dataset containing raw diacritized Arabic text (Tashkeela) for training and testing our system after cleaning and normalizing it. On a newly-released benchmark test set, our system outperformed all the tested systems by achieving DER of 3.39% and WER of 9.94% when taking all Arabic letters into account, DER of 2.61% and WER of 5.83% when ignoring the diacritization of the last letter of every word.

    Search related documents:
    Co phrase search for related documents
    • activation function and machine learning model: 1
    • active passive and machine learning: 1
    • active passive and machine learning model: 1
    • long vowel and machine learning: 1