Selected article for: "build dataset and deep learning"

Author: Su, Junhao; Wu, Ye; Ting, Hing-Fung; Lam, Tak-Wah; Luo, Ruibang
Title: RENET2: High-Performance Full-text Gene-Disease Relation Extraction with Iterative Training Data Expansion
  • Cord-id: x8ufze5p
  • Document date: 2021_3_19
  • ID: x8ufze5p
    Snippet: Background Relation extraction is a fundamental task for extracting gene-disease associations from biomedical text. Existing tools have limited capacity, as they can extract gene-disease associations only from single sentences or abstract texts. Results In this work, we propose RENET2, a deep learning-based relation extraction method, which implements section filtering and ambiguous relations modeling to extract gene-disease associations from full-text articles. We designed a novel iterative tra
    Document: Background Relation extraction is a fundamental task for extracting gene-disease associations from biomedical text. Existing tools have limited capacity, as they can extract gene-disease associations only from single sentences or abstract texts. Results In this work, we propose RENET2, a deep learning-based relation extraction method, which implements section filtering and ambiguous relations modeling to extract gene-disease associations from full-text articles. We designed a novel iterative training data expansion strategy to build an annotated full-text dataset to resolve the scarcity of labels on full-text articles. In our experiments, RENET2 achieved an F1-score of 72.13% for extracting gene-disease associations from an annotated full-text dataset, which was 27.22%, 30.30% and 29.24% higher than the best existing tools BeFree, DTMiner and BioBERT, respectively. We applied RENET2 to (1) ~1.89M full-text articles from PMC and found ~3.72M gene-disease associations; and (2) the LitCovid articles set and ranked the top 15 proteins associated with COVID-19, supported by recent articles. Conclusion RENET2 is an efficient and accurate method for full-text gene-disease association extraction. The source-code, manually curated abstract/full-text training data, and results of RENET2 are available at https://github.com/sujunhao/RENET2.

    Search related documents: