Author: Marcus Ludwig; Louis-Félix Nothias; Kai Dührkop; Irina Koester; Markus Fleischauer; Martin A. Hoffmann; Daniel Petras; Fernando Vargas; Mustafa Morsy; Lihini Aluwihare; Pieter C. Dorrestein; Sebastian Böcker
Title: ZODIAC: database-independent molecular formula annotation using Gibbs sampling reveals unknown small molecules Document date: 2019_11_16
ID: 03uonbrv_11
Snippet: But the ZODIAC score can be used to dierentiate between true and incorrect annotations: For each dataset, we sort molecular formula annotations by the ZODIAC score, and calculate the rate . CC-BY-NC-ND 4.0 International license author/funder. It is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/842740 doi: bioRxiv preprint of correct annotations for any subset of top-sc.....
Document: But the ZODIAC score can be used to dierentiate between true and incorrect annotations: For each dataset, we sort molecular formula annotations by the ZODIAC score, and calculate the rate . CC-BY-NC-ND 4.0 International license author/funder. It is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/842740 doi: bioRxiv preprint of correct annotations for any subset of top-scoring annotations. We nd that high-scoring ZODIAC annotations are more likely to be correct, see again Fig. 2 . For this evaluation, we also considered previously discarded compounds for which SIRIUS did not rank the correct molecular formula in the top 50; for these compounds, ZODIAC cannot nd the correct molecular formula but at best, the incorrect molecular formula should receive low ZODIAC scores. Selecting a ZODIAC score threshold of 0.9 results in more than 96.5 % correct annotations while keeping 52.05 % to 88.24 % of the compounds of each dataset (Fig. 2) . In comparison, spectral library search allowed us to annotate between 3.78 % and 16.55 % of a dataset, see Supplementary Table 1. Novel molecular formulas. We now concentrate on novel molecular formulas, in the sense that these molecular formulas are not contained in the largest public molecular structure databases PubChem 17 and ChemSpider 35 . As detailed in Section Materials & Methods, we cannot rule out that the molecular formula corresponds to, say, the compound minus a water loss instead of the full compound. Clearly, the structure of any compound with a novel molecular formula is also absent from the structure databases.
Search related documents:
Co phrase search for related documents- correct annotation and formula annotation: 1, 2
- correct molecular formula and dataset compound: 1
- correct molecular formula and formula annotation: 1, 2
- correct molecular formula and incorrect molecular formula: 1, 2, 3, 4
- correct molecular formula nd and incorrect molecular formula: 1
- dataset compound and formula annotation: 1
- formula annotation and library search: 1
- high score and library search: 1
Co phrase search for related documents, hyperlinks ordered by date