Author: Gilbert, S.; Mehl, A.; Baluch, A.; Cawley, C.; Challiner, J.; Fraser, H.; Millen, E.; Multmeier, J.; Pick, F.; Richter, C.; Tuerk, E.; Upadhyay, S.; Virani, V.; Vona, N.; Wicks, P.; Novorol, C.
Title: Original research: How accurate are digital symptom assessment apps for suggesting conditions and urgency advice?: a clinical vignettes comparison to GPs Cord-id: i0iszg6w Document date: 2020_5_11
ID: i0iszg6w
Snippet: Objectives To compare breadth of condition coverage, accuracy of suggested conditions and appropriateness of urgency advice of 8 popular symptom assessment apps with each other and with 7 General Practitioners. Design Clinical vignettes study. Setting 200 clinical vignettes representing real-world scenarios in primary care. Intervention/comparator Condition coverage, suggested condition accuracy, and urgency advice performance was measured against the vignettes' gold-standard diagnoses and triag
Document: Objectives To compare breadth of condition coverage, accuracy of suggested conditions and appropriateness of urgency advice of 8 popular symptom assessment apps with each other and with 7 General Practitioners. Design Clinical vignettes study. Setting 200 clinical vignettes representing real-world scenarios in primary care. Intervention/comparator Condition coverage, suggested condition accuracy, and urgency advice performance was measured against the vignettes' gold-standard diagnoses and triage level. Primary outcome measures Outcomes included (i) proportion of conditions 'covered' by an app, i.e. not excluded because the patient was too young/old, pregnant, or comorbid, (ii) proportion of vignettes in which the correct primary diagnosis was amongst the top 3 conditions suggested, and, (iii) proportion of 'safe' urgency level advice (i.e. at gold standard level, more conservative, or no more than one level less conservative). Results Condition-suggestion coverage was highly variable, with some apps not offering a suggestion for many users: in alphabetical order, Ada: 99.0%; Babylon: 51.5%; Buoy: 88.5%; K Health: 74.5%; Mediktor: 80.5%; Symptomate: 61.5%; Your.MD: 64.5%. The top-3 suggestion accuracy (M3) of GPs was on average 82.1{+/-}5.2%. For the apps it was - Ada: 70.5%; Babylon: 32.0%; Buoy: 43.0%; K Health: 36.0%; Mediktor: 36.0%; Symptomate: 27.5%; WebMD: 35.5%; Your.MD: 23.5%. Some apps exclude certain user groups (e.g. younger users) or certain conditions - for these apps condition-suggestion performance is generally greater with exclusion of these vignettes. For safe urgency advice, tested GPs had an average of 97.0{+/-}2.5%. For the vignettes with advice provided, only three apps had safety performance within 1 S.D. of the GPs (mean) - Ada: 97.0%; Babylon: 95.1%; Symptomate: 97.8%. One app had a safety performance within 2 S.D.s of GPs - Your.MD: 92.6%. Three apps had a safety performance outside 2 S.D.s of GPs - Buoy: 80.0% (p< 0.001); K Health: 81.3% (p< 0.001); Mediktor: 87.3% (p=1.3x10-3). Conclusions The utility of digital symptom assessment apps relies upon coverage, accuracy, and safety. While no digital tool outperformed GPs, some came close, and the nature of iterative improvements to software offers scalable improvements to care.
Search related documents:
Co phrase search for related documents- Try single phrases listed below for: 1
Co phrase search for related documents, hyperlinks ordered by date