Author: Condor, Aubrey
Title: Exploring Automatic Short Answer Grading as a Tool to Assist in Human Rating Cord-id: va82gnps Document date: 2020_6_10
ID: va82gnps
Snippet: This project proposes using BERT (Bidirectional Encoder Representations from Transformers) as a tool to assist educators with automated short answer grading (ASAG) as opposed to replacing human judgement in high-stakes scenarios. Many educators are hesitant to give authority to an automated system, especially in assessment tasks such as grading constructed response items. However, evaluating free-response text can be time and labor costly for one rater, let alone multiple raters. In addition, so
Document: This project proposes using BERT (Bidirectional Encoder Representations from Transformers) as a tool to assist educators with automated short answer grading (ASAG) as opposed to replacing human judgement in high-stakes scenarios. Many educators are hesitant to give authority to an automated system, especially in assessment tasks such as grading constructed response items. However, evaluating free-response text can be time and labor costly for one rater, let alone multiple raters. In addition, some degree of inconsistency exists within and between raters for assessing a given task. Recent advances in Natural Language Processing have resulted in subsequent improvements for technologies that rely on artificial intelligence and human language. New, state-of-the-art models such as BERT, an open source, pre-trained language model, have decreased the amount of training data needed for specific tasks and in turn, have reduced the amount of human annotation necessary for producing a high-quality classification model. After training BERT on expert ratings of constructed responses, we use subsequent automated grading to calculate Cohen’s Kappa as a measure of inter-rater reliability between the automated system and the human rater. For practical application, when the inter-rater reliability metric is unsatisfactory, we suggest that the human rater(s) use the automated model to call attention to ratings where a second opinion might be needed to confirm the rater’s correctness and consistency of judgement.
Search related documents:
Co phrase search for related documents- Try single phrases listed below for: 1
Co phrase search for related documents, hyperlinks ordered by date