Inter-Rater Agreement for the Annotation of Neurologic Concepts in Electronic Health RecordsInter-Rater Agreement • November 16th, 2022
Contract Type FiledNovember 16th, 2022The extraction of patient signs and symptoms recorded as free text in electronic health records is critical for precision medicine. Once extracted, signs and symptoms can be made computable by mapping to clinical concepts in an ontology. Extracting clinical concepts from free text is tedious and time-consuming. Prior studies have suggested that inter-rater agreement for clinical concept extraction is low. We have examined inter-rater agreement for annotating neurologic concepts in clinical notes from electronic health records. After training on the annotation process, the annotation tool, and the supporting neuro-ontology, three raters annotated 15 clinical notes in three rounds. Inter-rater agreement between the three annotators was high for text span and category label. A machine annotator based on a convolutional neural network had a high level of agreement with the human annotators, but one that was lower than human inter-rater agreement. We conclude that high levels of
Inter-Rater Agreement for the Annotation of Neurologic Concepts in Electronic Health RecordsInter-Rater Agreement • November 16th, 2022
Contract Type FiledNovember 16th, 2022The extraction of patient signs and symptoms recorded as free text in electronic health records is critical for precision medicine. Once extracted, signs and symptoms can be made computable by mapping to clinical concepts in an ontology. Extracting clinical concepts from free text is tedious and time-consuming. Prior studies have suggested that inter-rater agreement for clinical concept extraction is low. We have examined inter-rater agreement for annotating neurologic concepts in clinical notes from electronic health records. After training on the annotation process, the annotation tool, and the supporting neuro-ontology, three raters annotated 15 clinical notes in three rounds. Inter-rater agreement between the three annotators was high for text span and category label. A machine annotator based on a convolutional neural network had a high level of agreement with the human annotators, but one that was lower than human inter-rater agreement. We conclude that high levels of