Inter-Annotator Agreement for a German Newspaper CorpusInter-Annotator Agreement • December 8th, 2023
Contract Type FiledDecember 8th, 2023This paper presents the results of an investigation on inter-annotator agreement for the NEGRA corpus, consisting of German newspaper texts. The corpus is syntactically annotated with part-of-speech and structural information. Agreement for part-of-speech is 98.6%, the labeled F-score for structures is 92.4%. The two annotations are used to create a common final version by discussing differences and by several iterations of cleaning. Initial and final versions are compared. We identify categories causing large numbers of differences and categories that are handled inconsistently.
Inter-annotator agreement for a speech corpus pronounced by French and German language learnersInter-Annotator Agreement • August 3rd, 2022
Contract Type FiledAugust 3rd, 2022This paper presents the results of an investigation of inter- annotator agreement for the non-native and native French part of the IFCASL corpus. This large bilingual speech corpus for French and German language learners was manually annotated by several annotators. This manual annotation is the starting point which will be used both to improve the automatic segmentation algorithms and derive diagnosis and feedback. The agreement is evaluated by comparing the manual alignments of seven annotators to the manual alignment of an expert, for 18 sentences. Whereas results for the presence of the devoicing diacritic show a certain degree of disagreement between the annotators and the expert, there is a very good consistency between annotators and the expert for temporal boundaries as well as insertions and deletions. We find a good overall agreement for boundaries between annotators and expert with a mean deviation of 7.6 ms and 93% of boundaries within 20 ms.
Inter-annotator Agreement on a Multilingual Semantic Annotation TaskInter-Annotator Agreement • March 6th, 2006
Contract Type FiledMarch 6th, 2006Six sites participated in the Interlingual Annotation of Multilingual Text Corpora (IAMTC) project (Dorr et al., 2004; Farwell et al., 2004; Mitamura et al., 2004). Parsed versions of English translations of news articles in Arabic, French, Hindi, Japanese, Korean and Spanish were annotated by up to ten annotators. Their task was to match open-class lexical items (nouns, verbs, adjectives, adverbs) to one or more concepts taken from the Omega ontology (Philpot et al., 2003), and to identify theta roles for verb arguments. The annotated corpus is intended to be a resource for meaning-based approaches to machine translation. Here we discuss inter-annotator agreement for the corpus. The annotation task is characterized by annotators’ freedom to select multiple concepts or roles per lexical item. As a result, the annotation categories are sets, the number of which is bounded only by the number of distinct annotator-lexical item pairs. We use a reliability metric designed to handle partial
Inter-Annotator AgreementInter-Annotator Agreement • July 29th, 2009
Contract Type FiledJuly 29th, 2009Equation for σˆκˆ also extends to C categories Drawback: κˆ only uses diagonal and marginals of table, discarding most information from the off-diagonal cells
Inter-annotator Agreement: By Hook or by CrookInter-Annotator Agreement • August 21st, 2018
Contract Type FiledAugust 21st, 2018Through an extended case study, this paper reveals the metaphorical skeletons hidden in statistical cupboards of selective reporting, casting a new light on inter-annotator agreement (IAA) measures. Strategic decisions and their impacts on IAA were tracked in an extended corpus study of rhetorical functions in scientific research abstracts. A search of the research notes of the principal investigator resulted in 142 notes tagged with #IAA that were written between 2013 and 2017. The strategic decisions and their actual or perceived impacts on IAA were logged. A root cause analysis was also conducted to identify the causal factors that reduce IAA. The results show numerous strategic decisions, which using template analysis, were grouped into three categories, namely methodological, statistical and rhetorical. High IAA may be attributed to sound or cogent methodological choices, but it could also be due to manipulating the statistical smoke and rhetorical mirrors. With no standardized co
Inter-Annotator Agreement on Spontaneous Czech LanguageInter-Annotator Agreement • May 22nd, 2021
Contract Type FiledMay 22nd, 2021Abstract. The goal of this article is to show that for some tasks in automatic speech recognition (ASR), especially for recognition of spontaneous telephony speech, the reference annotation differs substantially among human annotators and thus sets the upper bound of the ASR accuracy. In this paper, we focus on the evaluation of the inter-annotator agreement (IAA) and ASR accuracy in the context of imperfect IAA. We evaluated it using a part of our Czech Switchboard- like spontaneous speech corpus called Toll-free calls. This data set was annotated by three different annotators rendering three parallel transcriptions. The results give us additional insights for understanding the ASR accuracy.
LING83800: Inter-annotator agreementInter-Annotator Agreement • October 3rd, 2021
Contract Type FiledOctober 3rd, 2021When our models consist of supervised machine learning trained on human annotations, agree- ment between multiple human annotators imposes a upper bound for performance, if only be- cause it is difficult if not impossible to measure system performance exceeding that of human annotators. And, when human annotators do not show substantial agreement, this suggests the annotation task may be ill-defined, or too challenging.
Inter-Annotator Agreement for ERE AnnotationInter-Annotator Agreement • May 1st, 2014
Contract Type FiledMay 1st, 2014This paper describes a system for inter- annotator agreement analysis of ERE an- notation, focusing on entity mentions and how the higher-order annotations such as EVENTS are dependent on those entity mentions. The goal of this approach is to provide both (1) quantitative scores for the various levels of annotation, and (2) infor- mation about the types of annotation in- consistencies that might exist. While pri- marily designed for inter-annotator agree- ment, it can also be considered a system for evaluation of ERE annotation.
Inter-annotator Agreement on Spontaneous Czech LanguageInter-Annotator Agreement • May 24th, 2016
Contract Type FiledMay 24th, 2016The goal of this article is to show that for some tasks in automatic speech recognition (ASR), especially for recognition of spontaneous speech, the gold-standard annotation differs substan- tially among human annotators. In this paper we focused on the evaluation of inter-annotator agreement (IAA) and ASR accuracy in the context of imperfect IAA. We evaluated it on a part of our Czech Switchboard-like spontaneous speech corpus. This part was annotated by three parallel transcriptions from three different annotators. The results give us additional insights for understanding of ASR accuracy.
Inter annotator agreementInter Annotator Agreement • October 13th, 2020
Contract Type FiledOctober 13th, 2020
Inter-Annotator Agreement in Linguistics: a critical reviewInter-Annotator Agreement • November 13th, 2021
Contract Type FiledNovember 13th, 2021§ Agreement indexes are widely used in Computational Linguistics and NLP to assess the reliability of annotation tasks
Inter-annotator agreement on various levels of annotation in PDTInter-Annotator Agreement • December 10th, 2014
Contract Type FiledDecember 10th, 2014
LING83800Inter-Annotator Agreement • June 18th, 2024
Contract Type FiledJune 18th, 2024When our models consist of supervised machine learning trained on human annotations, agree- ment between multiple human annotators imposes a upper bound for performance, if only be- cause it is difficult if not impossible to measure system performance exceeding that of human annotators. And, when human annotators do not show substantial agreement, this suggests the annotation task may be ill-defined, or too challenging.
Inter-Annotator AgreementInter-Annotator Agreement • May 13th, 2014
Contract Type FiledMay 13th, 2014Abstract. This paper discusses different methods of estimating the inter-annotator agreement in manual annotation of Polish coreference and proposes a new BLANC-based annotation agreement metric. The commonly used agreement indicators are calculated for mention detec- tion, semantic head annotation, near-identity markup and coreference resolution.
Inter-Annotator Agreement on a Linguistic Ontology for Spatial LanguageInter-Annotator Agreement • March 15th, 2010
Contract Type FiledMarch 15th, 2010In this paper, we present a case study for measuring inter-annotator agreement on a linguistic ontology for spatial language, namely the spatial extension of the Generalized Upper Model. This linguistic ontology specifies semantic categories, and it is used in dialogue systems for natural language of space in the context of human-computer interaction and spatial assistance systems. Its core representation for spatial language distinguishes how sentences can be structured and categorized into units that contribute certain meanings to the expression. This representation is here evaluated in terms of inter-annotator agreement: four uninformed annotators were instructed by a manual how to annotate sentences with the linguistic ontology. They have been assigned to annotate 200 sentences with varying length and complexity. Their resulting agreements are calculated together with our own ‘expert annotation’ of the same sentences. We show that linguistic ontologies can be evaluated with respect
Inter annotator agreement in discourse analysisInter Annotator Agreement • May 19th, 2010
Contract Type FiledMay 19th, 2010− organization of text into structural units by means of coherence (discourse relations) and cohesion (lexico-semantic relations)
Agreement is overrated:Inter-Annotator Agreement • September 21st, 2019
Contract Type FiledSeptember 21st, 2019Inter-Annotator Agreement (IAA) is used as a means of assessing the quality of NLG evalu- ation data, in particular, its reliability. Accord- ing to existing scales of IAA interpretation – see, for example, Lommel et al. (2014), Liu et al. (2016), Sedoc et al. (2018) and Amidei et al. (2018a) – most data collected for NLG evaluation fail the reliability test. We con- firmed this trend by analysing papers pub- lished over the last 10 years in NLG-specific conferences (in total 135 papers that included some sort of human evaluation study). Follow- ing Sampson and Babarczy (2008), Lommel et al. (2014), Joshi et al. (2016) and Amidei et al. (2018b), such phenomena can be ex- plained in terms of irreducible human lan- guage variability. Using three case studies, we show the limits of considering IAA as the only criterion for checking evaluation relia- bility. Given human language variability, we propose that for human evaluation of NLG, correlation coefficients and agreement coeffi- cients