Annotation Scheme Clause Samples
Annotation Scheme. Fig. 4 shows the scheme used for the annotation of the corpus.
Annotation Scheme. Four annotation tasks are conducted in sequence on Amazon Mechanical Turk for answer sentence selection (Tasks 1-4), and a single task is conducted for answer triggering using only Elasticsearch (Task 5; see Figure 3.1 for the overview). Approximately two thousand sections are randomly selected from the 486 articles in Section 3.1.1. All the selected sections consist of 3 to 25 sentences; is has been empirically found that annotators experienced difficulties accu- rately and timely annotating longer sections. For each section, annotators are instructed to generate a question that can be answered in one or more sentences in the provided section, and select the corresponding sentence or sentences that answer the question. The annotators are provided with the instructions, the topic, the article title, the section title, and the list of num- 1▇▇▇▇://▇▇▇▇▇▇▇.▇▇▇ 2▇▇▇▇▇://▇▇▇▇▇.▇▇▇▇▇▇▇▇▇.▇▇▇/enwiki 3▇▇▇▇▇://▇▇▇▇▇▇.▇▇▇/emorynlp/nlp4j Document Document Task 1 Task 3 Task 4 Task 3 Task 5 Task 2 Data Collection Answer Sentence Selection Answer Triggering Figure 3.1: The overview of the data collection (Section 3.1.1) and annotation scheme (Section 3.1.2). bered sentences in the section (Table 3.2). Annotators are asked to create another set of ⇡2K questions from the same selected sections excluding the sentences selected as answers in Task 1. The goal of Task 2 is to generate questions that can be answered from sentences different from those used to answer questions generated in the Task 1. The annotators are provided with the same information as in Task 1, except that the sentences used as the answer contexts in Task 1 are crossed out (line 1 in Table 3.2). Annotators are instructed not to use these sentences to generate new questions. Although the annotation instruction encourages the annotators to create questions in their own words, annotators will generate questions with some lexical overlap with the corresponding contexts. The intention of this task is to mitigate the effects of annotators’ tendency to generating questions with similar vocabulary and phrasing to answer contexts. This is a necessary step in creating a corpus that evaluates reading comprehension rather than the ability to model word co-occurrences. The annotators are provided with the previously generated questions and answer contexts and are instructed to paraphrase these questions using different terms.
Annotation Scheme. Upon creating a dataset consisting of software requirements, the next step is to annotate these requirements in order to train the parser. The main issue here lies in deciding how complex these annotations should be. In specific, an annotation scheme that is very close to the ontology classes described in Section 2 would be ideal for training the parser (since this is the final desired result). However, such a scheme would be very difficult for annotators without sufficient background knowledge. As a result, we propose a multi-step annotation scheme in which decisions in one iteration are further refined in later iterations. By adopting the class hierarchy introduced in Section 2, we can naturally divide each annotation iteration according to a level in the ontology. This means that in the first iteration, we ask annotators to simply mark all instances of actor, object, OperationType, and property that are explicitly expressed in a given requirement. After that, further refinements can be made (by more experienced annotators) in order to select more specific subclasses for each instance. Thus, we add one layer of sophistication from the class hierarchy in each iteration, resulting in step-wise refinements. In the final iteration, we can also add implicit but inferable cases of relations between instances of concepts (e.g. in the phrase “a user can delete his/her account” involves not only an action performed on “account” but also ownership of the “account” by the “user”). Consider the example of Figure 11: has_actor receives_action Level 3: useractor action theme In this sentence, the first iteration would include annotating the “user” and the “account” as instances of ThingType and the “login” as an OperationType and the “account” as an object. The second iteration would include annotating the “user” as an actor, the “login” as an action and the “account” as an object. After that, the next iteration would involve specifying the “user” as a useractor, and the “account” as a theme. Finally, in this example we could also add one more iteration where we would specify “account” as an object owned_by “user”. This relation is not explicitly given in this sentence, however it is correct.
