Data set Collection Sample Clauses

Data set Collection. From the total available corpus (70k documents), we currently have access to ~60,000 excavation reports and related documents, such as appendices, drawings and maps. These texts have been gathered by DANS (Digital Archiving and Networked Services) in the Netherlands, over the past 20 years. We received the documents from DANS as PDF files, and have used the pdftotext tool (Glyph & Cog LLC, 1996) to convert these to plain text. This data set contains 30,152,318 lines and 657,808,600 words (as counted by the command line tool “wc”). The texts are quite diverse; the dates of publication span decades with the earlier ones having been scanned and OCRd from hardcopies created in the 80s. The other temporal variation is in how old the found artefacts are, ranging from 200,000 BC to the present. Also, the type of research can be very different between reports, some might describe a short desk evaluation of a small area without any fieldwork, while others detail huge excavations over multiple years with detailed analysis by a team of specialists. To get a representative sample across all these ranges, a random sampling strategy would not be ideal, and we instead opted to manually select documents, taking into account the variation described above. We selected a total of 15 documents as annotation candidates (~42,000 tokens). For the purposes of calculating the IAA and evaluating the annotation guide- lines, we manually selected roughly 100 sentences from these documents contain- ing all the entity types (Table 3.1, explained below) and specific difficult cases as validation set, annotated by all annotators. Artefact An archaeological object found in the ground. Axe, pot, stake, arrow head, coin Time Period A defined (archaeological) period in time. Middle Ages, Neolithic, 500 BC, 4000 BP Location A placename or (part of) an address. Amsterdam, ▇▇▇▇▇- ▇▇▇▇▇▇ ▇, ▇▇▇▇▇▇▇▇▇▇ Context An anthropogenic, definable part of a stratigraphy. Something that can contain Artefacts Rubbish pit, burial mound, stake hole Material The material an Artefact is made of. Bronze, wood, flint, glass Species A species’ name (in Latin or Dutch) Cow, Corvus Corax, oak Table 3.1: Descriptions and examples for each entity type. Examples are trans- lated from Dutch.
Data set Collection. ‌ From the total available corpus (70k documents), we currently have access to ~60,000 excavation reports and related documents, such as appendices, drawings and maps. These texts have been gathered by DANS (Digital Archiving and Networked Services) in the Netherlands, over the past 20 years. We received the documents from DANS as PDF files, and have used the pdftotext tool (Glyph & Cog LLC, 1996) to convert these to plain text. This data set contains 30,152,318 lines and 657,808,600 words (as counted by the command line tool “wc”). The texts are quite diverse; the dates of publication span decades with the earlier ones having been scanned and OCRd from hardcopies created in the 80s. The other temporal variation is in how old the found artefacts are, ranging from 200,000 BC to the present. Also, the type of research can be very different between reports, some might describe a short desk evaluation of a small area without any fieldwork, while others detail huge excavations over multiple years with detailed analysis by a team of specialists. To get a representative sample across all these ranges, a random sampling strategy would not be ideal, and we instead opted to manually select documents, taking into account the variation described above. We selected a total of 15 documents as annotation candidates (~42,000 tokens). For the purposes of calculating the IAA and evaluating the annotation guide- lines, we manually selected roughly 100 sentences from these documents contain- ing all the entity types (Table 3.1, explained below) and specific difficult cases as validation set, annotated by all annotators.

Related to Data set Collection

  • Data Collection The grant recipient will be required to provide performance data reports on a schedule delineated within Section A of this contract, Specific Terms and Conditions.

  • Income Collection, Transaction Processing, Account Administration of a basis point per annum on the average net assets of the Fund.

  • Master Servicer Collection Account (a) The Master Servicer shall establish and maintain in the name of the Trustee, for the benefit of the Certificateholders, the Master Servicer Collection Account as a segregated trust account or accounts. The Master Servicer Collection Account may be a sub-account of the Distribution Account. The Master Servicer will deposit in the Master Servicer Collection Account as identified by the Master Servicer and as received by the Master Servicer, the following amounts: (i) Any amounts withdrawn from a Protected Account or other permitted account; (ii) Any Monthly Advance and any Compensating Interest Payments; (iii) Any Insurance Proceeds, Liquidation Proceeds or Subsequent Recoveries received by or on behalf of the Master Servicer or which were not deposited in a Protected Account or other permitted account; (iv) The repurchase price with respect to any Mortgage Loans repurchased and all proceeds of any Mortgage Loans or property acquired in connection with the optional termination of the trust; (v) Any amounts required to be deposited with respect to losses on investments of deposits in an Account; and (vi) Any other amounts received by or on behalf of the Master Servicer and required to be deposited in the Master Servicer Collection Account pursuant to this Agreement. (b) All amounts deposited to the Master Servicer Collection Account shall be held by the Master Servicer in the name of the Trustee in trust for the benefit of the Certificateholders in accordance with the terms and provisions of this Agreement. The requirements for crediting the Master Servicer Collection Account or the Distribution Account shall be exclusive, it being understood and agreed that, without limiting the generality of the foregoing, payments in the nature of (i) prepayment or late payment charges or assumption, tax service, statement account or payoff, substitution, satisfaction, release and other like fees and charges and (ii) the items enumerated in Subsections 4.05(a)(i), (ii), (iii), (iv), (vi), (vii), (viii), (ix), (xi) and (xii) with respect to the Securities Administrator, need not be credited by the Master Servicer or the related Servicer to the Distribution Account or the Master Servicer Collection Account, as applicable. In the event that the Master Servicer shall deposit or cause to be deposited to the Distribution Account any amount not required to be credited thereto, the Securities Administrator, upon receipt of a written request therefor signed by a Servicing Officer of the Master Servicer, shall promptly transfer such amount to the Master Servicer from the Distribution Account, any provision herein to the contrary notwithstanding. (c) The amount at any time credited to the Master Servicer Collection Account shall be invested, in the name of the Trustee, or its nominee, for the benefit of the Certificateholders, in Permitted Investments as directed by Master Servicer. All Permitted Investments shall mature or be subject to redemption or withdrawal on or before, and shall be held until, the next succeeding Distribution Account Deposit Date. Any and all investment earnings on amounts on deposit in the Master Servicer Collection Account from time to time shall be for the account of the Master Servicer. The Master Servicer from time to time shall be permitted to withdraw or receive distribution of any and all investment earnings from the Master Servicer Collection Account. The risk of loss of moneys required to be distributed to the Certificateholders resulting from such investments shall be borne by and be the risk of the Master Servicer. The Master Servicer shall deposit the amount of any such loss in the Master Servicer Collection Account within two Business Days of receipt of notification of such loss but not later than the second Business Day prior to the Distribution Date on which the moneys so invested are required to be distributed to the Certificateholders.

  • Allocations of Finance Charge Collections The Servicer shall allocate to the Series 1997-1 Certificateholders and retain in the Collection Account for application as provided herein an amount equal to the product of (A) the Floating Allocation Percentage and (B) the Series 1997-1 Allocation Percentage and (C) the aggregate amount of Collections of Finance Charge Receivables deposited in the Collection Account on such Deposit Date.

  • Data Collection and Usage The Company and the Service Recipient collect, process and use certain personal information about Participant, including, but not limited to, Participant’s name, home address, telephone number, email address, date of birth, social insurance number, passport or other identification number, salary, nationality, job title, any shares or directorships held in the Company, details of all awards granted under the Plan or any other entitlement to shares awarded, canceled, exercised, vested, unvested or outstanding in Participant’s favor (“Data”), for purposes of implementing, administering and managing the Plan. The legal basis, where required, for the processing of Data is Participant’s consent.