De-duplicator Sample Clauses

De-duplicator. The Web contains many duplicate (parts of) pages. For instance, Xxxxxx et al. (2009) reported that during building of the Wacky corpora the amount of documents was reduced by more than 50% after de-duplication. Ignoring this phenomenon and including duplicate documents could have a negative effect in creating a representative corpus. Therefore, the De-duplicator examines the main content of the stored documents in order to detect and remove near- duplicates. This module employs the de-duplication strategy12 included in the Nutch framework, which involves the construction of a text profile based on quantized word frequencies, and an MD5 hash for each page (see section 3.2). An additional step has been integrated into the final version of FMC for detection and removal of (near) duplicates. Each document is represented as a list with size equal to the number of paragraphs (without crawlinfo attribute) of the document. The elements of the list are the MD5 hashes of the paragraphs. Then, each list is checked against all other lists. For each candidate pair, the intersection of the lists is calculated. If the ratio of the intersection cardinality with the cardinality of the shortest list is over a predefined threshold, the documents are considered near- duplicates and the shortest is discarded.
AutoNDA by SimpleDocs
De-duplicator. The De-duplicator module described in 2.1.9 is also available as a standalone web service accessible from xxxx://xxx.xxxx.xx/soaplab2-axis/#ilsp.ilsp_deduplicatormd5_row. The service has two mandatory parameters: 1. The input denotes a file containing a list with URLs to the files to be de-duplicated. 2. The inputType denotes the type of the files to be de-duplicated. These files could be text or TO1 XML files similar to the ones generated by the FMC. The service also has two optional parameters: 1. minimumTokenLength During the calculation of the page profile, all tokens equal or shorter than this value are discarded. The default value is 2. 2. quantValue. Tokens with frequency (after quantization) below this value are discarded. The default value is 3. The output is a text file containing a list with URLs pointing to the files that have remained after de-duplication.

Related to De-duplicator

  • Non-duplication In the event that the Executive shall perform services for the Bank or any other direct or indirect subsidiary or affiliate of the Company or the Bank, any compensation or benefits provided to the Executive by such other employer shall be applied to offset the obligations of the Company hereunder, it being intended that this Agreement set forth the aggregate compensation and benefits payable to the Executive for all services to the Company, the Bank and all of their respective direct or indirect subsidiaries and affiliates.

  • No Duplication The remedies provided in this Article 8 shall not be duplicative of any remedy available under the indemnification provisions of the Purchase Agreement.

  • No Duplicative Payment The Company shall not be liable under this Agreement to make any payment of amounts otherwise indemnifiable hereunder if and to the extent that Indemnitee has otherwise actually received such payment under any insurance policy, contract, agreement or otherwise.

  • Previously Reviewed Receivable; Duplicative Tests If any Review Receivable was included in a prior Review, the Asset Representations Reviewer will not conduct additional Tests on such Review Receivable, but will include the previously reported Test results in the Review Report for the current Review. If the same Test is required for more than one representation and warranty, the Asset Representations Reviewer will only perform the Test once for each Review Receivable, but will report the results of the Test for each applicable representation and warranty on the Review Report.

  • No Duplicative Payments It is intended that the provisions of this Agreement will not result in duplicative payment of any amount (including interest) required under this Agreement. The provisions of this Agreement shall be construed in the appropriate manner to ensure such intentions are realized.

  • No Duplication; No Double Recovery Nothing in this Agreement is intended to confer to or impose upon any Party a duplicative right, entitlement, obligation or recovery with respect to any matter arising out of the same facts and circumstances.

  • REPORT OF CONTRACT USAGE All fields of information shall be accurate and complete. The report is to be submitted electronically via electronic mail utilizing the template provided in Microsoft Excel 2003, or newer (or as otherwise directed by OGS), to the attention of the individual shown on the front page of the Contract Award Notification and shall reference the Group Number, Award Number, Contract Number, Sales Period, and Contractor's (or other authorized agent) Name, and all other fields required. OGS reserves the right to amend the report template without acquiring the approval of the Office of the State Comptroller or the Attorney General.

  • REPAIRED OR REPLACED PARTS / COMPONENTS Where the Contractor is required to repair, replace or substitute Product or parts or components of the Product under the Contract, the repaired, replaced or substituted Products shall be subject to all terms and conditions for new parts and components set forth in the Contract including Warranties, as set forth in the Additional Warranties Clause herein. Replaced or repaired Product or parts and components of such Product shall be new and shall, if available, be replaced by the original manufacturer’s component or part. Remanufactured parts or components meeting new Product standards may be permitted by the Commissioner or Authorized User. Before installation, all proposed substitutes for the original manufacturer’s installed parts or components must be approved by the Authorized User. The part or component shall be equal to or of better quality than the original part or component being replaced.

  • Meter Testing Company shall provide at least twenty-four (24) hours' notice to Seller prior to any test it may perform on the revenue meters or metering equipment. Seller shall have the right to have a representative present during each such test. Seller may request, and Company shall perform, if requested, tests in addition to the every fifth-year test and Seller shall pay the cost of such tests. Company may, in its sole discretion, perform tests in addition to the fifth year test and Company shall pay the cost of such tests. If any of the revenue meters or metering equipment is found to be inaccurate at any time, as determined by testing in accordance with this Section 10.2 (Meter Testing), Company shall promptly cause such equipment to be made accurate, and the period of inaccuracy, as well as an estimate for correct meter readings, shall be determined in accordance with Section 10.3 (Corrections).

  • After-Tax Basis Indemnification under Section 11.1 and Section 11.2 shall be in an amount necessary to make the Indemnified Party whole after taking into account any tax consequences to the Indemnified Party of the receipt of the indemnity provided hereunder, including the effect of such tax or refund on the amount of tax measured by net income or profits that is or was payable by the Indemnified Party.

Draft better contracts in just 5 minutes Get the weekly Law Insider newsletter packed with expert videos, webinars, ebooks, and more!