Local Defaulter. In case the decomposer returns a string as ‘unknown’ this token needs to be annotated with linguistic information somehow; a tagger would not like a tag ‘unknown’ occurring in all kinds of possible contexts. It is the task of the defaulter to provide such annotations. The component is called ‘local defaulting’ as only the unknown string itself is considered, and no context information is used: Corpus-based extraction of information would be defaulting’. called ‘contextual The following resources are used: • Lists of foreign words: They are used to check if an unknown word comes from a foreign language. For this purpose, the word lists of the Language Identifier are re-used. Many unknown tokens in the test corpus are foreign language words. • Default endings: These resources are created by a training component that correlates some linguistic information with string endings. Such information include: Tags (BTag, STag, XTag), lemma formation, gender defaulting, etc. It takes a list of example words, and linguistic annotations of them, and produces the longest common ending strings for this annotation. For the defaulting of the tag, the training component produces about 470 K correlations of endings and tags assignments; in the case of homographs, it also gives the relative weights of the different tags against each other, based on the training data.14 So far, only part-of-speech defaulting is done; other defaulting operations will concern lemma, gender, and others. At runtime, three defaulting steps are tried: First, the foreign word dictionary is looked up, to check if the unknown string is a foreign word. In case the word is found it is marked as (a special kind of) ‘Common Noun’15 ‘EU/2/08/091/004’ or ‘CRF12’. As for tag assignment, such strings can be common nouns 14 For the current setup, only the STag defaulter is used; following versions will default more features if the approach turns out to be viable. 15 A tag like “FW” as in the STTS tagset does not really help, as its distribution would be completely classifying them as nouns is considered to do the least damage. (‘AKW’ = ‘Atomkraftwerk’) but also proper nouns (‘CSU’ = ‘christlich

Appears in 2 contracts

Samples: Grant Agreement, Grant Agreement

Local Defaulter. In case the decomposer returns a string as ‘unknown’ this token needs to be annotated with linguistic information somehow; a tagger would not like a tag ‘unknown’ occurring in all kinds of possible contexts. It is the task of the defaulter to provide such annotations. The component is called ‘local defaulting’ as only the unknown string itself is considered, and no context information is used: . Corpus-based extraction of information would be called ‘contextual defaulting’. ~~called ‘contextual~~ The following language resources are used: •  Lists of foreign words: They are used to check if an unknown word comes from a foreign language. For this purpose, the word lists of the Language Identifier are re-used. Many unknown tokens in the test corpus are foreign language words. •  Default endings: These resources are created by a training component that correlates some linguistic information with string endings. Such information include: Tags (BTag, STag, XTag), lemma formation, gender defaulting, etc. It takes a list of example words, and linguistic annotations of them, and produces the longest common ending strings for this annotation. For the defaulting of the tag, the training component produces about 470 K correlations of endings and tags assignments; in the case of homographs, it also gives the relative weights of the different tags against each other, based on the training ~~data.14~~ data.21 So far, only part-of-speech defaulting is done; other defaulting operations will concern lemma, gender, and others. Modus operandi: At runtime, three defaulting steps are tried: . First, the foreign word dictionary is looked up, to check if the unknown string is a foreign word. In case the word is found it is marked as (a special kind of) ‘Common ~~Noun’15~~ Noun’22. Next, a strategy to identify acronyms and other non-words, consisting of a mixture of digits, uppercase and lowercase letters is applied; it is supposed to cover strings like ‘EU/2/08/091/004’ or ‘CRF12’. As for tag assignment, such strings can be common nouns 14 For the current setup, only the STag defaulter is used; following versions will default more features if the approach turns out to be viable. 15 A tag like “FW” as in the STTS tagset does not really help, as its distribution would be completely classifying them as nouns is considered to do the least damage. (‘AKW’ = ‘Atomkraftwerk’) but also proper nouns (‘CSU’ = ‘~~christlich~~christlich soziale Union’). Therefore, they are treated as homographs, leaving it to later components to tag them properly. Finally, the string undergoes local defaulting, looking up its ending in the defaulter resource. This will always produce an assignment. The STag (or a set thereof, in case of homographs) is returned.

Appears in 2 contracts

Samples: Grant Agreement, Grant Agreement

Common use of Local Defaulter Clause in Contracts