Sentence Splitting. The task of the LTSentenceSplitter is to detect sentence boundaries and insert <s> …. </s> markups in the input text. • lists of startwords. Startwords are words that indicate a sentence beginning if capitalised (like ‘The’). • lists of endwords. Endwords are words that frequently occur before a sentence-final punctuation (i.e. they indicate that a following period is really a sentence-end) • lists of abbreviations. Abbreviations are further subcategorised into those that mostly occur in final position (like ‘etc.’), those that occur nearly always in non-final position (like ‘Dr.’), and others that occur that can be used both ways. The startwords and endwords have been collected from a corpus analysis of the WACky corpus, and manually corrected. They comprise about 12.000 entries per language. The SentenceSplitter identifies patterns which indicate a sentence boundary, checking contexts around punctuations in a variable-length window. WSDL xxxx://00.000.000.000:0000/xxxxxxxX0/xxxxxxxx/XxxxxxxxXxxxxx er?wsdl
Appears in 2 contracts
Samples: Grant Agreement, Grant Agreement
Sentence Splitting. The task of the LTSentenceSplitter is to detect sentence boundaries and insert <s> …. </s> markups in the input text. • The sentence splitter uses the following language resources for each language: lists of startwords. Startwords are words that indicate a sentence beginning if capitalised (like ‘The’). • lists of endwords. Endwords are words that frequently occur before a sentence-final punctuation (i.e. they indicate that a following period is really a sentence-end) • lists of abbreviations. Abbreviations are further subcategorised into those that mostly occur in final position (like ‘etc.’), those that occur nearly always in non-final position (like ‘Dr.’), and others that occur that can be used both ways. The startwords and endwords have been collected from a corpus analysis of the WACky corpus, and manually corrected. They comprise about 12.000 entries per language. Modus operandi: The SentenceSplitter identifies patterns which indicate a sentence boundary, checking contexts around punctuations in a variable-length window. WSDL xxxx://00.000.000.000:0000/xxxxxxxX0/xxxxxxxx/XxxxxxxxXxxxxx er?wsdlXxxxxxxxXxxxxxxx?xxxx
Appears in 2 contracts
Samples: Grant Agreement, Grant Agreement