Common use of Freeling Italian Clause in Contracts

Freeling Italian. The freeling_it web service hosted by CNR provides functionalities for POS tagging and Lemmatization using the Italian version of FreeLing. The FreeLing project was created and is currently led by Xxxxx Xxxxx; the tools were developed at the TALP Research Center of the Universitat Politècnica de Catalunya. The package consists of several language analysis libraries. The analyzer library contains a complete pipeline for the tokenization, sentence splitting, lemmatization, tagging and morphological analysis of text in several languages, including Italian. FreeLing reads from standard input and produces results to standard output. The input format is plain text (UTF-8 or ISO) and the output is a tabbed file where sentences are separated by an empty line. Each token is stored in a separate line, with lemma and POS information added to the token and separated by tabs. For further details, see Xxxxxxxx et al. (2006), Xxxxx et al. (2010) and the Freeling page at xxxx://xxx.xxx.xxx.xxx/freeling. Sentence splitting and tokenization are rule-based. Lemmatization is based on an Italian dictionary that is extracted from the Morph-it! lexicon developed at the University of Bologna. The lexicon contains over 360,000 forms corresponding to more than 40,000 lemma-POS combinations. POS disambiguation is performed using an HMM tagger, which, in the case of Italian, was trained on a manually annotated corpus of 300,000 words. The declared accuracy for Italian is 97% (Atserias et al. 2006). POS tags are represented by alphanumeric values that encode the EAGLES tagset. Although no documentation of the Italian tagset is provided by TALP, the tagset is similar to the one for Spanish found at xxxx://xxx.xxx.xxx.xxx/freeling/doc/ userman/parole-es.html URL xxxx://xxxx0.xxx.xxx.xx:8080/soaplab2- axis/#panacea.freeling_it_row WSDL xxxx://xxxx0.xxx.xxx.xx:8080/soaplab2- axis/typed/services/panacea.freeling_it?wsdl PANACEA Catalogue Entry xxxx://xxxxxxxx.xxxx.org/services/139 PANACEA MyExperiment Workflow(s) using the WS xxxx://xxxxxxxxxxxx.xxxx.org/workflows/24 Table 14 WS Details for freeling_it The freeling_it service accepts a set of optional parameters regarding multiword detection, named-entity and output-format. These parameters are briefly described in Table 15. Parameter name Semantics multiword Enables/disables multiwords detection (yes/no) ner Type of NE recognition is to be performed (none/basic, default is none) output-format Level of analysis to display in the results (token/splitted/tagged, default is tagged) Table 15 Optional parameters for freeling_it

Appears in 2 contracts

Samples: cordis.europa.eu, www.panacea-lr.eu

AutoNDA by SimpleDocs

Freeling Italian. The freeling_it web service hosted by CNR provides functionalities for POS tagging and Lemmatization using the Italian version of FreeLing. The FreeLing project was created and is currently led by Xxxxx Xxxxx; the tools were developed at the TALP Research Center of the Universitat Politècnica de Catalunya. The package consists of several language analysis libraries. The analyzer library contains a complete pipeline for the tokenization, sentence splitting, lemmatization, tagging and morphological analysis of text in several languages, including Italian. FreeLing reads from standard input and produces results to standard output. The input format is plain text (UTF-8 or ISO) and the output is a tabbed file where sentences are separated by an empty line. Each token is stored in a separate line, with lemma and POS information added to the token and separated by tabs. For further details, see Xxxxxxxx et al. (2006), Xxxxx et al. (2010) and the Freeling page at xxxx://xxx.xxx.xxx.xxx/freeling. Sentence splitting and tokenization are rule-based. Lemmatization is based on an Italian dictionary that is extracted from the Morph-it! lexicon developed at the University of Bologna. The lexicon contains over 360,000 forms corresponding to more than 40,000 lemma-POS combinations. POS disambiguation is performed using an HMM tagger, which, in the case of Italian, was trained on a manually annotated corpus of 300,000 words. The declared accuracy for Italian is 97% (Atserias et al. 2006). POS tags are represented by alphanumeric values that encode the EAGLES tagset. Although no documentation of the Italian tagset is provided by TALP, the tagset is similar to the one for Spanish found at xxxx://xxx.xxx.xxx.xxx/freeling/doc/ userman/parole-es.html URL xxxx://xxxx0.xxx.xxx.xx:8080/soaplab2- axis/#panacea.freeling_it_row WSDL xxxx://xxxx0.xxx.xxx.xx:8080/soaplab2- axis/typed/services/panacea.freeling_it?wsdl PANACEA Catalogue Entry xxxx://xxxxxxxx.xxxx.org/services/139 PANACEA MyExperiment Workflow(s) using the WS xxxx://xxxxxxxxxxxx.xxxx.org/workflows/24 Table 14 9 WS Details for freeling_it The freeling_it service accepts a set of optional parameters regarding multiword detection, named-entity and output-format. These parameters are briefly described in Table 1510. Parameter name Semantics multiword Enables/disables multiwords detection (yes/no) ner Type of NE recognition is to be performed (none/basic, default is none) output-format Level of analysis to display in the results (token/splitted/tagged, default is tagged) Table 15 10 Optional parameters for freeling_it

Appears in 2 contracts

Samples: repositori.upf.edu, cordis.europa.eu

AutoNDA by SimpleDocs
Draft better contracts in just 5 minutes Get the weekly Law Insider newsletter packed with expert videos, webinars, ebooks, and more!