Text Mining. Authorized Users may use the Licensed Material to perform and engage in text mining /data mining activities for legitimate academic research and other educational purposes.
Text Mining. We apply a pattern-based natural language process- ing approach for finding exceptions in contract text. Pattern-based information extraction has been an ac- tive discipline in the past two decades. Despite their simplicity, linguistic pattern-based approaches yield surprisingly good results. We survey some important work in this area. Hearst [6] pioneered the pattern-based approach by using it for automatic acquisition of hypernyms from Grolier’s American Academic Encyclopedia. The hyponymy relation such as of apple to fruit indicates the is a relation. To extract such information, Hearst defines patterns of the type ’NP 0 such as NP 1’. For example, the phrase fruit such as apple (if sufficiently frequent) conveys information that apple is a hy- ponym of fruit. Xxxxxxx and Charniak [38] apply a similar pattern- based approach to find nouns that satisfy part-of re- lations in the LDC North American News Corpus (NANC). The part-of relation indicates part and whole of the entities such as wheel to car. Xxxxxxx and Xxxxxxxx’x patterns are of the type ’NP 0 of NP 1’, which indicate a part-of relationship, as in basement of building that basement is a part of building. Xxxxx and Moldova [39] extract causal relations from text using an approach similar to the above on the TREC-9 data set, which is a collection of news articles. To extract causal relations from corpora, Xxxxx and Moldova use the most explicit intra-sentential pattern ’NP 0 V NP 1’, where V is a simple causative verb. Hearst evaluates her approach against WordNet and obtains a precision of 57.55%. Xxxxxxx and Char- niak’s approach yields 55% accuracy for the top 50 words, when evaluated against human annotated data. And, Xxxxx and Moldova achieve 65.6% accu- racy against the average performance on two human annotators on 300 relation pairs. In this context, our results of nearly 90% precision indicate that contracts are a promising domain and perhaps that additional information can be mined from them. Xxxxxxx and Xxxxxxxx [40] use Hearst patterns [6] to mine business risk vocabularies and build a taxon- omy. They identify potential risks in financial reports. Xxxxxxx and Xxxxxxxx use the Web as their corpus for vocabulary discovery and validation. In contrast, our system uses a set of contracts as its corpus, and its vocabulary discovery process is not based on the Hearst patterns. Xxxxxxxx and Xxxxxxx [41] use an approach based on machine learning to study contract documents. They employ a binar...
Text Mining. The purpose of text mining is to process unstructured textual information and extract meaningful numerical indices from the text, in order to make the information contained in the text accessible to the different data mining algorithms (statistical and machine learning) (Aggarwal 2012). Inside text mining, similarity detection (i.e., detection of similar texts by using either their syntactic or semantic properties) is an established field. In (Xxxx 2007), similarity is used to automatically predict the fixing effort, i.e., the person-hours spent on fixing an issue, such as a software bug. Given a new issue report, the Lucene4 framework is used to query the database of resolved issues for textually similar reports (using the nearest neighbour approach) and use their average time as a prediction. Assignments of developers to bug reports has also been tackled from a similarity perspective: ● Xxxx (2009) presents a framework for automated assignment of bug-fixing tasks which infers knowledge about a developer's expertise by analysing the history of bugs previously resolved by the developer. Then, it applies a vector space model (VSM) to recommend experts for fixing bugs, matching the new bug VSM representation with the most similar developer VSM representation. In addition to similarity, other heuristics are taken into account, as current workload and preferences of the developer. 3 xxxx://xxxxxx.xxxxxx.xxx/ 4 xxxxx://xxxxxx.xxxxxx.xxx/core/ ● Xxxxxxx (2012) proposes an algorithm to discover experts for fixing new software bugs which is based on the analysis of their textual information (e.g., summary and description attributes). Frequent terms are generated from this textual information and then term similarity is used to identify appropriate experts (developers) for the newly reported software bug. Text mining is used in combination with machine learning techniques in (Menzies 2008) to assist test engineers in assigning severity levels to defect reports. The proposed algorithm is based on the automated extraction and analysis of textual descriptions from issue reports: text mining techniques are used to extract the relevant features of each report, while machine learning techniques are used to assign these features with proper severity levels (taking into account the severity levels already assigned to other issues to construct rules about when an specific defect level should be assigned).
Text Mining. Participating Institutions and their Authorized Users may, subject to prior notification and approval by the licensor, using reasonable practices, engage in text processing, which is any kind of analysis of natural language text. The Licensor will make appropriate arrangements prior to the start of this activity to account for heavy usage and ensure continued access for the user. This may include but not be limited to a process by which information may be derived from text by identifying patterns and trends within natural language through text categorization, statistical pattern recognition, concept or sentiment extraction, and the association of natural language with indexing terms. Technology will not be used to hinder any uses granted under this section.