Identifying target sentences Sample Clauses
Identifying target sentences. Through online learning, the system accrues keywords from past answers (answer keywords) along with co-occurring words in the corresponding sentences (contextual words). For example, given answer keywords “diabetes” and “hypertension” in the sentence “The patient reports history of diabetes and hypertension”, contextual words are ”patient”, “report” and “history”. Such answer keywords and contextual words combined with customized vocabularies can be utilized to identify sentences that are likely to contain answers with the following methods.
a) Similarity based search using the vector space model
1) Accumulated contextual keywords are represented by a query q using space vector. The query vector consists of terms whose frequency f are above acceptance threshold α. wi is the weight of a individual term.:
2) Calculating weight of individual term for a query space vector takes the following steps: Frequency f is defined as the count of term ci divided by numbers of accumulated instances n: For terms with frequency f above acceptance threshold α, basic weight bi is computed with the following function, where max(c) is the maximum count among individual terms. log curves and smoothes the number of count: Employing ideas similar to term frequency-inverse document frequency (tf-idf), we promote terms that have high concurrence with the value to be extracted. The high concurrence score oi is calculated with the following formula: In the expression, gi is the total count of term i in all processed documents, gi / n represents the average appearance of term i. Since ci is the number of times term i co- occurs with the answer, the term with the stronger correlation with the answer will receive a higher oi score, indicating its higher distinguishing power. For example, ci / (gi / n) equals 1 means whenever the term i appear, the sentence that it belongs to always contains the value to be extracted. In this case, term i serves as a most reliable contextual word for identifying the target sentence. β is the parameter used for adjusting the scale of promotion. Finally, the term weight wi is calculated with bi * oi.
3) Each sentence t in a document is treated as a independent bag of words, represented by the vector: where wi is the weight of each term, or simply the number of occurrences.
4) The score of a sentence, St, is computed as follows. St = (q ·dt) /log(|dt|) The expression q · dt represents the dot product (X Y· = ∑Xi Yi) of query vector q and sentence vector dt....
