Merging for increased precision Clause Samples
Merging for increased precision. UCAM work focused on the experimentation of a new method for merging of Subcategorization information automatically acquired using two different parsers with the goal of acquiring a higher precision SCF resource, i.e. where only the information that the two resources agree on is retained. Differently from previous works, e.g. (▇▇▇▇▇▇ and ▇▇▇▇, 2005; ▇▇▇▇▇▇▇▇ et al., 2009), here the focus is on merging the intersection between two resources. Treating language resource merging as (roughly) a union operation seems appropriate for manually developed resources, or in general when coverage is a priority. However, when working with automatically acquired resources, it may be worthwhile to adopt the approach of merger by intersection. UCAM tried to reduce the noise that the taggers and parsers add to the automatic SCF acquisition, by combining two lexicons built with different parsers. For the experiment – performed on English data – the parsers used are the RASP parser and the unlexicalized Stanford parser. SCF are acquired from both outputs by an adapted version of the SCF acquisition system of (▇▇▇▇▇▇ et al., 2007), which is a rule-based classifier that matches the Grammatical Relations (GRs) for each verb instance with a corresponding SCF. Since the classifier is based on the GR scheme adopted for RASP and the scheme of the Stanford parser is different, a new version of the classifier has been developed for the 1 Note that SPs are not included in the TO at this time; since state-of-the-art SP models specify probabilistic relations between any verb-argument pair, they are not easily represented in a flat lexicon format. As anticipated in D6.1, inclusion of SPs is left for future work.
