Abstract
This thesis introduces a new computational framework and annotation methodology for investigating textual entailment in a theory-based paradigm. This paradigm is premised on the assumption that entailment recognizers could be made more accurate if an explicit linguistic theory explains at least some of the data that they are designed to
... read more
cover. Thus it is possible to begin with a theory that models a small set of linguistic phenomena and then augment it to cover increasingly complex syntactic constructions and semantic inferences. The proposed framework and annotation methodology allow human annotators to create entailment data that can be accounted for by a standard truth-conditional semantic model. The framework is an annotation platform integrating a typed-lexicon, a standard stochastic parser, a part-of-speech tagger, a lambda calculus engine and a first-order theorem prover with a graphical user interface. It is shown to be a logically sound proof system with respect to the semantic theory. Hence, when the platform uses the annotation successfully for deduction, it indicates that the underlying semantic theory accounts for the entailment. The platform is used within a methodology of Annotating-By-Proving: A premise and a conclusion of an entailment pair are considered well-annotated only if the annotations lead to a deduction from the premise to the conclusion. An extension of this methodology also covers non-entailing pairs. The proposed strategy provides annotators with an immediate feedback on the ability to generate an inferential path based on their annotations. This general approach is used for developing a semantic model incorporating some of the most common inferential phenomena in the Recognizing Textual Entailment (RTE) corpora: appositive, intersective and restrictive modification, as well as simple existential and universal quantification. Human annotators used the annotation platform to generate a dataset of annotated entailments explained by this semantic model. The resulting corpus, entitled SemAnTE (Semantic Annotation of Textual Entailment), consists of 600 pairs in a positive-negative ratio of 2:1 and is publicly available online (http://logiccommonsense.wp.hum.uu.nl/resources). The explicitness of the semantic theory, the simplicity of its representations, and the standard conventions used for tagging parse trees all suggest that the model is learnable and holds promise for developing better performing entailment recognizers.
show less