Data-oriented Probabilistic Syntactic Analysis

Keywords: Monte-Carlo parsing, automatic acquisition of probabilistic grammars
Contact Person: Martin Rajman
Phone: (+41 21) 693-5277

Project Description

Tree substitution grammars and syntactic Monte-Carlo parsing: application to the automated acquisition of probabilistic grammars. One of the important issues that dominate current research work in parsing and language modeling is the efficient integration of naturally occurring linguistic material (corpora, treebanks, ...) in the design of natural language parsers for specific applications. Simple high-coverage methods such as n-gram models miss the higher-order regularities required for reliable analysis, while laboriously hand-crafted computational grammars are often incomplete and ambiguous. Therefore, the objective of our research is to study how to combine explicit linguistic knowledge (e.g. predefined syntactic trees) and probabilistic techniques to design improved automated acquisition methods of natural language parsers. In particular, we will concentrate on the acquisition of data-driven probabilistic parsers based on Monte-Carlo techniques and tree-substitution grammars.

This research project has been carried out by Jacques Han in the framework of a Doctoral thesis at the École Nationale Supérieure des Télécommunications under the supervision of Martin Rajman.

Last modified: Mon Apr 17 14:19:51 2000