Keywords: Textual document retreiveal, distributional semantics, co-frequency Contact Person: Martin Rajman Phone: (+41 21) 693-5277 E-mail: Martin.Rajman@epfl.ch
This research project concerns the development of semantic models for textual document retrieval systems. The models we are focusing on take place in framework based on a "distributional semantics" where semantic proximities are derived from co-frequency matrices computed on large textual corpora. The queries and documents are represented in an unified way as projections in a vector space of pertinent terms. Different similarity measures will be tested to characterize the proximity between queries and documents.
A software prototype, called D-SIR, has been implemented in order to validate the approach.