Diploma Project Proposal: New metrics for Information Retrieval Systems


Chappelier Jean-Cédric
Office: INR 232
Tel: 021 / 693.66.83
Email: Jean-Cedric.Chappelier@epfl.ch
Gaussier Éric
Xerox Reserch Center Europe (XRCE)
Email: Eric.Gaussier@xrce.xerox.com
Rajman Martin
Office: INR 233
Tel: 021 / 693.52.77
Email: Martin.Rajman@epfl.ch


Introducing more Semantics (i.e. "meaning") in Information Retrieval System is one of the key challenges in the domain of Textual Information Retrieval. However, even the most efficient information retrieval systems are based on rather simple representation of documents.

The goal of this project is to implement and evaluate a rather new method based on Support Vector Machine and Fisher Kernels to build a new metric between documents, based on a priori semantic knowledge.

Several interesting challenges related with the implementation of computational techniques able to deal with very large document collections should also be taken into account. The final program will have to be tested on real life, large scale examples.

This project will take place in the framework of a collaboration between LIA and the Research Center of Xerox located in Grenoble. The actual working location is planed to be at EPFL, at least at the beginning, with some visits to Xerox site.
A continuation in the form of a founded internship of several months could also be considered. This has to be discussed with the candidate.


Not refractory to mathematics. Good C/C++ programming skills.
Basics in Information Retrieval (TIDT course is recommended).




G. Siolas and F. d'Alché-Buc, Mixture of probabilistic PCAs and Fisher kernels for word and document modeling, Proc. ICANN 2002, (eds) J. Dorronsoro, LNCS 2415, 2002.
G. Siolas and F. d'Alché-Buc, Support Vector Machines based on a Semantic Kernel for Text Categorization, Proc. of IJCNN'00, vol. 5, pp. 205-209, 2000.