The current boom in information technology produced an enormous amount of data and information available. However, the retrieval of needed documents is often difficult and media such as the Internet do not provide sophisticated structuring and organization to face the task of finding relevant documents in a particular situation. The automatic detection of similarity between texts and the determination of relevance of documents relative to a certain query is therefore essential to the efficient use of the humungous amount of data available. At the LIA, research into automatic structuring of documents and probabilistic parsing as undertaken in order to improve current retrieval and similarity approaches.
- EXTRACT: Automated Information Extraction from Classified Newspaper Advertisements
- Automatic Structuring of Textual Data: Applications to Text-Mining
- STING: Evaluation of scientific & technological innovation and progress in Europe through patents
- Distributional Semantics: application to retrieval from large textual bases