Industrial use of natural language processing techniques is subject to specific constraints (real time, short development cycles, low linguistic expertise locally available, etc...) which are not particularly compatible with the methods usually applied in classical approaches of computational linguistics. However, recent advances in the field of corpora-based linguistics open a whole set of new possibilities. In particular, the research in Natural Language Processing at the LIA focuses on text-mining (knowledge extraction out of textual data), automatic production of syntactic tools and evaluation of NLP tools.
Our text-mining methods ar based on techniques developed for information retrieval using a Distributional Semantic approach. In such methods, semantic proximities are derived from co-frequency matrices computed on large textual corpora. Different similarity measures are used to characterize the proximity between queries and documents which are represented in an unified way as projections in a high-dimensional vector space of pertinent terms.
Methods for automatic production of syntactic tools aim to implement probabilistic techniques and models operating on textual corpora (raw or annotated texts) in order to adapt various generic algorithms to specific applications: part-of-speech tagging, speech recognition, information retrieval, etc...
- Automatic Structuring of Textual Data: Applications to Text-Mining
- INSPECT - Integration of acoustic and advanced linguistic models into speech understanding systems,
- STING: Evaluation of scientific & technological innovation and progress in Europe through patents
- NLP-FPGA - Hardware NLP coprocessor
- INFOVOX - Interactive Voice Servers for Advanced Computer Telephony Applications
- EXTRACT: Automated Information Extraction from Classified Newspaper Advertisements
- GRACE - part-of-speech tagging evaluation
- ELSE - Evaluation of Language and Speech engineering
- Data-oriented probabilistic syntactic analysis
- Distributional Semantics: application to retrieval from large textual bases
- ISIS - Design of Advanced Vocal Information Servers
- Industrial tools for Natural Language Processing