Presentation of the FODA axis
The objective of the FODA axis (data mining and machine learning) is to work on theoretical issues and develop innovative algorithms in order to analyze large corpus of complex data, especially in the fields of Health and Environment as well as Social Sciences and Humanities. The activities of the researchers who belong to this axis are organised into three main areas described below.
The three main areas of the axis:
The topological learning aims at finding the most appropriate data representation, given the heterogeneous nature of complex data, their volume space and their lack of linearity. Its objective is to exploit the duality between the metric and the topological space. It exploits different similarity measures adapted to complex data that would make their representation more discriminative and it establishes a passage between topological and vectorial representation in order to circumvent the limitations of the methods based on multi-dimensional scaling (MDS). Various topological learning workshops have been organized in collaboration with known international conferences, such as ICDM and PKDD.
The advantage of the rule learning is that it provides understandable knowledge, even if its performance is sometimes lower than that of the black-box methods. One of the objectives is the development of theoretical methods that allow to improve the rule-based performance, while keeping the rules comprehensible. We can mention the case of the imbalance of the classes and that of the iterative procedures based on rules for which original solutions have been proposed. Other goals include the analysis and the development of measures in the interest of the extracted knowledge (see the Workshop QIMIE PAKDD 09).
The data mining of a complex corpus involves more specifically the modeling and analysis of complex data (natural language text, images, metadata, etc.) and social networks, in collaboration with the DECCO axis. The complexity of these data (volume, heterogeneity, redundancy, noise) as well as their evolutionary character require new solutions. There are several applications, such as opinion mining on the Web, automatic construction of social networks, extraction of relevant entities (messages, roles), and semi-automatic annotation of documents often based on semi-supervised clustering or learning.
Tanagra, a data mining platform
A variety of data mining methods have been implemented in TANAGRA, a platform in constant development, which is available to the scientific community interested in data mining and its applications.
Main members of the FODA axis
Secondary members of the FODA axis
Maître de Conférences
Maître de Conférences