Research topics
My research interests lie in the fields of artificial intelligence, machine learning and data mining. More precisely, I'm working on topic/concept extraction from various kind of data. To this end, I develop new clustering techniques that are able to deal with sparse, structured datasets. I'm also interested in tracking the topic/concept over time and in evaluating their quality. A lot of applications can be addressed, especially for data mining and social media analysis. I'm also very motivated by studying the links with other fields, such as sociology and literature.
Keywords: artificial intelligence, machine learning, data mining, conceptual clustering, topic learning, (text+web+opinion) mining, social media analysis, role discovery, applications to digital humanities.
Ongoing projects
-
TIGA (action 14, 2020-2023) L’industrie [Re]connectée et intégrée à son territoire et à ses habitants. You can find a brief description on the IMU website.
-
LIFRANUM (2020-2023, new ANR project) Identify and structure the corpus of digital French literatures. Please take a look at the dedicated webpage for more information.
-
Cartoweb: Cartography of digital French literatures (2018-2020) In this project led by G. Bonnet (MARGE, Université Lyon 3), we investigate the use of data mining techniques and visualization to study the production of francophone literature on the Web.
Past projects
-
What datamining tools for an epistemological study of Geography? (2015-2019) Today, we have a unique opportunity to revisit the history of sciences by leveraging recent and powerful automatized tools, such as the ones provided by data mining and natural language processing. Geography is an ancient research field that has gathered a lot of scientific contributions published in numerous journals over more than a century. In this project, jointly led by I. Lefort (EVS lab) and S. Loudcher (ERIC), we aim at proposing new data mining techniques for analyzing the evolution of Geography over time. Up to now, we have tested topic modeling approaches that might automatize the detect of metaphors in geographic corpora.
-
DyNoFlu: information dynamics and novelty in textual data streams (2017-2018) Textual data streams are ubiquitous nowadays and there is still room for advanced techniques to track the information flow. More precisely, we aim at studying the intertwinning between the topical evolution and special texts that influence this evolution, in collaboration with EDF R&D. This project is funded by the Gaspard Monge fundation, Research Initiative in Industrial Data Science (IRSDI).
-
CArtography of the Open Data in Auvergne-Rhône-Alpes (2017-2018) In the CAODRA project, in collaboration with specialists in information and communication studies of the ELICO lab, we aim at visualizing the open data landscape of the Auvergne-Rhône-Alpes region. The analysis is based on the information posted on websites and on the textual description of open data resources. This project benefits from a joint funding from the Institut des Sciences de l'Homme (ISH) and the Complex System Institute (IXXI).
-
Measuring the editorial policy of an international online media (2016-2018) We all know that the information conveyed by different media can be utterly different. But what about the same media targeting different audiences? In this project, in close collaboration with sociologists of the Max Weber lab, we aim at comparing the information conveyed by the HuffingtonPost in different countries and languages (English, French, Brazilian) by the mean of text mining techniques. The project is partly funded by the system complex centre IXXI. It is linked to the digital journalism initiative at Lyon 2 and involves researchers of the Federal University of Paraná.
-
ANR ImagiWeb: capturing the opinionated image of entities (2012-2015) In this project, we designed an original framework for extracting opinions expressed about entities (e.g., public figures, companies, brands) in messages extracted from the Web. This work was funded by the CONTINT program of the French National Research Agency. It involved six partners (half academic labs, half companies) for 42 months. It has been conducted in close collaboration with social scientists (in particular, researchers in political studies of the CEPEL lab), from data collection and annotation to prototype evaluation. Fine-grained opinions are extracted from the messages before being associated to aspects describing the targeted entity (e.g., balance sheet of a politician, sales policy of a company). These opinions are then aggregated by an evolutionary clustering to analyze their dynamics over time. The prototype has been tested on two case studies: the image of two French politicians during the 2012 presidential election over the Twitter platform, and the image of a French energy company over the French blogosphere. See the dedicated website for more details.
-
Static and dynamic study of the vocabulary in the domain of nuclear medicine (2012-2013) Joint project between the ERIC and CRTT labs, funded by the University of Lyon 2.
-
Semantic analysis of complaints by using text mining techniques (2012) Project funded by the CNAF / CNEDI.
-
Controversy analysis at the frontier of information sciences and data mining (2010-2011) Joint project between the ERIC and ELICO labs, funded by the University of Lyon 2.
-
New data mining tools for the SSH (2009-2010) Joint project between the ERIC and LARHRA labs, funded by the University of Lyo 2.