05/03/26 – Soutenance d’Irina Proskurina : Towards an Unbiased Compression of Large Language Models #llm

Soutenance femme

La soutenance de thèse d’Irina aura lieu le 5 mars 2026, sur BdR dans la salle LĂ©onie Villard Ă  14h.


I am pleased to invite you to my PhD defense entitled « Towards an Unbiased Compression of Large Language Models. » The defense will be held in English on March 5 at 2:00 PM at the Berges du Rhône campus, 4 bis rue de l’Université, 69007 Lyon, in Room Léonie Villard. The room is located on the ground floor of the Palais Hirsch.
A campus map is available here: plan campus BdR.

Abstract:
High memory demands of generative language models have drawn increasing attention to compression techniques such as quantization and pruning, which reduce computational cost, memory usage, and inference latency. However, recent empirical studies show that compression can increase biased outputs and degrade performance on fairness benchmarks, while it remains unclear which specific layers and matrices are responsible for these effects.
In this thesis, we examine the relationship between compression and bias in large language models across multiple dimensions, including representational bias, multilingual fairness, toxicity, calibration, and ethical reasoning. We first introduce Histoires Morales, the first French benchmark for evaluating moral reasoning and ethical alignment in language models, enabling cross-lingual analysis of value-based judgements. Next, we present the Dice-Leaderboard, which supports external submissions and advances open science initiatives aimed at evaluating the unintended effects of compression in LLMs.
To address the impact of compression on group bias, we introduce Fair-GPTQ, the first quantization method explicitly designed to mitigate unfairness during compression. Fair-GPTQ augments the GPTQ quantization objective with group-fairness constraints. Models quantized with Fair-GPTQ generally preserve pre-compression baseline accuracy on zero-shot benchmarks, while reducing bias relative to half-precision models and retaining the memory and speed benefits of 4-bit quantization. Next, we extend this objective to compression-time weight pruning, explicitly controlling bias in the reconstruction error on texts containing stereotypes and anti-stereotypes.
Overall, our contributions advance research on LLM efficiency and fairness-aware compression and support the responsible deployment of compressed models at scale.

*************************************
Jury:

Antske Fokkens, Full Professor, Vrije Universiteit Amsterdam, Rapporteure
Christophe Cerisara, Full Professor, CNRS LORIA, Université de Lorraine, Rapporteur
Karën Fort, Full Professor, Université de Lorraine, Examinatrice
Alexandre Allauzen, Full Professor, Université Paris-Dauphine, Université PSL, Examinateur
Julien Velcin, Full Professor, École Centrale de Lyon, Directeur de thèse
Guillaume Metzler, Associate Professor, Université Lumière Lyon 2, Co-directeur de thèse

Laisser un commentaire

Votre adresse de messagerie ne sera pas publiée. Les champs obligatoires sont indiqués avec *