Back RSS stream

Publications of Jérôme Darmont

Reference (inproceedings)

A. Diouan, S. Loudcher, J. Darmont, E. Ferey, "Discovering Relationships in Data Lakes Using Large Language Models: An Industrial Case", 28th International Conference on Big Data Analytics and Knowledge Discovery (DaWaK 2026), August 2026; Lecture Notes in Computer Science, Springer, Heidelberg, Germany.

BibTeX entry

@INPROCEEDINGS{dawak2026,
     Author = {Ahlame Diouan and Sabine Loudcher and Jérôme Darmont and Eric Ferey},
     Title = {Discovering Relationships in Data Lakes Using Large Language Models: An Industrial Case},
     Booktitle = {28th International Conference on Big Data Analytics and Knowledge Discovery (DaWaK 2026)},
     Month = {August},
     Year = {2026},
     Series = {Lecture Notes in Computer Science},
     Publisher = {Springer},
     Address = {Heidelberg, Germany},
     Abstract = {Data lakes rely on metadata to remain usable, yet this meta data is often limited or weakly informative for column relationship discovery, especially in ERP-derived datasets with coded or abbreviated schema labels. We propose ColRel, a two-stage method that builds column embeddings from metadata and data available at ingestion time. In difficult cases, such as coded schemata, business dictionaries help better interpret column names and support the generation of short natural-language descriptions used in the second stage. Experiments on public benchmarks and an industrial ERP dataset show that ColRel is particularly effective in semantically related, weak-signal settings.},
     Keywords = {Data lakes, Schema matching, Metadata enrichment, Embeddings, Large language models }
}

[ Export | Back ]