![]()
A. Diouan, S. Loudcher, J. Darmont, E. Ferey, « Discovering Relationships in Data Lakes Using Large Language Models: An Industrial Case ». 28th International Conference on Big Data Analytics and Knowledge Discovery (DaWaK 2026), Graz, Austria. LNCS.
Abstract: Data lakes rely on metadata to remain usable, yet this meta data is often limited or weakly informative for column relationship discovery, especially in ERP-derived datasets with coded or abbreviated schema labels. We propose ColRel, a two-stage method that builds column embeddings from metadata and data available at ingestion time. In difficult cases, such as coded schemata, business dictionaries help better interpret column names and support the generation of short natural-language descriptions used in the second stage. Experiments on public benchmarks and an industrial ERP dataset show that ColRel is particularly effective in semantically related, weak-signal settings.
