Publications of Jérôme Darmont
Reference (inproceedings)
A. Diouan, E. Ferey, J. Darmont, S. Loudcher, "About Relationships in Data Lakes", 28th International Engineered Applications Symposium (IDEAS 2024), Bayonne, France, September 2024; Lecture Notes in Computer Science, Springer, Heidelberg, Germany.
Abstract
In the era of Big Data, managing voluminous and heterogeneous data presents significant challenges for organizations. To tackle these challenges, the concept of a data lake has emerged as a promising solution, allowing the storage of raw data from diverse sources in their original format. An efficient metadata management system plays a crucial role in preventing data lake to turn into an unusable data swamp by providing a structured framework for organizing, categorizing and establishing relationships between data entities. In this paper, we identify the various relationships from diverse domains found in the literature. Then, we categorize the types of relationships and propose a relationship typology that classes relationships by similarity, containment, grouping and provenance. Eventually, we also aim to check whether goldMEDAL, a state-of-the-art generic metadata management model, adequately supports all such relationships.
Keywords
Data lakes, Data discovery, Semantic relationships, Big data