The Third International Workshop on Mining Complex Data - MCD'07 -
In Conjunction with ECML/PKDD 2007
Warsaw, Poland, September 17, 2007
Data mining and knowledge discovery, as stated in their early definition, can today be considered as stable fields with numerous efficient methods and studies that have been proposed to extract knowledge from data. Nevertheless, the famous golden nugget is still challenging. Actually, the context evolved since the first definition of the KDD process has been given and knowledge has now to be extracted from data getting more and more complex.
In the framework of Data Mining, many software solutions were developed for the extraction of knowledge from tabular data (which are typically obtained from relational databases). Methodological extensions were proposed to deal with data initially obtained from other sources, like in the context of natural language (text mining) and image (image mining). KDD has thus evolved following a unimodal scheme instantiated according to the type of the underlying data (tabular data, text, images, etc), which, in the end, always leads to working on the classical double entry tabular format.
However, in a large number of application domains, this unimodal approach appears to be too restrictive. Consider for instance a corpus of medical files. Each file can contain tabular data such as results of biological analyzes, textual data coming from clinical reports, image data such as radiographies, echograms, or electrocardiograms. In a decision making framework, treating each type of information separately has serious drawbacks. It appears therefore more and more necessary to consider these different data simultaneously, thereby encompassing all their complexity.
Hence, a natural question arises: how could one combine information of different nature and associate them with a same semantic unit, which is for instance the patient? On a methodological level, one could also wonder how to compare such complex units via similarity measures. The classical approach consists in aggregating partial dissimilarities computed on components of the same type. However, this approach tends to make superposed layers of information. It considers that the whole entity is the sum of its components. By analogy with the analysis of complex systems, it appears that knowledge discovery in complex data can not simply consist of the concatenation of the partial information obtained from each part of the object. The aim would rather be to discover more « global » knowledge giving a meaning to the components and associating them with the semantic unit. This fundamental information cannot be extracted by the currently considered approaches and the available tools.
The new data mining strategies shall take into account the specificities of complex objects (units with which are associated the complex data). These specificities are summarized hereafter:
The difficulty of Knowledge Discovery in complex data lies in all these specificities.
The reasons why the workshop is of interest this time.
The aim of this workshop will be to address issues related
to the concepts of mining complex data. The whole knowledge discovery process
being involved, our goal will be to attract papers dealing with each step of
this field. Actually, managing complex data within the KDD process implies to
work on every step, starting from the pre-processing (e.g. structuring and organizing)
to the visualization and interpretation (e.g. sorting or filtering) of the results,
via the data mining methods themselves (e.g. classification, clustering, frequent