Carlos Ordonez Professeur à l'université de Houston.
Title: Integrating Database Systems and Data Mining Algorithms
Data mining remains an important research area in database systems and a major challenge in computer science. We present a review of processing alternatives, storage mechanisms, algorithms, data structures and optimizations that enable data mining on large data sets. We focus on the computation of well-known multidimensional statistical and machine learning models. We pay particular attention to SQL (together with UDFs) and MapReduce as two competing technologies for large scale "in-database" analytics, especially with parallel computing. We conclude with a summary of solved major problems and open research issues.
Carlos Ordonez received a degree in applied mathematics and an M.S. degree in computer science, from UNAM University, Mexico, in 1992 and 1996, respectively. He got a Ph.D. degree in Computer Science from the Georgia Institute of Technology, in 2000. Dr Ordonez worked six years extending the Teradata DBMS with data mining algorithms. Carlos had the opportunity to collaborate in more than 20 data mining projects from many companies with large data warehouses. He is currently is an Assistant Professor at the University of Houston. His research is centered on the integration of statistical and data mining techniques into database systems and their application to scientific problems.
Chedy Raïssi chercheur à INRIA Lorraine
Title: Multidimensional skylines
Recently, skyline analysis has attracted a lot of interest due to its importance in multi-criteria decision making applications. In a multidimensional space where a preference is defined for each dimension, a point A dominates another point B if A is better (i.e., more preferred) than B on at least one dimension, and A is not worse than B on every dimension. For example, a skyline analysis may answer a customer's query who whishes to buy a flight ticket from Singapore to Paris with a preference for low prices, short travel time and few transits. Given a set of points, the skyline set contains the points that are not dominated by any other points.
Traditional skyline computation has always been restricted to the data embedding space of a fixed dimensionality. Recently, the subspace skyline problem has attracted a fast growing amount of interest. Given a set of points in an n-dimensional space, users may be interested in different skyline queries in different subspaces of different dimensions.
In this talk we will present the problem of efficient skycube computation. We introduce a novel approach significantly reducing domination tests for a given subspace and the number of subspaces searched. Technically, we identify two types of skyline points that can be directly derived without using any
domination tests. Moreover, based on formal concept analysis, we introduce two closure operators that enable a concise representation of skyline cubes. We show that this concise representation is easy to compute and develop an efficient algorithm which only needs to search a small portion of the huge search space.
Franck Duluc, Airbus Toulouse.
Franck Duluc est responsable des activités de Recherche et Technologie pour le département "Engineering & Maintenance" (Ingénierie et Maintenance) du "Customer Services" d'Airbus (Service à la clientèle). Il est titulaire d'un diplôme d'Ingénieur de L'INSA de Toulouse en Génie Informatique et Industriel (1995), d'un DEA (INSA Toulouse - Université Paul Sabatier de Toulouse) en Automatique et Informatique Industrielle (1995) et d'un Doctorat en Informatique de l'Université Paul Sabatier de Toulouse (2000 - base de données). Il travaille chez Airbus (Aerospatiale) depuis 1996 et a passé ces quinze années dans différentes activités liées au domaine "Engineering and Maintenance".