Wensday, September 13, 2000

TUTORIALS

Chair: Will Kloesgen, GMD


Tutorial 1
Chairs:  H. Kargupta, 
Washington State University (USA)

email: hillol@eecs.wsu.edu

Title: An Introduction to Distributed Data Mining
Duration: 3 h
Description :Distributed Data Mining (DDM) is the semi-automatic discovery of patterns, associations, changes, anomalies, and statistically significantly patterns in data which is geographically distributed. The last few years have witnessed the development of different distributed data mining algorithms, frameworks, and systems. The goal of this tutorial is to provide researchers, practitioners, and advanced students with an introduction to some of the key ideas in this field. The basic trade-off is accuracy vs. performance. Moving distributed data to a central location will in general yield more accurate models; on the other hand, the costs can be prohibitive. In many practical applications, results, which are less accurate, but still very useful can be found by leaving some or all of the data in place. In this tutorial we describe approaches for distributed data mining and introduce several distributed data mining algorithms and systems. We will use a variety of examples from health care, scientific data mining and business applications to illustrate our points.
Website: http://www.eecs.wsu.edu/~hillol/DKD/pkdd2000.html
 

Tutorial 2
Chairs: A. Hinneburg and D.A. Keim, University of Halle (Germany)

email: {hinneburg,keim}@informatik.uni-halle.de

Title: Clustering Techniques for large Data Sets: from the past to the future
Duration: 3 h
Description : Because of the fast technological progress, the amount of information which is stored in databases is rapidly increasing. In addition, new applications require the storage and retrieval of complex multimedia objects which are often represented by high-dimensional feature vectors. Finding the valuable information hidden in those databases is a difficult task. Cluster analysis is beside mining association rules and classification one of the basic techniques which is often applied in analyzing large data sets. The main goal of the tutorial is to provide an overview of the state-of-the-art in cluster discovery methods for large databases, covering well-known clustering methods from related fields such as statistics, pattern recognition, and machine learning, as well as database techniques which allow them to work efficiently on large databases. The target audience of the tutorial are researchers and practitioners from statistics, databases, and machine learning, who are interested in the state-of-the art of cluster discovery methods and their applications to large databases. The tutorial especially addresses people from academia who are interested in developing new cluster discovery algorithms, and people from industry who want to apply cluster discovery methods in analyzing large databases.
Website: http://hawaii.informatik.uni-halle.de/~hinnebur/PKDD2000/

Tutorial 3
Chairs: M. Spiliopoulou, 
Humboldt Universitat zu Berlin (Germany)

email: myra@wiwi.hu-berlin.de

Title: Data Analysis for Web Marketing and Merchandizing Applications 
Duration: 2h
Description :

  • INTENDED AUDIENCE: novice learners of statistical techniques,database researchers
  • GOAL: Juxtapose different methodologies applied by different research communities (mining researchers, economists, psychologists and HCI experts) to solve the same problem, bringing them to a common framework.
  • CONTENT: Discussion of analysis methodologies applied on web-oriented marketing and merchandizing. There are three main approaches to this problem:
    - Exploratory statistics (including mining techniques) applied on real usage data
    - In vivo experiments
    - Questionnaires
    This tutorial will place these approaches in a common framework, based on the notion of ``success'' and ``success controlling'' according to qualitative and quantitative criteria. Then, techniques applied by research groups for each of the above approaches will be discussed. This includes: problem specification, discussion of the methodology applied, measures for the validation of the results, pros and cons of the technique,
  • Website: http://www.wiwi.hu-berlin.de/~myra/PKDD2000_TUTORIAL/
    Tutorial 4
    Chairs: W. Lehner, 
    University of Erlangen-Nuremberg (Germany)

    email: lehner@informatik.uni-erlangen.de

    Title: Database Support for business Intelligence Applications
    Duration: 1h30'
    Description :The tutorial will enable the participants to learn about existing dataabse technologies, which are already provided by state-of-the-art relational database systems in the context of Business Intelligence Applications. The tutorial will provide the knowledge to take advantage of these technologies and will motivate to think about requirements, extensions, and improvements. The overall goal of the tutorial is to close the gap between database researchers and experts in application areas which are more and more required to take advantage of an underlying database system.

    Website:http://www6.informatik.uni-erlangen.de/~wolfgang/pkdd2000.html

    Tutorial 5
    Chairs: Yves Kodratoff, Djamel Zighed and Serge Di Palma   (France)

    email:yk@lri.fr, zighed@univ-lyon2.fr, sdipalma@univ-lyon2.fr

    Title: Text Mining
    Duration: 4h
    Description : The topic of Text Mining (TM) deserves a tutorial since it is spreading so fast nowadays, is of so much industrial importance. We want to present different aspects of Text Mining, this is why we decided to share that tutorial.

    Detailed program

    • 8h30-10h30: Yves Kodratoff : "Computational-linguistics-based" TM
      A definition of TM based on Data Mining
      Taxonomies of concepts ("ontologies") and finding relations among concepts
      Shallow parsing and semi-automatic building of concept taxonomies
    • 10h30-10h45: Break
    • 10h45-12h45: Serge Di Palma
      Application of Sipina to TM