Tutorials - Exploratory Data Analysis

Subject Components Tutorial Dataset
Correspondence Analysis
Perform a Correspondence Analysis with TANAGRA. This tutorial is suggested by Lebart, Morineau and Piron book ("Statistique Exploratoire Multidimensionnelle", Dunod, 2000, pages 104 to 107).
Dataset
Define Status
CA
media
Clustering -- HAC
HAC -- Hybrid clustering on IRIS dataset.
Dataset
Define Status
HAC
Group characterization
iris
Clustering -- K-means
Build clusters and validate them by comparison with preexistent classes.
Multiple correspondance analysis
K-Means
Group characterization
Cross-tabulation
vote
Variable Clustering (VARCLUS)
Utilization and reading of the results of the variable clustering components. Three approaches are available: VARKMEANS, VARHCA and VARCLUS. The presentation of results is copied on some standard software such as => http://www2.stat.unibo.it/ManualiSas/stat/chap68.pdf
Dataset
Define Status
VARKMEANS
VARHCA
VARCLUS
crime dataset
Feature Extraction -- SVD
NIPALS, a fast SVD or PCA algorithm, useful for high dimensional dataset. Application on a proteins classification process.
NIPALS
Spv Learning
K-NN
Bootstrap
dataset
Canonical Discriminant Analysis
Canonical Discriminant Analysis : explaining the quality of wine from weather descriptors.
Canonical Discriminant Analysis
Wine Quality
Visual group exploration
Description of subgroup of examples with comparative descriptive statistics.
Group characterization
Group exploration
autos
(Predictive) Clustering Trees
Build clusters with a top down induction of decision trees framework. The method selects the relevant descriptors in order to describe the groups.
CTP & CT
zoo
Combining HAC and PCA
Showing that combining data mining and visualization methods, we obtain a better knowledge extraction.
HAC
PCA
Correlation Scatterplot
cars
Clustering with Gaussian Mixture Models -- The EM Algorithm
We use a gaussian mixture model based clustering algorithm. We use the Expectation Maximization algorithm.
The right number of clusters can be specified with a resampling technique.
EM-CLUSTERING
EM-SELECTION
two gaussians
Factors rotation for PCA
VARIMAX and QUARTIMAX rotation for PCA factors.
PCA
Factor Rotation

will be translated soon
crime
Choosing the right number of factors for PLS Regression
Specifying the right number of factors for PLS Regression using a resampling technique.
PLS-FACTORIAL
PLS-SELECTION
protein


Ricco Rakotomalala.