Spv learning assessment

Contents

These components allow to evaluate the performances (error rate) of the supervised learning algorithms.

This kind of component does not have their own existence, it is necessarily inserted after a meta-supervised learning component

Some books/papers allow to understand and compare these methods:
D. Zighed, R. Rakotomalala, "Graphes d'induction : Apprentissage et Data Mining", Hermès, pp.237-262, 2000.
R. Kohavi, "Wrappers for performance enhancement and oblivious decision graphs", PhD Thesis, Stanford University, 1995.
T. Dietterich, "Statistical tests for comparing supervised learning algorithms", Technical Report, Oregon State University, 1996.

The majority of these components divide the dataset into training set and test set. In TANAGRA, this subdivision is realized on the first component, the data source component, so, it is really the path (sequence of operation) which is evaluated, and not only the last supervised learning component.

Attributes status

None.

Spv learning assessment components

Component Description Parameters Note

Train-Test
Didide the dataset into learning set and test set: the model is built of the first sample, and the error rate is computed on the second one. - Proportion of the dataset used for the training phase.
- Number of trials.

Cross-validation
Cross-validation - Number of folds.
- Number of trials.
Several studies show that 5x2-fold cv is often effective.

Last modification : January 21st, 2004.