Supervised learning

Concepts

These components implements a supervised algorithm.

Be careful !!! These component must be integrated into a "meta-supervised" component.

Attributes status

Target attribute must be discrete, input can be continuous or discrete.

Supervised learning components

Component Description Parameters Note

Binary logistic regression
Maximum likelihood method, Levenberg-Marquardt optimization algorithm.

From J. DEBORD library (http://ourworld.compuserve.com/homepages/JDebord/regnlin.htm).
Other references :
- L. Lebart, A. Morineau, M. Piron, "Statistique exploratoire multidimensionnelle", Ed. Dunod, pp. 290-294, 2000.
- R. Giraud, "L'économétrie", Collection QSJ - Presses Universitaires de France, pp. 67-75, 2000.

- Target attribute must have two values.
- Input attributes must be continuous.
- Constant is imposed.

k-Nearest Neighbor (k-NN)
K-Nearest neighbour -- Heterogenous Value Difference Metric (HVDM), discrete descriptors can be used.

- k-NN : D. Aha, D. Kibler, M. Albert, "Instance-based learning algorithms", Machine Learning, vol.6, pp. 37-66, 1991.
- HVDM D. Randall, T. Martinez, "Improved heterogenous distance functions", JAIR, vol.6, pp. 1-34, 1997.

Number of neighbor - Input can be discrete and/or continuous.
- No attribute standardization is necessary.

Multi-layer perceptron
Multi-layer perceptron, back-propagation algorithm.

- T. Mitchell, "Machine learning", Mc Graw-Hill International Editions, pp.86-126, 1997.
- K. Mehrotra, C. Mohan, S. Ranka, "Elements of artificial neural network", MIT Press, pp.66-87, 1997.
- T. Hastie, R. Tibshirani, J. Friedman, "The elements of statistical learning. Data Mining, inference and predictions.", Springer, pp.350-369, 2001.

Neural network architecture
- Use hidden layer
- Neuron in the hidden layer
Learning parameters
- Learning rate
- Size of pruning/validation set
- Attribute standardization
Stoping rule
- Max number of iterations
- Thresold error rate
- Error stagnation
- Gap of error stagnation evaluation
- Input must be continuous.

Prototype-NN
Kernels (prototypes) are built off-line, with a clustering algorith for instance. At each kernel, we affect a class membership. So, for a new instance, we give it the class of the nearest prototype.

It is an interpretation and a generalization of the approach suggested in Hastie et al. (pp. 411-433).

- Attribute to define kernels
- Standardization of attributes
- Input must be continuous.

ID3
Quinlan's ID3 algorithm, with some minor modifications:

- ID3 : J.R. Quinlan, "Discovering rules by induction from large collections of examples", D. Michie ed., Expert Systems in the Microelectronic age, pp. 168-201, 1979.
- ID3-IV : J.R. Quinlan, "Induction of Decision Trees", Machine Learning, vol. 1, pp. 81-106, 1986.
- Survey : D. Zighed, R. Rakotomalala, "Graphes d'induction - Apprentissage et Data Mining", Ed. Hermes, 2000.

- Min size of node to split
- Min size of leaf to produce
- Max depth of the tree
- Max entropy gain for a spliting

Linear Discriminant Analysis
Linear discriminant analysis.

- R.A. Fisher, "The use of multiple measurements in taxonomic problems", Annals of Eugenics, vol. 7, pp. 179-188, 1936.
- K. Fukunaga, "Statistical Pattern Recognition", Academic Press, 1972.
- T. Hastie, R. Tibshirani, J. Friedman, "The elements of statistical learning. Data Mining, inference and predictions.", Springer, pp.79-111, 2001.

Input must be continuous. Be careful about colinearity.

Naive Bayes
Naive Bayes algorithm.

- P. Domingos, M. Pazzani, "On the optimality of the simple bayesian classifier under zero-one loss", Machine Learning, vol. 29, pp.103-130, 1997.

Input must be discrete.

Radial basis function
Radial basis function, off-line processing.

- K. Mehrotra, C. Mohan, S. Ranka, "Elements of artificial neural network", MIT Press, pp.141-156, 1997.
- F. Blayo, M. Verleysen, "Les réseaux de neurones artificels", QSJ, Presses Universitaires de France, pp.67-73, 1996.

- Attribute which defines kernels.
- Other parameters : see MLP.
Input attributes must be continuous.

I have some doubts about the actual implementation !


Last modification : January 21st, 2004.