Discretization of continuous attributes : a survey of methods S. RABASEDA-LOUDCHER, M. SEBBAN, R. RAKOTOMALALA E.R.I.C-Lyon Université Lumière Lyon 2 5, avenue P. Mendès-France 69676 Bron Cedex Tél : 33-4-78-77-23-20 Fax : 33-4-78-77-23-75 |
Abstract :
Generally, induction processes resolve a classification problem by building a decision tree. In the beginning, these processes used only symbolic or ordered discrete attributes, even though in many problems the explicative attributes could be either ordered or continuous. So, learning systems should ideally be able to handle continuous attributes. There are many potential ways of doing this. One of those is to automatically (and economically) convert continuous attributes into ordered discrete attributes. This operation is called discretization. Discretization is performed by dividing the values of a numeric attribute into a small number of intervals, with each interval being mapped onto a discrete symbol. In the machine learning literature, many discretization algorithms are available. This paper covers all the main discretization methods, showing their advantages and drawbacks. It also suggests a way to extend this set of methods.
Key words : decision trees, discretization, continuous attributes, MoodĘs runs test.