Chapter 30. Classification Trees

Chapter 30. Classification Trees
Prev	Part IV. Modules	Next

Table of Contents

Introduction

Method description

The structure of Classification Trees
Tree building algorithm
Tree pruning
Null values

Usage

Data requirements
Model building and testing
Model application
Model statistics

Example

References

Introduction

Tree structures are a natural and convenient way of codifying knowledge. Classification tree is a special type of such trees. It is a classifier, which means that for a given case the tree can assign a decision. There are many computer algorithms which are capable of creating a classification tree basing on a set of training data. The best known is the ID3 algorithm researched by J.R.Quinlan.

Classification trees have two major functions: predictive, and descriptive. Predictive means that trees can be used as classifiers to predict unseen data. Descriptive means that trees can be used to mine knowledge from the data. By inspecting the tree structure, the analyst can obtain a clearer view of the data.

There are more applications of trees. The first is to use them as a clusterer. A tree is first built for a set of training data. Next, this tree is utilized to group the data into clusters by using the tree leaves as clusters. Therefore, trees can be used to construct hybrid models, like e.g. classification tree combined with logistic regression.

Another common usage for trees is the analysis of attribute importance. Classification trees can be used to help the analyst select the most important attributes in methods where it is applicable, like regression. For example, the attributes which were used in the tree model can be considered important. The level at which the attribute occurs in the built tree can be another heuristic measuring its importance. If the attribute is higher in the tree structure it is considered more important.

Compared to other classification methods, the tree algorithm is relatively fast. Both model building and model application are very efficient.

Another advantage of the method is the fact that it has no specific requirements for the input data. It naturally supports both nominal and numerical attributes. Neither prior binarization nor standardization is required. Furthermore, the target variable does not have to be binary and missing values are fully supported. The algorithm naturally copes with huge number of input attributes, and no prior attribute selection is required.

The classification tree algorithms are relatively simple in concept and they do not require wide background in the field to understand them.

The information presented in this chapter is sufficient for proper and successful use of the Classification Tree module in AdvancedMiner.

The Classification Tree module provides the capability of building and applying Classification Trees. The building process consist of loading data, tree building, and tree pruning. Trees are pruned after building. The goal of pruning is to reduce the rate of overfitting. The built model of a decision tree can be interactively viewed and applied to new data. The created tree model can also be manually edited by prunning selected nodes, or even changing split thresholds and values.

Prev	Up	Next
References	Home	Method description