This section describes model testing. There are four different test tasks corresponding to the four mining tasks:
There is one additional test task CalculateTestResultsTask, which concerns classification model testing on the basis of scored data instead of classification model. This task will be described at the end of this chapter.
The idea of testing is the same regardless of kind of test task. However, even though the main steps are common, there are slight differences in parameters between test tasks. For this reason this chapter is divided into four sections describing each test task individually. In each part the user will find all the steps required to perform test task and the descriptions of test results.
As an example we will use the 'cholesterol' dataset, which can be found in '..\Client\scripts\scripts\data\approximation' in the AdvancedMiner directory. To perform the whole task follow these steps:
The particular errors have the following meaning:
No model specified - an input model (a model to be tested) must be specified.
No testData specified - a PhysicalData object containing the test data must be assigned to the test task.
No testDataTargetAttributeName specified - when PhysicalData is provided the target attribute must be set.
No testResults specified - the name of the model where the testing results will be placed must be provided.
All the parameters exept testDataTargetAttributeName can be set in a standard way by right-clicking on the test task node and selecting the 'Add' action. The last parameter testDataTargetAttributeName should be set in the Properties Window. The Properties Window there may also contain some optional parameters.
Table 15.3. Approximation Test Task Options
| Option | Description | Possible values | Default value |
|---|---|---|---|
| Cut Prc | number of quantiles to trim each side when calculating Error Histogram | positive integer numbers [0; 100] | 5 |
| Number of Intervals | number of intervals used to calculate Error Histogram | positive integer numbers | 20 |
| Number of Intervals XY Plot | number of intervals used to calculate y vs yhat histogram | positive integer numbers | 20 |
| Target | name of the target variable | name of the target variable | NULL |
| Liberal Execution | if TRUE 'liberal' execution is preferred (do not stop on minor errors), false otherwise | TRUE / FALSE | TRUE |
If all the required parameters are set the defined test task can be executed.
The whole process of test task creating and executing can be done in a Gython script:
att = ApproximationTestTask() # create the test task object
att.setModelName('approximation_model') # if the required model exists
att.setTestDataName('cholesterol_pd') # specify the test data set
att.setTestDataTargetAttributeName('chol') # specify the target attribute
#att.setNumberOfIntervals(10) # optionally set other options
att.setTestResultName('approximation_results') # set the name of test results
save('approximation_tt', att)
execute('approximation_tt')
After execution, the test results object will appear in the Metadata Repository. It can be used to obtain detailed information about the test results: test statistics in the Properties Window and results components - objects contained in the test task object.
Basic Definitions:
-th
observation: 
-th observation:

-th observation:

Table 15.4. Aproximation Test Statistics
| Name | Description |
|---|---|
| Mean Absolute Error |
the mean of the absolute values of prediction errors
for the test data:
![]() |
| Mean Actual Value |
the mean of the actual values of the target
attributes for the test data:
![]() |
| Mean Predicted Value |
the mean of the predicted values of the target for
the test data:
![]() |
| RMS Error |
the square root of the mean squared errors for the
test data:
![]() |
Several table histograms are available as approximation test results. The number of intervals and cut prc parameters are parameters of the test task.
(residual error values)
(the observed target value for the i-th observation)
(the predicted target value for the i-th observation)
As an example we will use the 'german credit' dataset, which can be found in '..\Client\scripts\scripts\data\classification' in the AdvancedMiner directory. To perform the whole task follow these steps:
The first four errors errors have the same meaning as in the case of approximation test task. There is one additional error:
No positiveTargetValue specified, while ComputeQuantiles option is set to TRUE - either this option should be set as inactive or the positive category of the target value, necessary for testing, must be specified.
All the parameters except testDataTargetAttributeName and positiveTargetValue can be set in the standard way by right-clicking on the test task node and choosing the 'Add' action. The last two parameters testDataTargetAttributeName and positiveTargetValue must be set in the Properties Window. The Properties Window may also contain some optional parameters.
Table 15.5. Classification Test Task Options
If all the required parameters are set the defined test task can be executed.
The whole process of test task creating and executing can be done in a Gython script:
ctt = ClassificationTestTask() # create the test task object
ctt.setModelName('classification_model') # if the required model exists
ctt.setTestDataName('german_credit_pd') # specify the test data set
ctt.setTestDataTargetAttributeName('Class') # specify the the target attribute
ctt.setPositiveTargetValue('bad') # specify the the positive target category
#ctt.setNumberOfQuantiles(20) # optionally set other options
ctt.setTestResultName('classification_results') # specify the name of the test results
save('classification_tt', ctt)
execute('classification_tt')
After execution, the test results object will appear in the Metadata Repository. It can be used to obtain detailed information about the test results: test statistics in the Properties Window and result components - the objects contained in the test task object.
Table 15.6. Classification Test Statistics
| Name | Description |
|---|---|
| Accuracy | the accuracy of the model |
| Improperly Assigned | the number of cases for which the prediction is not equal to the actual target value |
| Properly Assigned Cases | the number of cases for which the prediction is equal to the actual target value |
| Total Cases | the total number of cases |
The others statistics available in AdvancedMiner are:
The Kolmogorov-Smirnov analysis consists of two charts: true positive rate and false positive rate (both statistics are described below) for each value of the discrimination threshold. The bigger the distance between those two lines the better model is.
There are some additional statistics connected with the K-S Chart:
The receiver operating characteristic (ROC) is an alternative measure of the quality of the classifier. Having one category selected as positive, ROC is a plot of the number of true positives (i.e. observations for which both data-observed and model-predicted target values are equal to the positive category) vs. the number of false positives (i.e. observations having a positive predicted category and a different observed category) for each value of the discrimination threshold (the probability of predicting the positive category) in the range from zero to one.
The best possible prediction model results in a graph that is a point in the upper left corner of the ROC plot, i.e. 100% sensitivity (all true positives are found) and 100% specificity (no false positives are found). A completely random predictor would yield a straight line at an angle of 45 degrees from the horizontal axis, from bottom left to top right (because as the threshold is raised, equal numbers of true and false positives would be let in). The results below this line would suggest a detector that gives wrong results consistently.
There are some additional statistics connected with ROC Chart:
ROC Area - the area under the ROC Curve (AUC). The area equal to 1 represents a perfect classifier model, the area equal to 0.5 represents a random prediction. Usually the area under the ROC curve above 0.9 indicates an excellent model, the values in the range 0.8-0.9 indicate a good model, the values in the range 0.7-0.8 indicate a fair model, and values below 0.7 indicate a poor model.
Gini Coefficient - a statistic very closely related to the ROC Area. It is equal to 2*AUC - 1.
Confusion matrix contains the information about the actual and predicted classification obtained by Classification Test Task. The value in every cell represents the number of examples for which the actual value of the target is given by the row and the value predicted by the model is given by the column. Such representation implies that numbers on the diagonal are for correct classification while other numbers are for the mistakes.
In the example below, the value 6.0 has the following interpretation: there are 6 cases which have been classified by the model as "van" but in fact they should be classified as "bus".
In the case of binary class classification problem the TP, FN, FP, TN symbols are defined. Thanks to such notation it is easy to define additional statistics available in the targetAnalysis node.



In the case of multi-class problems for the calculation of the additional statistics one class is defined as positive, while the remaining classes are treated together as one negative class.
Lift is a way to measure the quality of the classification model. It is calculated and plotted for a given value of the target attribute, which is called the positive value. Firstly, the observations are sorted in a descending order, according to the score that they where given by the model. The horizontal axis represents the sorted observations using percentage points, e.g. 10% means 10% of observations with the highest score.
The Lift value for a given percentage is the ratio of the density of positive target values within that percentage and the density of positive target values in the whole population. This shows how the classification model differs from the simplest possible model (random selection from population, for which the lift value is always 1) or other models.
Usually the lift value is calculated for discrete chunks called quantiles, e.g. if the lift will be computed for 10 quantiles, than the first quantile is the best 10% of observations, the second are observations from 10%-20%, and so on. The last are observations from 90%-100%, i.e. the worst 10%.
In addition to KSAnalysis, ROCAnalysis and liftAnalysis the user can find general statistics such as:
The ROC and Lift are computed only if the Compute Quantiles option is enabled.
In this example we will use the 'HIV' data which can be found in '..\Client\scripts\scripts\data\survival' in the AdvancedMiner directory. To perform the whole task follow these steps:
The errors have the same meaning as in approximation test task. There is one additional error:
No censoredCategory specified - the name of censored category must be specified.
All the parameters exept No censoredCategory specified can be set in the standard way by right-clicking on the test task node and choosing the 'Add' action. The last parameter No censoredCategory specified should be set in the Properties Window. The Properties Window may also containsome optional parameters.
Table 15.7. Survival Test Task Options
If all the required parameters are set the defined test task can be executed.
The whole process of test task creating and executing can be done in Gython script:
stt = SurvivalTestTask() # create the test task object
stt.setModelName('survival_model') # if the required model exists
stt.setTestDataName('HIV_pd') # specify the test data set
stt.setTestDataTargetAttributeName('days') # specify the target attribute
stt.setCensorName('censor') # specify the censor attribute
stt.setCensoredCategory('Y') # specify the censored category
#stt.setFirstTimePoint(25) # optionally set other options
stt.setTestResultName('survival_results') # specify the name of the test results
save('survival_tt', stt)
execute('survival_tt')
After execution the test results object will appear in the Metadata Repository. It can be used to obtain detailed information about the test results: result components - the objects contained in the test task object.
Lift charts at each time point are available. The Lift component and its statistics and options are described in detail in Classification Test Task.
In this example we will use the 'stock_market' dataset, which can be found in the '..\Client\scripts\scripts\data\time' in the AdvancedMiner directory. To perform the whole task follow these steps:
The erros list is exactly the same as for the approximation test task. The Properties Window may also contain some optional parameters.
Table 15.8. Time Series Test Task Options
| Option | Description | Possible values | Default value |
|---|---|---|---|
| Liberal Execution | if TRUE 'liberal' execution is preferred (do not stop on minor errors), false otherwise | TRUE / FALSE | TRUE |
| Number of Time Points | the number of time points for which the test statistics will be calculated | positive integer numbers | 10 |
| Target | - | the name of the target attribute | NULL |
| timePointAttribute name | - | the name of the time point attribute | NULL |
If all the required parameters are set the defined test task can be executed.
The whole process of test task creating and executing can be done in Gython script:
tstt = TimeSeriesTestTask() # create the test task object
tstt.setModelName('timeseries_model') # if the required model exists
tstt.setTestDataName('stock_market_pd') # specify the test data set
tstt.setTestDataTargetAttributeName('price_change') # specify the target attribute
tstt.setTimePointAttributeName('time') # specify the time attribute
#tstt.setNumberOfTimePoints(10) # optionally set other options
tstt.setTestResultName('timeseries_results') # specify the name of the test results
save('timeseries_tt', tstt)
execute('timeseries_tt')
After execution, the test results object will appear in the Metadata Repository. It can be used to obtain detailed information about the test results: the test statistics in the Properties Window.
Mean Absolute Error (MAE) - the mean of the absolute values of prediction errors for the test data

Mean Actual Value (MAV) - the observed mean of the whole series:

Actual Value Variance (AVV) - the observed variance of the whole series:

Mean Predicted Variance (MPV) - the mean value of the predicted variances:

RMS Error - the square root of the mean squared errors for the test data

R-Squared Error -

where
is the current mean value of the series,
denotes the model variance for the i-th observation,
denotes the series value for i-th observation and N is the number of observations
in the series.
This task performs a special kind of classification test, in which a scoring table is used instead of a classification model.
To perform this task the following objects are required: ascored table in the database, a physical data object for the scored data and a test task in the repository.
The scored table must containt the following columns:
real target
classes returned by the model
probability (or score) returned by the model
Let us use the sample data presented below. After executing this of script the sample data set and the scored data will apear in the database.
Example 15.1. Preparing a classification test result task
table "input_table":
temp height Class
-50 1 "good"
-40 1 "good"
-30 1 "good"
-20 10 "good"
-10 10 "good"
10 10 "good"
20 20 "bad"
30 10 "good"
40 10 "good"
50 10 "bad"
60 5 "good"
70 20 "bad"
80 30 "bad"
90 1 "good"
100 10 "bad"
# a simple classification model
trans "scored_Table""<-"input_Table":
score = temp/100.0
newClass = "good"
if score>0.5:
newClass ="bad"
The remaining steps are well known:
The particular errors have the following meaning:
no physicalDataName specified - a PhysicalData object containing the scored data must be assigned to the test task.
no target specified - the name of the real target variable
no predictedTarget specified - the name of a variable in the scored data containing the classes assigned by a model.
no positiveTargetValue specified - the positive category of the target variable must be set.
no testResultName specified - the name of model where the testing results will be placed must be provided.
no score specified - the name of the variable containing the score (probability) from the model.
Only the parameters testResultName and physicalDataName can be set in a standard way by right-clicking on the test task node and choosing the 'Add' action. The other parameters should be set in the Properties Window. Because Calculate Test Result Task is almost the same as Classification Test Task the rest of parameters in Properties Window are the same (see Classification Test Task Options).
If all the required parameters are set the defined test task can be executed.
The whole process of test task creating and executing can be done in Gython script:
ctrt = CalculateTestResultTask() # create the test task object
ctrt.setPhysicalDataName('scoredTable_pd') # specify the physical data object for the scored data
ctrt.setActualName("Class") # specify the real target attribute
ctrt.setPositiveTargetValue('bad') # specify the positive target category
ctrt.setPredictedName('newClass') # specify the name of the variable containg classes assigned by the model
ctrt.setScore('score') # specify the name of the variable with score
#ctrt.setNumberOfQuantiles(25) # optionally set other options
ctrt.setTestResultName('testresult_results') # specify the name of the test result
save('testresult_tt',ctrt)
execute('testresult_tt')
The results of this task are exactly the same as for Classification Test Task .