Model testing

Model testing
Prev	Chapter 15. AdvancedMiner in Practice	Next

Approximation Test Task

Creating the Test Task

As an example we will use the 'cholesterol' dataset, which can be found in '..\Client\scripts\scripts\data\approximation' in the AdvancedMiner directory. To perform the whole task follow these steps:

create physical data object for the test data
create the task: ApproximationTestTask
choose the 'Test' action from the context menu to learn what to do next. A test report will appear with a list of missing settings:

Figure 15.18. Approximation Test Task Report

The particular errors have the following meaning:

No model specified - an input model (a model to be tested) must be specified.
No testData specified - a PhysicalData object containing the test data must be assigned to the test task.
No testDataTargetAttributeName specified - when PhysicalData is provided the target attribute must be set.
No testResults specified - the name of the model where the testing results will be placed must be provided.

All the parameters exept testDataTargetAttributeName can be set in a standard way by right-clicking on the test task node and selecting the 'Add' action. The last parameter testDataTargetAttributeName should be set in the Properties Window. The Properties Window there may also contain some optional parameters.

Table 15.3. Approximation Test Task Options

Option	Description	Possible values	Default value
`Cut Prc`	number of quantiles to trim each side when calculating Error Histogram	positive integer numbers [0; 100]	5
`Number of Intervals`	number of intervals used to calculate Error Histogram	positive integer numbers	20
`Number of Intervals XY Plot`	number of intervals used to calculate y vs yhat histogram	positive integer numbers	20
`Target`	name of the target variable	name of the target variable	NULL
`Liberal Execution`	if TRUE 'liberal' execution is preferred (do not stop on minor errors), false otherwise	TRUE / FALSE	TRUE

If all the required parameters are set the defined test task can be executed.

Figure 15.19. Approximation Repository

The whole process of test task creating and executing can be done in a Gython script:

att = ApproximationTestTask()                                     # create the test task object
att.setModelName('approximation_model')                           # if the required model exists
att.setTestDataName('cholesterol_pd')                             # specify the test data set
att.setTestDataTargetAttributeName('chol')                        # specify the target attribute
#att.setNumberOfIntervals(10)                                     # optionally set other options
att.setTestResultName('approximation_results')                    # set the name of test results

save('approximation_tt', att)
execute('approximation_tt')

Approximation Test Task Results

After execution, the test results object will appear in the Metadata Repository. It can be used to obtain detailed information about the test results: test statistics in the Properties Window and results components - objects contained in the test task object.

Test Statistics

Basic Definitions:

observed target value for the -th observation:
predicted target value for the -th observation:
residual error for the -th observation:

Table 15.4. Aproximation Test Statistics

Name	Description
Mean Absolute Error	the mean of the absolute values of prediction errors for the test data:
Mean Actual Value	the mean of the actual values of the target attributes for the test data:
Mean Predicted Value	the mean of the predicted values of the target for the test data:
RMS Error	the square root of the mean squared errors for the test data:

Result Components

Several table histograms are available as approximation test results. The number of intervals and cut prc parameters are parameters of the test task.

errorHistogram - the table with the distribution of (residual error values)
errorHistogramCut - the error histogram with 'CutPrc' quantiles cut on both sides
yHistogram - the table containing the distribution of (the observed target value for the i-th observation)
yHistogramCut - the yHistogram with 'CutPrc' quantiles cut on both sides
yhatHistogram - table containing the distribution of (the predicted target value for the i-th observation)
yhatHistogramCut - a yhatHistogram with 'CutPrc' quantiles cut on both sides.

Classification Test Task

Creating the Test Task

As an example we will use the 'german credit' dataset, which can be found in '..\Client\scripts\scripts\data\classification' in the AdvancedMiner directory. To perform the whole task follow these steps:

create physical data object for the test data
create the task: ClassificationTestTask
choose the 'Test' action from the context menu to learn what to do next. A test report will appear with a list of missing settings:

Figure 15.20. Classification Test Task Report

The first four errors errors have the same meaning as in the case of approximation test task. There is one additional error:

No positiveTargetValue specified, while ComputeQuantiles option is set to TRUE - either this option should be set as inactive or the positive category of the target value, necessary for testing, must be specified.

All the parameters except testDataTargetAttributeName and positiveTargetValue can be set in the standard way by right-clicking on the test task node and choosing the 'Add' action. The last two parameters testDataTargetAttributeName and positiveTargetValue must be set in the Properties Window. The Properties Window may also contain some optional parameters.

Table 15.5. Classification Test Task Options

Option	Description	Possible values	Default value
`Compute Quantiles`	FALSE means than ROC and Lift will not be computed.	TRUE, FALSE	TRUE
`Liberal Execution`	if TRUE 'liberal' execution is preferred (do not stop on minor errors), false otherwise	TRUE, FALSE	TRUE
`Number of Quantiles`	the number of quantiles used for Lift and ROC computation	positive integer numbers	50
`Positive Binary Target Threshold`	the threshold for the positive binary category; when the probability from model is >= then this threshold the observation will be classified as belonging to the positive class	real numbers from the interval (0,1)	NaN
`Positive Target Value`	the positive (event) category value for the target attribute	the label of the selected class	NULL
`Reweighting mode`	the way in which the observations in test dataset are reweighted: reweightCounters - only counters (e.g. in confusion matrix, lifts etc.) will take weights into account, reweightEverything - everything will be reweighted (e.g. counters and quantiles in lift), noReweighting - observations won't be reweighted and weights will be ignored	noReweighting/reweightCounters/reweightEverything	noReweighting
`Target`	-	the name of the target attribute	NULL
`Weight`	-	the name of the weight attribute; will be taken into account only if `Reweighting mode` is not set to noReweighting	NULL

If all the required parameters are set the defined test task can be executed.

Figure 15.21. Classification Repository

The whole process of test task creating and executing can be done in a Gython script:

ctt = ClassificationTestTask()                                  # create the test task object
ctt.setModelName('classification_model')                        # if the required model exists
ctt.setTestDataName('german_credit_pd')                         # specify the test data set
ctt.setTestDataTargetAttributeName('Class')                     # specify the the target attribute
ctt.setPositiveTargetValue('bad')                               # specify the the positive target category
#ctt.setNumberOfQuantiles(20)                                   # optionally set other options
ctt.setTestResultName('classification_results')                 # specify the name of the test results

save('classification_tt', ctt)
execute('classification_tt')

Classification Test Task Results

After execution, the test results object will appear in the Metadata Repository. It can be used to obtain detailed information about the test results: test statistics in the Properties Window and result components - the objects contained in the test task object.

Test Statistics

Table 15.6. Classification Test Statistics

Name	Description
Accuracy	the accuracy of the model
Improperly Assigned	the number of cases for which the prediction is not equal to the actual target value
Properly Assigned Cases	the number of cases for which the prediction is equal to the actual target value
Total Cases	the total number of cases

Result Components

The others statistics available in AdvancedMiner are:

K-S Analysis

The Kolmogorov-Smirnov analysis consists of two charts: true positive rate and false positive rate (both statistics are described below) for each value of the discrimination threshold. The bigger the distance between those two lines the better model is.

There are some additional statistics connected with the K-S Chart:

K-S Statistics - the maximal distance between the True Positive Rate and False Positive Rate lines.
K-S Statistics Score Threshold - the score threshold yielding the maximal distance.

ROC Analysis

The receiver operating characteristic (ROC) is an alternative measure of the quality of the classifier. Having one category selected as positive, ROC is a plot of the number of true positives (i.e. observations for which both data-observed and model-predicted target values are equal to the positive category) vs. the number of false positives (i.e. observations having a positive predicted category and a different observed category) for each value of the discrimination threshold (the probability of predicting the positive category) in the range from zero to one.

The best possible prediction model results in a graph that is a point in the upper left corner of the ROC plot, i.e. 100% sensitivity (all true positives are found) and 100% specificity (no false positives are found). A completely random predictor would yield a straight line at an angle of 45 degrees from the horizontal axis, from bottom left to top right (because as the threshold is raised, equal numbers of true and false positives would be let in). The results below this line would suggest a detector that gives wrong results consistently.

There are some additional statistics connected with ROC Chart:

ROC Area - the area under the ROC Curve (AUC). The area equal to 1 represents a perfect classifier model, the area equal to 0.5 represents a random prediction. Usually the area under the ROC curve above 0.9 indicates an excellent model, the values in the range 0.8-0.9 indicate a good model, the values in the range 0.7-0.8 indicate a fair model, and values below 0.7 indicate a poor model.
Gini Coefficient - a statistic very closely related to the ROC Area. It is equal to 2*AUC - 1.

Confusion Matrix and targetAnalysis

Confusion matrix contains the information about the actual and predicted classification obtained by Classification Test Task. The value in every cell represents the number of examples for which the actual value of the target is given by the row and the value predicted by the model is given by the column. Such representation implies that numbers on the diagonal are for correct classification while other numbers are for the mistakes.

In the example below, the value 6.0 has the following interpretation: there are 6 cases which have been classified by the model as "van" but in fact they should be classified as "bus".

Figure 15.22. Confusion matrix

In the case of binary class classification problem the TP, FN, FP, TN symbols are defined. Thanks to such notation it is easy to define additional statistics available in the targetAnalysis node.

Figure 15.23. Confusion matrix - basic notation

Note

In the case of multi-class problems for the calculation of the additional statistics one class is defined as positive, while the remaining classes are treated together as one negative class.

Lift Analysis

Lift is a way to measure the quality of the classification model. It is calculated and plotted for a given value of the target attribute, which is called the positive value. Firstly, the observations are sorted in a descending order, according to the score that they where given by the model. The horizontal axis represents the sorted observations using percentage points, e.g. 10% means 10% of observations with the highest score.

The Lift value for a given percentage is the ratio of the density of positive target values within that percentage and the density of positive target values in the whole population. This shows how the classification model differs from the simplest possible model (random selection from population, for which the lift value is always 1) or other models.

Usually the lift value is calculated for discrete chunks called quantiles, e.g. if the lift will be computed for 10 quantiles, than the first quantile is the best 10% of observations, the second are observations from 10%-20%, and so on. The last are observations from 90%-100%, i.e. the worst 10%.

In addition to KSAnalysis, ROCAnalysis and liftAnalysis the user can find general statistics such as:

Maximal Score - the maximal score score given to an observation by the model
Minimal Score - the minimal score given to an observation by the model
Positive Cases - the number of cases from the positive class in the test dataset
Cases - the total number of cases in the test dataset.

Note

The ROC and Lift are computed only if the Compute Quantiles option is enabled.

Survival Test Task

Creating the Test Task

In this example we will use the 'HIV' data which can be found in '..\Client\scripts\scripts\data\survival' in the AdvancedMiner directory. To perform the whole task follow these steps:

create a physical data object for the test data
create the task: SurvivalTestTask
choose the 'Test' action from context menu to learn what to do next. A test report will appear with list of necessary settings:

Figure 15.24. Test Reports

Note

SurvivalTestTask may be performed only for the Cox model.

The errors have the same meaning as in approximation test task. There is one additional error:

No censoredCategory specified - the name of censored category must be specified.

All the parameters exept No censoredCategory specified can be set in the standard way by right-clicking on the test task node and choosing the 'Add' action. The last parameter No censoredCategory specified should be set in the Properties Window. The Properties Window may also containsome optional parameters.

Table 15.7. Survival Test Task Options

Option	Description	Possible values	Default value
`Censor Name`	-	the name of the censor attribute	NULL
`Censored Category`	the category indicating that the element is censored	the label of the chosen class	NULL
`First Time Point`	The lower bound of the time scale interval in which survival function values are calculated	integer numbers	NULL
`Last Time Point`	The upper bound of the time scale interval in which survival function values are calculated	integer numbers	NULL
`Liberal Execution`	if TRUE 'liberal' execution is preferred (do not stop on minor errors), false otherwise	TRUE / FALSE	TRUE
`Number of Lift Quantiles`	the number of quantiles used for Lift calculation	positive integer numbers	50
`Number of Time Points`	the number of time points in which the survival function will be calculated	integer numbers	10
`Target`	-	the name of the target attribute	NULL

If all the required parameters are set the defined test task can be executed.

Figure 15.25. Survival Repository

The whole process of test task creating and executing can be done in Gython script:

stt = SurvivalTestTask()                                        # create the test task object
stt.setModelName('survival_model')                              # if the required model exists
stt.setTestDataName('HIV_pd')                                   # specify the test data set
stt.setTestDataTargetAttributeName('days')                      # specify the target attribute
stt.setCensorName('censor')                                     # specify the censor attribute
stt.setCensoredCategory('Y')                                    # specify the censored category
#stt.setFirstTimePoint(25)                                      # optionally set other options
stt.setTestResultName('survival_results')                       # specify the name of the test results

save('survival_tt', stt)
execute('survival_tt')

Survival Test Task Results

After execution the test results object will appear in the Metadata Repository. It can be used to obtain detailed information about the test results: result components - the objects contained in the test task object.

Result Components

Lift charts at each time point are available. The Lift component and its statistics and options are described in detail in Classification Test Task.

Time Series Test Task

Note

Model Testing for Time Series is an experimental feature.

Creating the Test Task

In this example we will use the 'stock_market' dataset, which can be found in the '..\Client\scripts\scripts\data\time' in the AdvancedMiner directory. To perform the whole task follow these steps:

create a physical data object for the test data
create the task: TimeSeriesTestTask
choose the 'Test' action from the context menu to learn what to do next. A test report will appear with list of necessary settings:

Figure 15.26. Test Report

The erros list is exactly the same as for the approximation test task. The Properties Window may also contain some optional parameters.

Table 15.8. Time Series Test Task Options

Option	Description	Possible values	Default value
`Liberal Execution`	if TRUE 'liberal' execution is preferred (do not stop on minor errors), false otherwise	TRUE / FALSE	TRUE
`Number of Time Points`	the number of time points for which the test statistics will be calculated	positive integer numbers	10
`Target`	-	the name of the target attribute	NULL
`timePointAttribute name`	-	the name of the time point attribute	NULL

If all the required parameters are set the defined test task can be executed.

Figure 15.27. Time Series Repository

The whole process of test task creating and executing can be done in Gython script:

tstt = TimeSeriesTestTask()                                     # create the test task object
tstt.setModelName('timeseries_model')                           # if the required model exists
tstt.setTestDataName('stock_market_pd')                         # specify the test data set
tstt.setTestDataTargetAttributeName('price_change')             # specify the target attribute
tstt.setTimePointAttributeName('time')                          # specify the time attribute
#tstt.setNumberOfTimePoints(10)                                 # optionally set other options
tstt.setTestResultName('timeseries_results')                    # specify the name of the test results


save('timeseries_tt', tstt)
execute('timeseries_tt')

Time Series Test Task Results

After execution, the test results object will appear in the Metadata Repository. It can be used to obtain detailed information about the test results: the test statistics in the Properties Window.

Test Statistics

Mean Absolute Error (MAE) - the mean of the absolute values of prediction errors for the test data
Mean Actual Value (MAV) - the observed mean of the whole series:
Actual Value Variance (AVV) - the observed variance of the whole series:
Mean Predicted Variance (MPV) - the mean value of the predicted variances:
RMS Error - the square root of the mean squared errors for the test data
R-Squared Error -

where is the current mean value of the series, denotes the model variance for the i-th observation, denotes the series value for i-th observation and N is the number of observations in the series.

Classification Test Result Task

This task performs a special kind of classification test, in which a scoring table is used instead of a classification model.

Creating Test Task

To perform this task the following objects are required: ascored table in the database, a physical data object for the scored data and a test task in the repository.

The scored table must containt the following columns:

real target
classes returned by the model
probability (or score) returned by the model

Let us use the sample data presented below. After executing this of script the sample data set and the scored data will apear in the database.

Example 15.1. Preparing a classification test result task

table "input_table":
    temp   height   Class
    -50    1        "good"
    -40    1        "good"
    -30    1        "good"
    -20    10       "good"
    -10    10       "good"
     10    10       "good"
     20    20       "bad"
     30    10       "good"
     40    10       "good"
     50    10       "bad"
     60    5        "good"
     70    20       "bad"
     80    30       "bad"
     90    1        "good"
     100   10       "bad"

# a simple classification model
trans "scored_Table""<-"input_Table":
    score = temp/100.0
    newClass = "good"
    if score>0.5:
        newClass ="bad"

The remaining steps are well known:

create physical data of the test data
create task: CalculateTestResultTask
choose the 'Test' action from context menu to learn what to do next. A test report will appear containing list of necessary settings

Figure 15.28. Test Reports

The particular errors have the following meaning:

no physicalDataName specified - a PhysicalData object containing the scored data must be assigned to the test task.
no target specified - the name of the real target variable
no predictedTarget specified - the name of a variable in the scored data containing the classes assigned by a model.
no positiveTargetValue specified - the positive category of the target variable must be set.
no testResultName specified - the name of model where the testing results will be placed must be provided.
no score specified - the name of the variable containing the score (probability) from the model.

Only the parameters testResultName and physicalDataName can be set in a standard way by right-clicking on the test task node and choosing the 'Add' action. The other parameters should be set in the Properties Window. Because Calculate Test Result Task is almost the same as Classification Test Task the rest of parameters in Properties Window are the same (see Classification Test Task Options).

If all the required parameters are set the defined test task can be executed.

Figure 15.29. Calculate Test Result Repository

The whole process of test task creating and executing can be done in Gython script:

ctrt = CalculateTestResultTask()                                            # create the test task object
ctrt.setPhysicalDataName('scoredTable_pd')                                  # specify the physical data object for the scored data
ctrt.setActualName("Class")                                                 # specify the real target attribute
ctrt.setPositiveTargetValue('bad')                                          # specify the positive target category
ctrt.setPredictedName('newClass')                                           # specify the name of the variable containg classes assigned by the model
ctrt.setScore('score')                                                      # specify the name of the variable with score
#ctrt.setNumberOfQuantiles(25)                                              # optionally set other options
ctrt.setTestResultName('testresult_results')                                # specify the name of the test result

save('testresult_tt',ctrt)
execute('testresult_tt')

Calculate Test Result Task Results

The results of this task are exactly the same as for Classification Test Task .