Model testing

This section describes model testing. There are four different test tasks corresponding to the four mining tasks:

There is one additional test task CalculateTestResultsTask, which concerns classification model testing on the basis of scored data instead of classification model. This task will be described at the end of this chapter.

The idea of testing is the same regardless of kind of test task. However, even though the main steps are common, there are slight differences in parameters between test tasks. For this reason this chapter is divided into four sections describing each test task individually. In each part the user will find all the steps required to perform test task and the descriptions of test results.

Note

  • We assume that a model of the appropriate kind exists in the repository. To learn how to build the required model see the Model Building chapter.
  • We assume that the user is familiar with managing objects in metadata repository. See the Metadata Repository for an introduction to this subject.

Approximation Test Task

Creating the Test Task

As an example we will use the 'cholesterol' dataset, which can be found in '..\Client\scripts\scripts\data\approximation' in the AdvancedMiner directory. To perform the whole task follow these steps:

  1. create physical data object for the test data
  2. create the task: ApproximationTestTask
  3. choose the 'Test' action from the context menu to learn what to do next. A test report will appear with a list of missing settings:

Figure 15.18. Approximation Test Task Report

Approximation Test Task Report

The particular errors have the following meaning:

  • No model specified - an input model (a model to be tested) must be specified.

  • No testData specified - a PhysicalData object containing the test data must be assigned to the test task.

  • No testDataTargetAttributeName specified - when PhysicalData is provided the target attribute must be set.

  • No testResults specified - the name of the model where the testing results will be placed must be provided.

All the parameters exept testDataTargetAttributeName can be set in a standard way by right-clicking on the test task node and selecting the 'Add' action. The last parameter testDataTargetAttributeName should be set in the Properties Window. The Properties Window there may also contain some optional parameters.

Table 15.3. Approximation Test Task Options

OptionDescriptionPossible valuesDefault value
Cut Prc number of quantiles to trim each side when calculating Error Histogrampositive integer numbers [0; 100]5
Number of Intervals number of intervals used to calculate Error Histogrampositive integer numbers20
Number of Intervals XY Plot number of intervals used to calculate y vs yhat histogrampositive integer numbers20
Targetname of the target variablename of the target variableNULL
Liberal Execution if TRUE 'liberal' execution is preferred (do not stop on minor errors), false otherwise TRUE / FALSETRUE

If all the required parameters are set the defined test task can be executed.

Figure 15.19. Approximation Repository

Approximation Repository

The whole process of test task creating and executing can be done in a Gython script:

att = ApproximationTestTask()                                     # create the test task object
att.setModelName('approximation_model')                           # if the required model exists
att.setTestDataName('cholesterol_pd')                             # specify the test data set
att.setTestDataTargetAttributeName('chol')                        # specify the target attribute
#att.setNumberOfIntervals(10)                                     # optionally set other options
att.setTestResultName('approximation_results')                    # set the name of test results

save('approximation_tt', att)
execute('approximation_tt')
            

Approximation Test Task Results

After execution, the test results object will appear in the Metadata Repository. It can be used to obtain detailed information about the test results: test statistics in the Properties Window and results components - objects contained in the test task object.

Test Statistics

Basic Definitions:

  • observed target value for the -th observation:
  • predicted target value for the -th observation:
  • residual error for the -th observation:

Table 15.4. Aproximation Test Statistics

NameDescription
Mean Absolute Error the mean of the absolute values of prediction errors for the test data:
Mean Actual Value the mean of the actual values of the target attributes for the test data:
Mean Predicted Value the mean of the predicted values of the target for the test data:
RMS Error the square root of the mean squared errors for the test data:

Result Components

Several table histograms are available as approximation test results. The number of intervals and cut prc parameters are parameters of the test task.

  • errorHistogram - the table with the distribution of (residual error values)
  • errorHistogramCut - the error histogram with 'CutPrc' quantiles cut on both sides
  • yHistogram - the table containing the distribution of (the observed target value for the i-th observation)
  • yHistogramCut - the yHistogram with 'CutPrc' quantiles cut on both sides
  • yhatHistogram - table containing the distribution of (the predicted target value for the i-th observation)
  • yhatHistogramCut - a yhatHistogram with 'CutPrc' quantiles cut on both sides.

Classification Test Task

Creating the Test Task

As an example we will use the 'german credit' dataset, which can be found in '..\Client\scripts\scripts\data\classification' in the AdvancedMiner directory. To perform the whole task follow these steps:

  1. create physical data object for the test data
  2. create the task: ClassificationTestTask
  3. choose the 'Test' action from the context menu to learn what to do next. A test report will appear with a list of missing settings:

Figure 15.20. Classification Test Task Report

Classification Test Task Report

The first four errors errors have the same meaning as in the case of approximation test task. There is one additional error:

  • No positiveTargetValue specified, while ComputeQuantiles option is set to TRUE - either this option should be set as inactive or the positive category of the target value, necessary for testing, must be specified.

All the parameters except testDataTargetAttributeName and positiveTargetValue can be set in the standard way by right-clicking on the test task node and choosing the 'Add' action. The last two parameters testDataTargetAttributeName and positiveTargetValue must be set in the Properties Window. The Properties Window may also contain some optional parameters.

Table 15.5. Classification Test Task Options

OptionDescriptionPossible valuesDefault value
Compute Quantiles FALSE means than ROC and Lift will not be computed. TRUE, FALSETRUE
Liberal Execution if TRUE 'liberal' execution is preferred (do not stop on minor errors), false otherwise TRUE, FALSETRUE
Number of Quantiles the number of quantiles used for Lift and ROC computation positive integer numbers50
Positive Binary Target Threshold the threshold for the positive binary category; when the probability from model is >= then this threshold the observation will be classified as belonging to the positive class real numbers from the interval (0,1)NaN
Positive Target Valuethe positive (event) category value for the target attributethe label of the selected classNULL
Reweighting mode the way in which the observations in test dataset are reweighted:
  • reweightCounters - only counters (e.g. in confusion matrix, lifts etc.) will take weights into account,
  • reweightEverything - everything will be reweighted (e.g. counters and quantiles in lift),
  • noReweighting - observations won't be reweighted and weights will be ignored
noReweighting/reweightCounters/reweightEverythingnoReweighting
Target-the name of the target attributeNULL
Weight- the name of the weight attribute; will be taken into account only if Reweighting mode is not set to noReweighting NULL

If all the required parameters are set the defined test task can be executed.

Figure 15.21. Classification Repository

Classification Repository

The whole process of test task creating and executing can be done in a Gython script:

ctt = ClassificationTestTask()                                  # create the test task object
ctt.setModelName('classification_model')                        # if the required model exists
ctt.setTestDataName('german_credit_pd')                         # specify the test data set
ctt.setTestDataTargetAttributeName('Class')                     # specify the the target attribute
ctt.setPositiveTargetValue('bad')                               # specify the the positive target category
#ctt.setNumberOfQuantiles(20)                                   # optionally set other options
ctt.setTestResultName('classification_results')                 # specify the name of the test results

save('classification_tt', ctt)
execute('classification_tt')
            

Classification Test Task Results

After execution, the test results object will appear in the Metadata Repository. It can be used to obtain detailed information about the test results: test statistics in the Properties Window and result components - the objects contained in the test task object.

Test Statistics

Table 15.6. Classification Test Statistics

NameDescription
Accuracythe accuracy of the model
Improperly Assigned the number of cases for which the prediction is not equal to the actual target value
Properly Assigned Cases the number of cases for which the prediction is equal to the actual target value
Total Cases the total number of cases

Result Components

The others statistics available in AdvancedMiner are:

K-S Analysis

The Kolmogorov-Smirnov analysis consists of two charts: true positive rate and false positive rate (both statistics are described below) for each value of the discrimination threshold. The bigger the distance between those two lines the better model is.

There are some additional statistics connected with the K-S Chart:

  • K-S Statistics - the maximal distance between the True Positive Rate and False Positive Rate lines.
  • K-S Statistics Score Threshold - the score threshold yielding the maximal distance.
ROC Analysis

The receiver operating characteristic (ROC) is an alternative measure of the quality of the classifier. Having one category selected as positive, ROC is a plot of the number of true positives (i.e. observations for which both data-observed and model-predicted target values are equal to the positive category) vs. the number of false positives (i.e. observations having a positive predicted category and a different observed category) for each value of the discrimination threshold (the probability of predicting the positive category) in the range from zero to one.

The best possible prediction model results in a graph that is a point in the upper left corner of the ROC plot, i.e. 100% sensitivity (all true positives are found) and 100% specificity (no false positives are found). A completely random predictor would yield a straight line at an angle of 45 degrees from the horizontal axis, from bottom left to top right (because as the threshold is raised, equal numbers of true and false positives would be let in). The results below this line would suggest a detector that gives wrong results consistently.

There are some additional statistics connected with ROC Chart:

  • ROC Area - the area under the ROC Curve (AUC). The area equal to 1 represents a perfect classifier model, the area equal to 0.5 represents a random prediction. Usually the area under the ROC curve above 0.9 indicates an excellent model, the values in the range 0.8-0.9 indicate a good model, the values in the range 0.7-0.8 indicate a fair model, and values below 0.7 indicate a poor model.

  • Gini Coefficient - a statistic very closely related to the ROC Area. It is equal to 2*AUC - 1.

Confusion Matrix and targetAnalysis

Confusion matrix contains the information about the actual and predicted classification obtained by Classification Test Task. The value in every cell represents the number of examples for which the actual value of the target is given by the row and the value predicted by the model is given by the column. Such representation implies that numbers on the diagonal are for correct classification while other numbers are for the mistakes.

In the example below, the value 6.0 has the following interpretation: there are 6 cases which have been classified by the model as "van" but in fact they should be classified as "bus".

Figure 15.22. Confusion matrix

Confusion matrix

In the case of binary class classification problem the TP, FN, FP, TN symbols are defined. Thanks to such notation it is easy to define additional statistics available in the targetAnalysis node.

Figure 15.23. Confusion matrix - basic notation

Confusion matrix - basic notation

Note

In the case of multi-class problems for the calculation of the additional statistics one class is defined as positive, while the remaining classes are treated together as one negative class.

Lift Analysis

Lift is a way to measure the quality of the classification model. It is calculated and plotted for a given value of the target attribute, which is called the positive value. Firstly, the observations are sorted in a descending order, according to the score that they where given by the model. The horizontal axis represents the sorted observations using percentage points, e.g. 10% means 10% of observations with the highest score.

The Lift value for a given percentage is the ratio of the density of positive target values within that percentage and the density of positive target values in the whole population. This shows how the classification model differs from the simplest possible model (random selection from population, for which the lift value is always 1) or other models.

Usually the lift value is calculated for discrete chunks called quantiles, e.g. if the lift will be computed for 10 quantiles, than the first quantile is the best 10% of observations, the second are observations from 10%-20%, and so on. The last are observations from 90%-100%, i.e. the worst 10%.

In addition to KSAnalysis, ROCAnalysis and liftAnalysis the user can find general statistics such as:

  • Maximal Score - the maximal score score given to an observation by the model
  • Minimal Score - the minimal score given to an observation by the model
  • Positive Cases - the number of cases from the positive class in the test dataset
  • Cases - the total number of cases in the test dataset.

Note

The ROC and Lift are computed only if the Compute Quantiles option is enabled.

Survival Test Task

Creating the Test Task

In this example we will use the 'HIV' data which can be found in '..\Client\scripts\scripts\data\survival' in the AdvancedMiner directory. To perform the whole task follow these steps:

  1. create a physical data object for the test data
  2. create the task: SurvivalTestTask
  3. choose the 'Test' action from context menu to learn what to do next. A test report will appear with list of necessary settings:

Figure 15.24. Test Reports

Test Reports

Note

SurvivalTestTask may be performed only for the Cox model.

The errors have the same meaning as in approximation test task. There is one additional error:

  • No censoredCategory specified - the name of censored category must be specified.

All the parameters exept No censoredCategory specified can be set in the standard way by right-clicking on the test task node and choosing the 'Add' action. The last parameter No censoredCategory specified should be set in the Properties Window. The Properties Window may also containsome optional parameters.

Table 15.7. Survival Test Task Options

OptionDescriptionPossible valuesDefault value
Censor Name -the name of the censor attributeNULL
Censored Category the category indicating that the element is censoredthe label of the chosen classNULL
First Time Point The lower bound of the time scale interval in which survival function values are calculated integer numbersNULL
Last Time Point The upper bound of the time scale interval in which survival function values are calculated integer numbersNULL
Liberal Execution if TRUE 'liberal' execution is preferred (do not stop on minor errors), false otherwise TRUE / FALSETRUE
Number of Lift Quantilesthe number of quantiles used for Lift calculationpositive integer numbers50
Number of Time Pointsthe number of time points in which the survival function will be calculatedinteger numbers10
Target-the name of the target attributeNULL

If all the required parameters are set the defined test task can be executed.

Figure 15.25. Survival Repository

Survival Repository

The whole process of test task creating and executing can be done in Gython script:

stt = SurvivalTestTask()                                        # create the test task object
stt.setModelName('survival_model')                              # if the required model exists
stt.setTestDataName('HIV_pd')                                   # specify the test data set
stt.setTestDataTargetAttributeName('days')                      # specify the target attribute
stt.setCensorName('censor')                                     # specify the censor attribute
stt.setCensoredCategory('Y')                                    # specify the censored category
#stt.setFirstTimePoint(25)                                      # optionally set other options
stt.setTestResultName('survival_results')                       # specify the name of the test results

save('survival_tt', stt)
execute('survival_tt')
            

Survival Test Task Results

After execution the test results object will appear in the Metadata Repository. It can be used to obtain detailed information about the test results: result components - the objects contained in the test task object.

Result Components

Lift charts at each time point are available. The Lift component and its statistics and options are described in detail in Classification Test Task.

Time Series Test Task

Note

Model Testing for Time Series is an experimental feature.

Creating the Test Task

In this example we will use the 'stock_market' dataset, which can be found in the '..\Client\scripts\scripts\data\time' in the AdvancedMiner directory. To perform the whole task follow these steps:

  1. create a physical data object for the test data
  2. create the task: TimeSeriesTestTask
  3. choose the 'Test' action from the context menu to learn what to do next. A test report will appear with list of necessary settings:

Figure 15.26. Test Report

Test Report

The erros list is exactly the same as for the approximation test task. The Properties Window may also contain some optional parameters.

Table 15.8. Time Series Test Task Options

OptionDescriptionPossible valuesDefault value
Liberal Execution if TRUE 'liberal' execution is preferred (do not stop on minor errors), false otherwise TRUE / FALSETRUE
Number of Time Points the number of time points for which the test statistics will be calculated positive integer numbers10
Target-the name of the target attributeNULL
timePointAttribute name-the name of the time point attributeNULL

If all the required parameters are set the defined test task can be executed.

Figure 15.27. Time Series Repository

Time Series Repository

The whole process of test task creating and executing can be done in Gython script:

tstt = TimeSeriesTestTask()                                     # create the test task object
tstt.setModelName('timeseries_model')                           # if the required model exists
tstt.setTestDataName('stock_market_pd')                         # specify the test data set
tstt.setTestDataTargetAttributeName('price_change')             # specify the target attribute
tstt.setTimePointAttributeName('time')                          # specify the time attribute
#tstt.setNumberOfTimePoints(10)                                 # optionally set other options
tstt.setTestResultName('timeseries_results')                    # specify the name of the test results


save('timeseries_tt', tstt)
execute('timeseries_tt')
            

Time Series Test Task Results

After execution, the test results object will appear in the Metadata Repository. It can be used to obtain detailed information about the test results: the test statistics in the Properties Window.

Test Statistics
  • Mean Absolute Error (MAE) - the mean of the absolute values of prediction errors for the test data

  • Mean Actual Value (MAV) - the observed mean of the whole series:

  • Actual Value Variance (AVV) - the observed variance of the whole series:

  • Mean Predicted Variance (MPV) - the mean value of the predicted variances:

  • RMS Error - the square root of the mean squared errors for the test data

  • R-Squared Error -

where is the current mean value of the series, denotes the model variance for the i-th observation, denotes the series value for i-th observation and N is the number of observations in the series.

Classification Test Result Task

This task performs a special kind of classification test, in which a scoring table is used instead of a classification model.

Creating Test Task

To perform this task the following objects are required: ascored table in the database, a physical data object for the scored data and a test task in the repository.

The scored table must containt the following columns:

  • real target

  • classes returned by the model

  • probability (or score) returned by the model

Let us use the sample data presented below. After executing this of script the sample data set and the scored data will apear in the database.

Example 15.1. Preparing a classification test result task

table "input_table":
    temp   height   Class
    -50    1        "good"
    -40    1        "good"
    -30    1        "good"
    -20    10       "good"
    -10    10       "good"
     10    10       "good"
     20    20       "bad"
     30    10       "good"
     40    10       "good"
     50    10       "bad"
     60    5        "good"
     70    20       "bad"
     80    30       "bad"
     90    1        "good"
     100   10       "bad"

# a simple classification model
trans "scored_Table""<-"input_Table":
    score = temp/100.0
    newClass = "good"
    if score>0.5:
        newClass ="bad"

The remaining steps are well known:

  1. create physical data of the test data
  2. create task: CalculateTestResultTask
  3. choose the 'Test' action from context menu to learn what to do next. A test report will appear containing list of necessary settings

Figure 15.28. Test Reports

Test Reports

The particular errors have the following meaning:

  • no physicalDataName specified - a PhysicalData object containing the scored data must be assigned to the test task.

  • no target specified - the name of the real target variable

  • no predictedTarget specified - the name of a variable in the scored data containing the classes assigned by a model.

  • no positiveTargetValue specified - the positive category of the target variable must be set.

  • no testResultName specified - the name of model where the testing results will be placed must be provided.

  • no score specified - the name of the variable containing the score (probability) from the model.

Only the parameters testResultName and physicalDataName can be set in a standard way by right-clicking on the test task node and choosing the 'Add' action. The other parameters should be set in the Properties Window. Because Calculate Test Result Task is almost the same as Classification Test Task the rest of parameters in Properties Window are the same (see Classification Test Task Options).

If all the required parameters are set the defined test task can be executed.

Figure 15.29. Calculate Test Result Repository

Calculate Test Result Repository

The whole process of test task creating and executing can be done in Gython script:

ctrt = CalculateTestResultTask()                                            # create the test task object
ctrt.setPhysicalDataName('scoredTable_pd')                                  # specify the physical data object for the scored data
ctrt.setActualName("Class")                                                 # specify the real target attribute
ctrt.setPositiveTargetValue('bad')                                          # specify the positive target category
ctrt.setPredictedName('newClass')                                           # specify the name of the variable containg classes assigned by the model
ctrt.setScore('score')                                                      # specify the name of the variable with score
#ctrt.setNumberOfQuantiles(25)                                              # optionally set other options
ctrt.setTestResultName('testresult_results')                                # specify the name of the test result

save('testresult_tt',ctrt)
execute('testresult_tt')
            

Calculate Test Result Task Results

The results of this task are exactly the same as for Classification Test Task .