Shorthand methods of building, testing and applying models

AdvancedMiner also includes a number of Gython commands designed to streamline the process of building, testing and applying classification, approximation and clustering models. These commands are described in this section.

Approximator

The approximator command creates a set of objects in the repository for building and testing an approximation model.

Syntax:

approximator(trainTableName, target, algorithmSettings[, testTableName = s]
[, prefix = s][, validationTableName = s][, usageDict = d][, usages = d][, sqlAttributes = d]
[, categorical = l][, numerical = l][, repoName = s][, scoringLanguage = langname])

Explanation of parameters:

  • trainTableName - the name of a database table with the training data,

  • target - the name of the target attribbute in the training and testing datasets,

  • algorithmSettings - an object with algorithm settings. Valid objects are FeedforwardNeuralSettings, IRLSSettings, RegressionSettings and WeightedRegressionSettings.

  • testTableName - the name of a database table with the testing data; if omitted the training data will be used.

  • prefix - a string which will be pre-pended to the names of all created objects; if omitted the name of the training data table will be used.

  • validationTableName - the name of a database table with the validation data.

  • usageDict - a dictionary with optional usages for attributes; to specify usage for an attribute ad an entry to the dictionary with attribute name as the key and UsageOption as the value, e.g.

    usageDict = { 'att1' : UsageOption.inactive, 'att3' : UsageOption.supplementary }
                        

  • usages - a dictionary with optional usages for attributes; to specify usage for an attribute add an entry to the dictionary with key as the UsageOption and an attribute name or a list of attribute names as the value, e.g.:

    usages = { UsageOption.inactive : ['att1', 'att2], UsageOption.supplementary : 'att3' }
                        

  • sqlAttributes - a dicionary with sql attributes, with keys corresponding to attribute names and values to sql statements.

  • categorical - a list of attributes which should be treated as categorical.

  • numerical - a list of attributes which should be treated as numerical.

  • repoName - the name of the metadata repository in which the objects will be created; if omitted, the default repository will be used.

  • scoringLanguage - specifies the programming language in which the scoring code will be created (if applicable).

Example 15.8. Using the approximator command

approximator('iris', 'sepallength', LinearRegressionSettings() )
            

Classifier

The classifier command creates a set of objects in the repository for building and testing a classification model.

Syntax:

classifier(trainTableName, target, algorithmSettings[, testTableName = s]
[, positive = s][, prefix = s][, validationTableName = s][, usageDict = d][, usages = d]
[, ldTableName = s][, sqlAttributes = d][, categorical = l][, numerical = l][, testQuantiles = n]
[, inactiveAsDefault = 0 | 1][, repoName = s][, scoringLanguage = langname])

Explanation of parameters:

  • trainTableName - the name of a database table with the training data,

  • target - the name of the target attribbute in the training and testing datasets,

  • algorithmSettings - an object with algorithm settings. Valid objects are BivariateProbitSettings, DiscriminantSettings, FeedforwardNeuralNetSettings, KohonenClasificationSettings, LogisticRegressionSettings, TreeSettings.

  • testTableName - the name of a database table with the testing data; if omitted the training data will be used.

  • positive - the target category which will be treated as good for the purpose of computing quantiles in classififcation test task; if omitted, quantiles will not be computed and only confusion matrix and target statistics will be computed.

  • prefix - a string which will be pre-pended to the names of all created objects; if omitted the name of the training data table will be used.

  • validationTableName - the name of a database table with the validation data.

  • usageDict - a dictionary with optional usages for attributes; to specify usage for an attribute ad an entry to the dictionary with attribute name as the key and UsageOption as the value, e.g.

    usageDict = { 'att1' : UsageOption.inactive, 'att3' : UsageOption.supplementary }
                        

  • usages - a dictionary with optional usages for attributes; to specify usage for an attribute add an entry to the dictionary with key as the UsageOption and an attribute name or a list of attribute names as the value, e.g.:

    usages = { UsageOption.inactive : ['att1', 'att2], UsageOption.supplementary : 'att3' }
                        

  • ldTableName - the table name for constructing logical data.

  • sqlAttributes - a dicionary with sql attributes, with keys corresponding to attribute names and values to sql statements.

  • categorical - a list of attributes which should be treated as categorical.

  • numerical - a list of attributes which should be treated as numerical.

  • testQuantiles - the number of quantiles used for the computation of Lift and ROC; if omitted 50 quantiles are used.

  • inactiveAsDefault - if set to 1 will cause all attributes except the target to be inactive unless specified otherwise with usages or usageDict; if set to 0 (default) all attributes are assumed to be active.

  • repoName - the name of the metadata repository in which the objects will be created; if omitted, the default repository will be used.

  • scoringLanguage - specifies the programming language in which the scoring code will be created (if applicable).

Example 15.9. Basic usage of the classifier command

classifier('german_credit', 'Class', TreeSettings() )
            

Example 15.10. More complex usage of the classifier command

classifier('vehicle', 'Class', TreeSettings(minNodeSize=5), 'van' )
            

Example 15.11. Setting usage options with the classifier command

as = TreeSettings()
as.minNodeSizePrc = 15
classifier('german_credit', 'Class', as, 'van', usages = {UsageOption.inactive: ['credit_amount', 'credit_history']} )
            

Clusterer

The clusterer command creates a set of objects in the repository for building and testing a clustering model.

Syntax:

clusterer(trainTableName[, algorithmSettings][, prefix = s]
[, validationTableName = s][, usageDict = d][, usages = d][, ldTableName = s][, sqlAttributes = d]
[, categorical = l][, numerical = l][, testQuantiles = n][, repoName = s][, scoringLanguage = langname])

Explanation of parameters:

  • trainTableName - the name of a database table with the training data,

  • algorithmSettings - an object with algorithm settings. Valid objects are KohonenClusteringSettings and KMeansSettings; by default no algorithm settings are used.

  • prefix - a string which will be pre-pended to the names of all created objects; if omitted the name of the training data table will be used.

  • validationTableName - the name of a database table with the validation data.

  • usageDict - a dictionary with optional usages for attributes; to specify usage for an attribute ad an entry to the dictionary with attribute name as the key and UsageOption as the value, e.g.

    usageDict = { 'att1' : UsageOption.inactive, 'att3' : UsageOption.supplementary }
                        

  • usages - a dictionary with optional usages for attributes; to specify usage for an attribute add an entry to the dictionary with key as the UsageOption and an attribute name or a list of attribute names as the value, e.g.:

    usages = { UsageOption.inactive : ['att1', 'att2], UsageOption.supplementary : 'att3' }
                        

  • ldTableName - the table name for constructing logical data.

  • sqlAttributes - a dicionary with sql attributes, with keys corresponding to attribute names and values to sql statements.

  • categorical - a list of attributes which should be treated as categorical.

  • numerical - a list of attributes which should be treated as numerical.

  • repoName - the name of the metadata repository in which the objects will be created; if omitted, the default repository will be used.

  • scoringLanguage - specifies the programming language in which the scoring code will be created (if applicable).

Example 15.12. Basic usage of the clusterer command

clusterer('german_credit')
            

Example 15.13. More complex usage of the clusterer command

clusterer('iris', KMeansSettings(), usages={UsageOption.inactive : ['Class', 'sepalwidth'] } )
            

Example 15.14. Setting usage options with the clusterer command

as = TreeSettings()
as.minNodeSizePrc = 15
classifier('german_credit', 'Class', as, 'van', usages = {UsageOption.inactive: ['credit_amount', 'credit_history']} )
            

Applier

The applier command offers a streamlined way to apply approximation, classification, clustering, time series and survival models.

Syntax:

applier(inputTableName, modelName[, outputTableName = s][, scoreColumnName = s]
[, targetColumnName = 's'][, clusterColumnName = s][, columnsToCopy = l][, positive = s]
[, clusterLevel = n][, clusterLevelFromLeaf = 0 | 1][, deleteObjects = 0 | 1][, customApplyItems = l]
[, prefix = s][, firstTimePoint = n][, lastimePoint = n][, numberOfTimePoints = n][, repoName = s])
        

Explanation of parameters:

  • inputTableName - the name of the table to score; this table must contains all the attributes specified in the model signature.

  • modelName - the name of the model to use for scoring.

  • outTableName - the name of the output table to create; if omitted the name of the output table will be obtained by suffixing '_scored'' to the name of the input table.

  • scoreColumnName - the name of the column containing the scores in the output table; if omitted this column will be named 'score'.

  • targetColumnName - the name of the column with the predicted target values in the output table; if omitted, the name 'target' will be used. (applicable to classification models)

  • clusterColumnName - the name of the column with the assigned cluster in the output table; if omitted, the name 'cluster' will be used. (applicable to clustering models)

  • columnsToCopy - a list of columns to copy from the input table to the output table, e.g.":

    columnsToCopy = ['Class', 'sepallength']

    if it is necessary to change the name of the copied column in the output table, the new name should be specified in a pair with the original name, e.g.:

    columnsToCopy = [('Class', 'real_target'), 'sepallength']

    This will copy the column 'Class' from the input table to the output table renaming it to 'real_target'.

  • positive - the positive target value to score for; if omitted in the case of applying a statistical model, this value will taken from the model itself. (applicable to calssiffication models)

  • clusterLevel - denotes the level of a clustering model tree at which the observations will be assigned to a cluster. (applicable to hierarchical clustering models)

  • clusterLevelFromLeaf - if 0 then clusterLevel is computed from root to leaf, if 1, it is computed from leaf to root. (applicable to hierarchical clustering models)

  • deleteObjects - if 1, then the objects created in the repository will be deleted after applying the model; if 0 (default), the objects will not be deleted.

  • customApplyItems - a list of custom output items to add to ApplyOutput; e.g. can be used to obtain the second best predicted target.

  • repoName - the name of the repository in which the objects will be created.

Example 15.15. Streamlined building and applying of a clustering model

# after executing this script a table named 'german_credit_scored' should be created
clusterer('german_credit')
execute('german_credit_bt')
applier('german_credit_train', 'german_credit_model', clusterLevel=3, clusterLevelFromLeaf=0)
            

Example 15.16. Streamlined building, testing and applying of an approximation model

# after executing this script a table named 'iris_scored' should be created
approximator('iris', 'sepallength', LinearRegressionSettings() )
execute('iris_bt')
applier('iris_train', 'iris_model' )
            

Example 15.17. Streamlined building, testing and applying of a classification model with a secondary predicted target

# after executing this script a table named 'iris_scored' should be created and
# an Excel spreadsheet with the contents of this table will open
classifier('iris', 'Class', FeedforwardNeuralNetSettings(maxNumberOfIterations=10))
execute('iris_bt')
applier('iris_train', 'iris_model', customApplyItems=
    [ClassificationRankItem('2nd_target', ClassificationOutputType.predictedCategory, 1)] )
Office.createSpreadsheet().setCellDataArray(0,0, tableRead('iris_scored'))