AdvancedMiner also includes a number of Gython commands designed to streamline the process of building, testing and applying classification, approximation and clustering models. These commands are described in this section.
The approximator command creates a set of objects in the repository for building and testing an approximation model.
Syntax:
approximator(trainTableName, target, algorithmSettings[, testTableName = s]
[, prefix = s][, validationTableName = s][, usageDict = d][, usages = d][, sqlAttributes = d]
[, categorical = l][, numerical = l][, repoName = s][, scoringLanguage = langname])Explanation of parameters:
trainTableName - the name of a database table with the training data,
target - the name of the target attribbute in the training and testing datasets,
algorithmSettings - an object with algorithm settings. Valid objects are FeedforwardNeuralSettings, IRLSSettings, RegressionSettings and WeightedRegressionSettings.
testTableName - the name of a database table with the testing data; if omitted the training data will be used.
prefix - a string which will be pre-pended to the names of all created objects; if omitted the name of the training data table will be used.
validationTableName - the name of a database table with the validation data.
usageDict - a dictionary with optional usages for attributes; to specify usage for an attribute ad an entry to the dictionary with attribute name as the key and UsageOption as the value, e.g.
usageDict = { 'att1' : UsageOption.inactive, 'att3' : UsageOption.supplementary }
usages - a dictionary with optional usages for attributes; to specify usage for an attribute add an entry to the dictionary with key as the UsageOption and an attribute name or a list of attribute names as the value, e.g.:
usages = { UsageOption.inactive : ['att1', 'att2], UsageOption.supplementary : 'att3' }
sqlAttributes - a dicionary with sql attributes, with keys corresponding to attribute names and values to sql statements.
categorical - a list of attributes which should be treated as categorical.
numerical - a list of attributes which should be treated as numerical.
repoName - the name of the metadata repository in which the objects will be created; if omitted, the default repository will be used.
scoringLanguage - specifies the programming language in which the scoring code will be created (if applicable).
The classifier command creates a set of objects in the repository for building and testing a classification model.
Syntax:
classifier(trainTableName, target, algorithmSettings[, testTableName = s]
[, positive = s][, prefix = s][, validationTableName = s][, usageDict = d][, usages = d]
[, ldTableName = s][, sqlAttributes = d][, categorical = l][, numerical = l][, testQuantiles = n]
[, inactiveAsDefault = 0 | 1][, repoName = s][, scoringLanguage = langname])Explanation of parameters:
trainTableName - the name of a database table with the training data,
target - the name of the target attribbute in the training and testing datasets,
algorithmSettings - an object with algorithm settings. Valid objects are BivariateProbitSettings, DiscriminantSettings, FeedforwardNeuralNetSettings, KohonenClasificationSettings, LogisticRegressionSettings, TreeSettings.
testTableName - the name of a database table with the testing data; if omitted the training data will be used.
positive - the target category which will be treated as good for the purpose of computing quantiles in classififcation test task; if omitted, quantiles will not be computed and only confusion matrix and target statistics will be computed.
prefix - a string which will be pre-pended to the names of all created objects; if omitted the name of the training data table will be used.
validationTableName - the name of a database table with the validation data.
usageDict - a dictionary with optional usages for attributes; to specify usage for an attribute ad an entry to the dictionary with attribute name as the key and UsageOption as the value, e.g.
usageDict = { 'att1' : UsageOption.inactive, 'att3' : UsageOption.supplementary }
usages - a dictionary with optional usages for attributes; to specify usage for an attribute add an entry to the dictionary with key as the UsageOption and an attribute name or a list of attribute names as the value, e.g.:
usages = { UsageOption.inactive : ['att1', 'att2], UsageOption.supplementary : 'att3' }
ldTableName - the table name for constructing logical data.
sqlAttributes - a dicionary with sql attributes, with keys corresponding to attribute names and values to sql statements.
categorical - a list of attributes which should be treated as categorical.
numerical - a list of attributes which should be treated as numerical.
testQuantiles - the number of quantiles used for the computation of Lift and ROC; if omitted 50 quantiles are used.
inactiveAsDefault - if set to 1 will cause all attributes except the target to be inactive unless specified otherwise with usages or usageDict; if set to 0 (default) all attributes are assumed to be active.
repoName - the name of the metadata repository in which the objects will be created; if omitted, the default repository will be used.
scoringLanguage - specifies the programming language in which the scoring code will be created (if applicable).
Example 15.9. Basic usage of the classifier command
classifier('german_credit', 'Class', TreeSettings() )
The clusterer command creates a set of objects in the repository for building and testing a clustering model.
Syntax:
clusterer(trainTableName[, algorithmSettings][, prefix = s]
[, validationTableName = s][, usageDict = d][, usages = d][, ldTableName = s][, sqlAttributes = d]
[, categorical = l][, numerical = l][, testQuantiles = n][, repoName = s][, scoringLanguage = langname])Explanation of parameters:
trainTableName - the name of a database table with the training data,
algorithmSettings - an object with algorithm settings. Valid objects are KohonenClusteringSettings and KMeansSettings; by default no algorithm settings are used.
prefix - a string which will be pre-pended to the names of all created objects; if omitted the name of the training data table will be used.
validationTableName - the name of a database table with the validation data.
usageDict - a dictionary with optional usages for attributes; to specify usage for an attribute ad an entry to the dictionary with attribute name as the key and UsageOption as the value, e.g.
usageDict = { 'att1' : UsageOption.inactive, 'att3' : UsageOption.supplementary }
usages - a dictionary with optional usages for attributes; to specify usage for an attribute add an entry to the dictionary with key as the UsageOption and an attribute name or a list of attribute names as the value, e.g.:
usages = { UsageOption.inactive : ['att1', 'att2], UsageOption.supplementary : 'att3' }
ldTableName - the table name for constructing logical data.
sqlAttributes - a dicionary with sql attributes, with keys corresponding to attribute names and values to sql statements.
categorical - a list of attributes which should be treated as categorical.
numerical - a list of attributes which should be treated as numerical.
repoName - the name of the metadata repository in which the objects will be created; if omitted, the default repository will be used.
scoringLanguage - specifies the programming language in which the scoring code will be created (if applicable).
The applier command offers a streamlined way to apply approximation, classification, clustering, time series and survival models.
Syntax:
applier(inputTableName, modelName[, outputTableName = s][, scoreColumnName = s]
[, targetColumnName = 's'][, clusterColumnName = s][, columnsToCopy = l][, positive = s]
[, clusterLevel = n][, clusterLevelFromLeaf = 0 | 1][, deleteObjects = 0 | 1][, customApplyItems = l]
[, prefix = s][, firstTimePoint = n][, lastimePoint = n][, numberOfTimePoints = n][, repoName = s])
Explanation of parameters:
inputTableName - the name of the table to score; this table must contains all the attributes specified in the model signature.
modelName - the name of the model to use for scoring.
outTableName - the name of the output table to create; if omitted the name of the output table will be obtained by suffixing '_scored'' to the name of the input table.
scoreColumnName - the name of the column containing the scores in the output table; if omitted this column will be named 'score'.
targetColumnName - the name of the column with the predicted target values in the output table; if omitted, the name 'target' will be used. (applicable to classification models)
clusterColumnName - the name of the column with the assigned cluster in the output table; if omitted, the name 'cluster' will be used. (applicable to clustering models)
columnsToCopy - a list of columns to copy from the input table to the output table, e.g.":
columnsToCopy = ['Class', 'sepallength']
if it is necessary to change the name of the copied column in the output table, the new name should be specified in a pair with the original name, e.g.:
columnsToCopy = [('Class', 'real_target'), 'sepallength']This will copy the column 'Class' from the input table to the output table renaming it to 'real_target'.
positive - the positive target value to score for; if omitted in the case of applying a statistical model, this value will taken from the model itself. (applicable to calssiffication models)
clusterLevel - denotes the level of a clustering model tree at which the observations will be assigned to a cluster. (applicable to hierarchical clustering models)
clusterLevelFromLeaf - if 0 then clusterLevel is computed from root to leaf, if 1, it is computed from leaf to root. (applicable to hierarchical clustering models)
deleteObjects - if 1, then the objects created in the repository will be deleted after applying the model; if 0 (default), the objects will not be deleted.
customApplyItems - a list of custom output items to add to ApplyOutput; e.g. can be used to obtain the second best predicted target.
repoName - the name of the repository in which the objects will be created.
Example 15.15. Streamlined building and applying of a clustering model
# after executing this script a table named 'german_credit_scored' should be created
clusterer('german_credit')
execute('german_credit_bt')
applier('german_credit_train', 'german_credit_model', clusterLevel=3, clusterLevelFromLeaf=0)
Example 15.16. Streamlined building, testing and applying of an approximation model
# after executing this script a table named 'iris_scored' should be created
approximator('iris', 'sepallength', LinearRegressionSettings() )
execute('iris_bt')
applier('iris_train', 'iris_model' )
Example 15.17. Streamlined building, testing and applying of a classification model with a secondary predicted target
# after executing this script a table named 'iris_scored' should be created and
# an Excel spreadsheet with the contents of this table will open
classifier('iris', 'Class', FeedforwardNeuralNetSettings(maxNumberOfIterations=10))
execute('iris_bt')
applier('iris_train', 'iris_model', customApplyItems=
[ClassificationRankItem('2nd_target', ClassificationOutputType.predictedCategory, 1)] )
Office.createSpreadsheet().setCellDataArray(0,0, tableRead('iris_scored'))