Shorthand methods of building, testing and applying models

Shorthand methods of building, testing and applying models
Prev	Chapter 15. AdvancedMiner in Practice	Next

Approximator

The approximator command creates a set of objects in the repository for building and testing an approximation model.

Syntax:

approximator(trainTableName, target, algorithmSettings[, testTableName = s]
[, prefix = s][, validationTableName = s][, usageDict = d][, usages = d][, sqlAttributes = d]
[, categorical = l][, numerical = l][, repoName = s][, scoringLanguage = langname])

Explanation of parameters:

trainTableName - the name of a database table with the training data,
target - the name of the target attribbute in the training and testing datasets,
algorithmSettings - an object with algorithm settings. Valid objects are FeedforwardNeuralSettings, IRLSSettings, RegressionSettings and WeightedRegressionSettings.
testTableName - the name of a database table with the testing data; if omitted the training data will be used.
prefix - a string which will be pre-pended to the names of all created objects; if omitted the name of the training data table will be used.
validationTableName - the name of a database table with the validation data.
usageDict - a dictionary with optional usages for attributes; to specify usage for an attribute ad an entry to the dictionary with attribute name as the key and UsageOption as the value, e.g.
```
usageDict = { 'att1' : UsageOption.inactive, 'att3' : UsageOption.supplementary }
                    
```
usages - a dictionary with optional usages for attributes; to specify usage for an attribute add an entry to the dictionary with key as the UsageOption and an attribute name or a list of attribute names as the value, e.g.:
```
usages = { UsageOption.inactive : ['att1', 'att2], UsageOption.supplementary : 'att3' }
                    
```
sqlAttributes - a dicionary with sql attributes, with keys corresponding to attribute names and values to sql statements.
categorical - a list of attributes which should be treated as categorical.
numerical - a list of attributes which should be treated as numerical.
repoName - the name of the metadata repository in which the objects will be created; if omitted, the default repository will be used.
scoringLanguage - specifies the programming language in which the scoring code will be created (if applicable).

Example 15.8. Using the approximator command

approximator('iris', 'sepallength', LinearRegressionSettings() )

Classifier

The classifier command creates a set of objects in the repository for building and testing a classification model.

Syntax:

classifier(trainTableName, target, algorithmSettings[, testTableName = s]
[, positive = s][, prefix = s][, validationTableName = s][, usageDict = d][, usages = d]
[, ldTableName = s][, sqlAttributes = d][, categorical = l][, numerical = l][, testQuantiles = n]
[, inactiveAsDefault = 0 | 1][, repoName = s][, scoringLanguage = langname])

Explanation of parameters:

trainTableName - the name of a database table with the training data,
target - the name of the target attribbute in the training and testing datasets,
algorithmSettings - an object with algorithm settings. Valid objects are BivariateProbitSettings, DiscriminantSettings, FeedforwardNeuralNetSettings, KohonenClasificationSettings, LogisticRegressionSettings, TreeSettings.
testTableName - the name of a database table with the testing data; if omitted the training data will be used.
positive - the target category which will be treated as good for the purpose of computing quantiles in classififcation test task; if omitted, quantiles will not be computed and only confusion matrix and target statistics will be computed.
prefix - a string which will be pre-pended to the names of all created objects; if omitted the name of the training data table will be used.
validationTableName - the name of a database table with the validation data.
usageDict - a dictionary with optional usages for attributes; to specify usage for an attribute ad an entry to the dictionary with attribute name as the key and UsageOption as the value, e.g.
```
usageDict = { 'att1' : UsageOption.inactive, 'att3' : UsageOption.supplementary }
                    
```
usages - a dictionary with optional usages for attributes; to specify usage for an attribute add an entry to the dictionary with key as the UsageOption and an attribute name or a list of attribute names as the value, e.g.:
```
usages = { UsageOption.inactive : ['att1', 'att2], UsageOption.supplementary : 'att3' }
                    
```
ldTableName - the table name for constructing logical data.
sqlAttributes - a dicionary with sql attributes, with keys corresponding to attribute names and values to sql statements.
categorical - a list of attributes which should be treated as categorical.
numerical - a list of attributes which should be treated as numerical.
testQuantiles - the number of quantiles used for the computation of Lift and ROC; if omitted 50 quantiles are used.
inactiveAsDefault - if set to 1 will cause all attributes except the target to be inactive unless specified otherwise with usages or usageDict; if set to 0 (default) all attributes are assumed to be active.
repoName - the name of the metadata repository in which the objects will be created; if omitted, the default repository will be used.
scoringLanguage - specifies the programming language in which the scoring code will be created (if applicable).

Example 15.9. Basic usage of the classifier command

classifier('german_credit', 'Class', TreeSettings() )

Example 15.10. More complex usage of the classifier command

classifier('vehicle', 'Class', TreeSettings(minNodeSize=5), 'van' )

Example 15.11. Setting usage options with the classifier command

as = TreeSettings()
as.minNodeSizePrc = 15
classifier('german_credit', 'Class', as, 'van', usages = {UsageOption.inactive: ['credit_amount', 'credit_history']} )

Clusterer

The clusterer command creates a set of objects in the repository for building and testing a clustering model.

Syntax:

clusterer(trainTableName[, algorithmSettings][, prefix = s]
[, validationTableName = s][, usageDict = d][, usages = d][, ldTableName = s][, sqlAttributes = d]
[, categorical = l][, numerical = l][, testQuantiles = n][, repoName = s][, scoringLanguage = langname])

Explanation of parameters:

trainTableName - the name of a database table with the training data,
algorithmSettings - an object with algorithm settings. Valid objects are KohonenClusteringSettings and KMeansSettings; by default no algorithm settings are used.
prefix - a string which will be pre-pended to the names of all created objects; if omitted the name of the training data table will be used.
validationTableName - the name of a database table with the validation data.
usageDict - a dictionary with optional usages for attributes; to specify usage for an attribute ad an entry to the dictionary with attribute name as the key and UsageOption as the value, e.g.
```
usageDict = { 'att1' : UsageOption.inactive, 'att3' : UsageOption.supplementary }
                    
```
usages - a dictionary with optional usages for attributes; to specify usage for an attribute add an entry to the dictionary with key as the UsageOption and an attribute name or a list of attribute names as the value, e.g.:
```
usages = { UsageOption.inactive : ['att1', 'att2], UsageOption.supplementary : 'att3' }
                    
```
ldTableName - the table name for constructing logical data.
sqlAttributes - a dicionary with sql attributes, with keys corresponding to attribute names and values to sql statements.
categorical - a list of attributes which should be treated as categorical.
numerical - a list of attributes which should be treated as numerical.
repoName - the name of the metadata repository in which the objects will be created; if omitted, the default repository will be used.
scoringLanguage - specifies the programming language in which the scoring code will be created (if applicable).

Example 15.12. Basic usage of the clusterer command

clusterer('german_credit')

Example 15.13. More complex usage of the clusterer command

clusterer('iris', KMeansSettings(), usages={UsageOption.inactive : ['Class', 'sepalwidth'] } )

Example 15.14. Setting usage options with the clusterer command

as = TreeSettings()
as.minNodeSizePrc = 15
classifier('german_credit', 'Class', as, 'van', usages = {UsageOption.inactive: ['credit_amount', 'credit_history']} )

Applier

The applier command offers a streamlined way to apply approximation, classification, clustering, time series and survival models.

Syntax:

applier(inputTableName, modelName[, outputTableName = s][, scoreColumnName = s]
[, targetColumnName = 's'][, clusterColumnName = s][, columnsToCopy = l][, positive = s]
[, clusterLevel = n][, clusterLevelFromLeaf = 0 | 1][, deleteObjects = 0 | 1][, customApplyItems = l]
[, prefix = s][, firstTimePoint = n][, lastimePoint = n][, numberOfTimePoints = n][, repoName = s])

Explanation of parameters:

inputTableName - the name of the table to score; this table must contains all the attributes specified in the model signature.
modelName - the name of the model to use for scoring.
outTableName - the name of the output table to create; if omitted the name of the output table will be obtained by suffixing '_scored'' to the name of the input table.
scoreColumnName - the name of the column containing the scores in the output table; if omitted this column will be named 'score'.
targetColumnName - the name of the column with the predicted target values in the output table; if omitted, the name 'target' will be used. (applicable to classification models)
clusterColumnName - the name of the column with the assigned cluster in the output table; if omitted, the name 'cluster' will be used. (applicable to clustering models)
columnsToCopy - a list of columns to copy from the input table to the output table, e.g.":
```
columnsToCopy = ['Class', 'sepallength']
```
if it is necessary to change the name of the copied column in the output table, the new name should be specified in a pair with the original name, e.g.:
```
columnsToCopy = [('Class', 'real_target'), 'sepallength']
```
This will copy the column 'Class' from the input table to the output table renaming it to 'real_target'.
positive - the positive target value to score for; if omitted in the case of applying a statistical model, this value will taken from the model itself. (applicable to calssiffication models)
clusterLevel - denotes the level of a clustering model tree at which the observations will be assigned to a cluster. (applicable to hierarchical clustering models)
clusterLevelFromLeaf - if 0 then clusterLevel is computed from root to leaf, if 1, it is computed from leaf to root. (applicable to hierarchical clustering models)
deleteObjects - if 1, then the objects created in the repository will be deleted after applying the model; if 0 (default), the objects will not be deleted.
customApplyItems - a list of custom output items to add to ApplyOutput; e.g. can be used to obtain the second best predicted target.
repoName - the name of the repository in which the objects will be created.

Example 15.15. Streamlined building and applying of a clustering model

# after executing this script a table named 'german_credit_scored' should be created
clusterer('german_credit')
execute('german_credit_bt')
applier('german_credit_train', 'german_credit_model', clusterLevel=3, clusterLevelFromLeaf=0)

Example 15.16. Streamlined building, testing and applying of an approximation model

# after executing this script a table named 'iris_scored' should be created
approximator('iris', 'sepallength', LinearRegressionSettings() )
execute('iris_bt')
applier('iris_train', 'iris_model' )

Example 15.17. Streamlined building, testing and applying of a classification model with a secondary predicted target

# after executing this script a table named 'iris_scored' should be created and
# an Excel spreadsheet with the contents of this table will open
classifier('iris', 'Class', FeedforwardNeuralNetSettings(maxNumberOfIterations=10))
execute('iris_bt')
applier('iris_train', 'iris_model', customApplyItems=
    [ClassificationRankItem('2nd_target', ClassificationOutputType.predictedCategory, 1)] )
Office.createSpreadsheet().setCellDataArray(0,0, tableRead('iris_scored'))

Prev	Up	Next
Applying Models in AdvancedMiner	Home	Experiments