AdvancedMiner provides building classification and approximation models by ABM. ABM is a tool for automatic construction and updating of predictive models. It provides full automation of essential, yet time-consuming activities in model construction, such as fast variable selection, variable interaction modelling, and variable transformations or best model selection.
ABM is available in 4 working modes (3 modes for classification models and 1 for approximation model):
Quick - enables obtaining an accurate model in a relatively short time
Advanced - uses more advanced methods for feature selection and data preparation
Gold - provides a more in-depth search through possible predictive modelling paths, therefore requiring more time for the modelling process
Approximation - builds an approximation model
The abm.backend.api.abmQuick command starts ABM Quick mode (Advanced, Gold analogously)
Syntax:
abm.backend.api.abmQuick(tableName, target, targetCategory, inactiveVariables[,
qualityMeasureName = s][, cutoff = n][, samplingMode = s][, samplingSize = n][, samplingStratificationMode = s][, samplingPositiveTargetCategoryRatio = n][,
classificationThreshold = n][, prefix = s][, mrName = s][, abAliasName = s])
Explanation of parameters:
tableName - the name of a database table
target - the name of the target variable
targetCategory - predicted target category
inactiveVariables - variables which should be inactive, for example:
inactiveVariables = ['age', 'duration']
qualityMeasureName - the name of quality measure for choosing the best model (LIFT, CAPTURED_RESPONSE, PRECISION, RECALL, ACCURACY), default: LIFT
cutoff - the data percentile chose to optimize the quality measure (concerning LIFT i CAPTURED_RESPONSE), default: 0.1
samplingMode - the mode of sampling, default: MANUAL
samplingSize - the size of sampling, default: 30000
samplingStratificationMode - the mode of stratification sampling (NONE, CONST_NUM, CONST_RATIO, OVERSAMPLING), default: CONST_NUM
samplingPositiveTargetCategoryRatio - the proportion of positive target after stratification sampling, default: 0.5
classificationThreshold - classification threshold, default 0.5
prefix - a string which will be pre-pended to the names of all created objects
mrName - the name of the metadata repository in which the objects will be created; if omitted, the default repository will be used
abAliasName - the name of the alias in which the objects will be created; if omitted, the default alias will be used
The abm.backend.api.abmApproximation command starts ABM Approximation mode
Syntax:
abm.backend.api.abmApproximation(tableName, target, inactiveVariables[,
qualityMeasureName = s][, samplingMode = s][, samplingSize = n][, prefix = s][, mrName = s][, abAliasName = s])
Explanation of parameters:
tableName - the name of a database table
target - the name of the target variable
inactiveVariables - variables which should be inactive, for example:
inactiveVariables = ['age', 'duration']
qualityMeasureName - the name of quality measure for choosing the best model (MAE, MAPE, RSME, R_SQUARED), default: LIFT Measures: MAE - Mean Absolute Error, MAPE - Mean Absolute Percentage Error, RMSE - Root Mean Squared Error, R_SQUARED - coefficient of determination, denoted R-squared.
samplingMode - the mode of sampling, default: MANUAL
samplingSize - the size of sampling, default: 30000
prefix - a string which will be pre-pended to the names of all created objects
mrName - the name of the metadata repository in which the objects will be created; if omitted, the default repository will be used
abAliasName - the name of the alias in which the objects will be created; if omitted, the default alias will be used
The abm.backend.api.exportAbmStatistics command exports statistics to Excel
Syntax:
abm.backend.api.exportAbmStatistics(abmSettingsName[,
mrName = s][, path = s][, fileName = s])
Explanation of parameters:
abmSettingsName - the name of the object with algorithm settings
mrName - the name of the metadata repository in which the objects will be created; if omitted, the default repository will be used
path - path of the Excel file
fileName - the name of the Excel file
The abm.backend.api.score command scores the data by chosen model
Syntax:
abm.backend.api.score(mrName,settingsName, inputTableName, outputTableName,
aliasName, copyColumnList[, classificationThreshold = n])
Explanation of parameters:
mrName - the name of metadata repository
settingsName - the name of the object with algorithm settings
inputTableName - the name of the input table
outputTableName - the name of the output table
aliasName - the name of the alias
copyColumnList - the list of columns that should be copied into the output table
classificationThreshold - the threshold of classification
The abm.backend.api.calculateTestResults command calculates model statistics in AdvancedMiner (only for classification models)
Syntax:
abm.backend.api.calculateTestResults(abmSettingsName, tableNames[, prefix = s][,
mrName = s][, abAliasName = s)])
Explanation of parameters:
abmSettingsName - the name of the object with algorithm settings
tableNames - new database tables for calculating statistics If omitted, statistics for the input table are calculated.
prefix - a string which will be pre-pended to the names of all created objects
mrName - the name of the metadata repository in which the objects will be created; if omitted, the default repository will be used
abAliasName - the name of the alias in which the objects will be created; if omitted, the default alias will be used
Example 3.18. Creating model, export of statistics to Excel, data scoring and calculating statistics in AdvancedMiner
# Table name: tableName = 'german_credit' # Target name: target = 'Class' # Predicted target category: targetCategory = 'bad' # Variables that should be inactive: inactiveVariables = ['age', 'duration'] # The name of quality measure (LIFT, CAPTURED_RESPONSE, PRECISION, RECALL, ACCURACY): qualityMeasureName = 'LIFT' # The data percentile chose to optimize the quality measure: cutoff = 0.1 # Sampling mode: samplingMode = 'MANUAL' # Sampling size: samplingSize = 30000 # The mode of stratification sampling: samplingStratificationMode = 'CONST_NUM' # The proportion of positive target after stratification sampling: samplingPositiveTargetCategoryRatio = 0.5 # Classification threshold: classificationThreshold = 0.5 # Prefix of the names of all created objects prefix = 'quick' # Metadata repository name: mrName = mrRegistry().defaultRepository # Alias: abAliasName = dbAliasRegistry().getDefaultAliasName() # Import of necessary functions: import abm # ABM Quick launching (analogously: abmAdvanced i abmGold): abm.backend.api.abmQuick(tableName, target, targetCategory, inactiveVariables, qualityMeasureName, cutoff, samplingMode, samplingSize, samplingStratificationMode, samplingPositiveTargetCategoryRatio, classificationThreshold, prefix, mrName, abAliasName) # Export of statistics to Excel: abm.backend.api.exportAbmStatistics(prefix+'abmSettings', path = 'C:\some_data', fileName = 'stats') # Data scoring: abm.backend.api.score(mrName,prefix+'abmSettings', 'german_credit', 'german_credit_score', abAliasName, ['Class']) # Calculating statistics for new databases: abm.backend.api.calculateTestResults('quickabmSettings', ['german_credit', 'german_credit'], 'scored_stats')