Building models by ABM

Building models by ABM
Prev	Chapter 3. AdvancedMiner in Practice	Next

AdvancedMiner provides building classification and approximation models by ABM. ABM is a tool for automatic construction and updating of predictive models. It provides full automation of essential, yet time-consuming activities in model construction, such as fast variable selection, variable interaction modelling, and variable transformations or best model selection.

ABM is available in 4 working modes (3 modes for classification models and 1 for approximation model):

Quick - enables obtaining an accurate model in a relatively short time
Advanced - uses more advanced methods for feature selection and data preparation
Gold - provides a more in-depth search through possible predictive modelling paths, therefore requiring more time for the modelling process
Approximation - builds an approximation model

The abm.backend.api.abmQuick command starts ABM Quick mode (Advanced, Gold analogously)

Syntax:

abm.backend.api.abmQuick(tableName, target, targetCategory, inactiveVariables[, 
	qualityMeasureName = s][, cutoff = n][, samplingMode = s][, samplingSize = n][, samplingStratificationMode = s][, samplingPositiveTargetCategoryRatio = n][, 
	classificationThreshold = n][, prefix = s][, mrName = s][, abAliasName = s])

Explanation of parameters:

tableName - the name of a database table
target - the name of the target variable
targetCategory - predicted target category
inactiveVariables - variables which should be inactive, for example:
```
inactiveVariables = ['age', 'duration']
```
qualityMeasureName - the name of quality measure for choosing the best model (LIFT, CAPTURED_RESPONSE, PRECISION, RECALL, ACCURACY), default: LIFT
cutoff - the data percentile chose to optimize the quality measure (concerning LIFT i CAPTURED_RESPONSE), default: 0.1
samplingMode - the mode of sampling, default: MANUAL
samplingSize - the size of sampling, default: 30000
samplingStratificationMode - the mode of stratification sampling (NONE, CONST_NUM, CONST_RATIO, OVERSAMPLING), default: CONST_NUM
samplingPositiveTargetCategoryRatio - the proportion of positive target after stratification sampling, default: 0.5
classificationThreshold - classification threshold, default 0.5
prefix - a string which will be pre-pended to the names of all created objects
mrName - the name of the metadata repository in which the objects will be created; if omitted, the default repository will be used
abAliasName - the name of the alias in which the objects will be created; if omitted, the default alias will be used

The abm.backend.api.abmApproximation command starts ABM Approximation mode

Syntax:

abm.backend.api.abmApproximation(tableName, target, inactiveVariables[, 
	qualityMeasureName = s][, samplingMode = s][, samplingSize = n][, prefix = s][, mrName = s][, abAliasName = s])

Explanation of parameters:

tableName - the name of a database table
target - the name of the target variable
inactiveVariables - variables which should be inactive, for example:
```
inactiveVariables = ['age', 'duration']
```
qualityMeasureName - the name of quality measure for choosing the best model (MAE, MAPE, RSME, R_SQUARED), default: LIFT Measures: MAE - Mean Absolute Error, MAPE - Mean Absolute Percentage Error, RMSE - Root Mean Squared Error, R_SQUARED - coefficient of determination, denoted R-squared.
samplingMode - the mode of sampling, default: MANUAL
samplingSize - the size of sampling, default: 30000
prefix - a string which will be pre-pended to the names of all created objects
mrName - the name of the metadata repository in which the objects will be created; if omitted, the default repository will be used
abAliasName - the name of the alias in which the objects will be created; if omitted, the default alias will be used

The abm.backend.api.exportAbmStatistics command exports statistics to Excel

Syntax:

abm.backend.api.exportAbmStatistics(abmSettingsName[, 
	mrName = s][, path = s][, fileName = s])

Explanation of parameters:

abmSettingsName - the name of the object with algorithm settings
mrName - the name of the metadata repository in which the objects will be created; if omitted, the default repository will be used
path - path of the Excel file
fileName - the name of the Excel file

The abm.backend.api.score command scores the data by chosen model

Syntax:

abm.backend.api.score(mrName,settingsName, inputTableName, outputTableName, 
	aliasName, copyColumnList[, classificationThreshold = n])

Explanation of parameters:

mrName - the name of metadata repository
settingsName - the name of the object with algorithm settings
inputTableName - the name of the input table
outputTableName - the name of the output table
aliasName - the name of the alias
copyColumnList - the list of columns that should be copied into the output table
classificationThreshold - the threshold of classification

The abm.backend.api.calculateTestResults command calculates model statistics in AdvancedMiner (only for classification models)

Syntax:

abm.backend.api.calculateTestResults(abmSettingsName, tableNames[, prefix = s][,
	mrName = s][, abAliasName = s)])

Explanation of parameters:

abmSettingsName - the name of the object with algorithm settings
tableNames - new database tables for calculating statistics If omitted, statistics for the input table are calculated.
prefix - a string which will be pre-pended to the names of all created objects
mrName - the name of the metadata repository in which the objects will be created; if omitted, the default repository will be used
abAliasName - the name of the alias in which the objects will be created; if omitted, the default alias will be used

Example 3.18. Creating model, export of statistics to Excel, data scoring and calculating statistics in AdvancedMiner

# Table name: 
tableName = 'german_credit'

# Target name:
target = 'Class'

# Predicted target category:
targetCategory = 'bad'

# Variables that should be inactive:
inactiveVariables = ['age', 'duration']

# The name of quality measure (LIFT, CAPTURED_RESPONSE, PRECISION, RECALL, ACCURACY):
qualityMeasureName = 'LIFT'

# The data percentile chose to optimize the quality measure:
cutoff = 0.1

# Sampling mode:
samplingMode = 'MANUAL'

# Sampling size:
samplingSize = 30000

# The mode of stratification sampling:
samplingStratificationMode = 'CONST_NUM'

# The proportion of positive target after stratification sampling:
samplingPositiveTargetCategoryRatio = 0.5

# Classification threshold:
classificationThreshold = 0.5

# Prefix of the names of all created objects
prefix = 'quick'

# Metadata repository name:
mrName = mrRegistry().defaultRepository

# Alias:
abAliasName = dbAliasRegistry().getDefaultAliasName()

# Import of necessary functions:
import abm

# ABM Quick launching (analogously: abmAdvanced i abmGold):
abm.backend.api.abmQuick(tableName, target, targetCategory, inactiveVariables, qualityMeasureName, cutoff, samplingMode, samplingSize, samplingStratificationMode, samplingPositiveTargetCategoryRatio, classificationThreshold, prefix, mrName, abAliasName)

# Export of statistics to Excel:
abm.backend.api.exportAbmStatistics(prefix+'abmSettings', path = 'C:\some_data', fileName = 'stats')

# Data scoring:
abm.backend.api.score(mrName,prefix+'abmSettings', 'german_credit', 'german_credit_score', abAliasName, ['Class'])

# Calculating statistics for new databases:
abm.backend.api.calculateTestResults('quickabmSettings', ['german_credit', 'german_credit'], 'scored_stats')

Prev	Up	Next
Social Network Analysis	Home	Chapter 4. Data Access and Data Processing