Chapter 15. AdvancedMiner in Practice

Table of Contents

Model building
General rules
Approximation model building
Classification model building
Clustering model building
Survival model building
Model testing
Approximation Test Task
Classification Test Task
Survival Test Task
Time Series Test Task
Classification Test Result Task
Applying Models in AdvancedMiner
Basic concepts
Advanced concepts
Minimal set-up
Applying for different mining functions
Examples
Shorthand methods of building, testing and applying models
Approximator
Classifier
Clusterer
Applier
Experiments
Experiments project
Running experiments
Comparing models
Dictionary
Social Network Analysis
Building networks
Filtering networks
Analysing networks
Visualising networks

Model building

General rules

This section presents how to build a simple classification tree model. Other types of models are built in a similar way.

Loading data

The first thing to do is to decide what data we want to use to build the model. Make sure you have the default alias properly set before proceeding. To define alias use Services component and Alias dialog which are described in chapter AdvancedMiner Client Graphical User Interface in the Services section.

This example uses the 'iris' data set. First we have to build PhysicalData:

pd = PhysicalData('iris')

and now we can create LogicalData:

ld = LogicalData(pd)

The above procedure can also be performed using the graphical interface:

Right-click on Repository added in the Projects window, choose New ->Physical Data as shown below:

Figure 15.1. New Physical Data Object Creation

New Physical Data Object Creation

A window with the list of all tables will appear, select the one you are interested in ('iris') and click 'Next'.

Figure 15.2. Selecting the table

Selecting the table

A window for providing the name of our physical data object will appear. Enter 'pd' and click 'Next'. The new Physical Data object with the selected name will appear in the Projects window. Logical data can be created in the same way by choosing the New ->Logical Data menu command. A window (part of it is shown below) with the available physical data sets will appear. Select the one you wish to use from the list and click 'Finish'.

Figure 15.3. Selecting physical data for logical data object

Selecting physical data for logical data object

Function settings

The second thing we have to do is to decide what kind of model we want to build and create a suitable FunctionSettings object:

cfs = ClassificationFunctionSettings()

We need to assign a data set to the FunctionSettings object:

cfs.logicalData = ld 

select the attribute that we want to use as the target:

cfs.getAttributeUsageSet().getAttribute('Class').setUsage(UsageOption.target)

and decide which learning algorithm to use:

cfs.algorithmSettings = TreeSettings()

The FunctionSettings object can be also created by using the graphical interface: click right button on the repository in the Projects component. It will activate a context menu; then choose New ->Mining Function from the menu. Choose the settings from the list and then enter a name for the new Function Settings object.

Figure 15.4. Function Settings List

Function Settings List

The new object, which includes attributeUsageSet, will appear in the Projects window. Next we need to add to this object (using its context menu) the corresponding logicalData and algorithmSettings objects:

Figure 15.5. Adding objects to function settings

Adding objects to function settings

After choosing algorithmSettings a window with the list of all available settings will appear and you will be able to select a suitable one (TreeSettings):

Figure 15.6. List of available algorithm settings

List of available algorithm settings

In order to finish the creation of the FunctionSettings object it is necessary to open attributeUsageSet (by double-clicking or right-clicking and choosing Open from the context menu) and selecting one variable as the target:

Figure 15.7. Attribute Usage Selection

Attribute Usage Selection

Some other variables can be selected as obligatory, active, inactive, etc. Active means that the variable can be used as an independent variable in the model. A variable which is set to inactive will not be used in the model. Some models allow to force a variable to be used by setting the usage type to obligatory.

Model building

Now we need to build MiningBuildTask. First, however, it si necessary to save PhysicalData and FunctionSettings in the Metadata Repository. Until now all object were created locally. Objects have to be saved in the repository in order to be seen by AdvancedMiner Server. While creating MiningBuildTask we have to declare PhysicalData and FunctionSettings by names:

save('physical_data', pd)
save('cfs_Tree', cfs)
            

We can now build the task:

mbt = MiningBuildTask('physical_data', 'cfs_Tree', 'model_name' )

We have to save MiningBuildTask to be able to execute it:

save('mbt_Tree', mbt)

and finally we can execute this task:

execute('mbt_Tree')

As the result we get a model called 'model_name'. See the figure below for what objects appear in the repository after executing the code above.

Figure 15.8. Model building result

Model building result

The procedure of model building can also be carried out using the graphical interface: choose the New ->Task item in the context menu and select the type of the task from the list:

Figure 15.9. Tasks List

Tasks List

Click 'Next', enter the name for the created task and click 'Finish'.

The next step is to add to the newly created MiningBuildTask all the required elements (functionSettings, buildData and model):

Figure 15.10. Adding objects to build task

Adding objects to build task

The model name reference is set by entering the model name in the creation wizard window and pressing 'Finish'.

Figure 15.11. Model reference

Model reference

After all the steps aboveare completed the created objects can be tested to see if no problems are found. Save them by pressing the 'Save' button on the main toolbar by choosing File -> Save from the menu.

To execute the created MiningBuildTask press F6 or right-click on the task object and choose 'Execute' from the context menu.

Note

The new model name is the same as the one set in MiningBuildTask, even if the model with the same name existed in the repository before the execution. In such case the following naming convention is used: The newest model is always named as in MiningBuildTask. The oldest model is named 'model_name_1', the second created model is named 'model_name_2' and so on. This rule guarantees that existing models do not change their names.

Approximation model building

The first step of approximation model building is to load the data. Some approximation methods require a special data type or data preprocessing. All the necessary information about the proper preparation of data can be found in the Data requirements section of the chapter describing the appropriate method in the Modules part.

Note

For every approximation method the target attribute has to be numerical.

The second step is to select a classification method. The FunctionSettings object can be created as follows:

afs = ApproximationFunctionSettings()

If we test the created object the following message will appear:

Figure 15.12. Approximation Function Settings - Test Reports

Approximation Function Settings - Test Reports

  • Logical data not provided

    LogicalData should be assigned. It can be done by:

    afs.logicalData = ld

    where 'ld' denotes the LogicalData object created before.

  • Algorithm settings not provided

    The choice of the method in AdvancedMiner is equivalent to the choice of the corresponding algorithm settings. In AdvancedMiner the following approximation methods are available: Linear Regression, Weighted Regression, IRLS, and Neural Networks.

    Table 15.1. Approximation Methods - Algorithm Settings

    MethodAlgorithm Settings Object Name
    Neural Networks

    FeedforwardNeuralNetSettings

    IRLS

    IRLSSettings

    Linear Regression

    RegressionSettings

    Weighted Regression

    WeightedRegressionSettings

    Figure 15.13. Approximation Methods - Algorithm Settings

    Approximation Methods - Algorithm Settings
  • Target attribute not specified

    The target attribute should be specified. It can be done with

    afs.targetAttributeName = 'Target_Attribute_Name'

The last step is the creation and execution of MiningBuidTask. It is described in the Model building section.

Classification model building

The first step of classification model building is to load the data. Some of the classification methods require a special data type or data preprocessing. All the necessary information about the proper data preparation can be found in the section Data requirements in the chapter describing the selected method in the Modules chapter.

Note

For every classification method the target attribute has to be categorical.

The second step is to select a classification method. FunctionSettings object can be created as follows:

cfs = ClassificationFunctionSettings()

If we test the created object the following message will appear:

Figure 15.14. Classification Function Settings - Test Reports

Classification Function Settings - Test Reports
  • Logical data not provided

    LogicalData should be assigned. It can be done by:

    cfs.logicalData = ld

    where 'ld' denotes the LogicalData object created before.

  • Algorithm settings not provided

    The choice of the method in AdvancedMiner is equivalent to the choice of the corresponding algorithm settings. In AdvancedMiner the following classification methods are available: Kohonen Maps, Neural Networks, Classification Trees, Logistic Regression, Bivariate Probit, Discriminant Analysis.

    Table 15.2. Classification Methods - Algorithm Settings

    MethodAlgorithm Settings Object Name
    Bivariate Probit

    BivariateProbitSettings

    Discriminant Analysis

    DiscriminantSettings

    Neural Networks

    FeedforwardNeuralNetSettings

    Kohonen Maps

    KohonenClassificationSettings

    Logistic Regression

    LogisticRegressionSettings

    Classification Trees

    TreeSettings

    Figure 15.15. Classification Methods - Algorithm Settings

    Classification Methods - Algorithm Settings
  • Target attribute not specified

    The target attribute should be specified. It can be done by:

    cfs.targetAttributeName = 'Target_Attribute_Name'

The last step is the creation and execution of MiningBuidTask. It is described in the Model building section.

Clustering model building

The first step of clustering model building is to load the data. All the necessary information about the proper data preparation can be found in the section Data requirements in the Kohonen Maps section.

The second step is to select a clustering method. A FunctionSettings object can be created as follows:

cfs = ClusteringFunctionSettings()

If we test the created object the following message will appear:

Figure 15.16. Clustering Function Settings - Test Reports

Clustering Function Settings - Test Reports
  • Logical data not provided

    LogicalData should be assigned. It can be done by:

    cfs.logicalData = ld

    where 'ld' denotes the LogicalData object created before.

  • Algorithm settings not provided

    The choice of the method in AdvancedMiner is equivalent to the choice of the corresponding algorithm settings. Kohonen Maps and KMeans are clustering methods available in AdvancedMiner. The algorithm settings can be provided by:

    cfs.algorithmSettings = KohonenClusteringSettings()

The last step is the creation and execution of MiningBuidTask. It is described in the Model building section.

Survival model building

In order to build the Cox survival model it is necessary to create MiningBuildTask with the given data and Cox settings. Recall that the Cox model requires additional settings: censor and censoredValue:

Figure 15.17. Cox Survival Model Settings

Cox Survival Model Settings

The censor variable is set in functionSettings->attributeUsageSet and Censored category is set in the functionSettings.

Application of the built model will produce the output data which can be stored in a database or file. The output will contain the predicted survival time and the data used.