Description of nodes

Description of nodes
Prev	Chapter 13. Workflow	Next

Data

CSV File

Definition of csv file which should be imported to AdvancedMiner or definition of a file where the data should be exported.

To set file path and its name choose Edit from the context menu of a node. Using the node user can define flat files with data being separated with specific delimiter characters. The first row in the file can contain headers. In case of importing data from the file, AdvancedMiner can automatically detect column types for further processing.

Xls/xlsx File

Definition of an MS Excel file which should be imported to AdvancedMiner or definition of a file where data should be exported.

To set file name and define how the file should be loaded, from the file context menu choose Edit. After indicating a file path the system will automatically detect all existing sheets and will ask to indicate the sheet with data. The first line can contain column headers. AdvancedMiner will automatically detect column types for further processing. Types suggested by the application can be modified by a user. To change column types click Specify Column in the edit window of the application.

Table

Definition of a table in a database alias which was defined for a Workflow document.

A node symbolizes table which can contain both source data and data being results of processing. A node is automatically created by other nodes if the results of processing are put to a database table. To see the data in the table from a node's context menu select View Data. Apart from data the user can see SQL editor where any SQL statement within a Workflow alias can be executed. To define a table from the node's context menu choose <guimenutem>Edit</guimenutem>.

Data Source

A node defines source of processing - a database table to import/export data from other aliases than the one where Workflow works.

A user can indicate any table from a database alias which was defined in the application. To execute further processing from such source, the data should be transferred to a default database alias of the Workflow. Thus, the subsequent node should be a Database Table. To define a table from the neode's context menu select Edit.

Data Exploration

Analyzer 2D

A node enables the user to generate histograms and a cross table for two selected variables. Data is presented both in tabular and graphical forms. A node requires one predecessor - a Database Table node.

To open Analyzer's window from the context menu of the node choose Open Analyzer 2D. For numerical variables with a large number of values the ranges are created. Analyzed variables should be indicated in the fields X variable and Y variable. Optionally you can indicate a target variable and its target value. After selecting the analyzed variables run the execution (F6).

Click squares in the left part of the screen to see the following statistics:

number of observations - (COUNT),
the percentage of observations in the cell to the entire set of observations (TOTAL%),
percentage of the total number of cases in the cell to the total in the specified column (COL%),
percentage of the total number of cases in the cell to the total in the specified row (ROW%).

If a terget value is indicated then a user can see:

percentage of observations in a cell with the target value of a target variable (TARGET%),
number of distinct values of the dependent variable in the observations of the cell (DISTINCT).

Eagle View

A node for graphical representation of the analyzed table. Each value in the table corresponds to a rectangle whose colour is associated with the value of the observation. A node requires one predecessor - a Database Table node.

To open explorer from the node's context menu choose Eagle View. Hover the mouse pointer over the selected rectangle to display its value. Clicking its value will result in highlighting the cells with the same values. Condition Where under the chart enables the user to limit the data presented in the chart Put there an SQL-like condition. Filter Enables you to select columns presented in a graphical form (at least two columns need to be selected). Column names should be given with a space delimiter.

Freq

A node enables analysis of distribution of all the variables. It produces both histograms and descriptive statistics of variables. A node requires one predecessor - a Database Table node.

To analyse data using Freq, from its context menu select option View Freq. A window with a list of all the variables will appear. To get results click the selected variable and run processing (F6). To generate results for all variables select all of them (Ctrl+A). To generate data to the database table from the context menu of this node select Edit and in the resulting window check option Generate Output Table. The proposed output table name can be modified. Description of Freq functionalities can be found in the chapter describing Freq.

Apart from data exploration Freq node enables the user to create new variables based on the original ones. New variables can be added by adding a SQL attribute. Click one variable in the Freq window and then from the context menu select Add SQL Attribute. Next give name of the new variable and put its definition based on SQL syntax. Finally close the node, click the Database node created from the Freq node and run processing. New variable will appear in the resulting table.

Diagrams

Column Diagram

A node requires one predecessor - a Database Table node. A column diagram is created on the basis on the data in this table. To set the variables displayed on the diagram, from the node's context menu choose Properties.

In the properties window set a variable which determines categories (or timestamp for time series) in the field Categories and values - in the field Values. Every observation (one row) makes up one column in the diagram. Observations belonging to one category are grouped together in the diagram. It is possible to display additional variables on the diagram. Put their names in the fields Additional Values In this case categories should contain timestamps. If values in the category field are not unique, then diagram presents the last observation for every category.

Pie diagram

A node requires one predecessor - a Database Table node. A series of pie diagrams are created on the basis on the data in this table. To set the variables displayed on the diagram, from the node's context menu choose Properties.

In the properties window set a variable which determines categories in the field Categories and values - in the field Values. A series of pie charts is created - one for every observation within one category. A chart shows share of value n-th observation for a category in the sum of n-th values for all categories.

Line diagram

A node requires one predecessor - a Database Table node. A line diagram is created on the basis on the data in this table. To set the variables displayed on the diagram, from the node's context menu choose Properties.

In the properties window in the fields X Axis and Y Axis give names of variables containing coordinates of points which will be connected by lines in the diagram. If a variable is given in the field Series, observations will be divided according to value of this variable. If series variable is not given then selecting new variables in the Additional values will result in displaying subsequent lines connecting new points. For the new points Y coordinate is teken as value of the additional variable.

Point diagram

A node requires one predecessor - a Database Table node. A point diagram is created on the basis on the data in this table. To set the variables displayed on the diagram, from the node's context menu choose Properties.

In the properties window in the fields X Axis and Y Axis give names of variables containing coordinates of points which will displayed in the diagram. In case of indicating variable in the field Series observations will be divided (displayed in diferent colors and shapes) according to value of this variable. If series variable is not given then selecting new variables in the Additional values will result in displaying additional points. For the new points Y coordinate is teken as value of the additional variable.

Bar diagram

A node requires one predecessor - a Database Table node. A bar diagram is created on the basis on the data in this table. To set the variables displayed on the diagram, from the node's context menu choose Properties.

A node generates a diagram analogous to the column diagram. Instead of columns horizontal lines are displayed.

Area Diagram

A node requires one predecessor - a Database Table node. An area diagram is created on the basis on the data in this table. To set the variables displayed on the diagram, from the node's context menu choose Properties.

Area created by the diagram is analogous to the column diagram, with exception that points with the same colour are connected within the same area.

Technical Transformations

Filter Table

Use the node to create a table containing selected fields from the source table. The output table will contain all the rows from original table and selected columns from it. To select fields, from the context menu of the node choose Edit. A node requires one preceding node - a database table.

Fields, which should appear in the output table should be checked in the column Use Column. After selecting the columns the output table is automatically created. Name of the this table is given in the field Table Name The name proposed by the system can be changed. Fileds Select and Filter can be used to select columns, whose names contain the specified string. Put the string in one of the fields and press Enter.

Join Table

Node is used to combine data from two tables. One row of the result table contains data from one row of each source table. Table result set contains all fields from both source tables. A node requires two preceding database nodes.

Source table can be joined in two ways:

Use row id – combines data from the same row number; in this case, the field names should be different in the two source tables,
Use the Key Columns – combines data by values in the specified field (specified fields); in this case the joining fields should be present in both source tables.

Table Sample

Node is used to randomly select a certain number of rows from the source table. A node requires one preceding node - a database table.

To set the node's parameters, from its context menu chose Properties. Parameter Size of sample specifies how many records be drawn. Parameter Grain sets grain the random number generator.

Custom SQL

A node gives you the ability to create and generate result table by using your own SQL query. A node requires one preceding node - a database table.

To invoke the SQL editor, from the node's context menu select Edit. In the Editor window put SQL query based on any table (tables) in the current database alias.

Split Table

A node is used to divide the table into several tables with fewer rows. Each row of the source table will be put to one of the resulting table. A node requires one preceding node - a database table.

To define the way the data should be splitted, from the context menu of the node choose Edit. The following splits are possible:

random distribution - rows will be allocated at random according to a proportion given by a user in field Split. For example for values 3.7 the first table will contain approximately 30% rows of the source table, in the second the remaining 70% of the rows. Parameter Grain sets grain for the random number generator.
distribution of the attribute values - rows will be assigned to the result table according to the value in the field specified by the user. This node creates as many tables as there are different values for the specified field. The field, which was used to define the division, is not included in output sets.
Automatically execute graph to resolve distinct attribute values - checking the option causes results in automatic calculation of the number of resulting datasets if the table exists in the database.

Additional parameters of the node:

Group selecting this option will result in generating one table with the number group to which a record was assigned.
Generate Incremental Names output table names will contain suffix's - consecutive numbers starting with 0
Use the prefix name of the result table will begin with the user-specified prefix.

Union Table

Performs a combination of data from two source tables into a single result set. A node requires to preceding nodes of a database table.

The source tables must have the same structure, which is also a structure of a target table.

Where Condition

A node allows you to define own condition which will be used to select data to the output table. A node requires one preceding node - a database table.

To give the condition, from the context menu choose Edit. Name of a database table with output data is provided in the Properties window accessible from the context menu of the node.

Change Column Types

Node allows you to change the column types. A node requires one preceding node - a database table.

To change the column types, from the node's context menu select Edit. For example, if the field contains an Integer, and is of type String, it is possible to convert it to Integer type (and vice versa). In the resulting table double clique value provided in the New Type column.

Analytical Transformations

Replace Missing

Node is used to replace missing values. A node should have one preceding node - a database table.

To select fields which should be transformed from the context menu choose Edit attribute usage. In the resulting table double clique the field usage. Selecting values transform or auto would mean the transformation should be performed for the field.

To set the way the missing values should be replaced, from the left side of the edit window select the main node. On the right side of the window the relevant parameters appear. These parameters relate to all the columns of the given type. To set parameters for individual fields, on the left side of the window expand customAttributes node and then choose the relevant field. Appropriate parameters appear on the right side of the window.

The following options of replacing mising values are available:

Auto - missing values are replaced with the mean value for the variable numerical and dominant for nominal variables and ordinal
Mean - missing values are replaced with the average value of the variable (only for numeric variables)
Median - the missing values are replaced by median value of a variable
Distributionbased - the missing values are replaced by random values generated from the distribution of variable . For numeric attributes you can specify the percentage of observations about maximum and minimum values that will not be taken into account when generating values. This is done by setting the parameter values: Cut Lower Percent and Cut Upper Percent
Modal - missing values are replaced with the dominant value, the node used for the variable numeric or categorical.
Custom – Missing values are replaced with value given by the user in the field Custom value This field is available in the settings for individual columns.

Standardize

Standardization of variable distribution performs a linear transformation of the variable so that, average value of the variable is 0, a standard deviation equals 1. A node should have one preceding node - a database table.

The transformation is performed for numeric variables. To select fields which should be transformed, from the context menu of the node choose Edit attribute usage. In the resulting table double clique the field usage. Selecting values transform or auto would mean the transformation should be performed for the field.

Outlier Transformations

Transformation is available for numeric variables only. The transformation converts the specified percentage of the lowest (p1) and the highest (p2) variable values corresponding respectively to the quantiles of order p1 and 1-p2. A node should have one preceding node - a database table.

To edit node's parameters, from its context menu select Edit. Parameters LowerPercent and UpperPercent are the order of quantiles used in the transformation. Parameter MinValuesCount is the minimum number of distinct values of the variable required to perform the transformation.

To select fields which should be transformed, from the context menu choose Edit attribute usage. In the resulting table double clique the field usage. Selecting values transform or auto would mean the transformation should be performed for the field.

Discretize

Transformation is available for numeric variables. Creates variables whose values are ranges of the transformed variable. A node should have one preceding node - a database table.

To select fields which should be transformed, from the context menu choose Edit attribute usage. In the resulting table double clique the field usage. Selecting values transform would mean the transformation should be performed for the field.

To set parameters of the node from its context menu choose Edit. Parameter categoryNumber specifies the number of ranges into which the original variable will be divided.

Binarize

Transforms categorical variable into n binary variables, where n is the number of distinct values of a variable. A node should have one preceding node - a database table.

To select fields which should be transformed, from the context menu choose Edit attribute usage. In the resulting table double clique the field usage. Selecting values transform or auto would mean the transformation should be performed for the field.

Transformation leaves out missing values. To set parameters choose Edit from the context menu of the node. If the variable takes a larger number of values than specified by the parameter Max Values Count , then binarization of the variable is not performed.

Normalize

Node is used to normalize vectors. It can be applied if at least two numeric variables are selected for transformation. A node should have one preceding node - a database table.

To select fields which should be transformed, from the context menu choose Edit attribute usage. In the resulting table double clique the field usage. Selecting values transform or auto would mean the transformation should be performed for the field.

To set parameters choose Edit from the context menu of the node. The values of the variables in one record are treated coordinates of vector which begins at the origin of the coordinate system. The values of these variables divided by the length of the vector, which is calculated using a measure given by the parameter Norm Type. The following distance measures are possible: Euclidean, Manhattan, Maximum, Minimum, zero.

PCA

A node performs principal component analysis and generates fewer number of uncorrelated (orthogonal) transformed variables. The transformation is performed for numeric variables only. A node should have one preceding node - a database table.

To select fields which should be transformed, from the context menu choose Edit attribute usage. In the resulting table double clique the field usage. Selecting values transform or auto would mean the transformation should be performed for the field.

To set parameters choose Edit from the context menu of the node. Parameter Criterium can take two values: kaiser and number. If the first of these two values is selected then algorithm selects the optimal number of outcome variables. If the value number is selected then the user can specify the number of output variables. This number is the value of the parameter ComponentsNumber. If value ComponentsNumber is equal to 0, then the algorithm generates the maximum number of principal components.

Rescale Transformation

A node performs linear transformation of the variable. The resulting variable takes on values from a specified range. The transformation is performed for numeric variables only. A node should have one preceding node - a database table.

To select fields which should be transformed, from the context menu choose Edit attribute usage. In the resulting table double clique the field usage. Selecting values transform or auto would mean the transformation should be performed for the field.

To set parameters of the node choose Edit from the context menu of the node. Lower and upper limit of values after transformation are determined by the parameters: Max Value and Min Value.

Weight of Evidence

A node determines the value of the weight of evidence. Transformation are available for numerical and categorical variables. A node should have one preceding node - a database table.

To select fields which should be transformed, from the context menu choose Edit attribute usage. In the resulting table double clique the field usage. Selecting values transform or auto would mean the transformation should be performed for the field.

To set parameters choose Edit from the context menu of the node. If the variable takes on more values than specified by the parameter Max Values Count, the variable transformation is not performed. Parameter Target Attribute specifies the name of the dependent variable (must be a categorical variable). Parameter PositiveTarget Valule is the target value of dependent variable.

Feature Selection

A node performs selection of variables which significantly explain variability of a target variable being a categorical variable. It requires two predecessors: Database Table and Attribute Usage node.

Edit a node to access a scoreEvaluators object which calculates statistics for selected variables. A node performs selection for nominal and continuous variables. In case of continuous variables a node usually discretizes a variable.

Instructions for use

To select an algorithm edit the node (Edit option in the context menu of the node). Next click scoreEvaluaters element to the left side of the editor window. From its context menu choose Add element. As the result a window with available algorithms is dispalyed. selecting one element will result in adding the allgorithm to the current list of algorithms. To see the current list of algorithms expand element scoreEvaluators. Click the algorithm name to see its current parameters.

The following algorithms are available:

InformationGainFeatureEvaluator Selection method based on an Information Gain measure which assesses usefulness of a given variable X to predict variable Y. It is calculated in the following way:
where H is a measure of entropy or a Gini index: If H is entropy measure, then it is defined using the following formula:
If H is replaced with Gini index, then:
Filter1RFeatureEvaluator Selection method of calculating the relevance for a nominal variable. It is based on a contingency table describing the distribution of X variable for a certain value of target binary variable Y. If variable Y takes two values: 1 and 0, then the relevance of X variable is calculated in the following way:
where value from a contingency table - number of observations, which for a variable X take k-th value and for a variable Y take value 0, value from a contingency table - number of observations, which for a variable X take k-th value and for variable Y take value 1. In case of uneven distribution of values 0 and 1, the contingency table is modified so values are replaced with where
and number of cases for which X variable takes i-th value. If some values of variable X are represented by few cases, the rows in the contingency table are joined to get more numerous groups of cases.
NullCountFeatureEvaluator - calculates statistics proportional to the number of cases with missing values. Statistics equals 1 if there are no missing values and 0 if there are missing values only.
SmartTreeFeatureEvaluator - algorithm creates random forests to predict target variable. Algorithm uses two parameters:
- impurity decrease importance - determines how much (on average) a variable decreases entropy during the process of building a tree. The parameter informs if the statistics should be calculated.
- Permutation importance - determines how much random swap of values of a variable changes prediction error. The parameter informs if the statistics should be calculated.
StatisticalTestEvaluator - an application of the a statistical test to check connections of a target variable with independent variable. The following statistical tests are available:
- Student's T – evaluates whether there is a significant difference between the means of two populations. The test is used for numerical variables only. Detail description of student's t-test can be found in documentation in the section Student's t-test.
- Student's T Paired – evaluates whether there is a significant difference between the means of two populations if every observation in the first sample corresponds to a matching observation in the other sample. The test is used for numerical variables only. Detail description of student's t-test can be found in documentation in the section Student's t-test.
- Pearson - calculates a Pearson correlation coefficient between the target variable and an independent variable. The test is used for numerical variables only.
- Crammer's V – calculates values of normalized ch-square test. The test is used for nominal variables only.
- F Test - evaluates how close are standard deviations of a variable and a target variable. Test is used for numerical variables. Detail description of student's t-test can be found in documentation in the section test F
- Chi squared - test of independence a variable and a target variable. Continuous variables are discretized.
Tree1DFeatureEvaluator - Algorithm builds decision trees based on one variable. The statistics informs what is the quality of a tree based on a given variable. The algorithm uses the following parameters:
- Max Tree Depth - maximal depth of a tree
- Min Leaf Size - minimal number of observations in a leaf node
- Min Leaf Size Prc - minimal percentage of observations in a leaf
- Null Test Allowed - if a division which tests whether missing values are in the data is allowed
- Permutation importance - informs how much a random change of a variable influences the the value of an error. The parameter defines if the statistics should be calculated.
- Trees No. Per Attribute - number of trees per each variable

The above algorithms use common parameters:

Recalculate evaluation – parameter defining if the algorithm should be run. Unchecking the parameter means the values of the last run will be used. This allows the user to add results of recently added algorithms to previous results.
Selected – enables the user to calculate many statistics by several algorithms. Further processing will be performed only for variables which are selected. If this option is checked for more than one algorithm, variable evaluations generated by each algorithm are rescaled to the range from 0 to 1. Then, arithmetic mean of the rescaled values is calculated for every variable. The algorithm select variables with the highest score of average result.

Modeling

Logistic Regression

A node is used to calculate a logistic regression model as described in the documentation in section Logistic regression. A node requires the following preceding nodes: one or two Database Tables and optionally attribute usage.

If one preceding node is used then data in this table is a training dataset. In case of using two database tables one of then is a validation dataset. To set the roles of the database table edit values in the property window of the node. If the preceding node is not an attribute usage node then to set the roles of the variables in the model edit the node and double click the value in the attribute usage node on the left side of the window. To change the roles double click value in the column usage.

To set node's parameters choose Edit from the node's context menu. Then from the node's tree on the left side choose algorithmSettings node. A node requires setting value of Positive target category The results of the calculation of the regression model are available when you select View Model in the context menu of the node.

Scoring Card

A node to scorecard calculation as described in the documentation section Scoring card. Node requires two preceding nodes: Logistic Regression and Database Tables.

The results of the calculation of cards are available in the context menu of the node when you select: View model.

Attribute Usage

A node for setting roles of variables in the model. Node requires one node preceding - the database table.

Variable role for the model is set in the field usage. Statistical type of variable can be set in the field attributeType. Node requires one preceding node - the database table.

Decision Tree

A node is used for creating the classification tree model as described in the documentation in section Decision Trees. A node requires the following preceding nodes: one or two Database Tables and optionally Attribute Usage.

If one preceding node is used then data in this table is a training dataset. In case of using two database tables one of them is a validation dataset. To set the roles of the database table edit values in the property window of the node. If the preceding node is not an attribute usage node then to set the roles of the variables in the model, edit the node and double click the value in the attributeUsageSet on the left side of the window. To change the roles of variables double click value in the column usage.

The results of the calculation of the regression model can be viewed when you select View Model from the context menu of the node.

Kohonen Clustering

A node performs the task of building a network Kohenena as described in the documentation section Kohonen Network. A node requires the following preceding nodes: one or two Database Tables and optionally Attribute Usage.

If one preceding node is used then data in this table is a training dataset. In case of using two database tables one of them is a validation dataset. To set the roles of the database table edit values in the property window of the node. If the preceding node is not an attribute usage node then to set the roles of the variables in the model edit the node and double click the value in the attributeUsageSet on the left side of the window. To change the roles double click value in the column usage.

The results of the calculation of the regression model can be seen when you select View Model from the context menu of the node.

Linear Regression

A node performs the task of building a linear regression model as described in the documentation in section Linear regression . A node requires the following preceding nodes: one database Table and optionally attribute usage.

If the preceding node is not an attribute usage node then to set the roles of the variables in the model edit the node and double click the value in the attribute usage node on the left side of the window. To change the roles of variables double click value in the column usage.

The results of the calculation of the regression model can be seen when you select View Model from the context menu of the node.

Smart Trees

A node performs the task of calculating the random forest for a given data as described in the section of documentation Smart Trees . It requires prior nodes: Database Table and Attribute Usage. The results are available in the context menu when you select Display .

If the preceding node is not an attribute usage node then to set the roles of the variables in the model, edit the node and double click the value in the attribute usage node on the left side of the window. To change the roles double click value in the column usage.

The results of the calculation of the regression model can be seen when you select View Model from the context menu of the node.

Regularized Linear Regression

A node performs the task of building regularized linear regression model as described in the section of documentation Regularized Linear Regression . It requires two preceding nodes: Database Table and Attribute Usage.

If the preceding node is not an attribute usage node then to set the roles of the variables in the model edit the node and double click the value in the attributeUsageSet on the left side of the window. To change the roles of variables double click value in the column usage.

The results of the calculation of the regression model can be seen when you select View from the context menu of the node.

Regularized Logistic Regression

A node performs the task of building regularised logistic regression model. It requires two preceding nodes: database table and Uses attributes (optionally).

If one preceding node is used then data in this table is a training dataset. In case of using two database tables, one of them is a validation dataset. To set the roles of the database table edit values in the property window of the node. If the preceding node is not an attribute usage node then to set the roles of the variables in the model edit the node and double click the value in the attribute usage node on the left side of the window. To change the roles double click value in the column usage.

To set node's parameters choose Edit from the node's context menu. Then from the node's tree on the left side chose algorithmSettings node. A node requires setting value of Positive target category The results of the calculation of the regression model can be seen when you select View Model from the context menu of the node.

Automatic Business Modeler

A node builds model for a classification task. It requires two preceding nodes: a Database Table and Attribute Usage.

A node requires setting positive target value of the target variable. To set it, from the node's context menu select Properties and put this value in the field Positive Target Value. The node creates a model, for which it is not possible to do scoring nor to generate the scoring code.

SNA

Build Network

A node used to construct a graph as a representation of the social network based on input data. It requires one preceding node – a Database Table.

Description of how to build a network is in the documentation in the Chapter Build a network. To set parameters of building the node from the context menu of the node select Edit. On the left side of the resulting window click the first (main) node of the objects. On the right side of the window a list of parameters appears.

Network Analysis

Network Analysis A node is used to define and run proper network algorithms. A node requires two predecessors: Build Network node and a Database Table with additional data used by algorithms.

In this table, one of the columns should contain the identifiers of vertices. It should be indicated the properties of the node in the Vertex_id . Following algorithms are available:

Aggregator
CommunityAggregator
Community Statistics
Equivalence
HITS
Local Equivalence
Modularity
Page Rank
RoleFinder
SizeCommunityFinder
SpreadingActivation
LouvainCommunityFinder
Triads.

The calculation of results of all algorithms are written to a database table.

instructions for use

To select the algorithm edit the node. Next click 'algorithms' element on the left side of the window and from the context menu choose Add element. As a result a window with the list of algorithms appears. Choose one of them. To view a list of selected algorithms expand algorithms element.

Filter Network

A node is used to filter out certain nodes to get a subnetwork. It requires two preceding nodes - one for database tables and the second type of Building the Network.

To set filter conditions from the context menu of the node choose filter. On the left side of the resulting window select the main node (first item). On the right side of the window a list of parameters appears. One of them is field Filter. Put there a condition in the form similar to sql syntax. This condition should contain at least one field from the node's parent table. The condition cannot contain a field which is node's identifier. A node requires indicating which of the fields in the preceding table contains node's identifier. To set this parameter in the edit window of the node expand nodes attributeUsageSet and attributes. Click a field containing node's identifier and on the right side of the window select Usage as NODE_ID Description of functionality filtering network is in the documentation in Chapter Filter network.

Network Visualization

Node is used for network visualization. It requires one preceding node - the type the Filter Network or Network Analysis.

To manage network visualisation open Navigator window (Ctrl+7).

To visualize the results of running network algorithms, a user has to add a new table with results of the calculations. To do it, from the context menu of the node select Edit and an the left side of the window click attributes. Next from the context menu of this object choose Add. In the resulting window select database alias, table and a field containing node's identifier.

Results

Logistic Regression Report

A node is used to export the results of a logistic regression model to MSExcel file. It requires one preceding node- Logistic Regression model. The results of calculations are available in the context menu of the node after selecting Show Report.

To define a name and a path for the file, from the node's context menu choose Edit.

Scoring Code

Node generates a scoring code based on the model. Node requires one preceding node - the node model.

To set the scoring code language, from the node's context menu select Edit. On the left side of the resulting window click the first, main node. Then on the right side of the edit window set of parameters appear. Parameter Code Language specifies the language in which a scoring code will be generated. To see the code, chose from the context menu Show Scoring Code.

Scoring

A Node is used to generate scoring based on the data in the database table. A node requires two preceding nodes: a model and a database table.

To score the data it is necessary to define the way the scoring should be generated. First, join the node with a database table and model nodes. The output database table will by automatically created. To change the output table name choose 'Properties' option from the context menu of the node. The resulting database table is given in the Target Data entry. To define the output data choose option View Mining Apply Task from the context menu of the node.

In the Direct Mapping part a user can choose fields from the scored dataset (Source Data) which should appear in the output dataset (Target data) if a selection should be made. If no data is entered here, the resulting dataset will contain all the fields from the original database table. To add a field, click 'Add element' button. As the result, first field from the scored dataset will appear. By clicking this filed in the Source Data part we get a list o fields from which we can choose the appropriate one. In the Target Data part we select the name of the field for the output data. If the field should be with the same name in the output table, then the same value should be in the Source Data and the Target Data.

In the New Columns part user can generate type of scoring data. Click 'Add Element' button. Next, select the category of the scoring data. The available options are described below. For the classification models:

ClassificationCategoryItem - for generating probability of target value
ClassificationRankItem - for generating the most probable value
LogisticRegressionOutputItem - specific output for logistic regression model

For approximation models:

ApproximationOutputItem
LinearRegressionOutputItem

For Kohonen Grouping models:

ClusterIdItem
ClusteringRankItem

In the next step a user should define field in the output table with the scoring data. In the 'Choose a list of arguments' you should select values other than default. Depending on the type of model and type of scoring the following parameters should be set:

destinationName - name of the field in the output table
outputType - type of scoring, e.g. nodeId (tree node id), predictedCategory - predicted categorical value,
category - a value which relates to the type scoring, e.g. if the Output Type is chosen as probability, then scoring would contain probability of value given in this field,
topN'thIndex - if all categories are ordered from the most to the least probable, then value in this field is n subsequent category. Value 0 would mean the most probable value,
clusterId - number of cluster for Kohonen Clustering algorithm

Approximation Test Results

A node is used to share results of the tasks of approximation. Node requires two preceding nodes: approximation model and a Database Table containing the data to test the model.

A node requires indicating target variable in the test data. To do so, from the context menu of the node select Edit. On the left side of the window click the first (main) node. Then, on the right side of the window a list of parameters appears. In the field Target put a name of the target variable in the test dataset.

T o see the results of testing, from the the context menu of the node choose View.

Data Test Results

Use the node to generate reports on quality of prediction for classification models based on data containing both predicted probability of target and the real target data. It requires one preceding node - a database table.

The input table should contain certain variables that allow to assess the prediction quality. To set roles of the variables, from the context menu of the node select Edit. Then on the left side of the window select first (main) node. On the right side of the window a set of parameters appears. You need to provide Positive Target Value. Next indicate variables which contain certain types of data:

Target value generated from scoring (parameter Predicted Target),
probabilty of target value (parameter Score),
real data of a target variable - response data (parameter Target),
weight of observations (optionally) (parameter Weight).

The results are available in the context menu when you select Display.

Compare Approximation Results

A node is used to compare the results of the two approximation models. A node requires two preceding nodes of the type Test Results Approximation.

The results of the calculation are available in the context menu when you select Compare Histograms.

Classification Test Results

A node is used to test the data mining classification model. It requires two preceding nodes - one for a classification model, and the second one for a database table containing with test data set.

A node requires setting a target variable and a positive target value. To set these values, from the context menu of the node select Edit. Then in the edit window on the left side click the first (main) node of objects. On the right side of the window a list of parameters appears. Provide value in the field Positive Target Value and select Target variable. Results are available in the context menu of the node after selecting View.

Compare Classification Results

A node is used to compare the quality of classification model. It requires at least two preceding nodes of Classification Test Results.

A node generates results in the form of graphs and MSExcel spreadsheet. To give a path and a name of the file soelect Edit from the context menu of the node. To view charts which compare models, from the context menu of the node select Compare Charts.

Gython

Other

A node is used to execute any Gython script in AdvancedMiner.

To write script code, from the node's context menu choose Edit

Other

User Note

A node gives you the ability to add user comments to the workflow schema.

To add a comment to a schema, from the context menu of the node select Edit. Comment text will appear under the node. If you check option Add Note to Script a comment will appear in the generated the script.