Definition of csv file which should be imported to AdvancedMiner or definition of a file where the data should be exported.
To set file path and its name choose from the context menu of a node. Using the node user can define flat files with data being separated with specific delimiter characters. The first row in the file can contain headers. In case of importing data from the file, AdvancedMiner can automatically detect column types for further processing.
Definition of an MS Excel file which should be imported to AdvancedMiner or definition of a file where data should be exported.
To set file name and define how the file should be loaded, from the file context menu choose . After indicating a file path the system will automatically detect all existing sheets and will ask to indicate the sheet with data. The first line can contain column headers. AdvancedMiner will automatically detect column types for further processing. Types suggested by the application can be modified by a user. To change column types click Specify Column in the edit window of the application.
A node symbolizes table which can contain both source data and data being results of processing. A node is automatically created by other nodes if the results of processing are put to a database table. To see the data in the table from a node's context menu select View Data. Apart from data the user can see SQL editor where any SQL statement within a Workflow alias can be executed. To define a table from the node's context menu choose <guimenutem>Edit</guimenutem>.
A node defines source of processing - a database table to import/export data from other aliases than the one where Workflow works.
A user can indicate any table from a database alias which was defined in the application. To execute further processing from such source, the data should be transferred to a default database alias of the Workflow. Thus, the subsequent node should be a Database Table. To define a table from the neode's context menu select .
A node enables the user to generate histograms and a cross table for two selected variables. Data is presented both in tabular and graphical forms. A node requires one predecessor - a Database Table node.
To open Analyzer's window from the context menu of the node choose . For numerical variables with a large number of values the ranges are created. Analyzed variables should be indicated in the fields X variable and Y variable. Optionally you can indicate a target variable and its target value. After selecting the analyzed variables run the execution (F6).
Click squares in the left part of the screen to see the following statistics:
If a terget value is indicated then a user can see:
A node for graphical representation of the analyzed table. Each value in the table corresponds to a rectangle whose colour is associated with the value of the observation. A node requires one predecessor - a Database Table node.
To open explorer from the node's context menu choose . Hover the mouse pointer over the selected rectangle to display its value. Clicking its value will result in highlighting the cells with the same values. Condition Where under the chart enables the user to limit the data presented in the chart Put there an SQL-like condition. Filter Enables you to select columns presented in a graphical form (at least two columns need to be selected). Column names should be given with a space delimiter.
A node enables analysis of distribution of all the variables. It produces both histograms and descriptive statistics of variables. A node requires one predecessor - a Database Table node.
To analyse data using Freq, from its context menu select option . A window with a list of all the variables will appear. To get results click the selected variable and run processing (F6). To generate results for all variables select all of them (Ctrl+A). To generate data to the database table from the context menu of this node select and in the resulting window check option Generate Output Table. The proposed output table name can be modified. Description of Freq functionalities can be found in the chapter describing Freq.
Apart from data exploration Freq node enables the user to create new variables based on the original ones. New variables can be added by adding a SQL attribute. Click one variable in the Freq window and then from the context menu select Add SQL Attribute. Next give name of the new variable and put its definition based on SQL syntax. Finally close the node, click the Database node created from the Freq node and run processing. New variable will appear in the resulting table.
A node requires one predecessor - a Database Table node. A column diagram is created on the basis on the data in this table. To set the variables displayed on the diagram, from the node's context menu choose .
In the properties window set a variable which determines categories (or timestamp for time series) in the field Categories and values - in the field Values. Every observation (one row) makes up one column in the diagram. Observations belonging to one category are grouped together in the diagram. It is possible to display additional variables on the diagram. Put their names in the fields Additional Values In this case categories should contain timestamps. If values in the category field are not unique, then diagram presents the last observation for every category.
A node requires one predecessor - a Database Table node. A series of pie diagrams are created on the basis on the data in this table. To set the variables displayed on the diagram, from the node's context menu choose .
In the properties window set a variable which determines categories in the field Categories and values - in the field Values. A series of pie charts is created - one for every observation within one category. A chart shows share of value n-th observation for a category in the sum of n-th values for all categories.
A node requires one predecessor - a Database Table node. A line diagram is created on the basis on the data in this table. To set the variables displayed on the diagram, from the node's context menu choose .
In the properties window in the fields X Axis and Y Axis give names of variables containing coordinates of points which will be connected by lines in the diagram. If a variable is given in the field Series, observations will be divided according to value of this variable. If series variable is not given then selecting new variables in the Additional values will result in displaying subsequent lines connecting new points. For the new points Y coordinate is teken as value of the additional variable.
A node requires one predecessor - a Database Table node. A point diagram is created on the basis on the data in this table. To set the variables displayed on the diagram, from the node's context menu choose .
In the properties window in the fields X Axis and Y Axis give names of variables containing coordinates of points which will displayed in the diagram. In case of indicating variable in the field Series observations will be divided (displayed in diferent colors and shapes) according to value of this variable. If series variable is not given then selecting new variables in the Additional values will result in displaying additional points. For the new points Y coordinate is teken as value of the additional variable.
A node requires one predecessor - a Database Table node. A bar diagram is created on the basis on the data in this table. To set the variables displayed on the diagram, from the node's context menu choose .
A node generates a diagram analogous to the column diagram. Instead of columns horizontal lines are displayed.
A node requires one predecessor - a Database Table node. An area diagram is created on the basis on the data in this table. To set the variables displayed on the diagram, from the node's context menu choose .
Area created by the diagram is analogous to the column diagram, with exception that points with the same colour are connected within the same area.
Use the node to create a table containing selected fields from the source table. The output table will contain all the rows from original table and selected columns from it. To select fields, from the context menu of the node choose . A node requires one preceding node - a database table.
Fields, which should appear in the output table should be checked in the column Use Column. After selecting the columns the output table is automatically created. Name of the this table is given in the field Table Name The name proposed by the system can be changed. Fileds Select and Filter can be used to select columns, whose names contain the specified string. Put the string in one of the fields and press Enter.
Node is used to combine data from two tables. One row of the result table contains data from one row of each source table. Table result set contains all fields from both source tables. A node requires two preceding database nodes.
Source table can be joined in two ways:
Node is used to randomly select a certain number of rows from the source table. A node requires one preceding node - a database table.
To set the node's parameters, from its context menu chose . Parameter Size of sample specifies how many records be drawn. Parameter Grain sets grain the random number generator.
A node gives you the ability to create and generate result table by using your own SQL query. A node requires one preceding node - a database table.
To invoke the SQL editor, from the node's context menu select . In the Editor window put SQL query based on any table (tables) in the current database alias.
A node is used to divide the table into several tables with fewer rows. Each row of the source table will be put to one of the resulting table. A node requires one preceding node - a database table.
To define the way the data should be splitted, from the context menu of the node choose . The following splits are possible:
Additional parameters of the node:
Performs a combination of data from two source tables into a single result set. A node requires to preceding nodes of a database table.
The source tables must have the same structure, which is also a structure of a target table.
A node allows you to define own condition which will be used to select data to the output table. A node requires one preceding node - a database table.
To give the condition, from the context menu choose . Name of a database table with output data is provided in the Properties window accessible from the context menu of the node.
Node allows you to change the column types. A node requires one preceding node - a database table.
To change the column types, from the node's context menu select . For example, if the field contains an Integer, and is of type String, it is possible to convert it to Integer type (and vice versa). In the resulting table double clique value provided in the New Type column.
Node is used to replace missing values. A node should have one preceding node - a database table.
To select fields which should be transformed from the context menu choose Edit attribute usage. In the resulting table double clique the field usage. Selecting values transform or auto would mean the transformation should be performed for the field.
To set the way the missing values should be replaced, from the left side of the edit window select the main node. On the right side of the window the relevant parameters appear. These parameters relate to all the columns of the given type. To set parameters for individual fields, on the left side of the window expand customAttributes node and then choose the relevant field. Appropriate parameters appear on the right side of the window.
The following options of replacing mising values are available:
Standardization of variable distribution performs a linear transformation of the variable so that, average value of the variable is 0, a standard deviation equals 1. A node should have one preceding node - a database table.
The transformation is performed for numeric variables. To select fields which should be transformed, from the context menu of the node choose Edit attribute usage. In the resulting table double clique the field usage. Selecting values transform or auto would mean the transformation should be performed for the field.
Transformation is available for numeric variables only. The transformation converts the specified percentage of the lowest (p1) and the highest (p2) variable values corresponding respectively to the quantiles of order p1 and 1-p2. A node should have one preceding node - a database table.
To edit node's parameters, from its context menu select . Parameters LowerPercent and UpperPercent are the order of quantiles used in the transformation. Parameter MinValuesCount is the minimum number of distinct values of the variable required to perform the transformation.
To select fields which should be transformed, from the context menu choose Edit attribute usage. In the resulting table double clique the field usage. Selecting values transform or auto would mean the transformation should be performed for the field.
Transformation is available for numeric variables. Creates variables whose values are ranges of the transformed variable. A node should have one preceding node - a database table.
To select fields which should be transformed, from the context menu choose Edit attribute usage. In the resulting table double clique the field usage. Selecting values transform would mean the transformation should be performed for the field.
To set parameters of the node from its context menu choose . Parameter categoryNumber specifies the number of ranges into which the original variable will be divided.
Transforms categorical variable into n binary variables, where n is the number of distinct values of a variable. A node should have one preceding node - a database table.
To select fields which should be transformed, from the context menu choose Edit attribute usage. In the resulting table double clique the field usage. Selecting values transform or auto would mean the transformation should be performed for the field.
Transformation leaves out missing values. To set parameters choose from the context menu of the node. If the variable takes a larger number of values than specified by the parameter Max Values Count , then binarization of the variable is not performed.
Node is used to normalize vectors. It can be applied if at least two numeric variables are selected for transformation. A node should have one preceding node - a database table.
To select fields which should be transformed, from the context menu choose Edit attribute usage. In the resulting table double clique the field usage. Selecting values transform or auto would mean the transformation should be performed for the field.
To set parameters choose from the context menu of the node. The values of the variables in one record are treated coordinates of vector which begins at the origin of the coordinate system. The values of these variables divided by the length of the vector, which is calculated using a measure given by the parameter Norm Type. The following distance measures are possible: Euclidean, Manhattan, Maximum, Minimum, zero.
A node performs principal component analysis and generates fewer number of uncorrelated (orthogonal) transformed variables. The transformation is performed for numeric variables only. A node should have one preceding node - a database table.
To select fields which should be transformed, from the context menu choose Edit attribute usage. In the resulting table double clique the field usage. Selecting values transform or auto would mean the transformation should be performed for the field.
To set parameters choose from the context menu of the node. Parameter Criterium can take two values: kaiser and number. If the first of these two values is selected then algorithm selects the optimal number of outcome variables. If the value number is selected then the user can specify the number of output variables. This number is the value of the parameter ComponentsNumber. If value ComponentsNumber is equal to 0, then the algorithm generates the maximum number of principal components.
A node performs linear transformation of the variable. The resulting variable takes on values from a specified range. The transformation is performed for numeric variables only. A node should have one preceding node - a database table.
To select fields which should be transformed, from the context menu choose Edit attribute usage. In the resulting table double clique the field usage. Selecting values transform or auto would mean the transformation should be performed for the field.
To set parameters of the node choose from the context menu of the node. Lower and upper limit of values after transformation are determined by the parameters: Max Value and Min Value.
A node determines the value of the weight of evidence. Transformation are available for numerical and categorical variables. A node should have one preceding node - a database table.
To select fields which should be transformed, from the context menu choose Edit attribute usage. In the resulting table double clique the field usage. Selecting values transform or auto would mean the transformation should be performed for the field.
To set parameters choose from the context menu of the node. If the variable takes on more values than specified by the parameter Max Values Count, the variable transformation is not performed. Parameter Target Attribute specifies the name of the dependent variable (must be a categorical variable). Parameter PositiveTarget Valule is the target value of dependent variable.
A node performs selection of variables which significantly explain variability of a target variable being a categorical variable. It requires two predecessors: Database Table and Attribute Usage node.
Edit a node to access a scoreEvaluators object which calculates statistics for selected variables. A node performs selection for nominal and continuous variables. In case of continuous variables a node usually discretizes a variable.
The following algorithms are available:




value from a contingency table - number of observations, which for a variable X
take k-th value and for a variable Y take value 0,
value from a contingency table - number of observations, which for a variable X
take k-th value and for variable Y take value 1.
In case of uneven distribution of values 0 and 1, the contingency table
is modified so values
are replaced with
where

number of cases for which X variable takes i-th value.
If some values of variable X are represented by few cases, the rows in
the contingency table are joined to get more numerous groups of cases.
The above algorithms use common parameters:
A node is used to calculate a logistic regression model as described in the documentation in section Logistic regression. A node requires the following preceding nodes: one or two Database Tables and optionally attribute usage.
If one preceding node is used then data in this table is a training dataset. In case of using two database tables one of then is a validation dataset. To set the roles of the database table edit values in the property window of the node. If the preceding node is not an attribute usage node then to set the roles of the variables in the model edit the node and double click the value in the attribute usage node on the left side of the window. To change the roles double click value in the column usage.
To set node's parameters choose Edit from the node's context menu. Then from the node's tree on the left side choose algorithmSettings node. A node requires setting value of Positive target category The results of the calculation of the regression model are available when you select View Model in the context menu of the node.
A node to scorecard calculation as described in the documentation section Scoring card. Node requires two preceding nodes: Logistic Regression and Database Tables.
The results of the calculation of cards are available in the context menu of the node when you select: View model.
A node for setting roles of variables in the model. Node requires one node preceding - the database table.
Variable role for the model is set in the field usage. Statistical type of variable can be set in the field attributeType. Node requires one preceding node - the database table.
A node is used for creating the classification tree model as described in the documentation in section Decision Trees. A node requires the following preceding nodes: one or two Database Tables and optionally Attribute Usage.
If one preceding node is used then data in this table is a training dataset. In case of using two database tables one of them is a validation dataset. To set the roles of the database table edit values in the property window of the node. If the preceding node is not an attribute usage node then to set the roles of the variables in the model, edit the node and double click the value in the attributeUsageSet on the left side of the window. To change the roles of variables double click value in the column usage.
The results of the calculation of the regression model can be viewed when you select View Model from the context menu of the node.
A node performs the task of building a network Kohenena as described in the documentation section Kohonen Network. A node requires the following preceding nodes: one or two Database Tables and optionally Attribute Usage.
If one preceding node is used then data in this table is a training dataset. In case of using two database tables one of them is a validation dataset. To set the roles of the database table edit values in the property window of the node. If the preceding node is not an attribute usage node then to set the roles of the variables in the model edit the node and double click the value in the attributeUsageSet on the left side of the window. To change the roles double click value in the column usage.
The results of the calculation of the regression model can be seen when you select View Model from the context menu of the node.
A node performs the task of building a linear regression model as described in the documentation in section Linear regression . A node requires the following preceding nodes: one database Table and optionally attribute usage.
If the preceding node is not an attribute usage node then to set the roles of the variables in the model edit the node and double click the value in the attribute usage node on the left side of the window. To change the roles of variables double click value in the column usage.
The results of the calculation of the regression model can be seen when you select View Model from the context menu of the node.
A node performs the task of calculating the random forest for a given data as described in the section of documentation Smart Trees . It requires prior nodes: Database Table and Attribute Usage. The results are available in the context menu when you select Display .
If the preceding node is not an attribute usage node then to set the roles of the variables in the model, edit the node and double click the value in the attribute usage node on the left side of the window. To change the roles double click value in the column usage.
The results of the calculation of the regression model can be seen when you select View Model from the context menu of the node.
A node performs the task of building regularized linear regression model as described in the section of documentation Regularized Linear Regression . It requires two preceding nodes: Database Table and Attribute Usage.
If the preceding node is not an attribute usage node then to set the roles of the variables in the model edit the node and double click the value in the attributeUsageSet on the left side of the window. To change the roles of variables double click value in the column usage.
The results of the calculation of the regression model can be seen when you select View from the context menu of the node.
A node performs the task of building regularised logistic regression model. It requires two preceding nodes: database table and Uses attributes (optionally).
If one preceding node is used then data in this table is a training dataset. In case of using two database tables, one of them is a validation dataset. To set the roles of the database table edit values in the property window of the node. If the preceding node is not an attribute usage node then to set the roles of the variables in the model edit the node and double click the value in the attribute usage node on the left side of the window. To change the roles double click value in the column usage.
To set node's parameters choose Edit from the node's context menu. Then from the node's tree on the left side chose algorithmSettings node. A node requires setting value of Positive target category The results of the calculation of the regression model can be seen when you select View Model from the context menu of the node.
A node builds model for a classification task. It requires two preceding nodes: a Database Table and Attribute Usage.
A node requires setting positive target value of the target variable. To set it, from the node's context menu select and put this value in the field Positive Target Value. The node creates a model, for which it is not possible to do scoring nor to generate the scoring code.
A node used to construct a graph as a representation of the social network based on input data. It requires one preceding node – a Database Table.
Description of how to build a network is in the documentation in the Chapter Build a network. To set parameters of building the node from the context menu of the node select . On the left side of the resulting window click the first (main) node of the objects. On the right side of the window a list of parameters appears.
Network Analysis A node is used to define and run proper network algorithms. A node requires two predecessors: Build Network node and a Database Table with additional data used by algorithms.
In this table, one of the columns should contain the identifiers of vertices. It should be indicated the properties of the node in the Vertex_id . Following algorithms are available:
The calculation of results of all algorithms are written to a database table.
A node is used to filter out certain nodes to get a subnetwork. It requires two preceding nodes - one for database tables and the second type of Building the Network.
To set filter conditions from the context menu of the node choose . On the left side of the resulting window select the main node (first item). On the right side of the window a list of parameters appears. One of them is field Filter. Put there a condition in the form similar to sql syntax. This condition should contain at least one field from the node's parent table. The condition cannot contain a field which is node's identifier. A node requires indicating which of the fields in the preceding table contains node's identifier. To set this parameter in the edit window of the node expand nodes attributeUsageSet and attributes. Click a field containing node's identifier and on the right side of the window select Usage as NODE_ID Description of functionality filtering network is in the documentation in Chapter Filter network.
Node is used for network visualization. It requires one preceding node - the type the Filter Network or Network Analysis.
To manage network visualisation open Navigator window (Ctrl+7).
To visualize the results of running network algorithms, a user has to add a new table with results of the calculations. To do it, from the context menu of the node select and an the left side of the window click attributes. Next from the context menu of this object choose . In the resulting window select database alias, table and a field containing node's identifier.
A node is used to export the results of a logistic regression model to MSExcel file. It requires one preceding node- Logistic Regression model. The results of calculations are available in the context menu of the node after selecting .
To define a name and a path for the file, from the node's context menu choose .
Node generates a scoring code based on the model. Node requires one preceding node - the node model.
To set the scoring code language, from the node's context menu select . On the left side of the resulting window click the first, main node. Then on the right side of the edit window set of parameters appear. Parameter Code Language specifies the language in which a scoring code will be generated. To see the code, chose from the context menu Show Scoring Code.
A Node is used to generate scoring based on the data in the database table. A node requires two preceding nodes: a model and a database table.
To score the data it is necessary to define the way the scoring should be generated. First, join the node with a database table and model nodes. The output database table will by automatically created. To change the output table name choose 'Properties' option from the context menu of the node. The resulting database table is given in the Target Data entry. To define the output data choose option from the context menu of the node.
In the Direct Mapping part a user can choose fields from the scored dataset (Source Data) which should appear in the output dataset (Target data) if a selection should be made. If no data is entered here, the resulting dataset will contain all the fields from the original database table. To add a field, click 'Add element' button. As the result, first field from the scored dataset will appear. By clicking this filed in the Source Data part we get a list o fields from which we can choose the appropriate one. In the Target Data part we select the name of the field for the output data. If the field should be with the same name in the output table, then the same value should be in the Source Data and the Target Data.
In the New Columns part user can generate type of scoring data. Click 'Add Element' button. Next, select the category of the scoring data. The available options are described below. For the classification models:
For approximation models:
For Kohonen Grouping models:
In the next step a user should define field in the output table with the scoring data. In the 'Choose a list of arguments' you should select values other than default. Depending on the type of model and type of scoring the following parameters should be set:
A node is used to share results of the tasks of approximation. Node requires two preceding nodes: approximation model and a Database Table containing the data to test the model.
A node requires indicating target variable in the test data. To do so, from the context menu of the node select . On the left side of the window click the first (main) node. Then, on the right side of the window a list of parameters appears. In the field Target put a name of the target variable in the test dataset.
T o see the results of testing, from the the context menu of the node choose .
Use the node to generate reports on quality of prediction for classification models based on data containing both predicted probability of target and the real target data. It requires one preceding node - a database table.
The input table should contain certain variables that allow to assess the prediction quality. To set roles of the variables, from the context menu of the node select . Then on the left side of the window select first (main) node. On the right side of the window a set of parameters appears. You need to provide Positive Target Value. Next indicate variables which contain certain types of data:
The results are available in the context menu when you select Display.
A node is used to compare the results of the two approximation models. A node requires two preceding nodes of the type Test Results Approximation.
The results of the calculation are available in the context menu when you select .
A node is used to test the data mining classification model. It requires two preceding nodes - one for a classification model, and the second one for a database table containing with test data set.
A node requires setting a target variable and a positive target value. To set these values, from the context menu of the node select . Then in the edit window on the left side click the first (main) node of objects. On the right side of the window a list of parameters appears. Provide value in the field Positive Target Value and select Target variable. Results are available in the context menu of the node after selecting .
A node is used to compare the quality of classification model. It requires at least two preceding nodes of Classification Test Results.
A node generates results in the form of graphs and MSExcel spreadsheet. To give a path and a name of the file soelect from the context menu of the node. To view charts which compare models, from the context menu of the node select .