Each row in the data table used for building the network represents an edge in the network graph. Two integer-valued columns are required:
Optionally, a third column Weight may be used for constructing a network with weighted edges. This column should contain positive values only; edges with zero or negative weights will not be included in the network. Additionally, some network analysis algorithms require the weight column to be normalized to the interval [0, 1].
The SNABuildSettings object requires a logical data object pointing to a database table which specifies the network's vertices.
Table 30.1. SNABuildSettings settings
Option | Description | Possible Values | Default Value |
---|---|---|---|
Force undirected | If TRUE, an undirected graph (network) will be built | TRUE / FALSE | FALSE |
Load sorted | If TRUE, the data table containing the network edges is assumed to be sorted by the column specified the Source parameter, i.e. according to predecessor node id-s. This will reduce the memory footprint of the network building algorithm. An error will be raised in the input table is not sorted according by the Source. | TRUE / FALSE | FALSE |
Normalize weights | If TRUE, the WEIGHT column will be subjected to the normalize data transformation with minValue set to 0 and maxValue set to 1. | TRUE / FALSE | TRUE |
Source | The name of the column with id-s of predecessor nodes; this parameter must be set. | names of columns in the input table | |
Target | The name of the column with id-s of successor nodes; this parameter must be set. | names of columns in the input table | |
Weight | The name of the column with edge weights. Do not set this parameter to build an unweighted network. | names of columns in the input table |
Table 30.2. SNASettings settings
Option | Description | Possible values | Default value |
---|---|---|---|
merge results | If TRUE, the result columns from all algorithms will be merged into one output table. Otherwise some algorithms will create their own result tables. | TRUE / FALSE | FALSE |
thread count | how many threads to use for computation | positive integers | 1 |
Vertex ID | name of column with ids of network nodes | names of columns in input table |
Some settings are common to multiple SNA algorithms.
Table 30.3. SNA Algorithms - common settings
Option | Description | Possible values | Default value |
---|---|---|---|
Community | Name of the column which specifies the community of the predecessor node | names of columns in the input table | |
Null handling | How to treat missing values | None | None, Zero, Avg, Min, Max |
Prefix | String used to prefix names of columns created by the given algorithm in the output table. | string | |
Use weights | Whether to use weights in the computations performed by the algorithm | TRUE | TRUE / FALSE |
The LovainCommunityFinder creates a new LOUVAIN_COMMUNITY column in the output data table.
Table 30.4. LouvainCommunityFinder settings
Option | Description | Possible values | Default value |
---|---|---|---|
Initial communities | Column name with intial communities | names of columns in the input data table | |
Max iterations | Maximum number of iterations the algorithm will go through. -1 denotes no limit. | integer | -1 |
Max community size | Maximum intermediate community size to indicate as the best one in an additional variable. If set to 0, intermediate communities will not be indicated. The best communities will be stored in the output data table in the column BEST_LOUVAIN_COMMUNITY. | non-negative integers | 0 |
Modularity normalizer | None, Sum, Sqrt, Min | None | |
Precision | Determines the minimal change of modularity between iterations. Modularity change below this value concludes the algorithm even if less than Max iterations iterations have been carried out. | floating point numbers | 1.0e-5 |
Randomized | Determines whether to randomize the algorithm. | TRUE / FALSE | FALSE |
Resolution | Modularity resolution | floating point numbers from the interval [0.0, 1.0] | 1.0 |
Save intermediate communities | If TRUE, an additional column for each intermediate community (obtained in each iteration) will be created in the output data table. These columns will be named LOUVAIN_COMMUNITY_N, where N is the iteration number. | TRUE / FALSE | FALSE |
Each iteration of the algorithm transforms a partition into communities from the previous iteration (or the initial partition) into a new one. By default, the initial partition is obtained by assigning each vertex to its own community.
To use a different initial partition, set the value of the Initial communities to the name of a column in the input data table which contains the initial partition.
Each iteration of the community finder algorithms divides the network into a number of intermediate communities. When Max community size is set to a value greater than 0, an additional variable will be created in the output table, indicating an intermediate community, the size of which is nearest to the value of Max community size.
The lower the resolution, the smaller the size of the resulting communities and modularity.
The Size Constrained Community Finder algorithm creates a new column SC_COMMUNITY in the output data table.
Table 30.5. SizeConstrainedCommunityFinder settings
Option | Description | Possible values | Default value |
---|---|---|---|
Max iterations | Maximum number of iterations the algorithm will go through. | positive integer numbers | 10 |
Max community size | Upper bound for the size of identified communities | positive integer numbers | 40 |
Randomized | Determines whether to use randomization | TRUE / FALSE | FALSE |
Table 30.6. Aggregator settings
Option | Description | Possible values | Default value |
---|---|---|---|
Max. neighbourhood size | Neighbourhood radius, 0 means no maximum neighbourhood size is set. | non-negative integers | 2 |
Nominal aggregates | Used to specify for which nominal variables will be aggregated and which aggregates to calculate for each value. | ||
Numerical aggregates | Used to specify which numerical variables will be aggregated and which aggregates to calculate for each value. |
For very large networks neighbourhood size greater than 3 may lead to unrealistic computation times.
If Max meighbourhood size is equal to 1, all aggregates are computed in a weighted and unweighted variant. If Max neighbourhood size is greater than 1, the statistics for a given node are calculated in the following weighted variants:
Aggregation results are written to columns with names formatted in the following way: _A__COLNAME__AGGNAME__(WMODE), where COLNAME is the name of the aggregated column, AGGNAME is the type of aggregation, and WMODE is one of the 5 available weighting modes. for instance: _A__INCOME__VARIANCE__(AVG).
Community aggregator has the same settings as Aggregator with the exception of Max neighbourhood size.
Table 30.7. RoleFinder settings
Option | Description | Possible values | Default value |
---|---|---|---|
Min community size | Minimal size of community for which role finding should be performed. 0 means no restriction. | Non-negative integers | 0 |
Leader threshold | Z-score cutoff level above which the role of leader is assigned to a node. Lower values lead to more leader nodes. | float | 1.25 |
Table 30.8. Triads settings
Option | Description | Possible values | Default value |
---|---|---|---|
Flags | Name of table column with flag indicators. If not selected, triad statistics are not split with respect to flags | Column names in input data table |
There are three types of triads: FULL, PARTIAL_1 and PARTIAL_2. If the Flags option is empty, three columns corresponding to the different types of triads will be created. Each of these columns will contain the count of triads of the given type to which a given vertex belongs.
If the Flags option is set, for each type of triad six columns are created instead, each corresponding to different distribution of flags on the vertices in the triad.
Each of these columns contains the count of triads of the given subtype to which a given (i.e. checked) vertex belongs.
Table 30.10. Page Rank Settings
Option | Description | Possible values | Default value |
---|---|---|---|
Dampening costs | Name of column with the values of the dampening cost coefficient (optional) | Names of columns in the input data table | |
Dampening factor | The value of dampening factor for all nodes. Used if the Dampening factors column name is not set. | 0.85 | Floating point values between 0 and 1. |
Dampening factors | Name of column with the values of dampening factors. | Names of columns in the input data table | |
Epsilon | If the values of pagerank coefficient change by less than Epsilon in a new iteration the algorithm is stopped. | ||
Initial pagerank | Name of column with initial values of the pagerank coefficient | Names of columns in the input data table | |
Max iterations | The maximum number of iterations to perform. -1 means no limit. | integers | -1 |
Table 30.11. HITS settings
Option | Description | Possible values | Default value |
---|---|---|---|
Epsilon | If the values of hub and authority coefficients change by less then Epsilon in a new iteration the algorithm is stopped | Positive floating point numbers | 1.0E-6 |
Initial authorities | Name of column with initial values of the authorities coefficient | Column names in the input data table | |
Initial hubs | Name of column with initial values of the hubs coefficient | Column names in the input data table | |
Max iterations | The maximum number of iterations to perform. -1 means no limit. | Integer |
Table 30.12. Spreading Activation Settings
Option | Description | Possible values | Default value |
---|---|---|---|
Activation threshold | Minimal energy for node activation | Floating point numbers | 0.0 |
Epsilon | Epsilon value for iteration stop condition | Floating point numbers | 1.0E-6 |
Initial energy | Name of column in the input table with initial energy values for each vertex | Names of columns in the input table. | |
Max iterations | The maximum number of iterations to perform. -1 means no limit. | integers | -1 |
Multiple activations | Specifies whether each vertex should activate multiple times | TRUE / FALSE | FALSE |
Spreading factor | Uniform energy spreading factor; specifies the degree to which the energy of the given vertex spreads to its neighbours. Used if the Spreading factors column name is not set. | Floating point numbers | 0.85 |
Spreading factors | Name of column in the input table with the spreading factor value for each vertex | Names of columns in the input table. | |
Weight normalizer | Which method to use for the normalization of weights of outgoing edges when calculating energy spread from the given node. | None / Sum /Sqrt | Sum |
This setting determines the algorithm used to compute energy spreading to neighbouring nodes.
Let denote the energy of the given node, the weight of outgoing edge to a neighbouring node and the sum of weights of all edges outgoing from the given node. Then the energy transfered to the neighbouring node is calculated in the following way, depending on the value of the Weight normalizer setting:
Sum: ,
Sqrt: ,
None: .
Modularity algorithm calculates the value of the network's modularity coefficient. The result is stored in the output table in a column called MODULARITY. This column will contain the same value for every vertex of the network.
The input data table's rows correspond to vertices in the input network. A column with node id is required.
Table 30.14. SNAFilterSettings settings
Option | Description | Possible values | Default value |
---|---|---|---|
Filter | Filtering expression similar to WHERE clause in SQL SELECT statement |
Network is filtered according to the expression provided as the value to the Filter. Only nodes which match the filter are included in the resulting network. The filtering expression may include attribute names from the input data table.
The filtering expression may include the following elements:
Attribute comparisons. It is possible to compare the values of two attributes of the same category.
Table 30.15.
Attribute category | Operators | Example |
---|---|---|
NUMERICAL | =, !=, <, <=, >, >= | age = 20 income > tax |
NOMINAL, INTEGER | =, != | type = 'risky' community1 != community2 |
FLAG | =, !=, and, or | closed = true closed or open |
Node id checking. To check whether the node's id is in a specified subset use the id(id1, id2, ...) function. For example
id(1, 4, 7, 15)
will evaluate to TRUE for nodes with ids 1, 4, 7, 15 and FALSE for all other nodes.
Neighbourhood membership. The neigh(expression, radius) function can be used to check whether the node is in the neighbourhood of given radius of a node for which the provided expression evaluates to TRUE. For example
neigh(degree > 10, 1)
will evaluate to TRUE for all nodes for which the attribute degree has value greater than 10 and their immediate neighbors.
Negation. The not(expression) function can be used to negate an expression.
Set membership. The expression attribute in [value1, value2, ...] will evaluate to TRUE if the value of attribute is equal to one of value1, value2, ...
Combining expressions. The OR and AND operators can be used to combine simpler expressions. The AND operator has precedence over OR.
Network visualization can combine node data from multiple tables. Each table must contain an attribute identifying the node.
Table 30.16.
Option | Description | Possible values | Default value |
---|---|---|---|
Max edges count | The maximum number of edges to include in the visualization | Positive integer numbers | 100000 |
Timeout | How long (in seconds) the task should stay active after last usage. Some visualization operations (such as filtering) may complete faster when the visualization task is still active. | Positive integer numbers | 1200 |