Usage

Network building

Data requirements

Each row in the data table used for building the network represents an edge in the network graph. Two integer-valued columns are required:

  • Source: numeric id of the predecessor node for the given edge

  • Target: numeric id of the successor node for the given edge

Optionally, a third column Weight may be used for constructing a network with weighted edges. This column should contain positive values only; edges with zero or negative weights will not be included in the network. Additionally, some network analysis algorithms require the weight column to be normalized to the interval [0, 1].

Algorithm settings

The SNABuildSettings object requires a logical data object pointing to a database table which specifies the network's vertices.

Table 30.1. SNABuildSettings settings

OptionDescriptionPossible ValuesDefault Value
Force undirected If TRUE, an undirected graph (network) will be built TRUE / FALSEFALSE
Load sorted If TRUE, the data table containing the network edges is assumed to be sorted by the column specified the Source parameter, i.e. according to predecessor node id-s. This will reduce the memory footprint of the network building algorithm. An error will be raised in the input table is not sorted according by the Source. TRUE / FALSEFALSE
Normalize weights If TRUE, the WEIGHT column will be subjected to the normalize data transformation with minValue set to 0 and maxValue set to 1. TRUE / FALSETRUE
Source The name of the column with id-s of predecessor nodes; this parameter must be set.  names of columns in the input table
Target The name of the column with id-s of successor nodes; this parameter must be set.  names of columns in the input table
Weight The name of the column with edge weights. Do not set this parameter to build an unweighted network.  names of columns in the input table

Network analysis

Table 30.2. SNASettings settings

OptionDescriptionPossible valuesDefault value
merge results If TRUE, the result columns from all algorithms will be merged into one output table. Otherwise some algorithms will create their own result tables. TRUE / FALSEFALSE
thread counthow many threads to use for computationpositive integers1
Vertex IDname of column with ids of network nodesnames of columns in input table 

Common settings

Some settings are common to multiple SNA algorithms.

Table 30.3. SNA Algorithms - common settings

OptionDescriptionPossible valuesDefault value
Community Name of the column which specifies the community of the predecessor node  names of columns in the input table
Null handling How to treat missing values NoneNone, Zero, Avg, Min, Max
Prefix String used to prefix names of columns created by the given algorithm in the output table.  string
Use weights Whether to use weights in the computations performed by the algorithm TRUETRUE / FALSE

Louvain Community Finder

The LovainCommunityFinder creates a new LOUVAIN_COMMUNITY column in the output data table.

Table 30.4. LouvainCommunityFinder settings

OptionDescriptionPossible valuesDefault value
Initial communities Column name with intial communities names of columns in the input data table 
Max iterations Maximum number of iterations the algorithm will go through. -1 denotes no limit. integer-1
Max community size Maximum intermediate community size to indicate as the best one in an additional variable. If set to 0, intermediate communities will not be indicated. The best communities will be stored in the output data table in the column BEST_LOUVAIN_COMMUNITY. non-negative integers0
Modularity normalizer None, Sum, Sqrt, MinNone
Precision Determines the minimal change of modularity between iterations. Modularity change below this value concludes the algorithm even if less than Max iterations iterations have been carried out. floating point numbers1.0e-5
Randomized Determines whether to randomize the algorithm. TRUE / FALSEFALSE
Resolution Modularity resolution floating point numbers from the interval [0.0, 1.0]1.0
Save intermediate communities If TRUE, an additional column for each intermediate community (obtained in each iteration) will be created in the output data table. These columns will be named LOUVAIN_COMMUNITY_N, where N is the iteration number. TRUE / FALSEFALSE
Initial communities

Each iteration of the algorithm transforms a partition into communities from the previous iteration (or the initial partition) into a new one. By default, the initial partition is obtained by assigning each vertex to its own community.

To use a different initial partition, set the value of the Initial communities to the name of a column in the input data table which contains the initial partition.

Max community size

Each iteration of the community finder algorithms divides the network into a number of intermediate communities. When Max community size is set to a value greater than 0, an additional variable will be created in the output table, indicating an intermediate community, the size of which is nearest to the value of Max community size.

Resolution

The lower the resolution, the smaller the size of the resulting communities and modularity.

Size Constrained Community Finder

The Size Constrained Community Finder algorithm creates a new column SC_COMMUNITY in the output data table.

Table 30.5. SizeConstrainedCommunityFinder settings

OptionDescriptionPossible valuesDefault value
Max iterations Maximum number of iterations the algorithm will go through. positive integer numbers10
Max community size Upper bound for the size of identified communities positive integer numbers40
RandomizedDetermines whether to use randomizationTRUE / FALSEFALSE

Aggregator

Table 30.6. Aggregator settings

OptionDescriptionPossible valuesDefault value
Max. neighbourhood sizeNeighbourhood radius, 0 means no maximum neighbourhood size is set.non-negative integers2
Nominal aggregates Used to specify for which nominal variables will be aggregated and which aggregates to calculate for each value.   
Numerical aggregates Used to specify which numerical variables will be aggregated and which aggregates to calculate for each value.   
Max neighbourhood size

For very large networks neighbourhood size greater than 3 may lead to unrealistic computation times.

Nominal aggregates, NumericalAggregates

If Max meighbourhood size is equal to 1, all aggregates are computed in a weighted and unweighted variant. If Max neighbourhood size is greater than 1, the statistics for a given node are calculated in the following weighted variants:

  • SUM - the weights are equal to sum of weights of all paths to the final node,
  • AVG - the weights are equal to the average of weights of all paths to the final node,
  • MAX - the weights are equal to the maximum of weights of all paths to the final node.
  • UNIQUE - constant weight of 1 is used for each node in the neighbourhood
  • MULTI - the number of paths to the final node is used as the weight

Aggregation results are written to columns with names formatted in the following way: _A__COLNAME__AGGNAME__(WMODE), where COLNAME is the name of the aggregated column, AGGNAME is the type of aggregation, and WMODE is one of the 5 available weighting modes. for instance: _A__INCOME__VARIANCE__(AVG).

Community Aggregator

Community aggregator has the same settings as Aggregator with the exception of Max neighbourhood size.

Role Finder

Table 30.7. RoleFinder settings

OptionDescriptionPossible valuesDefault value
Min community size Minimal size of community for which role finding should be performed. 0 means no restriction. Non-negative integers0
Leader threshold Z-score cutoff level above which the role of leader is assigned to a node. Lower values lead to more leader nodes. float1.25

Triads

Table 30.8. Triads settings

OptionDescriptionPossible valuesDefault value
Flags Name of table column with flag indicators. If not selected, triad statistics are not split with respect to flags Column names in input data table 
Flags

There are three types of triads: FULL, PARTIAL_1 and PARTIAL_2. If the Flags option is empty, three columns corresponding to the different types of triads will be created. Each of these columns will contain the count of triads of the given type to which a given vertex belongs.

If the Flags option is set, for each type of triad six columns are created instead, each corresponding to different distribution of flags on the vertices in the triad.

  • 0_0 - no flags on all vertices
  • 0_1 - no flag on the checked vertex, exactly one of other vertices is flagged
  • 0_2 - no flag on the checked vertex, flags on each of the other vertices of the flag
  • 1_0 - flag only on th checked vertex
  • 1_1 - flag on the checked vertex and on exactly one of other vertices
  • 1_2 - flags on all vertices in the triad

Each of these columns contains the count of triads of the given subtype to which a given (i.e. checked) vertex belongs.

Community Statistics

Table 30.9. Community Statistics Settings

OptionDescriptionPossible valuesDefault value
CommunitiesName of column with community IdNames of columns in the input table 

Page Rank

Table 30.10. Page Rank Settings

OptionDescriptionPossible valuesDefault value
Dampening costs Name of column with the values of the dampening cost coefficient (optional) Names of columns in the input data table 
Dampening factor The value of dampening factor for all nodes. Used if the Dampening factors column name is not set. 0.85Floating point values between 0 and 1.
Dampening factors Name of column with the values of dampening factors. Names of columns in the input data table 
Epsilon If the values of pagerank coefficient change by less than Epsilon in a new iteration the algorithm is stopped.   
Initial pagerank Name of column with initial values of the pagerank coefficient Names of columns in the input data table 
Max iterationsThe maximum number of iterations to perform. -1 means no limit.integers-1

HITS (Hubs and Authorities)

Table 30.11. HITS settings

OptionDescriptionPossible valuesDefault value
Epsilon If the values of hub and authority coefficients change by less then Epsilon in a new iteration the algorithm is stopped Positive floating point numbers1.0E-6
Initial authorities Name of column with initial values of the authorities coefficient Column names in the input data table 
Initial hubs Name of column with initial values of the hubs coefficient Column names in the input data table 
Max iterations The maximum number of iterations to perform. -1 means no limit. Integer 

SPA (Sprading Activation)

Table 30.12. Spreading Activation Settings

OptionDescriptionPossible valuesDefault value
Activation thresholdMinimal energy for node activationFloating point numbers0.0
EpsilonEpsilon value for iteration stop conditionFloating point numbers1.0E-6
Initial energy Name of column in the input table with initial energy values for each vertex Names of columns in the input table. 
Max iterations The maximum number of iterations to perform. -1 means no limit. integers-1
Multiple activations Specifies whether each vertex should activate multiple times TRUE / FALSEFALSE
Spreading factor Uniform energy spreading factor; specifies the degree to which the energy of the given vertex spreads to its neighbours. Used if the Spreading factors column name is not set. Floating point numbers0.85
Spreading factors Name of column in the input table with the spreading factor value for each vertex Names of columns in the input table. 
Weight normalizer Which method to use for the normalization of weights of outgoing edges when calculating energy spread from the given node. None / Sum /SqrtSum
Weight normalizer

This setting determines the algorithm used to compute energy spreading to neighbouring nodes.

Let denote the energy of the given node, the weight of outgoing edge to a neighbouring node and the sum of weights of all edges outgoing from the given node. Then the energy transfered to the neighbouring node is calculated in the following way, depending on the value of the Weight normalizer setting:

  • Sum: ,

  • Sqrt: ,

  • None: .

Modularity

Modularity algorithm calculates the value of the network's modularity coefficient. The result is stored in the output table in a column called MODULARITY. This column will contain the same value for every vertex of the network.

Table 30.13. Modularity settings

OptionDescriptionPossible valuesDefault value
CommunityName of column in the input table with community idNames of columns in the input table 

Network filtering

Data requirements

The input data table's rows correspond to vertices in the input network. A column with node id is required.

Algorithm settings

Table 30.14. SNAFilterSettings settings

OptionDescriptionPossible valuesDefault value
FilterFiltering expression similar to WHERE clause in SQL SELECT statement  
Filter

Network is filtered according to the expression provided as the value to the Filter. Only nodes which match the filter are included in the resulting network. The filtering expression may include attribute names from the input data table.

The filtering expression may include the following elements:

  • Attribute comparisons.  It is possible to compare the values of two attributes of the same category.

    Table 30.15. 

    Attribute categoryOperatorsExample
    NUMERICAL
    =, !=, <, <=, >, >=
    age = 20 
    income > tax
    NOMINAL, INTEGER
    =, !=
    type = 'risky' 
    community1 != community2
    FLAG
    =, !=, and, or
    closed = true 
    closed or open

  • Node id checking.  To check whether the node's id is in a specified subset use the id(id1, id2, ...) function. For example

    id(1, 4, 7, 15)

    will evaluate to TRUE for nodes with ids 1, 4, 7, 15 and FALSE for all other nodes.

  • Neighbourhood membership.  The neigh(expression, radius) function can be used to check whether the node is in the neighbourhood of given radius of a node for which the provided expression evaluates to TRUE. For example

    neigh(degree > 10, 1)

    will evaluate to TRUE for all nodes for which the attribute degree has value greater than 10 and their immediate neighbors.

  • Negation.  The not(expression) function can be used to negate an expression.

  • Set membership.  The expression attribute in [value1, value2, ...] will evaluate to TRUE if the value of attribute is equal to one of value1, value2, ...

  • Combining expressions.  The OR and AND operators can be used to combine simpler expressions. The AND operator has precedence over OR.

Network visualization

Data requirements

Network visualization can combine node data from multiple tables. Each table must contain an attribute identifying the node.

Algorithm settings

Table 30.16. 

OptionDescriptionPossible valuesDefault value
Max edges countThe maximum number of edges to include in the visualizationPositive integer numbers100000
Timeout How long (in seconds) the task should stay active after last usage. Some visualization operations (such as filtering) may complete faster when the visualization task is still active. Positive integer numbers1200