Usage

Usage
Prev	Chapter 30. Social Network Analysis Module	Next

Network building

Data requirements

Each row in the data table used for building the network represents an edge in the network graph. Two integer-valued columns are required:

Source: numeric id of the predecessor node for the given edge

Target: numeric id of the successor node for the given edge

Optionally, a third column Weight may be used for constructing a network with weighted edges. This column should contain positive values only; edges with zero or negative weights will not be included in the network. Additionally, some network analysis algorithms require the weight column to be normalized to the interval [0, 1].

Algorithm settings

The SNABuildSettings object requires a logical data object pointing to a database table which specifies the network's vertices.

Table 30.1. SNABuildSettings settings

Option	Description	Possible Values	Default Value
Force undirected	If TRUE, an undirected graph (network) will be built	TRUE / FALSE	FALSE
Load sorted	If TRUE, the data table containing the network edges is assumed to be sorted by the column specified the Source parameter, i.e. according to predecessor node id-s. This will reduce the memory footprint of the network building algorithm. An error will be raised in the input table is not sorted according by the Source.	TRUE / FALSE	FALSE
Normalize weights	If TRUE, the WEIGHT column will be subjected to the normalize data transformation with minValue set to 0 and maxValue set to 1.	TRUE / FALSE	TRUE
Source	The name of the column with id-s of predecessor nodes; this parameter must be set.		names of columns in the input table
Target	The name of the column with id-s of successor nodes; this parameter must be set.		names of columns in the input table
Weight	The name of the column with edge weights. Do not set this parameter to build an unweighted network.		names of columns in the input table

Network analysis

Table 30.2. SNASettings settings

Option	Description	Possible values	Default value
merge results	If TRUE, the result columns from all algorithms will be merged into one output table. Otherwise some algorithms will create their own result tables.	TRUE / FALSE	FALSE
thread count	how many threads to use for computation	positive integers	1
Vertex ID	name of column with ids of network nodes	names of columns in input table

Common settings

Some settings are common to multiple SNA algorithms.

Table 30.3. SNA Algorithms - common settings

Option	Description	Possible values	Default value
Community	Name of the column which specifies the community of the predecessor node		names of columns in the input table
Null handling	How to treat missing values	None	None, Zero, Avg, Min, Max
Prefix	String used to prefix names of columns created by the given algorithm in the output table.		string
Use weights	Whether to use weights in the computations performed by the algorithm	TRUE	TRUE / FALSE

Louvain Community Finder

The LovainCommunityFinder creates a new LOUVAIN_COMMUNITY column in the output data table.

Table 30.4. LouvainCommunityFinder settings

Option	Description	Possible values	Default value
Initial communities	Column name with intial communities	names of columns in the input data table
Max iterations	Maximum number of iterations the algorithm will go through. -1 denotes no limit.	integer	-1
Max community size	Maximum intermediate community size to indicate as the best one in an additional variable. If set to 0, intermediate communities will not be indicated. The best communities will be stored in the output data table in the column `BEST_LOUVAIN_COMMUNITY`.	non-negative integers	0
Modularity normalizer		None, Sum, Sqrt, Min	None
Precision	Determines the minimal change of modularity between iterations. Modularity change below this value concludes the algorithm even if less than Max iterations iterations have been carried out.	floating point numbers	1.0e-5
Randomized	Determines whether to randomize the algorithm.	TRUE / FALSE	FALSE
Resolution	Modularity resolution	floating point numbers from the interval [0.0, 1.0]	1.0
Save intermediate communities	If TRUE, an additional column for each intermediate community (obtained in each iteration) will be created in the output data table. These columns will be named `LOUVAIN_COMMUNITY_N`, where `N` is the iteration number.	TRUE / FALSE	FALSE

Initial communities

Each iteration of the algorithm transforms a partition into communities from the previous iteration (or the initial partition) into a new one. By default, the initial partition is obtained by assigning each vertex to its own community.

To use a different initial partition, set the value of the Initial communities to the name of a column in the input data table which contains the initial partition.

Max community size

Each iteration of the community finder algorithms divides the network into a number of intermediate communities. When Max community size is set to a value greater than 0, an additional variable will be created in the output table, indicating an intermediate community, the size of which is nearest to the value of Max community size.

Resolution

The lower the resolution, the smaller the size of the resulting communities and modularity.

Size Constrained Community Finder

The Size Constrained Community Finder algorithm creates a new column SC_COMMUNITY in the output data table.

Table 30.5. SizeConstrainedCommunityFinder settings

Option	Description	Possible values	Default value
Max iterations	Maximum number of iterations the algorithm will go through.	positive integer numbers	10
Max community size	Upper bound for the size of identified communities	positive integer numbers	40
Randomized	Determines whether to use randomization	TRUE / FALSE	FALSE

Aggregator

Table 30.6. Aggregator settings

Option	Description	Possible values	Default value
Max. neighbourhood size	Neighbourhood radius, 0 means no maximum neighbourhood size is set.	non-negative integers	2
Nominal aggregates	Used to specify for which nominal variables will be aggregated and which aggregates to calculate for each value.
Numerical aggregates	Used to specify which numerical variables will be aggregated and which aggregates to calculate for each value.

Max neighbourhood size

For very large networks neighbourhood size greater than 3 may lead to unrealistic computation times.

Nominal aggregates, NumericalAggregates

If Max meighbourhood size is equal to 1, all aggregates are computed in a weighted and unweighted variant. If Max neighbourhood size is greater than 1, the statistics for a given node are calculated in the following weighted variants:

SUM - the weights are equal to sum of weights of all paths to the final node,
AVG - the weights are equal to the average of weights of all paths to the final node,
MAX - the weights are equal to the maximum of weights of all paths to the final node.
UNIQUE - constant weight of 1 is used for each node in the neighbourhood
MULTI - the number of paths to the final node is used as the weight

Aggregation results are written to columns with names formatted in the following way: _A__COLNAME__AGGNAME__(WMODE), where COLNAME is the name of the aggregated column, AGGNAME is the type of aggregation, and WMODE is one of the 5 available weighting modes. for instance: _A__INCOME__VARIANCE__(AVG).

Community Aggregator

Community aggregator has the same settings as Aggregator with the exception of Max neighbourhood size.

Role Finder

Table 30.7. RoleFinder settings

Option	Description	Possible values	Default value
Min community size	Minimal size of community for which role finding should be performed. 0 means no restriction.	Non-negative integers	0
Leader threshold	Z-score cutoff level above which the role of leader is assigned to a node. Lower values lead to more leader nodes.	float	1.25

Triads

Table 30.8. Triads settings

Option	Description	Possible values	Default value
Flags	Name of table column with flag indicators. If not selected, triad statistics are not split with respect to flags	Column names in input data table

Flags

There are three types of triads: FULL, PARTIAL_1 and PARTIAL_2. If the Flags option is empty, three columns corresponding to the different types of triads will be created. Each of these columns will contain the count of triads of the given type to which a given vertex belongs.

If the Flags option is set, for each type of triad six columns are created instead, each corresponding to different distribution of flags on the vertices in the triad.

0_0 - no flags on all vertices
0_1 - no flag on the checked vertex, exactly one of other vertices is flagged
0_2 - no flag on the checked vertex, flags on each of the other vertices of the flag
1_0 - flag only on th checked vertex
1_1 - flag on the checked vertex and on exactly one of other vertices
1_2 - flags on all vertices in the triad

Each of these columns contains the count of triads of the given subtype to which a given (i.e. checked) vertex belongs.

Community Statistics

Table 30.9. Community Statistics Settings

Option	Description	Possible values	Default value
Communities	Name of column with community Id	Names of columns in the input table

Page Rank

Table 30.10. Page Rank Settings

Option	Description	Possible values	Default value
Dampening costs	Name of column with the values of the dampening cost coefficient (optional)	Names of columns in the input data table
Dampening factor	The value of dampening factor for all nodes. Used if the `Dampening factors` column name is not set.	0.85	Floating point values between 0 and 1.
Dampening factors	Name of column with the values of dampening factors.	Names of columns in the input data table
Epsilon	If the values of pagerank coefficient change by less than `Epsilon` in a new iteration the algorithm is stopped.
Initial pagerank	Name of column with initial values of the `pagerank` coefficient	Names of columns in the input data table
Max iterations	The maximum number of iterations to perform. -1 means no limit.	integers	-1

HITS (Hubs and Authorities)

Table 30.11. HITS settings

Option	Description	Possible values	Default value
Epsilon	If the values of hub and authority coefficients change by less then `Epsilon` in a new iteration the algorithm is stopped	Positive floating point numbers	1.0E-6
Initial authorities	Name of column with initial values of the `authorities` coefficient	Column names in the input data table
Initial hubs	Name of column with initial values of the `hubs` coefficient	Column names in the input data table
Max iterations	The maximum number of iterations to perform. -1 means no limit.	Integer

SPA (Sprading Activation)

Table 30.12. Spreading Activation Settings

Option	Description	Possible values	Default value
Activation threshold	Minimal energy for node activation	Floating point numbers	0.0
Epsilon	Epsilon value for iteration stop condition	Floating point numbers	1.0E-6
Initial energy	Name of column in the input table with initial energy values for each vertex	Names of columns in the input table.
Max iterations	The maximum number of iterations to perform. -1 means no limit.	integers	-1
Multiple activations	Specifies whether each vertex should activate multiple times	TRUE / FALSE	FALSE
Spreading factor	Uniform energy spreading factor; specifies the degree to which the energy of the given vertex spreads to its neighbours. Used if the `Spreading factors` column name is not set.	Floating point numbers	0.85
Spreading factors	Name of column in the input table with the spreading factor value for each vertex	Names of columns in the input table.
Weight normalizer	Which method to use for the normalization of weights of outgoing edges when calculating energy spread from the given node.	None / Sum /Sqrt	Sum

Weight normalizer

This setting determines the algorithm used to compute energy spreading to neighbouring nodes.

Let denote the energy of the given node, the weight of outgoing edge to a neighbouring node and the sum of weights of all edges outgoing from the given node. Then the energy transfered to the neighbouring node is calculated in the following way, depending on the value of the Weight normalizer setting:

Sum: ,
Sqrt: ,
None: .

Modularity

Modularity algorithm calculates the value of the network's modularity coefficient. The result is stored in the output table in a column called MODULARITY. This column will contain the same value for every vertex of the network.

Table 30.13. Modularity settings

Option	Description	Possible values	Default value
Community	Name of column in the input table with community id	Names of columns in the input table

Network filtering

Data requirements

The input data table's rows correspond to vertices in the input network. A column with node id is required.

Algorithm settings

Table 30.14. SNAFilterSettings settings

Option	Description	Possible values	Default value
Filter	Filtering expression similar to WHERE clause in SQL SELECT statement

Filter

Network is filtered according to the expression provided as the value to the Filter. Only nodes which match the filter are included in the resulting network. The filtering expression may include attribute names from the input data table.

The filtering expression may include the following elements:

Attribute comparisons. It is possible to compare the values of two attributes of the same category.
Table 30.15.
Attribute category Operators Example
NUMERICAL
=, !=, <, <=, >, >=
age = 20 income > tax
NOMINAL, INTEGER
=, !=
type = 'risky' community1 != community2
FLAG
=, !=, and, or
closed = true closed or open
Node id checking. To check whether the node's id is in a specified subset use the id(id1, id2, ...) function. For example
```
id(1, 4, 7, 15)
```
will evaluate to TRUE for nodes with ids 1, 4, 7, 15 and FALSE for all other nodes.
Neighbourhood membership. The neigh(expression, radius) function can be used to check whether the node is in the neighbourhood of given radius of a node for which the provided expression evaluates to TRUE. For example
```
neigh(degree > 10, 1)
```
will evaluate to TRUE for all nodes for which the attribute degree has value greater than 10 and their immediate neighbors.
Negation. The not(expression) function can be used to negate an expression.
Set membership. The expression attribute in [value1, value2, ...] will evaluate to TRUE if the value of attribute is equal to one of value1, value2, ...
Combining expressions. The OR and AND operators can be used to combine simpler expressions. The AND operator has precedence over OR.

Attribute category	Operators	Example
NUMERICAL	=, !=, <, <=, >, >=	age = 20 income > tax
NOMINAL, INTEGER	=, !=	type = 'risky' community1 != community2
FLAG	=, !=, and, or	closed = true closed or open

Network visualization

Data requirements

Network visualization can combine node data from multiple tables. Each table must contain an attribute identifying the node.

Algorithm settings

Table 30.16.

Option	Description	Possible values	Default value
Max edges count	The maximum number of edges to include in the visualization	Positive integer numbers	100000
Timeout	How long (in seconds) the task should stay active after last usage. Some visualization operations (such as filtering) may complete faster when the visualization task is still active.	Positive integer numbers	1200

Prev	Up	Next
Method description	Home	Examples