AdvancedMiner provides the user with a scripting language called Gython, which offers various tools for data processing. Sampling and splitting the table are examples of dedicated functions accessible from Gython scripts. Below we present an example illustrating the usage and effect of both procedures along with examples of functions in Gython.
Example 1.1. Data processing examples
if not tableExists('german_credit'): raise "Table 'german_credit' does not exists. Please run german_credit.py script from data directory first" # FUNCTION DEFINITION Example in Gython # includes illustration of SQL parametrization - "tableName" parameter def rowCount(tableName): sql result: SELECT count(*) FROM $tableName return result[0][0] # SPLITTING DATA Example # splitting data into data_1 and data_2 sets in_data = 'german_credit' split_data_1 = 'german_credit_1' split_data_2 = 'german_credit_2' print "Rows count for whole dataset: ", rowCount(in_data ) tableSplit(in_data , split = [7,3], seed = 1234, output=[split_data_1, split_data_2]) print "Table after split 1: ", rowCount(split_data_1) print "Table after split 2: ", rowCount(split_data_2) # SAMPLING DATA Example in_data = 'german_credit' sample_size = 100 sampled_data = 'german_credit_sample' print "Rows count for whole dataset: ", rowCount(in_data) sample(in_data, sampled_data, sample_size) print "Sampled data size: ", rowCount(sampled_data)
Output:
Rows count for whole dataset: 1000 Table after split 1: 709 Table after split 2: 291 Rows count for whole dataset: 1000 Sampled data size: 100
Working with the scripts is supported by an advanced editor with code completion, on-fly error localization, syntax highlighting etc.
Gython provides access to:
Besides the presented ways there are other possibilities for data processing that are not covered in this tutorials, for example Transformations (e.g. binarize, normalize, outliers, PCA, replace missings, standardize, WoE). To obtain more information on using Transformations see the Transformations chapter in the Technical Documentation.