Chapter 4. Data Access and Data Processing

Table of Contents

Database Access
Aliases
Database explorer
Using SQL statements
Importing and exporting data and other database operations
Importing MS Excel spreadsheets
Importing CSV files
Exporting data to MS Excel spreadsheets
Exporting data to CSV files
Getting column list for a database table
Deleting a database table
Checking for a database table existence.
Creating tables in Gython
Creating a table with manually specified data
Creating a table with data copied from a list
Creating a table with data obtained from an sql query
Using lists to define column names and formats
Importing data from external sources
The Trans procedure
Basic transformations
The where keyword
The keep in and drop in keywords
The keep out and drop out keywords
The format keyword
Indexes
Flow control
Appending tables
The rename keyword
Joining tables
Notes
Data transformation functions
Ranking data (the rank procedure)
Expansion of data (the interpolate procedure)
Sampling data (the sample command)
Splitting tables (the tableSplit procedure)
Transposing tables (the transpose procedure)
Comparing two tables (the tablesCompare procedure)
Predefined transformations for data Mining models
Introduction
Transformation Types
Usage
Examples
Important notes
References
-

Database Access

AdvancedMiner allows full control over databases via SQL or the built-in database explorer.

Aliases

A database alias is a definition of a database connection. At least one database is required for the user to be able to execute data transformations. Databases store input and/or output data. It is possible to execute a transformation that takes the input data from several different databases and writes the output data to table in yet another database. In order to do this a connection to the database must be provided. To set up a connection the database URL, user name and password as well as the database driver must be specified. These settings are kept in a database alias. After the alias is registered the user needs only to specify the alias name to access a particular database. It is also possible to create a default alias which is used for accessing a database when no other alias is specified. In the case of working with a single database it is convenient to register only the default alias, and than work without using explicitly any alias at all.

Aliases are created in script using the dbAlias keyword.

Syntax:

# default named alias
dbAlias([driver_name, database_URL, user_name, user_password])

# custom named alias
dbAlias([driver_name, database_URL, user_name, user_password], alias_name)

# setting alias as default
dbAlias([driver_name, database_URL, user_name, user_password], alias_name, default = 1)

# use coded password string (use "/AdvancedMiner/Server/password_coder" script)
dbAlias([driver_name, database_URL, user_name, user_password], passwordCoded)

# using table catalog and schema
dbAlias([driver_name, database_URL, user_name, user_password, table_catalog, table_schema])
            

Example:

dbAlias(['com.mysql.jdbc.Driver', 'jdbc:mysql://127.0.0.1/test', '', ''],'myDB')
            

Please note that aliases can be defined using the Services component which is described in the Components chapter. Since the alias function requires a password parameter it is not secure to use it in scripts. We strongly recommend to use the Services component to define aliases.

For more advanced work with database aliases, there is an AliasRegistry object. Please refer to MetaModel part of the javadoc documentation for the list of available functions.

Syntax:

dbAliasRegistry() # returns an AliasRegistry object
            

Examples:

# get the alias for a given name.
dbAliasRegistry().getAlias('myDB')

# set the default alias
dbAliasRegistry().setDefaultAlias('server')

# get the default alias name
dbAliasRegistry().getDefaultAliasName()
            

Database explorer

The user can access the data directly using the Database Explorer tool (the Alias node in the Services component). With this tool the user can

Using SQL statements

Using scripts it is possible to carry out automated queries. With the SQL procedure the user can create parametrized queries using SQL commands. See the GDBase Command Reference for a description of the SQL commands supported by the AdvancedMiner's built in database.

Syntax:

sql result, maxNumOfRows in alias use connectionNo:
sqlQuery
            

  • result - the variable (table) where the query result will be stored. Default: None.

  • maxNumOfRows - the maximum number of rows returned by the query. Optional. By default there is no limit on the number of returned rows.

  • alias - the database alias. Optional. If not specified the default connection is used

  • connectionNo - use the specified connection number (positive integer value). Optional.

Example 4.1. An SQL statement with a single connection

tableDelete("x")
tableDelete("y")
try:
    sql use 1:
        create table x (a VARCHAR(10))
    sql use 1:
        begin
    sql use 1:
        alter table x rename to y
    sql use 1:
        create table x (b VARCHAR(10))
    sql use 1:
        commit
finally:
    sql_close 1
print "finished..."
    

Output:

finished...
    

The result variable is not only a two dimensional list, but can also be used to obtain information about the columns. Using the command:

print result.names

the user can print the column names of the result. To obtain more information about the result use the command:

print result.header

Example 4.2. Sql: names and header:

tableName = 'sqlStatementTest'
tableDelete(tableName)

# create new table
sql: CREATE TABLE $tableName (id INTEGER, value INTEGER) 

# insert values
for i in range(5):  
    v = i * 10;
    sql:
        INSERT INTO $tableName (id, value) VALUES ($i, $v)
        
# select from...
sql result:
    SELECT * FROM $tableName
    
print result.names
print result.header
    

Output:

['id', 'value']
[{'columnTypeName': 'INTEGER', 'precision': None, 'displaySize': 2147483647, 'columnName': 'id', 'columnType': 4, 'isNullable': 0, 'scale': 0}, {'columnTypeName': 'INTEGER', 'precision': None, 'displaySize': 2147483647, 'columnName': 'value', 'columnType': 4, 'isNullable': 0, 'scale': 0}]
    

Finally it is possible to print with simple formating the whole result table with the command:

print result

Example 4.3. sql statement:

tableName = 'sqlStatementTest'
tableDelete(tableName)

# create a new table
sql: CREATE TABLE $tableName (id INTEGER, value INTEGER) 

# insert values
for i in range(5):  
    v = i * 10;
    sql:
        INSERT INTO $tableName (id, value) VALUES ($i, $v)
        
# select from...
sql result:
    SELECT * FROM $tableName

strFormat = '%-10s%-10s'
print strFormat % ('id','value')

for row in result:
    print strFormat % (row[0],row[1])

# set constraints involved with returned records, default 2000 resords
sql result, 2:
    SELECT * FROM $tableName
    
print 
print strFormat % ('id','value')
for row in result:
    print strFormat % (row[0],row[1])
    

Output:

id        value     
0         0         
1         10        
2         20        
3         30        
4         40        

id        value     
0         0         
1         10        
    

Note

In the examples above the special character '$' was used. It is used always if the user wants to refer to a global variable.

There is one more situation when '$' is necessary: if the user needs to use the '#' character. Because it is interpreted by script as a comment it must be preceeded by the '$' character.

Example 4.4. Using the '#' character

# an error will occur
sql x:
    select *, '#' || Class from $tab_name
print x

# the correct version
sql x, 5:
    select *, '$#' || Class from $tab_name
print x