Importing and exporting data and other database operations

Importing MS Excel spreadsheets

The xlsImport command copies the data from a Microsoft Excel spreadsheet (XLS file) to a table in a database.

Syntax:

xlsImport(file, [alias.]table[, hasHeader = 0 | 1][, sheetNumber = N])
  • file - the name of the XLS file

  • alias - an alias to the database

  • table - the name of thr table to be created

  • hasHeader - 1 if the first row in the CSV file contains column names, 0 otherwise. Default : 1

  • sheetNumber - the number of the sheet from which the data will be read. Default : 0 (the first sheet).

Importing CSV files

The csvImport command copies the data from a CSV file to a table in database.

Syntax:

csvImport(file, alias.table[, hasHeader = 0 | 1][, rowsToAnalyze = n]
[,textSeparator = c][, separator = c][, breakOnError = 0 | 1][, deleteTable = 0 | 1]
columnNames = [s1, s2, ... ], dateFormat = s | { n1 : s1, n2 : s2, ... },
[,decimalSeparator = c][, columnType = {n1 : t1, n2 : t2, ... }]
[,language = aa | ab | af | am | ... ][, quotedString = 0 | 1]
[,encoding = "UTF-8" | "UTF-16" | ... ][, skipErroneous = 0 | 1]
[,newLinesAllowed = 0 | 1][, lineConverter = converterObject.converterMethod] )
  • file - the name of the CSV file

  • alias - an alias to the database

  • table - the name of thr table to be created

  • hasHeader - 1 if the first row in the CSV file contains column names, 0 otherwise. Default : 1

  • rowsToAnalyze - the number of rows to analyze for determining column type. If all rows are analyzed, loading the data may take up to twice as long as in the case when only the first row is analyzed. Default : all rows.

  • textSeparator - the character that separates text entries. Default : '"' (double quote), textSeperator=None is equivalent to textSeparator="".

  • separator - the character that separates consecutive columns. Default : ',' (comma)

  • breakOnError - if 1, import stops when an error occurs. Default: 1.

  • deleteTable - if 1, any existing table with the same name will be overwritten. The value 0 throws an exception if the output table already exists. Default : 0

  • prefix - prefix for automatically generated column names. Default : "column".

  • columnNames - the names of all columns in the file.

  • dateFormat - format of a date entries in the imported table. A single format can be specified for all date entries in the table, or every column may have a different format - in this case n1, n2, ... is a 1-based column index. The syntax of date format can be found at http://download.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html. By default, the following formats are tried: "yyyy-MM-dd", "yyyy.MM.dd", "yyyy/MM/dd", "dd-MM-yyyy", "dd.MM.yyyy", "dd/MM/yyyy". The string in the imported table must match the declared (or default, if declared was not specified) format perfectly to be interpreted as a date entry.

  • language - Define language to interprate date fields (for instance in date format like "ddMMMyyy"). Two-letter lower-case symbols are used (ISO 639). This parameter can be set for each column in form of string (i.e. "aa | ab | af | am") or as a dictionary (i.e. {1:'aa', 2:'af' }). For more information about language codes see the Language Codes appendix.

  • decimalSeparator - the character used as the decimal separator. Default: '.'

  • columnType - the type of a column - n1, n2, ... is a 1-based column index, t1, t2, ... can be: LONGVARCHAR, DOUBLE or DATE. By default column types are determined automatically.

  • quotedAsString - if 1, numerical values in quotes are interpreted as strings. Default: 0.

  • encoding - Input file encoding. Default : operating system default encoding. The complete list of encodings is available here: http://download.oracle.com/javase/1.3/docs/guide/intl/encoding.doc.html

  • skipErroneous - if 1, rows with errors (such as: invalid data type, invalid number of values in columns) are skipped. Default: 0.

  • newLinesAllowed - if 1, new line between column separators is allowed. Default: 0.

  • maxWarnings - the maximum number of displayed warnings. Default: 100.

  • lineConverter - enables line transformations (including multiplying or deleting) prior to import. Line converter is defined as a Java object converterObject belonging to the class converterClass. This class must define a converterMethod method, which does the actual transformations. The converterClass and converterObject must defined in the script prior to calling the csvImport command. See the examples below.

Example 4.5. Importing a CSV file

csvImport('c:\\some_data.csv', 'some_data', separator = ';', columnNames = ['first', 'second'], dateFormat = {1: 'MM-dd-yyyy'}, columnType = {1: DATE, 2: LONGVARCHAR})
            

Example 4.6. Importing a CSV file, but skipping the first 10 lines

# Example 1 - skiping the first 10 lines
class MyConverter:
def convert(self, line, linenum):
    # print '*'*5, linenum, repr(line)
    if linenum < 10:
        return '' # delete the line
    else:
        return line
t = MyConverter()
csvImport(r'C:\test.csv', 'cup664', lineConverter=t.convert)
            

Exporting data to MS Excel spreadsheets

The xlsExport copies the data from a database table to a Microsoft Excel (XLS) spreadsheet.

Syntax:

xlsExport([alias.]table, file[, hasHeader = 0 | 1][, sheetName = name]
[, xlsDateFormat=format_string][, use1904windowing = 0 | 1]) 
  • alias - an alias for the database

  • table - the name of the table

  • file - the name of the XLS file to be created, including path

  • hasHeader - 1 if the first row in the CSV file should contain column names, 0 otherwise. Default: 1

  • sheetName - the name of the sheet to create in the output table. If omitted or null, the name of the table will be used.

  • xlsDateFormat - a string specifying the format of dates in the output file: y - year, m - month, d - day. The default value is 'yyyy.m.d'.

  • use1904windowing - if 1, all dates will be saved using the 1904 date system in Excel (i.e. dates are stored as the number of days from January 1, 1904. Default: 0, i.e. dates are stored as the number of days from January 1, 1900.

Exporting data to CSV files

The csvExport command copies the data from a table in database to a CSV file.

Syntax:

csvExport(alias.table, file[, hasHeader = 0 | 1][, deleteFile = 0 | 1]
[,separator = c][, dateFormat = s][, decimalSeparator = c][, encoding = "UTF-8"|"UTF-16"| ... ]
[,language = aa | ab | af | am | ... ][, textSeparator = c][, forceTextSeparator = 0 | 1 ])
  • alias - an alias for the database

  • table - the name of table

  • file - the name of the CSV file to be created, including path

  • hasHeader - 1 if the first row in the CSV file should contain column names, 0 otherwise. Default: 1

  • deleteFile - if 1, any existing file with the same name will be overwritten. Default: 0

  • separator - the character that separates consecutive columns. Default: ',' (comma)

  • dateFormat - format of date entries. The syntax for date format can be found at http://download.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html. Default: "yyyy-MM-dd".

  • decimalSeparator - the character used the decimal separator. Default: '.'

  • encoding - The encoding of the output file. Default: operating system default. The complete list of encodings is available here: http://java.sun.com/j2se/1.3/docs/guide/intl/encoding.doc.html

  • language - a two-letter lower-case symbol (ISO 639) which defines the language used to interprate date fields (for instance in the date format "ddMMMyyy"). For more information about language codes see the Language Codes appendix.

  • textSeparator - the character that separates text. Default : '"' (quote), textSeperator=None is equivalent to no textSeparator.

  • forceTextSeparator - if set to 0, textSeparator is ommited unless the exported file will cause problems while importing. Defaul: 1 - textSeparator will be used.

Example 4.7. Exporting data to a CSV file

csvExport('some_data', 'c:\\some_data.csv', deleteFile = 1, hasHeader = 1, dateFormat = 'MM-dd-yyyy')
            

Getting column list for a database table

The tableColumns command returns a list object with column names in the selected database table. If the specified table does not exist, an exception is thrown. The returned list is mutable.

Syntax:

tableColumns(alias.table)
  • table - the name of the table.

  • alias - the database alias

Example 4.8. Getting column list for a database table:

print tableColumns('iris')
>>> ['sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'Class']
            

Deleting a database table

The tableDelete command deletes a database table. It returns 1 if the table existed, and 0 if not.

Syntax:

tableDelete(alias.table)
  • table - the name of the table.

  • alias - the database alias

Example 4.9. Deleting a database table:

tableDelete('some_table')
            

Checking for a database table existence.

The tableExists command checks whether a specified table exists and returns a boolean result.

Syntax:

tableExists(alias.table)
  • table - the name of the table.

  • alias - the database alias

Example 4.10. Checking the existence of a database table:

tableExists('some_table')