It is impossible to foresee in what kind of environment the scoring code will be used. Thus, the code should be as flexible as possible. The scoring code is independent from data sources, but it requires the programmer to pass the data in the correct form.
The information about the form and order of the input data is stored in the InputSignature. This class was designed to enable access to the information about the structure of the input data.
Sample code for reading InputSigature is presented below. Please note the assumptions concerning this code:
inputSignature is an object of the class Class containing the definition of the class InputSigature
it is required to import all packages responsible for reflexion in Java
Note two important parts: Loop A, Loop B, which are described below.
Example 10.2. Reading InputSignature
// Obtain field information from the inputSignature class Field [] properties = inputSignature.getFields(); //check if the name of the class is correct if(inputSignature.getName().equals("ScoringCode$InputSignature")){ try { //****** LOOP A******** //Iterate thru all fields (properties) from inputSignature. //There is only one field in inputSignature by default, this //loop is here only for the situation when the user adds //some custom fields to the signature for(int i=0; i<properties.length; i++) { //If field name is equal to INPUT_ATTRIBUTES get the name and type of //attributes Field signatureAttribute = properties[i]; if( signatureAttribute.getName().equals("INPUT_ATTRIBUTES") ) { //Treat this field (INPUT_ATTRIBUTES) as an array Object attributeArray = signatureAttribute.get(null); //****** LOOP B******** //Each row in the array contains information about name and type. //Iterate thru array rows to gather that information for(int k=0; k<Array.getLength(attributeArray); k++) { // get k-th row from the array Object attrib[] = (Object [])Array.get(attributeArray, k); //attrib[0] - contains attribute name //attrib[1] - contains type of the attribute } } ... // The rest of code responsible for the interpretation of // the gathered information ... } } catch(Exception e) { throw new GSException("Error occurred when retrieving Input and Output "+ "signature information"); } }
This loop protects against a situation when the user modifies the input signature. Usually this signature contains only one field (property) - it is a public static two-dimensional table named INPUT_ATTRIBUTES.
This loop reads the names and types of attributes which are accepted by the model. The example does not show what can be done with this attributes. The user can store the information about the attributes for further processing or use it at this stage, for example, to create an SQL query or other form of data access from a database or file.
The order of the attributes expected by the model to work properly is the same as the order of the attributes in InputSignature.
A simple application with using of scoring code is presented below. The scoring code is based on a model using the decision trees algorithm. The form of the input data set is CSV (comma separated values), the output data set is in the same format.
Example 10.3. Scoring code in an external application
This is the code of a sample external scoring application.
package test; import java.io.BufferedReader; import java.io.FileReader; import java.io.FileWriter; import java.io.IOException; import java.io.PrintWriter; import java.util.ArrayList; import java.util.Iterator; import java.util.LinkedHashMap; public class ScoringCodeTest { private ScoringCode generatedScoringCode; private Object input[]; private ScoringCode.OutputStructure output; private FileReader inputFileReader; private FileWriter outputFileWriter; private PrintWriter outputPrintWriter; public void score() { input = new Object[21]; generatedScoringCode = new ScoringCode(); output = new ScoringCode.OutputStructure(); //LinkedHashMap is used because it returns the elements in // the same order as they had been stored. LinkedHashMap inputSignature = new LinkedHashMap(); //List is used to store attribute names ArrayList inputAttributeName = new ArrayList(); //Write attributes from input signature into map. //This will be easier to access and to provide attributes in // correct order. for(int i=0; i<ScoringCode.InputSignature.INPUT_ATTRIBUTES.length; i++) { inputSignature.put(ScoringCode.InputSignature.INPUT_ATTRIBUTES[i][0], ScoringCode.InputSignature.INPUT_ATTRIBUTES[i][1]); } // Read a line from the csv file, convert it into numerical // values, and score it. After scoring, write the output to // a csv file. try { String line = ""; String separator = " "; inputFileReader = new FileReader("test/input.csv"); outputFileWriter = new FileWriter("test/output.csv"); BufferedReader bfr = new BufferedReader( inputFileReader ); outputPrintWriter = new PrintWriter( outputFileWriter ); boolean headerLine = true; while((line = bfr.readLine()) != null) { // Split the line into particular values using the // separator character String [] values = line.split( separator ); //The first line contains the names of attributes. if(headerLine) { for(int i=0; i<values.length; i++) inputAttributeName.add(values[i]); headerLine = false; } else { //The next lines contain the values for scoring for ( int i = 0; i< values.length; i++ ) { //Gets attributeType Object attributeType = inputSignature. get(inputAttributeName.get(i)); //System.out.println(attributeType); //This loop is used to obtain the correct // attribute index int attributeIndex = -1; Iterator iter = inputSignature.keySet(). iterator(); while(iter.hasNext()) { attributeIndex++; Object key = iter.next(); if(key.equals(inputAttributeName.get(i))) break; } //Prepare the value with the proper type //obtained from the signature if(values[i]!=null && values[i].length()!=0) { if(attributeType.equals(String.class)) input[attributeIndex] = values[i]; else input[attributeIndex] = new Double (values[i]); } else input[attributeIndex] = null; } // Score the prepared data generatedScoringCode.scoreData( input, output); //Write clientID outputPrintWriter.print( (( Number)input[2]).intValue() + ", " ); //Write the predicted target value outputPrintWriter.print(output.positiveCategoryProbability+ ", "); } } } catch (Exception ex) { ex.printStackTrace(); } finally { try { inputFileReader.close(); outputPrintWriter.close(); } catch (IOException e) { e.printStackTrace(); } } } public static void main(String args[]) { ScoringCodeTest scoringCodeExecutionTest = new ScoringCodeTest(); scoringCodeExecutionTest.score(); } }
Here is a sample csv data file (place it in the file named input.csv in the test package):
preg plas pres skin insu mass pedi age 6.0 148.0 72.0 35.0 0.0 33.6 0.627 50.0 1.0 85.0 66.0 29.0 0.0 26.6 0.351 31.0 8.0 183.0 64.0 0.0 0.0 23.3 0.672 32.0 1.0 89.0 66.0 23.0 94.0 28.1 0.167 21.0 0.0 137.0 40.0 35.0 168.0 43.1 2.288 33.0 5.0 116.0 74.0 0.0 0.0 25.6 0.201 30.0
Here is the sample scoring code generated by AdvancedMiner
package test import java.util.HashMap; public class ScoringCode { /** Temporary map used by prepareData method */ private HashMap tempInput = new HashMap(); /** Temporary map used by prepareData method */ private HashMap tempOutput = null; /** scoreData is main scoring method. Input data have to be provided in correct order. * Please refer to InputSignature to see which data types are supported and how to provide data in correct order. * Transformation (if apply) is done by prepareData method. PrepareData method also deals with categorical data encoding * @param input - input data to score * @param output - OutputStructure - will contain scoring output */ public void scoreData(Object [] input, OutputStructure output) { processRow(prepareData(input), output); } /** getCategoricalValueCode - gets codeded double value for corresponding catgorical */ public double getCategoricalValueCode(HashMap data, String attrName, int attrIndex) { double output=Double.NaN; for(int i=0; i<Signature.INPUT_ATTRIBUTES_VALUE_SET[attrIndex].length; i++){ if(data.get(attrName).equals(Signature.INPUT_ATTRIBUTES_VALUE_SET[attrIndex][i])){ output=i; break; } } if( output==Double.NaN ) throw new RuntimeException("Value "+tempOutput.get(attrName).toString()+" is not in input attributes value set"); return output; } /** prepareData - prepares data for processRow method. * It does all necesary mapping, transformation and categorical * attributes encoding. * @param inputData - provided in correct order data for scoring - according to InputSignature * @return - array of double consits of ready to score data */ public double [] prepareData(Object [] inputData) { tempInput.clear(); //Input attributes mapping tempInput.put("preg", inputData[0]); tempInput.put("plas", inputData[1]); tempInput.put("pres", inputData[2]); tempInput.put("skin", inputData[3]); tempInput.put("insu", inputData[4]); tempInput.put("mass", inputData[5]); tempInput.put("pedi", inputData[6]); tempInput.put("age", inputData[7]); //There are no transformation, so input is copied to output tempOutput = tempInput; //Output attributes mapping double [] output = new double[8]; //Initialize output array with Double.NaN's java.util.Arrays.fill(output, Double.NaN); if(tempOutput.get("preg") != null ) if(tempOutput.get("preg") instanceof Number) output[0]=((Number)tempOutput.get("preg")).doubleValue(); else throw new RuntimeException("Data type doesn't match to data type specified in inputSignature. Can't continue. "); if(tempOutput.get("plas") != null ) if(tempOutput.get("plas") instanceof Number) output[1]=((Number)tempOutput.get("plas")).doubleValue(); else throw new RuntimeException("Data type doesn't match to data type specified in inputSignature. Can't continue. "); if(tempOutput.get("pres") != null ) if(tempOutput.get("pres") instanceof Number) output[2]=((Number)tempOutput.get("pres")).doubleValue(); else throw new RuntimeException("Data type doesn't match to data type specified in inputSignature. Can't continue. "); if(tempOutput.get("skin") != null ) if(tempOutput.get("skin") instanceof Number) output[3]=((Number)tempOutput.get("skin")).doubleValue(); else throw new RuntimeException("Data type doesn't match to data type specified in inputSignature. Can't continue. "); if(tempOutput.get("insu") != null ) if(tempOutput.get("insu") instanceof Number) output[4]=((Number)tempOutput.get("insu")).doubleValue(); else throw new RuntimeException("Data type doesn't match to data type specified in inputSignature. Can't continue. "); if(tempOutput.get("mass") != null ) if(tempOutput.get("mass") instanceof Number) output[5]=((Number)tempOutput.get("mass")).doubleValue(); else throw new RuntimeException("Data type doesn't match to data type specified in inputSignature. Can't continue. "); if(tempOutput.get("pedi") != null ) if(tempOutput.get("pedi") instanceof Number) output[6]=((Number)tempOutput.get("pedi")).doubleValue(); else throw new RuntimeException("Data type doesn't match to data type specified in inputSignature. Can't continue. "); if(tempOutput.get("age") != null ) if(tempOutput.get("age") instanceof Number) output[7]=((Number)tempOutput.get("age")).doubleValue(); else throw new RuntimeException("Data type doesn't match to data type specified in inputSignature. Can't continue. "); return output; } /** decodeTarget is used to decode integer value to its original representation * @param coded value of target * @return decoded value of target */ public String decodeTarget(int codedValue) { return (String)Signature.OUTPUT_TARGET_VALUE_SET[codedValue]; } /** InputSignature class contains data signature, which has to be passed to scoreData method. * This structure shows attributes in name-type pairs and their correct order in input data */ public static class InputSignature { /** INPUT_ATTRIBUTES - this array contains names-types pairs which describes input attrubutes * The Number in comment represents valid position of particular attribute in input data */ public static Object [][] INPUT_ATTRIBUTES = new Object [][] { {"preg", double.class }, // inputRow[0] {"plas", double.class }, // inputRow[1] {"pres", double.class }, // inputRow[2] {"skin", double.class }, // inputRow[3] {"insu", double.class }, // inputRow[4] {"mass", double.class }, // inputRow[5] {"pedi", double.class }, // inputRow[6] {"age", double.class } // inputRow[7] }; } /** Signature - this class describes how data should be passed to method processRow * by Transformation. If there is no transformation this is the same as InputSignature. * This structure shows attributes in pairs name-type and their correct order in input data * It also presents the set of categorical values set. */ public static class Signature { /** INPUT_ATTRIBUTES - this array contains names-types pairs which describes input attrubutes * The Number in comment represents valid position of particular attribute in input data */ public static Object [][] INPUT_ATTRIBUTES = new Object [][] { {"preg", double.class }, // inputRow[0] {"plas", double.class }, // inputRow[1] {"pres", double.class }, // inputRow[2] {"skin", double.class }, // inputRow[3] {"insu", double.class }, // inputRow[4] {"mass", double.class }, // inputRow[5] {"pedi", double.class }, // inputRow[6] {"age", double.class } // inputRow[7] }; /** INPUT_ATTRIBUTES_VALUE_SET contains information about the encoding of categorical attributes * Codes for values are assigned according to order of attributes in array. First attribute has code equal to 0, * second atribute has code equal to 1, and so on. */ public static Object [][] INPUT_ATTRIBUTES_VALUE_SET = new Object [][] { { null }, // preg value set -> inputRow [0] { null }, // plas value set -> inputRow [1] { null }, // pres value set -> inputRow [2] { null }, // skin value set -> inputRow [3] { null }, // insu value set -> inputRow [4] { null }, // mass value set -> inputRow [5] { null }, // pedi value set -> inputRow [6] { null } // age value set -> inputRow [7] }; /** OUTPUT_TARGET_VALUE_SET contains information about coded target values */ public static Object [] OUTPUT_TARGET_VALUE_SET = new Object [] { "tested_negative", // Target value code = java.util.AbstractList$Itr@1302be2 "tested_positive" // Target value code = java.util.AbstractList$Itr@1302be2 }; } /** positive modelled category probability (predicted from the model) */ /** OutputStructure encloses information about names and types which can be obtained from processRow method * This structure can contain the following types: double, int, String, double [], int []. * OutputStructure is used by Górnik System to determine the output. */ public static class OutputStructure { public double positiveCategoryProbability; } /////////////////////////// DATA TAKEN FROM AdvancedMiner MODEL ////////////////////////////////////// //Values assigned to variables listed below are taken from model public final static int LOGIT_MODEL = 1; public final static int PROBIT_MODEL = 2; public final static int CUMULATIVE_LOGIT_MODEL = 3; public final static int CUMULATIVE_PROBIT_MODEL = 4; public final static int MULTINOMIAL_LOGIT_MODEL = 5; public final static int MULTINOMIAL_PROBIT_MODEL = 6; public final int modelType = LOGIT_MODEL; /**Index of the positive target category ("event" category) in Signature.OUTPUT_TARGET_VALUE_SET table.*/ public final int positiveTargetCategoryIndex = 0; public final int varNo = 8; public final int intercept = 1; // 0 for model without intercept /** vector of estimated parameters from the model */ protected double[] beta = new double[varNo+intercept]; ///////////////////////////////////// SCORING CODE //////////////////////////////////////////// /** computes predicted linear score: y'=sum_i(beta[i]*input[i]) * @param input scoring data row */ protected double getLinearScore(double[] input) { double linearScore = (intercept > 0 ? beta[0] : 0); for(int k=0; k<varNo; k++) { if (Double.isNaN(input[k])) throw new RuntimeException("Missing values are not supported."); linearScore += beta[k+intercept]*input[k]; } return linearScore; } /** computes logit transformation on the basis of linear score: * y = 1 / (1 + e^(-linearScore)) * @param input scoring data row */ protected double score(double[] input) { double linearScore = getLinearScore(input); return 1.0 / (1.0 + Math.exp(-linearScore)); } //Input data row processing /** processRow - works with a single data row. * Data returned by processRow method is placed in OutputStructure output * <b>Remarks:<b> * Do not use this function, it uses mapped, and coded values in double array. Better use scoreData method. * @param input - array of double data * @param output - this structure will be the scoring output. */ public void processRow(double[] inputRow, OutputStructure output) { //Input data checking if (inputRow.length != varNo) throw new RuntimeException("Input array size doesn't match the number of variables in the model [" + varNo + "]"); output.positiveCategoryProbability = score(inputRow); } /////////////////////////////////// END OF SCORING CODE ////////////////////////////////////////// public ScoringCode() { beta[0] = 8.40469636691381; beta[1] = -0.12318229835243809; beta[2] = -0.035163714606857285; beta[3] = 0.013295546904304663; beta[4] = -6.18964364875373E-4; beta[5] = 0.0011916989841621699; beta[6] = -0.08970097003093419; beta[7] = -0.945179740621142; beta[8] = -0.014869004744466634; } }