Confusion Matrix is an N x N matrix, in which rows correspond to correct decision classes and the columns to decisions made by the classifier. The number ni,j at the intersection of i-th row and j-th column is equal to the number of cases from the i-th class which have been classified as belonging to the j-th class.
In the previous parts of the tutorial (part 1, part 2) we introduced quantitative indicators of classification model quality. In the next two parts we will take a closer look at a couple of graphical indicators. The first one is called the Confusion Matrix (the name “Contingency Table” is also used).
Various forms of Confusion Matrices let us more easily observe certain characteristics of the classification (i.e. the cost incurred by incorrect classifications).
Confusion Matrix in the gains and losses form contains the sums of costs due to classification decisions.
Cut off point is a certain threshold value which can be used to determine whether an observation belongs to a particular class.
if P(class(x)=1) >= alfa, then assign to class 1
alfa – the cut off point
P (class(x)=1) – probability, that the given element belongs to the class denoted by 1
If the probability (calculated by our classification model) that a given loan applicant will not be good at repaying the loan is greater or equal to 60%, then assign this applicant to the class of bad debtors, otherwise assign him/her to the class of good debtors.
Different cut off points can be considered for the same problem (i.e., assessing creditworthiness), which will lead to different confusion matrices. By analyzing these matrices the optimal cut off point can be selected.