The classification process helps with the categorization of the dataset into different classes. A machine learning model enables you to:
 Frame the problem,
 Collect the data,
 Add the variables,
 Train the model,
 Measure the performance,
 Improve the model with the help of cost function.
But how can we measure the performance of a model? By comparing the predicted and actual model? However, this will not solve the classification problem. A confusion matrix can help you to analyze the data and solve the problem. Let’s understand how this technique helps the machine learning model.
Confusion Matrix
The confusion matric technique helps with performance measurement for machine learning classification. With this type of model, you can distinguish and classify the model with the known true values on the set of test data. The term confusion matrix is straightforward yet confusing. This article will simplify the concept so you can easily understand and create a confusion matrix on your own.
Calculating the Confusion Matrix
Follow these simple steps to calculate the confusion matrix for data mining:
Step 1
Estimate the outcome values of the dataset.
Step 2
Test the dataset with the help of expected output.
Step 3
Predict the rows in your test dataset.
Step 4
Calculate the expected outcomes and predictions. You need to consider the:
 Total correct predictions of the class
 Total incorrect predictions of the class
After performing these steps, you need to organize the numbers in the below methods:
 Link each row of the matrix with predicted class
 Correspond each column of the matrix with the actual class
 Enter the correct and incorrect classification of the model into the table
 Include the total of the correct predictions into the predicted column. Also, add the class value in the expected row.
 Include the total of the incorrect predictions into the expected row and class value in the predicted column.
Understanding the Outcome in a Confusion Matrix

True Positive
The actual and predicted values are the same. The predicted value of the model is positive, along with an actual positive value.

True Negative
The actual and predicted values are the same. The predicted value of the model is negative, along with an actual negative value.

False Positive (Type 1 Error)
The actual and predicted values are not the same. The predicted value of the model is positive and falsely predicted. However, the actual value is negative. You can refer to this as a Type 1 error.

False Negative (Type 2 Error)
The actual and predicted values are not the same. The predicted value of the model is negative and falsely predicted. However, the actual value is positive. You can refer to this error as a Type 2 error.
Importance of Confusion Matrix
Before answering the question, we should understand the hypothetical classification problem. Suppose you are predicting the number of people infected with the virus before showing symptoms. This way, you can easily isolate them and ensure a healthy population. We can choose two variables to define the target population: Infected and noninfected.
Now you might think, why use a confusion matrix when the variables are too straightforward. Well, this technique helps with the accuracy of the classification. The data in this example is the imbalanced dataset. Let’s suppose we have 947 negative data points and three positive data points. Now we will calculate the accuracy with this formula:
With the help of the following table, you can check the accuracy:
The total output values will be:
TP = 30, TN = 930, FP = 30, FN = 10
So you can calculate the accuracy of the model as:
96% accuracy for a model is incredible. But you can only generate the wrong idea from the outcome. According to this model, you can predict the infected people 96% of the time. However, the calculation predicts that 96% of the population will not become infected. However, sick people are still spreading the virus.
Does this model look like a perfect solution for the problem, or should we measure the positive cases and isolate them to stop the spread of the virus. Therefore we use a confusion matrix to solve these types of problems. Here are some benefits of the confusion matrix:
 The matrix helps with the classification of the model while making the predictions
 This technique signifies the type and the insight of the errors so you can understand the case easily
 You can overcome the restriction with the accurate classification of the data
 The columns of the confusion matrix will represent the instances of the predicted class
 Each row will indicate the instances of the actual class
 The confusion matrix will highlight the errors that the classifier
Confusion Matrix in Python
Now, as you know the concept of the confusion matrix, you can practice using the following code in Python with the help of the Scikitlearn library.
# confusion matrix in sklearn  
fromsklearn.metricsimportconfusion_matrix  
fromsklearn.metricsimportclassification_report  
# actual values  
actual = [1,0,0,1,0,0,1,0,0,1]  
# predicted values  
predicted = [1,0,0,1,0,0,0,1,0,0]  
# confusion matrix  
matrix =confusion_matrix(actual,predicted, labels=[1,0])  
print(‘Confusion matrix : \n’,matrix)  
# outcome values order in sklearn  
tp, fn, fp, tn=confusion_matrix(actual,predicted,labels=[1,0]).reshape(1)  
print(‘Outcome values : \n’, tp, fn, fp, tn)  
# classification report for precision, recall f1score and accuracy  
matrix =classification_report(actual,predicted,labels=[1,0])  
print(‘Classification report : \n’,matrix) 
Conclusion
The confusion matrix helps with the restriction in the accuracy of the classification method. Also, it highlights important details about different classes. Furthermore, it analyzes the variables and the data so you can compare actual data with the prediction.