Coursera Learner working on a presentation with Coursera logo and
Coursera Learner working on a presentation with Coursera logo and

In machine learning, you can solve predictive modeling through classification problems. For each observation in the model, you must predict the class label. The input data will contain any one of the following variables: categorical variables or continuous variables. However, the output will always contain a categorical variable. We can understand this concept with the following example. Let’s say we are predicting the weather in the neighborhood. We will consider the time of the year and weather information as input features. The weather information will include temperature, wind speed, humidity, cloudy/sunny. We will generate output as to whether it will rain or not. In another example, we can predict if the email is spam or not by considering the sender’s information and email’s content as the output.

Understanding Log Loss

Log loss is an essential classification metric for predictions based on probabilities. Although the interpretation of the raw log-loss values is a tricky task, you will find log-loss an effective technique to compare one machine learning model with another. Keep in mind that to find good predictions for any problem, you should consider the lower log-loss value. You can also refer to Log Loss as cross-entropy loss or logistic loss.

The loss function is useful for multinominal models such as logistic regression and its extensions. The extensions include neural networks and other types of models. In other words, the loss function is the negative log-likelihood in a logistic model. Provided that the model returns (y_pred) probabilities for training the data (y_true).

You can only define log loss for two labels and more. The equation for log loss, considering the first sample with probability estimate p=Pr (y=1) and true label y∈{0,1} would be:


Examples of Log Loss

Suppose that the predicted probabilities of the model for three houses as [0.8, 0.4, 0.1]. Only the last one from all these houses was not sold. Therefore, you will numerically represent the final result from these inputs as [1, 1, 0].

Log Loss and Python

Below, we will discuss various types of loss functions for concrete loss function. We will use Python for the calculation:

  • Mean Squared Error Loss

The Mean Squared Error Loss is a regression loss function. You will calculate MSE as the average of the squared variance between predicted values and actual values. No matter what sign the predicted values and the actual values contain, you will always receive a positive result. The perfect value would be 0.0. Despite the fact that you can make a negative score and utilize loss value for the maximization optimization process, the result will be minimum. The following Python function will calculate the means squared error. Furthermore, you can make a list of predicted and actual real-valued quantities.

# calculate mean squared error

def mean_squared_error(actual, predicted):

                sum_square_error = 0.0

                for i in range(len(actual)):

                                sum_square_error += (actual[i] – predicted[i])**2.0

                mean_square_error = 1.0 / len(actual) * sum_square_error

                return mean_square_error

You should use the mean squared error() function to implement the error loss efficiently.

  • Cross-Entropy Loss (or Log Loss)

You can refer to cross-entropy loss as a logarithmic loss, cross-entropy, log loss, or logistic loss. It indicates that every probability that you predict compares with the actual class output value as 0 or 1. This technique calculates the score. This score will penalize the probability due to the distance between output value and expected value. The nature of the penalty will be logarithmic. The large difference will contain a huge score, such as 0.9 or 10. However, smaller differences will include small scores such as 0.1 or 0.2.

The model with accurate probabilities will contain log loss or cross-entropy of 0.0. It indicates that cross-entropy loss is at a minimum, and smaller values will represent a good model instead of larger ones. Among all examples, the cross-entropy for two-class prediction or binary problems will calculate average cross-entropy.

The following Python functions will help you calculate Log Loss. You have to implement this pseudocode and compare the 0 and 1 values and predict the probabilities for class 1. Therefore, you will be able to calculate the Log Loss:

from math import log


# calculate binary cross entropy

def binary_cross_entropy(actual, predicted):

sum_score = 0.0

for i in range(len(actual)):

sum_score += actual[i] * log(1e-15 + predicted[i])

mean_sum_score = 1.0 / len(actual) * sum_score

return -mean_sum_score

To avoid the error, we must add a small value in the predicted probabilities. This means the best possible loss will be the value closer to zero, but it should not be exactly zero. You can calculate the cross-entropy for multiple-class classification. Based on each class, the predictions should include predicted possibilities and contain binary features. Then, the cross-entropy will be the sum of the average and binary features of all the examples in the dataset.

The following Python function will help you calculate the cross-entropy of the encoded values list. It will help compare the predicted possibilities and encoded values for each class:

from math import log


# calculate categorical cross entropy

def categorical_cross_entropy(actual, predicted):

                sum_score = 0.0

                for i in range(len(actual)):

                                for j in range(len(actual[i])):

                                                sum_score += actual[i][j] * log(1e-15 + predicted[i][j])

                mean_sum_score = 1.0 / len(actual) * sum_score

                return -mean_sum_score

You should use the log_loss() function to implement the cross-entropy efficiently.


You should carefully interpret the skills of the model using log-loss because of the low log-loss value and imbalance dataset. While creating a statistical model, it should accomplish the baseline log-loss score depending on the given dataset. If it does not achieve the log score, the trained statistical model is inaccurate and not helpful. In that case, you should use a better model to find the log loss of a probability.


Weekly newsletter

No spam. Just the latest releases and tips, interesting articles, and exclusive interviews in your inbox every week.