Cross-Entropy 

Cross-entropy misfortune, or log misfortune, gauges the presentation of an order model whose yield is likelihood esteem somewhere in the range of 0 and 1. Cross-entropy misfortune increments as the anticipated likelihood veers from the real mark. So foreseeing a likelihood of .012 when the genuine perception name is 1 would be terrible and bring about high misfortune esteem. An ideal model would have a log loss of 0. 

_images/cross_entropy.png

The diagram above shows the scope of conceivable misfortune esteems given a genuine perception (isDog = 1). As the anticipated likelihood approaches 1, log misfortune gradually diminishes. As the anticipated likelihood diminishes, be that as it may, the log misfortune increments quickly. Log misfortune punishes the two kinds of mistakes, however particularly those expectations that are certain and wrong! 

Cross-entropy and log misfortune are somewhat unique relying upon setting, yet in AI while computing mistake rates somewhere in the range of 0 and 1 they resolve to something very similar.

Code

def CrossEntropy(yHat, y):

    if y == 1:

      return -log(yHat)

    else:

      return -log(1 – yHat)

Math

In binary classification, where the number of classes M equals 2, cross-entropy can be calculated as:

−(ylog(p)+(1−y)log(1−p))

If M>2 (i.e. multiclass classification), we calculate a separate loss for each class label per observation and sum the result.

−∑c=1Myo,clog(po,c)

Note

M – number of classes (dog, cat, fish)

log – the natural log

y – binary indicator (0 or 1) if class label c is the correct classification for observation o

p – predicted probability observation o is of class c

Hinge

Used for classification.

Code

def Hinge(yHat, y):

    return np.max(0, 1 – yHat * y)

Huber

Typically used for regression. It’s less sensitive to outliers than the MSE as it treats error as square only inside an interval.

Lδ={12(y−y^)2δ((y−y^)−12δ)if|(y−y^)|<δotherwise

Code

def Huber(yHat, y, delta=1.):

    return np.where(np.abs(y-yHat) < delta,.5*(y-yHat)**2 , delta*(np.abs(y-yHat)-0.5*delta))

Further information can be found at Huber Loss in Wikipedia.

Kullback-Leibler

Code

def KLDivergence(yHat, y):

    “””

    :param yHat:

    :param y:

    :return: KLDiv(yHat || y)

    “””

    return np.sum(yHat * np.log((yHat / y)))

MAE (L1)

Mean Absolute Error, or L1 loss. Excellent overview below [6] and [10].

Code

def L1(yHat, y):

    return np.sum(np.absolute(yHat – y))

MSE (L2)

Mean Squared Error, or L2 loss. Excellent overview below [6] and [10].

def MSE(yHat, y):

    return np.sum((yHat – y)**2) / y.size

def MSE_prime(yHat, y):

    return yHat – y