Coursera Learner working on a presentation with Coursera logo and
Coursera Learner working on a presentation with Coursera logo and

The Expectation-Maximization (EM) algorithm may be a thanks to find maximum-likelihood estimates for model parameters when your data is incomplete, has missing data points, or has unobserved (hidden) latent variables. it’s an iterative thanks to approximate the utmost likelihood function. While maximum likelihood estimation can find the “best fit” model for a group of knowledge , it doesn’t work particularly well for incomplete data sets. The more complex EM algorithm can find model parameters albeit you’ve got missing data. It works by choosing random values for the missing data points, and using those guesses to estimate a second set of knowledge . The new values are wont to create a far better guess for the primary set, and therefore the process continues until the algorithm converges on a hard and fast point.

See also: EM Algorithm explained in one picture.

MLE vs. EM

Although Maximum Likelihood Estimation (MLE) and EM can both find “best-fit” parameters, how they find the models are very different. MLE accumulates all of the info first then uses that data to construct the foremost likely model. EM takes a guess at the parameters first—accounting for the missing data—then tweaks the model to suit the guesses and therefore the observed data. the essential steps for the algorithm are:

An initial guess is formed for the model’s parameters and a probability distribution is made . this is often sometimes called the “E-Step” for the “Expected” distribution.

Newly observed data is fed into the model.

The probability distribution from the E-step is tweaked to incorporate the new data. this is often sometimes called the “M-step.”

Steps 2 through 4 are repeated until stability (i.e. a distribution that doesn’t change from the E-step to the M-step) is reached.

The EM Algorithm always improves a parameter’s estimation through this multi-step process. However, it sometimes needs a couple of random starts to seek out the simplest model because the algorithm can hone in on an area maxima that isn’t that on the brink of the (optimal) global maxima. In other words, it can perform better if you force it to restart and take that “initial guess” from Step 1 once again . From all of the possible parameters, you’ll then choose the one with the best maximum likelihood.

In reality, the steps involve some pretty heavy calculus (integration) and conditional probabilities, which is beyond the scope of this text . If you would like a more technical (i.e. calculus-based) breakdown of the method , I highly recommend you read Gupta and Chen’s 2010 paper.


The EM algorithm has many applications, including:

Dis-entangling superimposed signals,

Estimating Gaussian mixture models (GMMs),

Estimating hidden Markov models (HMMs),

Estimating parameters for compound Dirichlet distributions,

Finding optimal mixtures of fixed models.


Weekly newsletter

No spam. Just the latest releases and tips, interesting articles, and exclusive interviews in your inbox every week.