Surprisingly few software engineers and scientists seem to understand it, which makes me sad because it’s such a general and powerful tool for combining information within the presence of uncertainty. sometimes its ability to extract accurate information seems almost magical— and if it seems like I’m talking this up an excessive amount of, then take a glance at this previously posted video where I demonstrate a Kalman filter deciding the orientation of a free-floating body by watching its velocity. Totally neat!

What is a Kalman filter?

You can use a Kalman filter in anywhere where you’ve got uncertain information about some dynamic system, and you’ll make an informed guess about what the system goes to try to to next. albeit messy reality comes along and interferes with the clean motion you guessed about, the Kalman filter will often do a really good job of deciding what actually happened. And it can cash in of correlations between crazy phenomena that you simply maybe wouldn’t have thought to exploit!

Kalman filters are ideal for systems which are continuously changing. they need the advantage that they’re light on memory (they don’t got to keep any history aside from the previous state), and that they are in no time , making them compatible for real time problems and embedded systems.

The math for implementing the Kalman filter appears pretty scary and opaque in most places you discover on Google. That’s a nasty state of affairs, because the Kalman filter is really super simple and straightforward to know if you check out it within the right way. Thus it makes an excellent article topic, and that i will plan to illuminate it with many clear, pretty pictures and colours . The prerequisites are simple; all you would like may be a basic understanding of probability and matrices.

I’ll start with a loose example of the type of thing a Kalman filter can solve, but if you would like to urge right to the shiny pictures and math, be happy to leap ahead.

What can we do with a Kalman filter?

Let’s make a toy example: You’ve built a touch robot which will wander around within the woods, and therefore the robot must know exactly where it’s in order that it can navigate.

Your little robot

We’ll say our robot features a state xk→, which is simply an edge and a velocity:

xk→=(p⃗ ,v⃗ )

Note that the state is simply an inventory of numbers about the underlying configuration of your system; it might be anything. In our example it’s position and velocity, but it might be data about the quantity of fluid during a tank, the temperature of a car engine, the position of a user’s finger on a touchpad, or any number of belongings you got to keep track of.

Our robot also features a GPS sensor, which is accurate to about 10 meters, which is sweet , but it must know its location more precisely than 10 meters. There are many gullies and cliffs in these woods, and if the robot is wrong by quite a couple of feet, it could fall off a cliff. So GPS by itself isn’t ok .

Oh no.

We might also know something about how the robot moves: It knows the commands sent to the wheel motors, and its knows that if it’s headed in one direction and zip interferes, at subsequent instant it’ll likely be further along that very same direction. But in fact it doesn’t know everything about its motion: it’d be buffeted by the wind, the wheels might slip a touch bit, or roll over bumpy terrain; therefore the amount the wheels have turned won’t exactly represent how far the robot has actually traveled, and therefore the prediction won’t be perfect.

The GPS sensor tells us something about the state, but only indirectly, and with some uncertainty or inaccuracy. Our prediction tells us something about how the robot is moving, but only indirectly, and with some uncertainty or inaccuracy.

But if we use all the knowledge available to us, can we get a far better answer than either estimate would give us by itself? in fact the solution is yes, and that’s what a Kalman filter is for.

How a Kalman filter sees your problem

Let’s check out the landscape we’re trying to interpret. We’ll continue with an easy state having only position and velocity.

x⃗ =[pv]

We don’t know what the particular position and velocity are; there are an entire range of possible combinations of position and velocity which may be true, but a number of them are more likely than others:

The Kalman filter assumes that both variables (postion and velocity, in our case) are random and Gaussian distributed. Each variable features a mean μ, which is that the center of the random distribution (and its presumably state), and a variance σ2, which is that the uncertainty:


In the above picture, position and velocity are uncorrelated, which suggests that the state of 1 variable tells you nothing about what the opposite could be .

The example below shows something more interesting: Position and velocity are correlated. The likelihood of observing a specific position depends on what velocity you have:

gauss_3This quite situation might arise if, for instance , we are estimating a replacement position supported an old one. If our velocity was high, we probably moved farther, so our position are going to be more distant. If we’re moving slowly, we didn’t get as far.

This kind of relationship is basically important to stay track of, because it gives us more information: One measurement tells us something about what the others might be . And that’s the goal of the Kalman filter, we would like to squeeze the maximum amount information from our uncertain measurements as we possibly can!

This correlation is captured by something called a covariance matrix. In short, each element of the matrix Σij is that the degree of correlation between the ith state variable and therefore the jth state variable. (You could be ready to guess that the covariance matrix is symmetric, which suggests that it doesn’t matter if you swap i and j). Covariance matrices are often labelled “Σ”, so we call their elements “Σij”.


Describing the matter with matrices

We’re modeling our knowledge about the state as a Gaussian blob, so we’d like two pieces of data at time k: We’ll call our greatest estimate x^k (the mean, elsewhere named μ ), and its covariance matrix Pk.


(Of course we are using only position and velocity here, but it’s useful to recollect that the state can contain any number of variables, and represent anything you want).

Next, we’d like how to seem at the present state (at time k-1) and predict subsequent state at time k. Remember, we don’t know which state is that the “real” one, but our prediction function doesn’t care. It just works on all of them, and provides us a replacement distribution:


gauss_8It takes every point in our original estimate and moves it to a replacement predicted location, which is where the system would move if that original estimate was the proper one.


Let’s apply this. How would we use a matrix to predict the position and velocity at subsequent moment within the future? We’ll use a very basic kinematic formula:


In other words:


We now have a prediction matrix which provides us our next state, but we still don’t skills to update the covariance matrix.

This is where we’d like another formula. If we multiply every point during a distribution by a matrix A, then what happens to its covariance matrix Σ?

Well, it’s easy. I’ll just offer you the identity:


So combining (4) with equation (3):


External influence

We haven’t captured everything, though. There might be some changes that aren’t associated with the state itself— the surface world could be affecting the system.

For example, if the state models the motion of a train, the train operator might press on the throttle, causing the train to accelerate. Similarly, in our robot example, the navigation software might issue a command to show the wheels or stop. If we all know this extra information about what’s happening within the world, we could stuff it into a vector called uk→, do something with it, and add it to our prediction as a correction.

Let’s say we all know the expected acceleration a thanks to the throttle setting or control commands. From basic kinematics we get:


In matrix form:


Bk is named the control matrix and uk→ the control vector. (For very simple systems with no external influence, you’ll omit these).

Let’s add another detail. What happens if our prediction isn’t a 100% accurate model of what’s actually going on?

External uncertainty

Everything is ok if the state evolves supported its own properties. Everything remains fine if the state evolves supported external forces, goodbye as we all know what those external forces are.

But what about forces that we don’t know about? If we’re tracking a quadcopter, for instance , it might be buffeted around by wind. If we’re tracking a wheeled robot, the wheels could slip, or bumps on the bottom could slow it down. We can’t keep track of those things, and if any of this happens, our prediction might be off because we didn’t account for those extra forces.

We can model the uncertainty related to the “world” (i.e. things we aren’t keeping track of) by adding some new uncertainty after every prediction step:


Every state in our original estimate could have moved to a variety of states. Because we like Gaussian blobs such a lot , we’ll say that every point in x^k−1 is moved to somewhere inside a Gaussian blob with covariance Qk. differently to mention this is often that we are treating the untracked influences as noise with covariance Qk.

gauss_10aThis produces a replacement Gaussian blob, with a special covariance (but an equivalent mean):


We get the expanded covariance by simply adding Qk, giving our complete expression for the prediction step:


In other words, the new best estimate may be a prediction made up of previous best estimate, plus a correction for known external influences.

And the new uncertainty is predicted from the old uncertainty, with some additional uncertainty from the environment.


All right, so that’s easy enough. we’ve a fuzzy estimate of where our system could be , given by x^k and Pk. What happens once we get some data from our sensors?

Refining the estimate with measurements

We might have several sensors which give us information about the state of our system. For the nonce it doesn’t matter what they measure; perhaps one reads position and therefore the other reads velocity. Each sensor tells us something indirect about the state— in other words, the sensors operate a state and produce a group of readings.


gauss_12Notice that the units and scale of the reading won’t be an equivalent because the units and scale of the state we’re keeping track of. you would possibly be ready to guess where this is often going: We’ll model the sensors with a matrix, Hk.

We can find out the distribution of sensor readings we’d expect to ascertain within the usual way:

μ⃗ expectedΣexpected=Hkx^k=HkPkHTk(8)

One thing that Kalman filters are great for is handling sensor noise. In other words, our sensors are a minimum of somewhat unreliable, and each state in our original estimate might end in a variety of sensor readings.gauss_12


From each reading we observe, we’d guess that our system was during a particular state. But because there’s uncertainty, some states are more likely than others to possess have produced the reading we saw:gauss_11


We’ll call the covariance of this uncertainty (i.e. of the sensor noise) Rk. The distribution features a mean adequate to the reading we observed, which we’ll call zk→.

So now we’ve two Gaussian blobs: One surrounding the mean of our transformed prediction, and one surrounding the particular sensor reading we got.


We must attempt to reconcile our guess about the readings we’d see supported the anticipated state (pink) with a special guess supported our sensor readings (green) that we actually observed.

So what’s our new presumably state? For any possible reading (z1,z2), we’ve two associated probabilities: (1) The probability that our sensor reading zk→ may be a (mis-)measurement of (z1,z2), and (2) the probability that our previous estimate thinks (z1,z2) is that the reading we should always see.

If we’ve two probabilities and that we want to understand the prospect that both are true, we just multiply them together. So, we take the 2 Gaussian blobs and multiply them:


What we’re left with is that the overlap, the region where both blobs are bright/likely. And it’s tons more precise than either of our previous estimates. The mean of this distribution is that the configuration that both estimates are presumably , and is therefore the simplest guess of truth configuration given all the knowledge we’ve .

Hmm. This seems like another Gaussian blob.


As it seems , once you multiply two Gaussian blobs with separate means and covariance matrices, you get a replacement Gaussian blob with its own mean and covariance matrix! Maybe you’ll see where this is often going: There’s need to be a formula to urge those new parameters from the old ones!