Coursera Learner working on a presentation with Coursera logo and
Coursera Learner working on a presentation with Coursera logo and

Testing is not the event. We have a test for cancer, which is separate from the event of actually having cancer. There is a test for spam, separate from the event of actually having a spam message.

The tests are imperfect. Tests detect things that don’t exist (false positive), and things that exist are missing (false negative). People often use test results without fixing test errors.

The false positives distort the results. Suppose we are searching for something really rare (1 in a million). Also with a good test, it is likely that a positive result is really a false positive on someone in 999.999.

People prefer the natural numbers. Saying “100 out of 10,000″ rather than “1%” helps people to work on numbers with fewer errors, especially with multiple percentages (“Of those 100, 80 will turn out positive” rather than “80% of 1% will turn out positive”).

Also science is a test. On a philosophical level, scientific experiments are “potentially flawed tests” and must be treated accordingly. There is a test for a chemical substance, or a phenomenon, and there is the event of the phenomenon itself. There is an error rate to be taken into account in our tests and measuring equipment.

Bayes’ theorem converts your test results into the true probability of the event. For example, it is possible:

Correct measurement errors. Provided you know the real probabilities and the probability of a false positive and false negative, you can correct measurement errors.

Relative the real probability to the probability of the measured test. Given the results of the mammogram test and the known error rates, you can predict the actual probability that the cancer gave a positive test. Technically speaking, you can find Pr(H|E), the probability that an H hypothesis is true given the test E, starting with Pr(E|H), the probability that the test appears when the hypothesis is true.

Anatomy of a test

This article describes a cancer testing scenario:

bayes table

1% have breast cancer (and therefore 99% do not).

80% of mammograms detect breast cancer when it is present (and therefore 20% miss it).

9.6% of mammograms reveal breast cancer when it is not present (and therefore 90.4% correctly return a negative result).

Put in a table, the odds look like this:

So how do we read it?

1% of people have cancer

If you already got cancer, you’re in the first column. There is an 80% chance the test is positive. There’s a 20% chance the test is negative.

If you don’t have cancer, you’re in the second column. There is a 9.6% chance that the test is positive and a 90.4% chance that the test is negative.

How accurate is the test?

Now suppose the test result is positive. What are the chances of you having cancer? 80%? 99%? 1%?

Here’s what I think:

Okay, we got a positive result. It means we’re somewhere in the front row of our table. Let’s not assume anything – it could be a true positive or a false positive.

The chances of a true positive = probability of having cancer * the probability test took it = 1% * 80% = .008

The chances of a false positive = chance of not having cancer * the probability test took it anyway = 99% * 9.6% = 0.09504

The table is like this:

bayes table computed

And what was the question? Oh yes: what are the chances that we really have cancer if we get a positive result. The likelihood of an event is the number of ways it could happen, given all the possible results:

\displaystyle{ \text{Probability} = \frac{\text{desired event}}{\text{all possibilities}} }

The possibility of obtaining a real and positive result is 0.008. The chance of getting any kind of positive result is the chance of a true positive plus the chance of a false positive (.008 + 0.09504 = .10304).

So, our probability of cancer is .008/.10304 = 0.0776, or about 7.8%.

Interesting – a positive mammogram only means that you have a 7.8% probability of cancer, instead of 80% (the presumed accuracy of the test). It may sound strange at first, but it makes sense: the test gives a false positive for 9.6% of the time (quite high), therefore there will be many false positives in a given population. Due to a rare disease, most of the positive test results will be wrong.

Let’s test our intuition by drawing a conclusion from just looking at the table. When you take 100 people, only 1 person will have cancer (1%) and most likely will test positive (80% chance). Of the remaining 99 people, about 10% will test positive, so we will get about 10 false positives. Taking all positive tests into account, only 1 in 11 is correct, so there is a 1/11 chance that the cancer will be positive. The actual number is 7.8% (closer to 1/13, calculated above), but we found a reasonable estimate without a calculator.

Bayes’ Theorem

We can turn the process above into an equation, which is Bayes’ Theorem. It enables us to take the test results and correct the “asymmetry” introduced by false positives. You have the real possibility of having the event. Here is the equation:

and here’s the decoding key to read it:

bayes theorem colorized equation

Pr(H|E) = Chance of having cancer (H) given a positive test (E). This is what we want to know: How likely is it to have cancer with a positive result? In our case it was 7.8%.

Pr(E|H) = Chance of having a positive test (E) since you had cancer (H). This is the probability of a true positive, in our case 80%.

Pr(H) = Chance of having cancer (1%).

Pr(not H) = Chance of not having cancer (99%).

Pr(E|not H) = Chance of a positive test (E) since you have not had cancer (not H). It is a false positive, in our case 9.6%.

It all boils down to the possibility of a true positive divided by the possibility of any positive. We can simplify the equation a:

\displaystyle{\Pr(\mathrm{H}|\mathrm{E}) = \frac{\Pr(\mathrm{E}|\mathrm{H})\Pr(\mathrm{H})}{\Pr(\mathrm{E})}}

Pr(E) tells us the possibility of obtaining a positive result, either a true positive in the tumor population (1%) or a false positive in the non-tumor population (99%). In acts as a weighting factor, adjusting the probabilities towards the most likely result.

Forgetting to count false positives is what makes the low probability of cancer of 7.8% (given a positive test) seem counterintuitive. Thanks, normalizing constant, for putting us on the right track!

Intuitive Understanding: Illuminate the Light

It mentions an intuitive understanding of how to shine a light through your real population and get a test population. The analogy makes sense, but it takes a few thousand words to get there :).

Think of a real population. You make some tests that “shine light” through that real population and create some test results. If the light is completely accurate, the probabilities of the test and the real probabilities coincide. Everyone who tests positive is actually “positive”. Everyone who does a negative test is actually “negative”.

But that’ s the real world. Tests are bad. Sometimes people who have cancer don’t show up for tests, and vice versa.

The Bayes theorem allows us to look at asymmetric test results and correct mistakes, recreating the original population and finding the real possibility of a real positive result.

Bayesian spam filtering

An intelligent application of the Bayes Theorem is spam filtration. We have

Event A: The message is spam.

Test X: Message contains some words (X)

Inserted in a more readable formula (from Wikipedia):

\displaystyle{\Pr(\mathrm{spam}|\mathrm{words}) = \frac{\Pr(\mathrm{words}|\mathrm{spam})\Pr(\mathrm{spam})}{\Pr(\mathrm{words})}}

Bayesian filtering enables you to predict the possibility that a message is really spam, given the “test results” (the presence of some words). Of course, words like “viagra” are more likely to appear in spam messages than normal ones.

Spam filtering based on a blacklist is imperfect – it is too restrictive and false positives are too big. But Bayesian filtering offers us a middle ground – we use probabilities. By analyzing the words of a message, we can calculate the probability that it is spam (rather than making a yes/no decision). When a message has a 99.9% chance of being spam, it probably is. As the filter is trained with more and more messages, update the odds of certain words leading to spam messages. Advanced Bayesian filters can examine multiple words in a line, like another data point.