Introduction to Probability - Conditional Probability and Bayes theorem
Overview
Machine Learning deals all the time with predicting a certain outcome, given a set of inputs, or features. The essence of this comes from conditional probability, which can be used to calculate the expectation of events which are dependent on other(s).
In this post, we will go over the essentials of conditional probability, its fundamental rules and the Bayes theorem.
Conditional Probability
In real-world situations, events are rarely, if not never completely independent of others. In fact, most machine learning problems start from the premise that a certain outcome can be predicted based on the occurrence of other events. If
If
On the other hand, if
Product Rule
This rule can be used to calculate the joint probability of X and Y, or the probability that both X and Y will take place. It can be stated as follows:
Where:
is the probability that Y happen. is the conditional probability of X w.r.t Y, thus the conditional probability of X.
This rule helps us to arrive to the following conclusions:
- If X is a subset of Y (
) then . - If Y is a subset of X, then
.
Chain Rule
The chain rule is a generalization of the product rule, which describes the joint probability of
Bayes Theorem
The Bayes theorem, also called Bayes rule, consists in a combination of conditional probability with product and sum rules. It is stated as follows.
The sum rule is employed to calculate
A Simple Exercise (adapted from HackerEarth)
John will have an important event at work tomorrow and he can’t be late for it. In regular days he goes to work by car and the probability that he will arrive on time, with no problems in the car, is
What are the chances that he arrives at work on time tomorrow?
Assume that:
Analytical Solution
Let’s call:
From the problem description, we have that
Where
From sum rule, we have that,
Replacing each term according the product rule, we have,
Substituting the values we have:
Therefore, there is 28% chance that John will arrive on time.
To illustrate, if he could take any
Surely he would have much better chances of arriving on time.
Solution in Python
We can try to estimate the probability found in the previous section, by doing an experiment in Python. This experiment consists on the following.
- Initialize the counter
ontime
, which indicates the number of times that John would arrive on time. - Generate a random variable
car_problem
, uniformly distributed in [0,1] indicating if John had a car problem. Ifcar_problem < 0.2
, he had a car problem, otherwise he didn’t. - If he had a car problem, randomly generate a value
train
between 1 and , indicating which train arrived. Iftrain
is 1 or 2, the right train arrived, and he will arrive on time. Sum 1 toontime
. - If he didn’t have a car problem, then generate a random variable
ontime_noproblem
which, if it is less then , sum 1 toontime
. - Repeat the above steps
times. At the end, calculate asontime/i
The following script performs the experiment described above, after the plot visualizes the convergence of the value obtained towards
import numpy as np
import matplotlib.pyplot as plt
plt.rcParams.update({'font.size': 22, 'figure.autolayout': True, 'figure.figsize': (15,5)})
np.random.seed(999)
ontime = 0
prob = []
nr_points = []
for i in range(1, int(1e6)):
car_problem = np.random.random()
# Will he have a car problem?
if car_problem < 0.2:
# if he has a car problem, will he still arrive on time?
train = np.random.choice(np.arange(1,11))
if train <= 2:
ontime += 1
else: # no car problem
ontime_noproblem = np.random.random()
if ontime_noproblem < 0.3:
ontime += 1
if i%10 == 0:
nr_points.append(i)
prob.append(ontime/(i))
#plt.plot(prob)
plt.semilogx(nr_points, prob)
plt.ylim([0,0.3])
plt.grid(True, which='both')
plt.xlabel('# of Drawings')
plt.ylabel('P(X)')
plt.savefig('experiment.png')

Notice here how the probability nicely approaches the theoretical values previously calculated (0.28).
Summary
Through this post we went over the basics of conditional probability and bayes theorem.
Conditional probability is used to calculate the probability of an outcome X given that another event Y has already happened. The product rule and chain rule can be used to obtain conditional probabilities from join ones.
Bayes theorem employs the concepts inherited from conditional probability, product rule and the sum rule to calculate the conditional probability of the outcome, given a priori knowledge of another event X.
Further Reading
Bayes’ rules, Conditional probability, Chain rule - HackerEarth