Introduction to Probability Theory

Overview
It can be said that probability is one of the foundations of machine learning, together with linear algebra and calculus. In this post I would like to provide a basic description of what probability means in an intuitive sense, followed by some essential concepts in probability theory.
Probability is all about dealing with events which are not certain, they may happen or they may not happen. In some situations it is linked to terms such as luck, chance or risk. However, as a field of mathematics, probability is not a magical or obscure concept. On the contrary, it provides hte necessary tools to measure uncertainty of events and to investigate them based on reasoning.
A simple example: Coin toss
Let’s start this explanation with a simple example. Consider that you toss a coin. When the coin lands, will it show head or tails? Well we can not know for sure. What we know from the coin is that the number of possible events
When one toss the coin, what happens is 1 event, out of 2 possible events. Say that one toss the same coin 10 times, so out of
Probability of an event
From the example described above, we can see that probability
Where
As another example, consider the roll of a fair die. What is the probability that we get a 2? To find that we can write down the event
Therefore, the value 2 is one event out of 6 possible events (including). Applying that in the calculation of
The same probability (0.166 or 16.6%) applies to obtaining any other number from the roll of a fair die, since any number consists in 1 event out of 6.
As you may have already noted, probability is often expressed as a fraction, a value between 0 and 1 or a percentage (between 0% or 100%).
The probability of an event not happening is called the complement of the probability and is stated as 1 minus its probability. For example, the probability of not obtaining a value of 2 in the roll of die is:
In certain cases, the probability of an event is expressed in terms of the odds or chance of the event. Though it refers to the same idea, the way it is expressed differs a bit. Odds are expressed a ratio between wins (occurence) to losses (non-ocurrence). In that case, the odds or likelihood of obtaining a value of 2 in the roll of a die can be written as 1:5 for 1 win and 5 losses.
Frequentist View x Bayesian View
To introduce the concept of probability in this article we have viewed it from a Frequentist approach. In this, probability is an indication of frequency of events after multiple runs. As an example, if a coin is tossed multiple times, it is expected to land tails about half of the time.
A different but less intuitive view of probability comes from the Bayesian view. In this, probability is understood as a measure of the uncertainty about an event, thus being related with information instead of repeated trials. Viewing from this standpoint, a coin will likely land heads or tails after a toss.
Bayesian view is more subjetive, relying sometimes in information obtained previously but also on certain assumptions and beliefs over a probability space. Still, is shows a clear advantage over the frequentist approach, since one can assign probabilities to very rare events that may not have been observed before, something which can not be done using frequentist approach.
Summary
Probability is a mathematical concept used to quantify and measure uncertainy. It provides the formal rules tools to determine the likelihood of an event based on the certain propositions.
An event
Though in this article we talked mostly about equally likely events, that’s not always the case. In face, this is just one specific case among all possible probability functions. But we will see more about that
Further reading
The following literature provides information to dive deeper into this topic.
Machine Learning - A Probabilistic Perspective
Alex Smola and S.V.N. Vishwanathan (2008) Introduction to Machine Learning. Cambridge University Press