12 Introduction to Bayes Theorem
Dr. Harmanpreet Singh Kapoor
Learning Objectives
- Introduction
- Some Preliminary Result
- Bayes Theorem
- Summary
- Suggested Readings
1. Learning Objectives
The objective of this module is to define the process to find out the probability of an event given that external information related to that events is already available then events must be related to each other. Conditional probability concept is very helpful to solve this type of problem. To solve these problems, the mostly commonly used method that is available in the literature is Bayes theorem. In this module, the concept of Bayes theorem and method of Bayes theorem with examples will be discussed in detail.
2. Introduction
Let us consider, there are two bags 1 and 2. Bag 1 contains 3 blue marbles and 4 green marbles and bag 2 contains 5 white marbles and 2 red marbles. One marble is drawn randomly from one of the two bags. The probability of selecting a bag is ½. The probability of randomly drawing a blue marble from bag 1 is 3/7, probability of randomly drawing a green marble from bag 1 is 4/7, probability of randomly drawing white marble from bag 2 is 5/7 and probability of randomly drawing a red marble from bag 2 is 2/7. One bag is chosen at random and one marble is drawn at random from it. Now the probability of drawing a red color marble from second bag is available. If we are interested in drawing a marble from a bag 2 given that the color of the marble drawn is red.
From above example, one can observe that problem has some basic information given in terms of probability, basic information is used to find conditional probability as a new information and this new information is used to find basic information in the new form.
In this example, basic information given in terms of random selection of bag and marbles, new information as the probability of drawn marble is red given that selected bag is 2 and a new form of basic information is the probability of red marble comes from bag 2 given that marble is red.
These type of problem-solving technique/method was given by a famous mathematician Thomas Bayes, in 1763. The formula developed by Bayes, for the solution of reverse probability, is known as
Bayes Theorem.
Bayes theorem is used to find the probability of reverse probability of the events with respect to additional information.
Bayes Theorem has wide application area. The most specific application area is the backbone of statistics i.e. statistical inference and Bayesian inference.
In this module, our main purpose is to give you brief introduction of this topic with more and more examples. The basic idea behind keeping this module as simple as possible so that one can understand the topic in an easy manner.
3. Some Preliminary Results
In this section, some preliminary results will be discussed. These results are very helpful to understand Bayes theorem as well as its application part.
Partition of sample space:
Sample space is a set of all possible outcomes of a random experiment. Sample space is mostly represented by S symbolically.
Suppose a random experiment is conducted for selecting a bag out of two bags denoted as 1 and 2 respectively. These bags contain marbles of different colours. Then the sample space is:
There are two outcomes ? and ?, ? represents for bag 1 and ? for bag 2.
? = {1,2};
? = 1; ? = 2.
This sample space S can be divided into two partitions ? and ?. Partition ? and ? jointly represent whole sample space if it is mutually exclusive, exhaustive and non-negative.
Similarly, if a sample space S can be partition into n outcomes, these outcomes are events. These events represent the partitions of the sample space or jointly represent whole sample space if the partitioned event is exhaustive, mutually exclusive and non-negative.
The Mathematical representation of the partition of sample space:
Suppose ?1, ?2, … , ?? are n events of a random experiment.
Sample space is given as:
? = {?1, ?2, … , ??}
Events are partition events of sample space if:
i) Exhaustive:
?1 ∪ ?2 … ∪ ?? = ?.
ii) Mutually exclusive:
?? ∩ ?? = ∅ ∀ ?, ? = 1,2, … , ?.
iii) Non-negativity:
?(?? ) ≥ 0 ∀ ? = 1,2, … , ?
Venn diagram Representation of the Partition of Sample Space:
Figure 1
From Figure 1, we can see that the event A is represented through the mutually exclusive and exhaustive events , = 2, 3, 4. These events are part of the partition events of sample space S. One can also observe that events . = 1,2, … ,5 jointly represent the whole sample space. Hence any event can be represented through events due to its mutually exclusive, exhaustive property as well as non-negative property.
Theorem of Total probability:
?1, ?2, … , ?? are the partition events and has non-zero probability of happening, the probability of happening an event A is associated with ?? , ? = 1,2, … , ? as:
?(?) = ?(?1)?(?|?1) + ?(?2)?(?|?2) + ⋯ + ?(??)?(?|??)
This is known as theorem of total probability.
Proof
As are mutually exclusive and exhaustive events. So any events belong to the sample space can be represented in terms of these events.
Hence ? = (?1 ∩ ?) ∪ (?2 ∩ ?) … ∪ (?? ∩ ?) (1)
Also if A event does not belong to some ?? then intersection will be empty i.e. ? ∩ ?? = ∅ for some k. As events ?? , ? = 1,2, … , ? are the partition events then there must be some events that has non-empty
intersection with the event A.
? ∩ ?? ≠ ∅ for some ? ≠ ?;
Now take probability on both side of (1), we get
?(?) = ?(?1 ∩ ?) + ?(?2 ∩ ?) + ⋯ + ?(?? ∩ ?). (2)
As ?? are mutually exclusive events so the intersection between partition events are always empty. Hence the other terms will vanish in the above formula.
Now from the definition of conditional probability, we know that
Now put the conditional form in equation (2), we will get:
?(?) = ?(?1)?(?|?1) + ?(?2)?(?|?2) … + ?(??)?(?|??).
Hence proved.
The total probability theorem has played a vital role in the application of Bayes theorem. So it is essential to understand it before moving forward for the Bayes theorem. Some questions with answers are discussed in this section regarding this.
Question 1
The probability of becoming data scientist of A, B and C are 0.4, 0.37 and 0.87 respectively. The probability of getting bonus scheme will be introduced if A, B, and C become data scientist are 0.33, 0.7 and 0.57 respectively. What is the probability that bonus scheme will be introduced?
Answer
Let E, F and G denote the event that A, B, and C become data scientist respectively and X denotes the event that bonus scheme introduced. E, F, and G are partition events because they are exhaustive, mutually exclusive and non-negative. In other words, if A will become data scientist that has no relation with whether B or C will become data scientist. Similar if B will become data scientist has no relation with whether A or C will become data scientist. Also C has no relationship with A or B qualification for becoming data scientist.
Probabilities are given by:
?(?) = 0.4;
?(?) = 0.37;
?(?) = 0.87.
As bonus will only be paid if that person will be data scietist. So being a data scientist is a condition for getting a bonus scheme. Hence the probability of introducing bonus scheme given that E, F, and G become data scientist is given as:
?(?|?) = 0.33;
?(?|?) = 0.7;
?(?|?) = 0.57.
The probability of introducing the bonus scheme by using total probability theorem is given by:
?(?) = ?(?)?(?|?) + ?(?)?(?|?) + ?(?)?(?|?);
?(?) = 0.4 ∗ 0.33 + 0.37 ∗ 0.7 + 0.87 ∗ 0.57 = 0.8869.
The probability of the bonus scheme will be introduced is 0.8869.
Hence we can see that the probability is very high. This means that if all three i.e. A, B and C will become data scientist then the chances are very high for introducing a bonus scheme.
Question 2
A letter comes either from PRAGUE or KERALA. On the paper, just two consecutive letter RA appears. What is the probability that two-consecutive letter visible on the paper is RA?
Answer
Suppose E and F two events. Event E denotes the event letter comes from PRAGUE and F denotes the event letter comes from KERALA. Event X denotes the event two consecutive letter is visible on paper are RA.
Find the probability of X by using theorem of Total probability. E and F event are exhaustive, mutually exclusive and non-negative. E and F are partition events of X.
Probabilities are given as:
Hence the probability that two-consecutive letter RA visible on paper is 0.2.
Question 3
A computer disk manufacturer company produces computer disks. Let us consider three computer disk manufacturer companies are Western digital, Seagate technology, and G- Technology. Suppose Western digital produces 60% disks of which 0.25% are defective, Seagate technology produces 40% disk of which 0.06% are defective and G-Technology produces 25% disk of which 0.5% are defective. Find out the probability that disk is defective?
Answer
Let us consider E, F and G are three events. Event E denotes Western digital produces computer disk, F event denotes Seagate technology produces computer disk and G denotes the G-Technology produces computer disk. X is an event that take account of only those defective computer disk produced. As all the manufacturers produced some defective computer disk. So to find out the probability that defective computer disk are produced one has to apply total probability theorem. E, F, and G are partition events because they are exhaustive, mutually exclusive and non-negative events. Probabilities are given as:
The probability of produces defective computer disk is 0.002999 i.e. 0.299%.
In this section, we have covered the topic of Total probability theorem with examples. In the next section, we will discuss about Bayes theorem and steps that one has to follow for applying this theorem.
4. Bayes Theorem
Bayes theorem is considered as the method that is used to update the initial known information using new information. Probability related to initial information is known as prior probabilities. The probability of initial known information after updating is known as posterior probabilities. Bayes Theorem has wide application in every field of study like animal science, zoology, environmental science, biosciences etc.
Mathematical formulation of Bayes theorem:
Mathematical formulation of Bayes theorem is based on the basic partition of sample space and theorem of total probability. Conditional probability is a very useful concept in Bayes theorem.
?1, ?2, … , ?? are mutually exclusive, exhaustive and non-negative events for an arbitrary A, reverse probability for the initial event given that A is:
Note that:
?(?1), ?(?2), … , ?(??) are the prior probabilities i.e. these events exist before calculating the new probability (these probabilities are available at the initial stage)
?(?? |?), ? = 1,2, … , ?, are the posterior probabilities i.e. initial probability determined after the new information added. (we need to find out or determine these probabilities values).
Steps:
For updating the prior probabilities into posterior probabilities using new Information steps are shown:
Step 1: Basic information available
Step 2: New information are determined
Step 3: Application of Bayes theorem
Step 4: Determine updated basic information
These four steps are applied to solve any reverse probability for any given probability.
According to four steps, basic information is already available in the data as the happening of partitioned events of any experiment. Second, one has identified the new information based on the prior information. This two information are used in Bayes formula to determine posterior probabilities. Posterior probabilities are the updated version of the basic information under given circumstances of new information.
One can easily understand and solve the problem keeping these steps in mind. Some of the examples are discussed for better understanding.
Question 4
A person speaks truth 3 out of 6 times. A dice is tossed. He reports that five number appeared while tossing the dice. What is the chance that actually there was five?
Answer
Let us consider the following events:
?1: Person speaks the truth
?2: Person tells a lie
?: Reports five
Probabilities of simple events and conditional events are given as:
The probability of actually having five on a dice is 0.5.
Question 5
The punctuality of buses has been investigated by considering a number of bus journeys. In the random sample, 40% buses have a destination of Chandigarh, 60% buses have destination of Patiala and 20% buses have a destination of Jalandhar. The probability of arriving at fix time in Chandigarh, Patiala, and Jalandhar are 30%, 50%, and 70% respectively.
If a fix time bus is selected at random under consideration, what is the probability that it has a destination of Jalandhar?
Answer
Let us consider the following events:
?1: Busses destination of Chandigarh
?2: Busses destination of Patiala
?3 : Buses destination of Jalandhar
?: Bus Arriving at fix time
Probabilities of events are given as:
Prior probabilities.
Probabilities as new information
?(?|?1) = 0.3;
?(?|?2) = 0.5;
?(?|?3) = 0.7.
Probability of arriving bus at fix time is given by (new information probability):
?(?) = ?(?1)?(?|?1) + ?(?2)?(?|?2) + ?(?3)?(?|?3);
?(?) = 0.4 ∗ 0.3 + 0.6 ∗ 0.5 + 0.2 ∗ 0.7 = 0.56.
The Probability of bus destination of Jalandhar given that bus has arrived at fix time: (Bayes theorem):
The probability of bus destination is Jalandhar given that bus has arrived at fix time is 0.25 i.e. posterior probability.
Question 6
A student is answering a question in a competitive exam on multiple choice test, the student either knows the answer or guesses. Let ½ be the probability that student knows the answer and ½ be the probability that student guesses the answer. Let us consider that the guessed answer will be correct with ¼ probability, where there are 4 choices to choose an answer. What is the probability that student knows the answer to a question given that student answered it correctly?
Answer
Let us consider the following events:
?1: Student knows the correct answer
?2: Student guesses the correct answer
?: Student get correct answer
Probabilities of simple events and conditional events are given as:
Prior probabilities
The probability that student gets the correct answer by using theorem of Total Probability.
The probability that student knows the correct answer given that student gives correct answer by using Bayes theorem.
?(?) = ?(?1)?(?|?1) + ?(?2)?(?|?2)
The probability that student knows the correct answer given that student gives correct answer is 0.8 a numerical measure of probability shows the posterior probability using Bayes theorem.
Question 7
A manufacturing company produces a certain type of products by four types of machines. The respective daily production figures are:
From the past experience, machine A shows 2% defective product, machine B shows 5%, machine C shows 1.7% and machine D shows 3.5% defective products. A product is selected at random from the day’s production of the machines and found is defective.
What is the probability that defective product comes from the machine A, machine B, machine C and machine D?
Answer
Let us define the following events:
?1: Product produced by machine A
?2 : Product produced by machine B
?3: Product produced by machine C
?4 : product produced by machine D
?: Product is defective
The probabilities of simple events and conditional events are given as:
Prior probabilities.
The probability of the defective product comes from machine A given that product is defective:
The probability of the defective product comes from machine B given that product is defective:
Tabular form of all answer in the form of prior and posterior probabilities:
In this section, we discussed how to decide which method or technique will be used to find out the solution. We also discussed that if we have some prior information available then how to use this information to find out the posterior probabilities.
5. Summary
In this module, we discussed two important theorems of probability theory i.e. Total probability theorem and Bayes theorem. One can use these theorem to solve probability problem. Definition as well as key concepts are also discussed. Statistical property theorem like addition probability theorem, multiplication probability theorem and Bayes theorem are discussed.
6. Suggested Readings
Agresti, A. and B. Finlay, Statistical Methods for the Social Science, 3rd Edition, Prentice Hall, 1997.
Daniel, W. W. and C. L. Cross, C. L., Biostatistics: A Foundation for Analysis in the Health Sciences, 10th Edition, John Wiley & Sons, 2013.
Hogg, R. V., J. Mckean and A. Craig, Introduction to Mathematical Statistics, Macmillan Pub. Co. Inc., 1978.
Meyer, P. L., Introductory Probability and Statistical Applications, Oxford & IBH Pub, 1975.
Stephens, L. J., Schaum’s Series Outline: Beginning Statistics, 2nd Edition, McGraw Hill, 2006.
Triola, M. F., Elementary Statistics, 13th Edition, Pearson, 2017.
Weiss, N. A., Introductory Statistics, 10th Edition, Pearson, 2017.
you can view video on Introduction to Bayes Theorem |
One can refer to the following links for further understanding of the statistics terms.
http://biostat.mc.vanderbilt.edu/wiki/pub/Main/ClinStat/glossary.pdf
http://www.stats.gla.ac.uk/steps/glossary/alphabet.html
http://www.reading.ac.uk/ssc/resources/Docs/Statistical_Glossary.pdf
https://stats.oecd.org/glossary/
http://www.statsoft.com/Textbook/Statistics-Glossary
https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm
https://stats.oecd.org/glossary/alpha.asp?Let=A