This MCQ module is based on: Total Probability and Bayes’ Theorem
TOPIC 24 OF 25
Total Probability and Bayes’ Theorem
🎓 Class 12
Mathematics
CBSE
Theory
Ch 13 — Probability
⏱ ~15 min
🌐 Language: [gtranslate]
🧠 AI-Powered MCQ Assessment
▲
📐 Maths Assessment
▲
This mathematics assessment will be based on: Total Probability and Bayes’ Theorem
Targeting Class 12 level in Calculus, with Advanced difficulty.
Upload images, PDFs, or Word documents to include their content in assessment generation.
13.5 Bayes' Theorem and Theorem of Total Probability
13.5.1 Partition of a sample space
Partition
A collection of events \(E_1, E_2, \ldots, E_n\) is a partition of the sample space \(S\) if:
- \(E_i\cap E_j=\varnothing\) for all \(i\ne j\) (pairwise mutually exclusive),
- \(E_1\cup E_2\cup\cdots\cup E_n=S\) (exhaustive),
- \(P(E_i)>0\) for every \(i\).
Examples: in a coin toss, \(\{H, T\}\) partitions \(S\); in a 3-line factory, \(\{A, B, C\}\) (the three lines) partition the sample space of items produced. The chosen partition usually represents the "causes" or "categories" we want to reason over.
13.5.2 Theorem of Total Probability
Total Probability Theorem
Let \(E_1, E_2, \ldots, E_n\) form a partition of the sample space \(S\). For any event \(A\subseteq S\):
\[\boxed{\;P(A)=\sum_{i=1}^{n}P(E_i)\,P(A|E_i)=P(E_1)P(A|E_1)+P(E_2)P(A|E_2)+\cdots+P(E_n)P(A|E_n)\;}\]
The unconditional probability of \(A\) is a weighted average of conditional probabilities, with weights given by the partition.
Why does this work?
Decompose \(A\) using the partition: \(A=A\cap S=A\cap(E_1\cup\cdots\cup E_n)=(A\cap E_1)\cup\cdots\cup(A\cap E_n)\). The pieces are pairwise disjoint (since \(E_i\) are), so by additivity \(P(A)=\sum P(A\cap E_i)=\sum P(E_i)P(A|E_i)\) using the multiplication rule.
Bayes' Theorem
Bayes' theorem
Let \(E_1, E_2, \ldots, E_n\) be a partition of \(S\) and \(A\) an event with \(P(A)>0\). Then for any \(i\in\{1,\ldots,n\}\):
\[\boxed{\;P(E_i|A)=\dfrac{P(E_i)\,P(A|E_i)}{\sum_{j=1}^{n}P(E_j)\,P(A|E_j)}\;}\]
The denominator is just \(P(A)\) by total probability.
Names:
- \(P(E_i)\) — prior probability of \(E_i\) (before seeing \(A\)).
- \(P(A|E_i)\) — likelihood of evidence \(A\) under cause \(E_i\).
- \(P(E_i|A)\) — posterior probability of \(E_i\) (after seeing \(A\)).
Bayesian intuition
Bayes' theorem is the rule for learning from evidence. Start with a prior; observe data; update to posterior. Iterate. This single formula underlies spam filters, medical diagnosis, GPS positioning, and the modern probabilistic AI revolution.
Worked Examples
Example 8 (Total Probability). A bag B₁ has 6 red and 4 black balls; bag B₂ has 4 red and 6 black. A bag is chosen at random and a ball is drawn. Find P(red).
P(B₁) = P(B₂) = 1/2 (random choice). P(R|B₁) = 6/10 = 3/5. P(R|B₂) = 4/10 = 2/5. By total probability:
\(P(R)=P(B_1)P(R|B_1)+P(B_2)P(R|B_2)=\dfrac{1}{2}\cdot\dfrac{3}{5}+\dfrac{1}{2}\cdot\dfrac{2}{5}=\dfrac{3+2}{10}=\dfrac{1}{2}\).
\(P(R)=P(B_1)P(R|B_1)+P(B_2)P(R|B_2)=\dfrac{1}{2}\cdot\dfrac{3}{5}+\dfrac{1}{2}\cdot\dfrac{2}{5}=\dfrac{3+2}{10}=\dfrac{1}{2}\).
Example 9 (Bayes). In Example 8, given that a red ball was drawn, find the probability it came from bag B₁.
\(P(B_1|R)=\dfrac{P(B_1)P(R|B_1)}{P(R)}=\dfrac{(1/2)(3/5)}{1/2}=\dfrac{3}{5}\). Knowing the ball was red updates the probability of bag 1 from 1/2 (prior) to 3/5 (posterior). Reasonable: bag 1 is "redder" than bag 2.
Example 10 (Disease test). A test for a disease has sensitivity 99% (P(+|D)=0.99) and specificity 95% (P(−|D')=0.95, equivalently P(+|D')=0.05). Disease prevalence P(D) = 0.001 (1 in 1000). A person tests positive. Find P(D|+).
By Bayes:
\[P(D|+)=\dfrac{P(D)P(+|D)}{P(D)P(+|D)+P(D')P(+|D')}=\dfrac{0.001\cdot 0.99}{0.001\cdot 0.99+0.999\cdot 0.05}=\dfrac{0.00099}{0.00099+0.04995}\approx 0.0194.\]
Despite a 99% sensitive test, only ~1.9% of positives actually have the disease! This is the celebrated "base-rate fallacy". For rare diseases, false positives among the healthy outnumber true positives.
Example 11. A person speaks the truth 4/5 of the time. He throws a die and reports it as a 6. Find the probability that it actually was a 6.
Let \(E_1\) = "die showed 6", \(E_2\) = "did not". \(P(E_1)=1/6\), \(P(E_2)=5/6\). \(A\) = "reported 6". \(P(A|E_1)=4/5\) (truth). \(P(A|E_2)=1/5\) (lies — but he must lie to specifically a "6", which we'll take as the simple case 1/5).
\(P(E_1|A)=\dfrac{P(E_1)P(A|E_1)}{P(E_1)P(A|E_1)+P(E_2)P(A|E_2)}=\dfrac{(1/6)(4/5)}{(1/6)(4/5)+(5/6)(1/5)}=\dfrac{4/30}{4/30+5/30}=\dfrac{4}{9}\).
\(P(E_1|A)=\dfrac{P(E_1)P(A|E_1)}{P(E_1)P(A|E_1)+P(E_2)P(A|E_2)}=\dfrac{(1/6)(4/5)}{(1/6)(4/5)+(5/6)(1/5)}=\dfrac{4/30}{4/30+5/30}=\dfrac{4}{9}\).
Example 12. Three urns A, B, C contain different mixes: A has 2 white & 1 black; B has 1 white & 2 black; C has 2 white & 2 black. An urn is chosen at random and a white ball drawn. Find P(it came from C).
Priors: \(P(A)=P(B)=P(C)=1/3\). Likelihoods of white: \(P(W|A)=2/3,\ P(W|B)=1/3,\ P(W|C)=1/2\).
By Bayes: \(P(C|W)=\dfrac{P(C)P(W|C)}{P(A)P(W|A)+P(B)P(W|B)+P(C)P(W|C)}=\dfrac{(1/3)(1/2)}{(1/3)(2/3+1/3+1/2)}=\dfrac{1/6}{(1/3)\cdot(3/2)}=\dfrac{1/6}{1/2}=\dfrac{1}{3}\).
By Bayes: \(P(C|W)=\dfrac{P(C)P(W|C)}{P(A)P(W|A)+P(B)P(W|B)+P(C)P(W|C)}=\dfrac{(1/3)(1/2)}{(1/3)(2/3+1/3+1/2)}=\dfrac{1/6}{(1/3)\cdot(3/2)}=\dfrac{1/6}{1/2}=\dfrac{1}{3}\).
Activity: Spam Filter Reasoning
L4 AnalyseMaterials: Pen, paper.
Predict: 30% of emails are spam. P(word "free" | spam) = 0.6; P("free" | not spam) = 0.05. An email contains "free" — probability it is spam?
- Priors: P(S) = 0.3; P(not S) = 0.7.
- Likelihood: P(F|S) = 0.6; P(F|not S) = 0.05.
- Total: P(F) = 0.3·0.6 + 0.7·0.05 = 0.18 + 0.035 = 0.215.
- Bayes: P(S|F) = 0.18/0.215 ≈ 0.837. So a "free"-containing email has ~84% chance of being spam.
- Now combine with another word. P(W₂|S)=0.4, P(W₂|not S)=0.02. Multi-word Bayes (assuming conditional independence): posterior compounds.
Modern spam filters use thousands of word features. The "naive Bayes classifier" assumes conditional independence of features given the class — a strong but practical assumption that performs astonishingly well. Bayes' theorem is the algorithmic foundation of huge swaths of machine learning.
Competency-Based Questions
Scenario: Three machines A, B, C produce 50%, 30%, 20% of a factory's output. Their defect rates are 1%, 2%, 3% respectively.
Q1. Find P(defective).
L3 ApplyAnswer: Total probability: 0.5·0.01 + 0.3·0.02 + 0.2·0.03 = 0.005 + 0.006 + 0.006 = 0.017.
Q2. A defective item is found. What is the probability it came from machine C?
L4 AnalyseAnswer: Bayes: P(C|D) = 0.006/0.017 ≈ 0.353. (35.3%.) Despite producing only 20% of output, machine C accounts for 35% of defects because of its higher defect rate.
Q3. (T/F) "If P(A) = P(A|B), then A and B are independent." Justify.
L5 EvaluateTrue. P(A|B) = P(A∩B)/P(B). If P(A|B) = P(A), then P(A∩B) = P(A)P(B), the definition of independence.
Q4. A coin is biased: P(H) = p. What value of p makes the events "first toss is H" and "second toss is H" jointly equally likely with both other combinations? (i.e. all four outcomes equally likely?)
L4 AnalyseAnswer: All four outcomes equally likely (1/4 each) requires p² = (1−p)² and p(1−p) = 1/4. From the first: p = 1/2. (Fair coin.)
Q5. Design: a weather forecaster says "70% chance of rain". Tomorrow, you observe a low-pressure system (a forecasting cue). P(low-pressure | rain) = 0.8; P(low-pressure | no rain) = 0.2. Update the posterior P(rain | low-pressure).
L6 CreateSolution: Prior: P(R) = 0.7; P(R') = 0.3. Likelihoods: P(LP|R) = 0.8, P(LP|R') = 0.2. P(LP) = 0.7·0.8 + 0.3·0.2 = 0.56 + 0.06 = 0.62. P(R|LP) = 0.56/0.62 ≈ 0.903. The cue raises confidence in rain from 70% to 90%.
Assertion–Reason Questions
Assertion (A): The sum \(P(E_1)+P(E_2)+\cdots+P(E_n)=1\) for any partition.
Reason (R): A partition is exhaustive and pairwise disjoint, so the events' probabilities sum to P(S) = 1.
Reason (R): A partition is exhaustive and pairwise disjoint, so the events' probabilities sum to P(S) = 1.
Answer: (a). R is the precise definition that yields A.
Assertion (A): Bayes' theorem updates a prior P(E) into a posterior P(E|A) using the likelihood P(A|E).
Reason (R): Posterior is proportional to (prior × likelihood), normalised by P(A).
Reason (R): Posterior is proportional to (prior × likelihood), normalised by P(A).
Answer: (a). "posterior ∝ prior × likelihood" is the famous Bayesian summary of A.
Assertion (A): A test with 99% sensitivity and 99% specificity for a 1-in-10000 disease produces 99% reliable positive results.
Reason (R): Sensitivity = P(+|D), specificity = P(−|D'). High values mean reliable.
Reason (R): Sensitivity = P(+|D), specificity = P(−|D'). High values mean reliable.
Answer: (d). A is FALSE — Bayes shows posterior ≈ 1% (very unreliable for rare diseases). R is true definitions but doesn't capture the base-rate effect. This is the classic counter-intuitive Bayes example.
Frequently Asked Questions — Total Probability and Bayes' Theorem
What is the law of total probability?
For a partition E₁,…,Eₙ: P(A) = ΣP(Eᵢ)·P(A|Eᵢ).
What is Bayes' theorem?
P(Eᵢ|A) = P(Eᵢ)·P(A|Eᵢ) / Σ P(Eⱼ)·P(A|Eⱼ). Updates prior to posterior using evidence.
What is a partition of a sample space?
A collection of events that are pairwise disjoint and exhaustive — covering S without overlap.
What is the difference between prior and posterior probability?
Prior = before evidence. Posterior = after evidence. Bayes' theorem connects them.
Why is Bayes' theorem important?
It is the rule for learning from evidence — used in spam filters, medical diagnostics, machine learning.
How do you use total probability for unconditional probabilities?
Choose a partition for which conditional probabilities are easy; sum P(Eᵢ)·P(A|Eᵢ).
AI Tutor
Mathematics Class 12 — Part II
Ready