What is 13.5 Bayes' Theorem and Theorem of Total Probability in NCERT Class 12 Mathematics Chapter 7?

13.5 Bayes' Theorem and Theorem of Total Probability is the topic of Part 2 of Chapter 7 (Ch 7 — Integrals) in NCERT Class 12 Mathematics. This lesson builds intuition through examples, visuals, and graded practice.

How do you compute P(A) using total probability when conditional probabilities are easier to find?

Choose a partition Eu2081,u2026,Eu2099 for which the conditional probabilities P(A|Eu1d62) are simple to compute. Then P(A) = u03a3 P(Eu1d62)u00b7P(A|Eu1d62). The partition often corresponds to 'cases' in the problem (e.g. which urn, which production line, which day).

How do you compute P(A) using total probability when conditional probabilities are easier to find?

Choose a partition E₁,…,Eₙ for which the conditional probabilities P(A|Eᵢ) are simple to compute. Then P(A) = Σ P(Eᵢ)·P(A|Eᵢ). The partition often corresponds to 'cases' in the problem (e.g. which urn, which production line, which day).

TOPIC 24 OF 25

Total Probability and Bayes’ Theorem

Q: What is the law of total probability?

If Eu2081, Eu2082, u2026, Eu2099 form a partition of the sample space (mutually exclusive and exhaustive, each with positive probability), then for any event A: P(A) = u03a3 P(Eu1d62)u00b7P(A|Eu1d62). Compute the unconditional P(A) by averaging conditional probabilities, weighted by the partition's prior probabilities.

Q: What is Bayes' theorem?

For a partition Eu2081,u2026,Eu2099 and an event A with P(A) > 0: P(Eu1d62|A) = P(Eu1d62)u00b7P(A|Eu1d62) / u03a3 P(Eu2c7c)u00b7P(A|Eu2c7c). It updates the prior probability P(Eu1d62) into the posterior P(Eu1d62|A) given the evidence A.

Q: What is a partition of a sample space?

A collection of events Eu2081, Eu2082, u2026, Eu2099 that are pairwise disjoint (Eu1d62u2229Eu2c7c=u2205 for iu2260j) and exhaustive (Eu2081u222aEu2082u222au2026u222aEu2099=S). Together they cover the sample space without overlap.

Q: What is the difference between prior and posterior probability?

Prior P(Eu1d62): probability of cause Eu1d62 before observing evidence. Posterior P(Eu1d62|A): probability of Eu1d62 updated after observing evidence A. Bayes' theorem is the rule for converting prior to posterior.

Q: Why is Bayes' theorem important?

It formalises learning from evidence. Modern applications: spam filters (P(spam | words)), medical diagnostics (P(disease | symptoms)), forensic identification, machine learning (Bayesian inference), legal reasoning.

🎓 Class 12 Mathematics CBSE Theory Ch 13 — Probability ⏱ ~15 min

🌐 Language:

🧠 AI-Powered MCQ Assessment ▲

This MCQ module is based on: Total Probability and Bayes’ Theorem

No. of Questions:

Difficulty Level:

📐 Maths Assessment ▲

This mathematics assessment will be based on: Total Probability and Bayes’ Theorem
Targeting Class 12 level in Calculus, with Advanced difficulty.

Question Type:

Number of Questions:

Upload Additional Content (Optional):

Upload images, PDFs, or Word documents to include their content in assessment generation.

13.5 Bayes' Theorem and Theorem of Total Probability

13.5.1 Partition of a sample space

Partition

A collection of events \(E_1, E_2, \ldots, E_n\) is a partition of the sample space \(S\) if:

\(E_i\cap E_j=\varnothing\) for all \(i\ne j\) (pairwise mutually exclusive),
\(E_1\cup E_2\cup\cdots\cup E_n=S\) (exhaustive),
\(P(E_i)>0\) for every \(i\).

Examples: in a coin toss, \(\{H, T\}\) partitions \(S\); in a 3-line factory, \(\{A, B, C\}\) (the three lines) partition the sample space of items produced. The chosen partition usually represents the "causes" or "categories" we want to reason over.

13.5.2 Theorem of Total Probability

Total Probability Theorem

Let \(E_1, E_2, \ldots, E_n\) form a partition of the sample space \(S\). For any event \(A\subseteq S\): \[\boxed{\;P(A)=\sum_{i=1}^{n}P(E_i)\,P(A|E_i)=P(E_1)P(A|E_1)+P(E_2)P(A|E_2)+\cdots+P(E_n)P(A|E_n)\;}\] The unconditional probability of \(A\) is a weighted average of conditional probabilities, with weights given by the partition.

Why does this work?

Decompose \(A\) using the partition: \(A=A\cap S=A\cap(E_1\cup\cdots\cup E_n)=(A\cap E_1)\cup\cdots\cup(A\cap E_n)\). The pieces are pairwise disjoint (since \(E_i\) are), so by additivity \(P(A)=\sum P(A\cap E_i)=\sum P(E_i)P(A|E_i)\) using the multiplication rule.

Bayes' Theorem

Bayes' theorem

Let \(E_1, E_2, \ldots, E_n\) be a partition of \(S\) and \(A\) an event with \(P(A)>0\). Then for any \(i\in\{1,\ldots,n\}\): \[\boxed{\;P(E_i|A)=\dfrac{P(E_i)\,P(A|E_i)}{\sum_{j=1}^{n}P(E_j)\,P(A|E_j)}\;}\] The denominator is just \(P(A)\) by total probability.

Names:

\(P(E_i)\) — prior probability of \(E_i\) (before seeing \(A\)).
\(P(A|E_i)\) — likelihood of evidence \(A\) under cause \(E_i\).
\(P(E_i|A)\) — posterior probability of \(E_i\) (after seeing \(A\)).

Bayesian intuition

Bayes' theorem is the rule for learning from evidence. Start with a prior; observe data; update to posterior. Iterate. This single formula underlies spam filters, medical diagnosis, GPS positioning, and the modern probabilistic AI revolution.

Worked Examples

Example 8 (Total Probability). A bag B₁ has 6 red and 4 black balls; bag B₂ has 4 red and 6 black. A bag is chosen at random and a ball is drawn. Find P(red).

P(B₁) = P(B₂) = 1/2 (random choice). P(R|B₁) = 6/10 = 3/5. P(R|B₂) = 4/10 = 2/5. By total probability:
\(P(R)=P(B_1)P(R|B_1)+P(B_2)P(R|B_2)=\dfrac{1}{2}\cdot\dfrac{3}{5}+\dfrac{1}{2}\cdot\dfrac{2}{5}=\dfrac{3+2}{10}=\dfrac{1}{2}\).

Example 9 (Bayes). In Example 8, given that a red ball was drawn, find the probability it came from bag B₁.

\(P(B_1|R)=\dfrac{P(B_1)P(R|B_1)}{P(R)}=\dfrac{(1/2)(3/5)}{1/2}=\dfrac{3}{5}\). Knowing the ball was red updates the probability of bag 1 from 1/2 (prior) to 3/5 (posterior). Reasonable: bag 1 is "redder" than bag 2.

Example 10 (Disease test). A test for a disease has sensitivity 99% (P(+|D)=0.99) and specificity 95% (P(−|D')=0.95, equivalently P(+|D')=0.05). Disease prevalence P(D) = 0.001 (1 in 1000). A person tests positive. Find P(D|+).

By Bayes: \[P(D|+)=\dfrac{P(D)P(+|D)}{P(D)P(+|D)+P(D')P(+|D')}=\dfrac{0.001\cdot 0.99}{0.001\cdot 0.99+0.999\cdot 0.05}=\dfrac{0.00099}{0.00099+0.04995}\approx 0.0194.\] Despite a 99% sensitive test, only ~1.9% of positives actually have the disease! This is the celebrated "base-rate fallacy". For rare diseases, false positives among the healthy outnumber true positives.

Example 11. A person speaks the truth 4/5 of the time. He throws a die and reports it as a 6. Find the probability that it actually was a 6.

Let \(E_1\) = "die showed 6", \(E_2\) = "did not". \(P(E_1)=1/6\), \(P(E_2)=5/6\). \(A\) = "reported 6". \(P(A|E_1)=4/5\) (truth). \(P(A|E_2)=1/5\) (lies — but he must lie to specifically a "6", which we'll take as the simple case 1/5).
\(P(E_1|A)=\dfrac{P(E_1)P(A|E_1)}{P(E_1)P(A|E_1)+P(E_2)P(A|E_2)}=\dfrac{(1/6)(4/5)}{(1/6)(4/5)+(5/6)(1/5)}=\dfrac{4/30}{4/30+5/30}=\dfrac{4}{9}\).

Example 12. Three urns A, B, C contain different mixes: A has 2 white & 1 black; B has 1 white & 2 black; C has 2 white & 2 black. An urn is chosen at random and a white ball drawn. Find P(it came from C).

Priors: \(P(A)=P(B)=P(C)=1/3\). Likelihoods of white: \(P(W|A)=2/3,\ P(W|B)=1/3,\ P(W|C)=1/2\).
By Bayes: \(P(C|W)=\dfrac{P(C)P(W|C)}{P(A)P(W|A)+P(B)P(W|B)+P(C)P(W|C)}=\dfrac{(1/3)(1/2)}{(1/3)(2/3+1/3+1/2)}=\dfrac{1/6}{(1/3)\cdot(3/2)}=\dfrac{1/6}{1/2}=\dfrac{1}{3}\).

Activity: Spam Filter Reasoning

L4 Analyse

Materials: Pen, paper.

Predict: 30% of emails are spam. P(word "free" | spam) = 0.6; P("free" | not spam) = 0.05. An email contains "free" — probability it is spam?

Priors: P(S) = 0.3; P(not S) = 0.7.
Likelihood: P(F|S) = 0.6; P(F|not S) = 0.05.
Total: P(F) = 0.3·0.6 + 0.7·0.05 = 0.18 + 0.035 = 0.215.
Bayes: P(S|F) = 0.18/0.215 ≈ 0.837. So a "free"-containing email has ~84% chance of being spam.
Now combine with another word. P(W₂|S)=0.4, P(W₂|not S)=0.02. Multi-word Bayes (assuming conditional independence): posterior compounds.

Modern spam filters use thousands of word features. The "naive Bayes classifier" assumes conditional independence of features given the class — a strong but practical assumption that performs astonishingly well. Bayes' theorem is the algorithmic foundation of huge swaths of machine learning.

Competency-Based Questions

Scenario: Three machines A, B, C produce 50%, 30%, 20% of a factory's output. Their defect rates are 1%, 2%, 3% respectively.

Q1. Find P(defective).

L3 Apply

Answer: Total probability: 0.5·0.01 + 0.3·0.02 + 0.2·0.03 = 0.005 + 0.006 + 0.006 = 0.017.

Q2. A defective item is found. What is the probability it came from machine C?

L4 Analyse

Answer: Bayes: P(C|D) = 0.006/0.017 ≈ 0.353. (35.3%.) Despite producing only 20% of output, machine C accounts for 35% of defects because of its higher defect rate.

Q3. (T/F) "If P(A) = P(A|B), then A and B are independent." Justify.

L5 Evaluate

True. P(A|B) = P(A∩B)/P(B). If P(A|B) = P(A), then P(A∩B) = P(A)P(B), the definition of independence.

Q4. A coin is biased: P(H) = p. What value of p makes the events "first toss is H" and "second toss is H" jointly equally likely with both other combinations? (i.e. all four outcomes equally likely?)

L4 Analyse

Answer: All four outcomes equally likely (1/4 each) requires p² = (1−p)² and p(1−p) = 1/4. From the first: p = 1/2. (Fair coin.)

Q5. Design: a weather forecaster says "70% chance of rain". Tomorrow, you observe a low-pressure system (a forecasting cue). P(low-pressure | rain) = 0.8; P(low-pressure | no rain) = 0.2. Update the posterior P(rain | low-pressure).

L6 Create

Solution: Prior: P(R) = 0.7; P(R') = 0.3. Likelihoods: P(LP|R) = 0.8, P(LP|R') = 0.2. P(LP) = 0.7·0.8 + 0.3·0.2 = 0.56 + 0.06 = 0.62. P(R|LP) = 0.56/0.62 ≈ 0.903. The cue raises confidence in rain from 70% to 90%.

Assertion–Reason Questions

Assertion (A): The sum \(P(E_1)+P(E_2)+\cdots+P(E_n)=1\) for any partition.
Reason (R): A partition is exhaustive and pairwise disjoint, so the events' probabilities sum to P(S) = 1.

(a) Both true, R explains A.

(b) Both true, R doesn't explain A.

(c) A true, R false.

(d) A false, R true.

Answer: (a). R is the precise definition that yields A.

Assertion (A): Bayes' theorem updates a prior P(E) into a posterior P(E|A) using the likelihood P(A|E).
Reason (R): Posterior is proportional to (prior × likelihood), normalised by P(A).

(a) Both true, R explains A.

(b) Both true, R doesn't explain A.

(c) A true, R false.

(d) A false, R true.

Answer: (a). "posterior ∝ prior × likelihood" is the famous Bayesian summary of A.

Assertion (A): A test with 99% sensitivity and 99% specificity for a 1-in-10000 disease produces 99% reliable positive results.
Reason (R): Sensitivity = P(+|D), specificity = P(−|D'). High values mean reliable.

(a) Both true, R explains A.

(b) Both true, R doesn't explain A.

(c) A true, R false.

(d) A false, R true.

Answer: (d). A is FALSE — Bayes shows posterior ≈ 1% (very unreliable for rare diseases). R is true definitions but doesn't capture the base-rate effect. This is the classic counter-intuitive Bayes example.

Frequently Asked Questions — Total Probability and Bayes' Theorem

What is the law of total probability?

For a partition E₁,…,Eₙ: P(A) = ΣP(Eᵢ)·P(A|Eᵢ).

What is Bayes' theorem?

P(Eᵢ|A) = P(Eᵢ)·P(A|Eᵢ) / Σ P(Eⱼ)·P(A|Eⱼ). Updates prior to posterior using evidence.

What is a partition of a sample space?

A collection of events that are pairwise disjoint and exhaustive — covering S without overlap.

What is the difference between prior and posterior probability?

Prior = before evidence. Posterior = after evidence. Bayes' theorem connects them.

Why is Bayes' theorem important?

It is the rule for learning from evidence — used in spam filters, medical diagnostics, machine learning.

How do you use total probability for unconditional probabilities?

Choose a partition for which conditional probabilities are easy; sum P(Eᵢ)·P(A|Eᵢ).

AI Tutor

Mathematics Class 12 — Part II

Ready

Hi! 👋 I'm Gaura, your AI Tutor for Total Probability and Bayes’ Theorem. Take your time studying the lesson — whenever you have a doubt, just ask me! I'm here to help.