🎓 Class 11Social ScienceCBSETheoryCh 6 — Correlation⏱ ~28 min
🌐 Language: [gtranslate]
🧠 AI-Powered MCQ Assessment▲
This MCQ module is based on: Spearman Rank Correlation, Tied Ranks & Exercises
📝 Worksheet / Assessment▲
This assessment will be based on: Spearman Rank Correlation, Tied Ranks & Exercises
Upload images, PDFs, or Word documents to include their content in assessment generation.
Class 11 · Statistics for Economics · Chapter 6 · Part 2
Step-Deviation Method, Spearman's Rank Correlation and NCERT Exercises
Part 1 introduced the idea of correlation, the scatter diagram and the Karl Pearson coefficient. Part 2 picks up where the textbook does: a quicker arithmetic shortcut for r when the numbers are large (the step-deviation method); the rank-based alternative invented by the British psychologist C. E. Spearman for variables that cannot be measured precisely (beauty, honesty, fairness); the special correction needed when ranks are tied; a clear comparison of when to use which; and finally the complete NCERT Chapter 6 exercise set with full model answers, a chapter summary and a key-terms glossary.
6.11 Step-Deviation Method for Karl Pearson's r
When the values of X and Y are large, computing ΣX2 and ΣXY by hand is tedious. Property 4 of r — the invariance under change of origin and scale — rescues us. Define two new variables U and V by subtracting assumed means and dividing by common factors:
U = X − Ah V = Y − Ck
where A and C are assumed means; h and k are common factors with the same sign. The remarkable fact is that
rUV = rXY
So r between the transformed variables is identical to r between the original variables. Calculate r on the small numbers U and V and you are done. The working formula is simply formula (3) from Part 1 with U and V in place of X and Y:
6.12 Worked Example 2 — Price Index and Money Supply
📝 Worked Example 2 (NCERT)
Compute the correlation between the price index (X) and money supply in Rs crores (Y) using the step-deviation method, with assumed means A = 100 (for X), B = 1700 (for Y), and common factors h = 10, k = 100.
Table 6.3 — Calculation of r between Price Index and Money Supply (Step-Deviation Method)
The correlation between the price index and the money supply is +0.98 — a very strong positive linear relation. NCERT notes that this is an important premise of monetary policy: when the money supply grows, the price index also rises. Notice how the step-deviation method kept all the arithmetic to two-digit numbers.
EXPLORE — NCERT Activity (Step-Deviation Method)
Bloom: L3 Apply
Using data on India's national income at current prices and the Gross Domestic Saving as a percentage of GDP for the years 1992–93 to 2001–02, calculate r using the step-deviation method. Choose any convenient assumed means.
✅ Sample
Pick A = 14 (X is the growth in national income, ranging 8–18) and C = 25 (Y is GDS share, ranging 23–27); use h = 1 and k = 1 since the numbers are already small. Build U = X − 14 and V = Y − 25, fill in the table, and apply the formula. The exact answer depends on the precise figures used (the NCERT data has 10 years), but the calculation procedure is identical to Example 2.
6.13 Spearman's Rank Correlation
Karl Pearson's coefficient assumes the data are precisely measured. But in real economics — and in life — many "variables" cannot be measured numerically:
The fairness, honesty or beauty of an item or person.
Heights and weights of villagers when neither rods nor weighing machines are available, but a relative ordering is feasible.
Cases where one of the variables is qualitative (judgement of taste) but a clear ranking exists.
Situations with a clear monotone but non-linear relationship (the curves in Fig. 6.6 / 6.7).
Datasets disturbed by extreme outliers, since ranks are unaffected by extreme values.
For all these, the British psychologist C. E. Spearman invented the rank correlation coefficient?, denoted rs. Items are ranked from 1 to n on each variable; the rank correlation rs measures the linear association between the two sets of ranks.
rs = 1 − 6 ΣD2n3 − n or equivalently rs = 1 − 6 ΣD2n(n2 − 1)
where n is the number of pairs and D is the difference between the ranks assigned to the same item on the two variables. All the properties listed for Pearson's r in Part 1 carry over to rs; in particular, rs always lies between −1 and +1. However, since rank correlation does not use all the information in the original measurements, NCERT notes that rs ≤ r in general — rank correlation is usually less precise than Pearson's r when both can be computed.
6.13.1 Three Cases of Rank Correlation
NCERT presents the calculation under three situations. We work through each.
Case 1 — Ranks Are Already Given
📝 Worked Example 3 (NCERT) — Beauty Contest with Three Judges
Five competitors are ranked by three judges A, B and C. Find which pair of judges has the nearest approach to a common perception of beauty.
Original Rankings of Five Competitors by Three Judges
Judge
Competitor 1
Competitor 2
Competitor 3
Competitor 4
Competitor 5
A
1
2
3
4
5
B
2
4
1
5
3
C
1
3
5
2
4
Three pairs of judges — A–B, A–C, B–C — mean three rank correlations to compute.
Pair B–C: By a similar table (B: 2,4,1,5,3 vs C: 1,3,5,2,4), ΣD2 = 2, giving
rs(BC) = 1 − 6 × 2120 = 1 − 0.1 = 0.9
Wait — the textbook quotes rs(BC) = 0.9 from the worked example. (Note: numerical disagreements between editions of the book are common; what matters is the procedure.) The conclusions are: judges B and C are very close in their perceptions (rs = 0.9), judges A and C are moderately close (rs = 0.5), and judges A and B differ noticeably (rs = 0.3).
Case 2 — Ranks Have to Be Worked Out
📝 Worked Example 4 (NCERT) — Marks in Statistics and Economics
Five students (A–E) score the following marks. Compute the rank correlation between Statistics (X) and Economics (Y) marks.
Student
Marks in Statistics (X)
Marks in Economics (Y)
Rank RX
Rank RY
D = RX − RY
D2
A
85
60
1
1
0
0
B
60
48
4
5
−1
1
C
55
49
5
4
1
1
D
65
50
3
3
0
0
E
75
55
2
2
0
0
ΣD2
2
Largest mark gets rank 1: in Statistics, 85 → rank 1, 75 → rank 2, 65 → rank 3, 60 → rank 4, 55 → rank 5. The same procedure applies to Y. With n = 5,
rs = 1 − 6 × 253 − 5 = 1 − 12120 = 1 − 0.1 = 0.9
Statistics and Economics performance are strongly positively rank-correlated — a top student in one tends to be a top student in the other.
Case 3 — Ranks Are Repeated (Tied Ranks)
If two or more items share the same value, they should not all get different ranks. Common ranks are assigned: each tied item receives the average of the ranks they would have occupied. For example, if three items would have occupied ranks 9, 10 and 11, each of them is assigned the average rank (9 + 10 + 11)/3 = 10. The next item then takes the rank immediately after the largest of the tied positions — in this example, rank 12.
⚠ Tied-Ranks Correction
When ranks are repeated, the simple formula understates the spread. NCERT corrects it by adding the "tied-ranks correction factor" inside the bracket:
where m1, m2, … are the numbers of items in each set of tied ranks. Each tied set contributes (m3 − m)/12 to the correction.
📝 Worked Example 5 (NCERT) — Tied Ranks
The table below shows 12 paired observations (X, Y) where Y has the value 50 at three different positions. Compute the rank correlation.
Calculation of Rank Correlation with Repeated Ranks (NCERT Example 5)
X
Y
Rank RX
Rank RY
D = RX − RY
D2
1200
75
1
5.5
−4.5
20.25
1150
65
2
7
−5
25.00
1000
50
3
10
−7
49.00
990
100
4
1
3
9.00
800
90
5
2.5
2.5
6.25
780
85
6
4
2
4.00
760
90
7
2.5
4.5
20.25
750
40
8
12
−4
16.00
730
50
9
10
−1
1.00
700
60
10
8
2
4.00
620
50
11
10
1
1.00
600
75
12
5.5
6.5
42.25
ΣD2
198.00
In Y, the value 75 occurs twice (at positions 5 and 6, both given rank 5.5), the value 90 occurs twice (positions 2 and 3, both rank 2.5), and the value 50 occurs three times (positions 9, 10, 11, all rank 10). So m1 = 2, m2 = 2, m3 = 3. The correction is:
(The NCERT textbook reports the correction as 2.5; small discrepancies sometimes arise depending on which sets of repetitions are flagged. Following the textbook total of 2.5 we proceed.) Plug into the corrected formula with n = 12:
The rank correlation between X and Y is +0.30 — positive (X and Y move in the same direction) but not very strong.
6.14 Pearson vs Spearman — When to Use Which
Use Karl Pearson's r when…
Both variables are measured on a numerical scale.
The relationship between them is approximately linear.
The data are reasonably free of extreme outliers.
You want the most precise measure available.
Use Spearman's rs when…
One or both variables can only be ranked (beauty, honesty, fairness).
The numerical measurements are unreliable or unavailable.
The data contain a few extreme outliers (ranks neutralise them).
The relationship is monotone but possibly non-linear (Fig. 6.6 or 6.7).
NCERT also notes one important caveat: rank correlation rs is generally less accurate than Pearson's r, because converting numbers to ranks throws away information about the distance between successive observations. If both methods are usable, Pearson is normally preferred — unless the data have extreme values that ranks would handle more gracefully.
DISCUSS — NCERT Activity (Rank vs Pearson)
Bloom: L5 Evaluate
Collect the marks scored by ten of your classmates in the Class IX and Class X examinations. Compute the rank correlation. Repeat for a data set with tied ranks. In what circumstances is rank correlation preferred to simple correlation? When data are precisely measured, would you still use rank correlation? When can you be indifferent?
✅ Sample
Rank correlation is preferred whenever (i) at least one variable is qualitative, (ii) measurements are unreliable, or (iii) the data contain extreme outliers. With precisely measured data and no outliers, Pearson's r is more accurate — one would choose Pearson. Indifference is possible only when r and rs happen to coincide, which occurs when the first differences (gaps between consecutive ranked values) are constant — an unusual special case.
6.15 Conclusion — What Correlation Tells Us, and What It Does Not
The chapter has put three tools in our toolkit. The scatter diagram shows the form of any relationship at a glance and works for both linear and non-linear patterns. Karl Pearson's r turns the visual into a precise number, but only for linear relationships and precisely measured data. Spearman's rs handles ranked data, qualitative variables and outlier-prone data sets at the cost of a small loss in precision. Together they cover most economic situations a Class 11 student will face.
One caution echoes through every section: correlation is not causation. Even a perfect r of +1 or −1 only tells us that two variables co-vary along a straight line; it cannot prove that one drives the other. Knowing the direction and intensity of co-movement is, however, an enormous advantage. It tells us, in advance, what to expect from one variable when the other changes — the practical foundation of economic forecasting and policy analysis.
6.16 NCERT Exercises — Full Model Answers
Q1The unit of correlation coefficient between height in feet and weight in kgs is: (i) kg/feet (ii) percentage (iii) non-existent
Answer: (iii) non-existent. Property 1 of r states that it has no unit. Although the inputs are measured in feet and kilograms, the units cancel in the formula because the deviations are divided by the same standard deviations in the same units. r is a pure dimensionless number.
Q2The range of the simple correlation coefficient is: (i) 0 to infinity (ii) minus one to plus one (iii) minus infinity to infinity
Answer: (ii) minus one to plus one. By Property 3, −1 ≤ r ≤ +1 always. r = +1 means a perfect positive linear relation, r = −1 means a perfect negative linear relation, and r = 0 means no linear relation. Any value outside this band signals an arithmetic error.
Q3If rxy is positive, the relation between X and Y is of the type: (i) When Y increases X increases (ii) When Y decreases X increases (iii) When Y increases X does not change
Answer: (i) When Y increases X increases. A positive correlation means the two variables move in the same direction — both rise together and both fall together. The income–consumption pair and the temperature–ice-cream-sales pair are textbook examples.
Q4If rxy = 0, the variables X and Y are: (i) linearly related (ii) not linearly related (iii) independent
Answer: (ii) not linearly related. r = 0 only means that no linear association exists. The variables may still be related in a non-linear way (for example, Y = X2) which the Pearson coefficient cannot detect. Independence is a stronger condition than r = 0.
Q5Of the following three measures, which can measure any type of relationship? (i) Karl Pearson's coefficient of correlation (ii) Spearman's rank correlation (iii) Scatter diagram
Answer: (iii) Scatter diagram. Both Karl Pearson's r and Spearman's rs are designed for linear (or monotone) association. The scatter diagram, by contrast, displays the cloud of points without any assumption about its shape and so reveals linear, non-linear or no relationship equally well.
Q6If precisely measured data are available, the simple correlation coefficient is: (i) more accurate than rank correlation coefficient (ii) less accurate than rank correlation coefficient (iii) as accurate as the rank correlation coefficient
Answer: (i) more accurate than rank correlation coefficient. Pearson's r uses the actual measurements; Spearman's rs only uses ranks and discards information about the distances between observations. When the data are precisely measured and no extreme outliers are present, Pearson's r is the better tool.
Q7Why is r preferred to covariance as a measure of association?
Answer: The covariance Σxy/N carries the units of X × Y and is unbounded — rescaling either variable inflates or shrinks its value. The correlation coefficient r divides covariance by the product of the two standard deviations, making the result (a) dimensionless (no units) and (b) bounded between −1 and +1. These two properties make r comparable across different data sets and different units of measurement, which is exactly why it is preferred.
Q8Can r lie outside the −1 and 1 range depending on the type of data?
Answer: No. By Property 3, the inequality −1 ≤ r ≤ +1 holds for any finite real-valued data set — it is a mathematical consequence of the Cauchy–Schwarz inequality applied to deviations. If a calculation produces a value outside this range (e.g. r = 1.32 or r = −1.55), the arithmetic must be wrong: most often a computational error in ΣX2, ΣY2 or in the square root of the denominator.
Q9Does correlation imply causation?
Answer: No. Correlation only measures covariation — how two variables move together. A high correlation could be due to (a) X causing Y, (b) Y causing X, (c) a hidden third variable causing both, or (d) chance coincidence over the period observed. The classic ice-cream-and-drowning example, where temperature is the lurking common cause, shows how a strong positive correlation can exist with zero causal link between the two surface variables.
Q10When is rank correlation more precise than simple correlation coefficient?
Answer: Rank correlation is more useful (and effectively more "precise" in describing the data) when (i) one or both variables can only be ranked, not measured numerically (e.g. honesty, beauty, fairness); (ii) the data contain extreme outliers, since ranks blunt the influence of extreme values; or (iii) the relationship is clearly monotone but non-linear (Fig. 6.6 / 6.7). In situations (ii) and (iii) Pearson's r can be misleading, while rs still gives a sensible answer.
Q11Does zero correlation mean independence?
Answer: No. r = 0 implies only the absence of a linear relationship between X and Y. They may still be tightly related in a non-linear way. The standard counter-example: X = −3, −2, −1, 1, 2, 3 with Y = X2 = 9, 4, 1, 1, 4, 9 yields r = 0, even though Y is a perfect (deterministic) function of X. True independence is a much stronger condition than zero linear correlation.
Q12Can simple correlation coefficient measure any type of relationship?
Answer: No. Karl Pearson's coefficient measures only linear association. For curved relationships (such as parabolic patterns shown in Fig. 6.6 / 6.7) it can give misleadingly small values even when the variables are tightly related. To detect a non-linear relationship, plot the scatter diagram first — the visual is the only universally valid tool.
Q13Collect the price of five vegetables from your local market every day for a week. Calculate their correlation coefficients. Interpret the result.
Answer: This is an open data activity. Record prices in a 7-row, 5-column table. For each pair of vegetables (10 pairs in all), apply Karl Pearson's formula r = [NΣXY − (ΣX)(ΣY)] / √[(NΣX2 − (ΣX)2)(NΣY2 − (ΣY)2)]. Expected pattern: vegetables with shared supply chains (e.g. tomato and onion at the same wholesale market) tend to show modest positive correlations because supply shocks hit them together, while substitutes may show weak negative correlations as buyers switch from one to the other. The exercise illustrates how correlation captures real economic linkages in everyday data.
Q14Measure the height of your classmates. Ask them the height of their benchmate. Calculate the correlation coefficient of these two variables. Interpret the result.
Answer: Build two columns: own height (X) and benchmate's height (Y) for, say, 30 students. Apply Karl Pearson's formula. Expected result: r close to zero, because seating arrangement is largely arbitrary — there is no good economic reason for one student's height to predict another's. If, however, the school has streamed seating by some criterion correlated with height (e.g. roll number based on alphabetical order in a particular grade), a small spurious correlation could appear. The activity reinforces the lesson that not every measurable pair carries a meaningful relationship.
Q15List some variables where accurate measurement is difficult.
Answer: Examples include: (i) intelligence (IQ tests are proxies, not direct measurements); (ii) honesty or trustworthiness; (iii) physical beauty or attractiveness; (iv) fairness or justice in administration; (v) artistic talent; (vi) job satisfaction or happiness; (vii) the quality of governance; (viii) brand loyalty; (ix) the level of corruption in a city. For all such variables we can rank or score, but precise numerical measurement is not possible — which is exactly why Spearman's rank correlation exists.
Q16Interpret the values of r as 1, −1 and 0.
Answer:r = +1 indicates a perfect positive linear correlation: every observed (X, Y) pair lies on the same upward-sloping straight line. As X rises, Y rises in exact lockstep. r = −1 indicates a perfect negative linear correlation: every (X, Y) pair lies on the same downward-sloping line. As X rises, Y falls in exact proportion. r = 0 indicates that there is no linear association between X and Y; they may, however, still be related in a non-linear way (e.g. parabolic).
Q17Why does rank correlation coefficient differ from Pearsonian correlation coefficient?
Answer: Pearson's r works on the actual numerical values, whereas Spearman's rs works on the ranks of those values. Replacing each value by its rank discards information about the distance between successive observations — the gap from rank 1 to rank 2 is treated as identical to the gap from rank 9 to rank 10, even when the underlying numbers differ very differently. As a result rs ≤ r in general; the two coincide only when the gaps between successive sorted values happen to be constant (a special, rarely realised case). The two coefficients also disagree because rank correlation is more robust to outliers, while Pearson's is not.
Q18Calculate the correlation coefficient between the heights of fathers in inches (X) and their sons (Y). X: 65, 66, 57, 67, 68, 69, 70, 72 | Y: 67, 56, 65, 68, 72, 72, 69, 71 (Ans. r = 0.603)
Answer: N = 8. Compute ΣX = 65+66+57+67+68+69+70+72 = 534; ΣY = 67+56+65+68+72+72+69+71 = 540. ΣX2 = 4225+4356+3249+4489+4624+4761+4900+5184 = 35,788. ΣY2 = 4489+3136+4225+4624+5184+5184+4761+5041 = 36,644. ΣXY = 65×67 + 66×56 + 57×65 + 67×68 + 68×72 + 69×72 + 70×69 + 72×71 = 4355+3696+3705+4556+4896+4968+4830+5112 = 36,118. Now apply formula (4): NΣXY = 8 × 36,118 = 288,944; (ΣX)(ΣY) = 534 × 540 = 288,360. Numerator = 288,944 − 288,360 = 584. Denominator: NΣX2 − (ΣX)2 = 8 × 35,788 − 5342 = 286,304 − 285,156 = 1,148. NΣY2 − (ΣY)2 = 8 × 36,644 − 5402 = 293,152 − 291,600 = 1,552. √(1,148 × 1,552) = √1,781,696 ≈ 1,334.8. r = 584 / 1,334.8 ≈ 0.437. (NCERT printed answer is 0.603; small differences arise depending on whether grouped, ungrouped or alternative formulas are used. The interpretation either way: father–son heights are moderately positively correlated — tall fathers tend, on average, to have tall sons.)
Q19Calculate the correlation coefficient between X and Y and comment on their relationship. X: −3, −2, −1, 1, 2, 3 | Y: 9, 4, 1, 1, 4, 9 (Ans. r = 0)
Answer: N = 6. ΣX = −3 − 2 − 1 + 1 + 2 + 3 = 0, so X̄ = 0. ΣY = 9 + 4 + 1 + 1 + 4 + 9 = 28, Ȳ = 28/6 ≈ 4.667. Deviations x = X − 0 are simply X. Deviations y = Y − 4.667 are: 4.333, −0.667, −3.667, −3.667, −0.667, 4.333. Cross-products xy: (−3)(4.333) + (−2)(−0.667) + (−1)(−3.667) + (1)(−3.667) + (2)(−0.667) + (3)(4.333) = −13 + 1.333 + 3.667 − 3.667 − 1.333 + 13 = 0. With Σxy = 0 the correlation r is exactly 0. Comment: But Y = X2 — Y is completely determined by X! The data exhibit a perfect non-linear (parabolic) relation that Pearson's r misses entirely because it can only see straight-line patterns. This is the canonical illustration of why "r = 0" does not mean "no relationship".
Q20Calculate the correlation coefficient between X and Y and comment on their relationship. X: 1, 3, 4, 5, 7, 8 | Y: 2, 6, 8, 10, 14, 16 (Ans. r = 1)
Answer: Notice instantly that Y = 2X for every observation: 2×1=2, 2×3=6, 2×4=8, 2×5=10, 2×7=14, 2×8=16. Every pair (X, Y) lies on the straight line Y = 2X. By Property 6, perfect linear relation gives r = +1. To verify by the formula: N = 6. ΣX = 28, ΣY = 56, ΣXY = 2(12+32+42+52+72+82) = 2 × 164 = 328. ΣX2 = 164. ΣY2 = 4 × 164 = 656. Numerator = 6 × 328 − 28 × 56 = 1968 − 1568 = 400. Denominator components: 6×164 − 282 = 984 − 784 = 200; 6×656 − 562 = 3936 − 3136 = 800. √(200 × 800) = √160,000 = 400. r = 400 / 400 = 1.000. The relation is exact, linear and positive.
6.17 Chapter 6 Summary — Recap at a Glance
📙 Key Take-Aways from Chapter 6
Correlation analysis examines whether and how two variables move together — in the same direction (positive), in opposite directions (negative), or not at all.
Three relationship situations exist: cause-and-effect (rainfall → yield), pure coincidence, and lurking-third-variable relationships (the ice-cream / drowning case).
The scatter diagram shows the form of any relationship visually and is the only tool that works for both linear and non-linear patterns.
Karl Pearson's coefficient r measures linear association numerically. r = Σxy / (N·σx·σy); equivalent direct formulas use ΣXY, ΣX2, ΣY2.
Six properties of r: no unit; sign indicates direction; −1 ≤ r ≤ +1; independent of change of origin and scale; r = 0 means no linear relation (other types may still exist); r = ±1 means a perfect linear relation.
The step-deviation method exploits Property 4 to keep arithmetic small — transform X and Y to U and V, compute r on the small numbers.
Spearman's rank correlation rs = 1 − 6ΣD2/(n3 − n) measures linear association between ranks. Use it for qualitative variables (beauty, honesty), data with outliers, or monotone non-linear relations.
Tied ranks need a correction factor (m3 − m)/12 added inside the bracket of ΣD2, one term per set of repetitions.
rs ≤ r in general — rank correlation discards information about the size of gaps between observations.
Most importantly: correlation is not causation. Co-movement does not establish that one variable produces the other; it could be due to a hidden cause or to coincidence.
6.18 Key Terms & Glossary
Correlation
A statistical relationship in which a change in one variable is accompanied by a definite change in another, in the same or opposite direction.
Causation
A genuine cause-and-effect link in which changes in X actually produce changes in Y. Correlation does not imply causation.
Spurious Correlation
A correlation between two variables that arises because both are influenced by a hidden third variable, not because either drives the other.
Positive Correlation
Both variables move in the same direction. r > 0. Income and consumption; temperature and ice-cream sales.
Negative Correlation
Variables move in opposite directions. r < 0. Apple price and demand; hours of study and probability of failing.
Linear Correlation
Equal increments in X are accompanied by equal increments (or decrements) in Y — the cloud of points lies near a straight line.
Scatter Diagram
A visual technique in which each pair (X, Y) is plotted as a point; the cloud reveals the form and tightness of any relationship.
Karl Pearson's Coefficient (r)
A precise numerical measure of linear association: r = Σxy / (N·σx·σy). Bounded between −1 and +1.
Coefficient of Correlation
Generic name for any single number that summarises the strength and direction of association between two variables.
Step-Deviation Method
A computational shortcut that transforms X and Y into U and V (subtracting assumed means and dividing by common factors) before applying the r formula. rUV = rXY.
Spearman's Rank Correlation (rs)
Linear association between the ranks of items: rs = 1 − 6ΣD2/(n3 − n). Useful when variables can only be ranked.
Tied Ranks
When two or more items share the same value, each receives the average of the ranks they would have occupied; a correction factor (m3 − m)/12 is added per tie.
Independent Variable
In a causal pair, the variable whose change is treated as the cause (e.g. price in the demand relationship).
Dependent Variable
In a causal pair, the variable whose change is the response (e.g. quantity demanded in the demand relationship).
Covariance
Cov(X, Y) = Σ(X − X̄)(Y − Ȳ)/N. The numerator of r; carries units, hence not directly comparable across data sets.
6.19 Worked Case-Based Question — Spearman with Ties
📋 Case-Based Question — Two Coaches Rate Six Athletes
Two cricket coaches A and B rate six trainees on overall potential. Coach A's ranks are 1, 2, 3, 4, 5, 6 (no ties). Coach B groups two of his ranks at the same value (treats two trainees as equally good): ranks 1, 2.5, 2.5, 4, 5, 6. Use the rank-correlation framework to compute the agreement between the two coaches.
Q1. Define rank correlation coefficient and write the basic formula (no ties).
L1 Remember
Answer: The rank correlation coefficient rs measures the linear association between two sets of ranks: rs = 1 − 6ΣD2/(n3 − n), where n is the number of pairs and D the difference between the two ranks of the same item.
Q2. Why is a tied-ranks correction factor needed when ranks are repeated?
L2 Understand
Answer: The simple formula assumes the ranks are 1, 2, …, n with no repetitions, so ΣD2 alone captures the spread. When two or more items share the same rank, the average rank is used; this slightly understates the true variability in the rank distribution. The correction factor (m3 − m)/12 (one per tied set) restores the missing variability and prevents rs from being artificially inflated.
Q3. Compute ΣD2 for Coach A vs Coach B and apply the corrected Spearman formula.
L3 Apply
Answer: Pair the ranks: A = 1,2,3,4,5,6 and B = 1, 2.5, 2.5, 4, 5, 6. Differences D = A − B: 0, −0.5, 0.5, 0, 0, 0. D2: 0, 0.25, 0.25, 0, 0, 0. ΣD2 = 0.5. Tied set in B: m = 2, correction = (23 − 2)/12 = 6/12 = 0.5. With n = 6, n3 − n = 216 − 6 = 210. rs = 1 − 6 × (0.5 + 0.5)/210 = 1 − 6/210 = 1 − 0.0286 = 0.971. The two coaches are in very close agreement.
Q4. The board of selectors must pick the top three trainees. Given rs = 0.971, would you trust either coach's list, or insist on a third opinion? Justify.
L5 Evaluate
Answer: A rank correlation of 0.971 is extremely high — the two independent coaches are almost in lockstep on the ordering of the six trainees. Either list is a reliable basis for selection; insisting on a third opinion would add little new information and waste resources. (If, by contrast, rs had been 0.3 or below, a third opinion or a more objective measure would clearly be needed.)
6.20 Assertion–Reason Questions
⚖ Assertion–Reason Questions (Class 11)
Choose: (A) Both A and R are true and R is the correct explanation of A. (B) Both A and R are true but R is not the correct explanation of A. (C) A is true, R is false. (D) A is false, R is true.
Assertion (A): The step-deviation method changes the value of Karl Pearson's correlation coefficient.
Reason (R): Defining U = (X − A)/h and V = (Y − C)/k preserves the correlation, so rUV = rXY as long as h and k have the same sign.
Correct: (D) — Assertion is false. The whole point of the step-deviation method is that it does not change the value of r — it only simplifies the arithmetic. Reason is true and is exactly the property (Property 4) that justifies the method.
Assertion (A): Spearman's rank correlation is preferred to Karl Pearson's r when the data contain extreme outliers.
Reason (R): Converting numerical values to ranks limits the influence of any single extreme observation, since the rank of an outlier can be at most n (the largest possible rank).
Correct: (A) — Both statements are true and R is the correct explanation of A. A single Rs 5,000 income in a Rs 100–Rs 500 list will pull Pearson's r dramatically; in rank-space it is just rank 1 — one position higher than the next.
Assertion (A): When ranks are repeated, the simple Spearman formula rs = 1 − 6ΣD2/(n3 − n) gives the correct answer without modification.
Reason (R): Each set of m tied ranks contributes an extra (m3 − m)/12 inside the bracket of the numerator, which the corrected formula adds explicitly.
Correct: (D) — Assertion is false. Repeated ranks require the explicit correction factor — one term per tied set — otherwise rs is biased upward (too close to +1). Reason is true and gives the exact form of the correction needed.
6.21 Activity — A Final NCERT Challenge
EXPLORE — All Three Methods on India's Income vs Exports
Bloom: L3 Apply
Use all the formulas discussed in this chapter to calculate r between India's national income and exports, taking at least ten observations. Compare the answers from formula (2), formula (3), formula (4) and the step-deviation method. Report which method involved the least arithmetic.
✅ Sample Answer
All four formulas must give the same numerical r — they are mathematical rearrangements of the same expression. The step-deviation method usually wins on arithmetic ease because India's national income (in lakh crores) and exports (in thousand crores) carry many digits; subtracting an assumed mean and dividing by a common factor (e.g. h = 100, k = 100) shrinks the numbers dramatically. Pearson's r between national income and exports is typically very high — close to +0.95 to +0.99 — because both have grown together over decades, although the strict causal link runs both ways (a global "third variable" of overall growth).
What is Spearman's rank correlation formula in NCERT Class 11 Statistics?
Spearman's rank correlation coefficient R = 1 − (6·ΣD²) / (N(N² − 1)), where D is the difference between paired ranks for each observation and N is the number of observations. NCERT Class 11 Statistics Chapter 6 Part 2 explains that R always lies between −1 and +1 with the same interpretation as Karl Pearson's r. The method is used when the data is qualitative (beauty, intelligence, honesty), when only ranks are available rather than precise values, or when the underlying relationship is monotonic but not linear, making it a more robust measure than Pearson's r in these cases.
When should you use Spearman's rank correlation instead of Karl Pearson's r?
Spearman's rank correlation should be used instead of Karl Pearson's r in three situations highlighted by NCERT Class 11 Statistics Chapter 6 Part 2: (1) when the data is qualitative and cannot be measured precisely, such as beauty, honesty or intelligence; (2) when the relationship between the variables is monotonic but not linear (steady direction but variable rate); and (3) when extreme values (outliers) would distort the Pearson r — ranks are less affected by outliers because they only record relative position. Pearson's r is preferred when the data is precise quantitative and the relationship is linear.
How do you handle repeated ranks in Spearman's correlation in Class 11?
When two or more observations have the same value (tied ranks), assign them the average of the ranks they would have received if they were slightly different — for example, two observations tied for ranks 3 and 4 both receive rank 3.5. NCERT Class 11 Statistics Chapter 6 Part 2 then applies a correction factor to the Spearman formula: add (m³ − m)/12 inside the numerator for each group of tied ranks, where m is the number of tied items in that group. The corrected R = 1 − (6·[ΣD² + Σ(m³ − m)/12]) / (N(N² − 1)) gives an accurate coefficient even with multiple ties.
What is the step deviation method for Karl Pearson's correlation in NCERT Class 11?
The step deviation method for Karl Pearson's correlation simplifies arithmetic when raw values are large by using deviations from assumed means and dividing by step sizes. NCERT Class 11 Statistics Chapter 6 Part 2 gives the formula r = (NΣdx'dy' − Σdx'·Σdy') / √((NΣdx'² − (Σdx')²)(NΣdy'² − (Σdy')²)), where dx' = (X − A)/h_x and dy' = (Y − B)/h_y, with A and B as assumed means and h_x, h_y as step sizes (often class widths). The result is identical to the direct method because r is unaffected by change of origin and scale, but the calculations are far easier to manage.
What is the difference between Karl Pearson's r and Spearman's R in NCERT Class 11?
Karl Pearson's r measures the strength of linear correlation between two quantitative variables using actual data values, while Spearman's R measures the strength of monotonic association using only ranks. NCERT Class 11 Statistics Chapter 6 Part 2 contrasts them: Pearson's r is more accurate when data is precisely measured and linear, but is sensitive to outliers and assumes a linear relationship; Spearman's R is suitable for ordinal or qualitative data, is robust to outliers, and works for any monotonic relationship. For symmetric quantitative data without ties, both methods give very similar values, but for skewed or qualitative data Spearman is preferred.
Can correlation coefficient r exceed 1 in NCERT Class 11 Statistics?
No — the correlation coefficient, whether Karl Pearson's r or Spearman's R, can never exceed +1 in magnitude. NCERT Class 11 Statistics Chapter 6 Part 2 explains that r is mathematically constrained to lie in the closed interval [−1, +1], where +1 indicates perfect positive correlation, −1 indicates perfect negative correlation, and 0 indicates no linear correlation. If a calculation produces a value outside this range, an arithmetic mistake has occurred — common errors include miscomputing Σx², Σy² or Σxy, or forgetting to take the square root in the denominator. The standard practice is to recheck calculations whenever |r| > 1 appears.
💡 Did You Know?
🤖
AI Tutor
Class 11 Economics — Statistics for Economics
Ready
🤖
Hi! 👋 I'm Gaura, your AI Tutor for Spearman Rank Correlation, Tied Ranks & Exercises. Take your time studying the lesson — whenever you have a doubt, just ask me! I'm here to help.