TOPIC 10 OF 15

Correlation Concepts, Scatter Diagram & Karl Pearson

🎓 Class 11 Social Science CBSE Theory Ch 6 — Correlation ⏱ ~28 min
🌐 Language: [gtranslate]

This MCQ module is based on: Correlation Concepts, Scatter Diagram & Karl Pearson

This assessment will be based on: Correlation Concepts, Scatter Diagram & Karl Pearson

Upload images, PDFs, or Word documents to include their content in assessment generation.

Class 11 · Statistics for Economics · Chapter 6 · Part 1

Correlation — Types, Scatter Diagram and Karl Pearson's Coefficient

So far in this textbook, every chapter has worked with one variable at a time — heights, marks, incomes, prices. Real economics, however, is rarely about one variable in isolation. The price of tomatoes moves with their supply at the local mandi. The number of ice-creams sold moves with the day's temperature. Hours of study move with the marks scored in the next test. Statisticians call such co-movement correlation. Part 1 builds the idea from scratch: what counts as a relationship, the difference between cause-and-effect and a chance coincidence, the visual shortcut of the scatter diagram, and finally the most famous numerical measure of all — Karl Pearson's coefficient of correlation, r.

6.1 From One Variable to Two — Why Correlation Matters

Earlier chapters showed how to summarise a single mass of data: a mean, a median, a standard deviation. Now imagine two columns of figures sitting side by side. As summer heat rises, hill stations fill with visitors and the queues outside ice-cream parlours grow. The day's temperature and the day's ice-cream sales are clearly not independent — they tend to rise together. As fresh tomatoes flood the local mandi, the price drops from Rs 40 a kilo to Rs 4 a kilo. Supply and price are clearly not independent either — they tend to move in opposite directions.

The branch of statistics that studies such co-movement systematically is called correlation analysis?. NCERT lists the three questions it sets out to answer:

Is there a relationship?
Do the two variables co-move at all, or do they wander independently of each other?
Direction of movement?
If one variable rises, does the other rise too (same direction) or fall (opposite direction)?
📊
How strong is it?
Is the link tight and predictable or loose and noisy? Statistics gives us a number for this.
📖 Definition — Correlation
Correlation studies and measures the direction and intensity of the relationship between two variables. The presence of correlation between X and Y simply means that when the value of one variable changes in one direction, the value of the other also changes in a definite way — either in the same direction (positive change) or the opposite direction (negative change). Correlation measures covariation, not causation.

6.2 Types of Relationship

6.2.1 Cause-and-Effect, Coincidence and a Lurking Third Variable

Not every observed co-movement carries the same meaning. NCERT carefully distinguishes three situations:

  • Cause-and-effect relationship. The movement in quantity demanded with the price of a commodity is the centrepiece of demand theory (which you will meet in Class 12). Low rainfall in a season tends to depress agricultural productivity. Here one variable genuinely produces the change in the other.
  • Coincidence. The arrival of migratory birds in a sanctuary may correlate with the local birth rate, and shoe size may correlate with the money in your pocket. The relationships are real in the data — but they are mere coincidence and have no causal interpretation.
  • A lurking third variable. In the famous textbook example, brisk ice-cream sales correlate with deaths by drowning. Eating ice-cream does not cause drowning. The unseen third variable — rising summer temperature — drives both: heat boosts ice-cream sales, and heat drives more people to swimming pools, where some unfortunately drown. This kind of false correlation through a hidden common cause is called a spurious correlation?.
⚠ Crucial Caution — Correlation Is Not Causation
Correlation should never be interpreted as implying a cause-and-effect relation. If X and Y move together it could be (a) X causes Y, (b) Y causes X, (c) a third variable Z causes both, or (d) the two simply happen to co-move by chance over the period observed. Correlation alone cannot tell which.
NCERT Caution An epidemic spreads in some villages and the government sends a team of doctors to the affected areas. The correlation between the number of doctors sent and the number of deaths reported turns out to be positive. Does this mean doctors cause deaths? Not at all — many of the reported deaths were terminal cases the doctors could do little about, the data covers too short a period to capture recovery, and a tsunami struck during the same window. Statistics needs to be read alongside common sense.

6.2.2 Independent vs Dependent Variables

Even when correlation is causal, statisticians give the two variables different names. The independent variable is the one whose change is treated as the cause; the dependent variable changes in response. In the demand example, price is normally the independent variable and quantity demanded the dependent one. In the rainfall example, rainfall (the natural input) is independent and agricultural yield (the response) is dependent. Correlation itself is symmetric — r between X and Y is the same number as r between Y and X — but cause-and-effect language distinguishes the two roles.

6.2.3 Positive and Negative Correlation

Correlation is positive when the two variables move together in the same direction. Income and consumption: when income rises consumption rises, when income falls consumption falls. Sale of ice-cream and temperature also move in the same direction. Correlation is negative when the variables move in opposite directions. When the price of apples falls, demand for apples increases. When you spend more time studying, your chance of failing falls; when you study less, the chance of low marks rises. Both are negative correlations.

Positive Correlation (X and Y rise together) X (e.g. Temperature) Y Same direction — r is between 0 and +1 Negative Correlation (X up, Y down) X (e.g. Apple Price) Y Opposite direction — r is between −1 and 0
A side-by-side picture of the two basic shapes of correlation. The upward-sloping cloud on the left signals positive co-movement; the downward-sloping cloud on the right signals negative co-movement.

6.2.4 Linear and Non-Linear Correlation

For simplicity, NCERT assumes that the correlation we measure is linear correlation?. A relationship is said to be linear if it can be represented by a straight line on graph paper — equal increments in X are accompanied by equal increments (or decrements) in Y. When the underlying co-movement is curved (the points lie on a parabola or some other curve), the relationship is non-linear, and Karl Pearson's coefficient is not the right tool for the job. Visually inspecting a scatter diagram is the easiest way to tell linear from non-linear before plunging into formulae.

6.3 Three Tools for Measuring Correlation

NCERT names three classroom tools for studying correlation:

  1. Scatter diagram — a visual technique. The values of the two variables are plotted as points on a graph; the closeness and overall direction of the cloud tell us, at a glance, the form and strength of any relationship. It does not produce a number.
  2. Karl Pearson's coefficient of correlation (r) — a precise numerical measure of linear association, valid only when the relation between X and Y is roughly straight-line.
  3. Spearman's rank correlation — a measure of linear association between the ranks assigned to individual items, useful when variables (like beauty or honesty) cannot be measured numerically. We meet this in Part 2.

6.4 The Scatter Diagram — A Visual Window

A scatter diagram? is the simplest way to see whether two variables are related. Each pair of values (X, Y) is plotted as a single point on graph paper. The cloud of points immediately tells us two things: (i) the direction of any relationship — is the cloud sloping up, down or going nowhere — and (ii) the tightness of the relationship — do the points hug a line closely or scatter widely around it? When all the points lie exactly on a line, the correlation is perfect; when the points are widely dispersed, the correlation is weak. The relationship is called linear if the cloud lies near a straight line.

📊 Six Patterns Identified by NCERT
Figures 6.1 to 6.7 in the textbook walk through the standard scatter shapes. The five patterns most commonly tested are reproduced in the four panels below.
Fig. 6.4 — Perfect Positive Correlation (r = +1). Every point sits on the same upward line.
Fig. 6.5 — Perfect Negative Correlation (r = −1). Every point sits on the same downward line.
Fig. 6.1 — Strong Positive Correlation (r close to +1). Points scatter around an upward line.
Fig. 6.2 — Strong Negative Correlation (r close to −1). Points scatter around a downward line.
Fig. 6.3 — No Correlation (r near 0). Points are scattered with no rising or falling trend.
Fig. 6.6 / 6.7 — Non-Linear Relation. The cloud follows a clear curve, not a straight line. Karl Pearson's r should not be used here.
⚠ Why the Scatter Comes First
Karl Pearson's coefficient is designed to detect linear association. If the true pattern is curved (like Fig. 6.6 / 6.7) the coefficient can give a misleadingly small or even zero value, even though X and Y are tightly linked. NCERT therefore advises plotting a scatter diagram before calculating Karl Pearson's coefficient — to make sure a straight line is at least a reasonable description of the data.
EXPLORE — NCERT Activity (Scatter Diagram)
Bloom: L3 Apply

Collect data on the height, weight and marks scored in any two subjects of all the students in your class. Draw three scatter diagrams: height vs weight, height vs marks, and marks in subject 1 vs marks in subject 2. What kind of relationship does each show?

✅ Sample Observation
You will usually find a moderate positive correlation between height and weight (taller students tend to weigh more), weak or no correlation between height and marks (no good reason for one to drive the other), and moderate to strong positive correlation between marks in two subjects (good students tend to do well in both, weak students tend to struggle in both). These intuitions are exactly what scatter diagrams reveal at a glance, before any number is calculated.

6.5 Karl Pearson's Coefficient of Correlation

The scatter diagram gives a feel; what we now need is a single number that captures the same information precisely. Karl Pearson's coefficient of correlation? — also called the product-moment correlation coefficient — gives a numerical value to the degree of linear association between two variables X and Y. NCERT writes the underlying formulae step by step, starting from familiar quantities.

6.5.1 Building Blocks — Means, Variances, Covariance

Let X₁, X₂, …, XN be N values of X with corresponding values Y₁, Y₂, …, YN of Y. The arithmetic means are the familiar quantities

X̄ = ΣXN     Ȳ = ΣYN

The variances measure the spread of X and Y around their own means:

σ2x = Σ(X − X̄)2N     σ2y = Σ(Y − Ȳ)2N

The standard deviations σx and σy are the positive square roots of the variances. The truly new quantity is the covariance, which measures how X and Y vary together:

Cov(X, Y) = Σ(X − X̄)(Y − Ȳ)N = ΣxyN

where x = X − X̄ and y = Y − Ȳ are the deviations from the respective means. If both deviations are positive on the same observation (X above average and Y above average), or both negative (both below average), the product xy is positive. If they have opposite signs, xy is negative. The sign of the covariance therefore tells us the direction of the co-movement; the sign of r will follow.

6.5.2 Three Equivalent Formulas for r

The product-moment correlation r is defined as the covariance divided by the product of the two standard deviations:

r = ΣxyN · σx · σy  … (1)

By substituting the definitions of σx and σy the same number can be written in the more familiar deviation form:

r = Σ(X − X̄)(Y − Ȳ)√[ Σ(X − X̄)2 · Σ(Y − Ȳ)2 ]  … (2)

And, because squaring out the deviations gives expressions in raw values only, NCERT also presents the direct formula that avoids computing X̄ and Ȳ explicitly:

r = ΣXY − (ΣX)(ΣY)/N√[ ( ΣX2 − (ΣX)2/N ) · ( ΣY2 − (ΣY)2/N ) ]  … (3)

An algebraically equivalent rearrangement multiplies numerator and denominator by N:

r = NΣXY − (ΣX)(ΣY)√[ NΣX2 − (ΣX)2 ] · √[ NΣY2 − (ΣY)2 ]  … (4)

All four formulas give the same number for the same data set. Choose whichever has the simplest arithmetic for the figures in front of you.

6.6 Worked Example 1 — Years of Schooling and Annual Yield per Acre

📝 Worked Example 1 (NCERT)
Seven farmers report their years of schooling and the annual yield per acre (Rs '000). Compute the Karl Pearson coefficient of correlation r.
Table 6.1 — Calculation of r between years of schooling of farmers and annual yield
X (Years)X − X̄(X − X̄)2Y (Yield, Rs '000)Y − Ȳ(Y − Ȳ)2(X − X̄)(Y − Ȳ)
0−6364−3918
2−4164−3912
4−246−112
60010390
82410396
104168114
126367000
ΣX = 42Σ(X−X̄)2 = 112ΣY = 49Σ(Y−Ȳ)2 = 38Σxy = 42

Means: X̄ = 42 / 7 = 6 years, Ȳ = 49 / 7 = Rs 7 thousand. Standard deviations:

σx = √(112/7) = √16 = 4     σy = √(38/7)

Plug into formula (1):

r = 427 × √(112/7) × √(38/7) = 42√112 · √38 = 0.644

Or equivalently, using formula (2) directly:

r = 42√(112 × 38) = 42√4256 = 0.644

Both routes give r = 0.644. The number is positive, so years of schooling and annual yield per acre move in the same direction. The value is also fairly large — closer to +1 than to 0 — so the linear association is reasonably strong. This underlines the importance of farmers' education: more schooling tends to be linked with higher yield per acre.

Scatter of Example 1: years of schooling (X) vs annual yield per acre (Y, Rs '000). The cloud rises from left to right — visual evidence of the positive correlation r = 0.644 we just calculated.

6.7 Properties of the Correlation Coefficient

NCERT lists six core properties of r. Memorise them — they are common one-mark and two-mark questions.

1. r has no unit
It is a pure number. The units of measurement cancel out in the formula. r between height in feet and weight in kg could be 0.7 — not 0.7 kg/feet.
2. Sign tells direction
A negative r means the two variables move in opposite directions (apple price up, demand down). A positive r means they move together (income up, consumption up).
3. r lies between −1 and +1
The bounds −1 ≤ r ≤ +1 are mathematically guaranteed. If a calculation produces a value outside this range, the arithmetic is wrong.
4. Independent of origin and scale
Defining U = (X − A)/B and V = (Y − C)/D leaves r unchanged: rUV = rXY as long as B and D have the same sign. This is the basis of the step-deviation method.
5. r = 0 means no linear relation
When r = 0 the variables are linearly uncorrelated — but a non-linear relation between them may still exist.
6. r = ±1 is perfect linear
If r = +1 or r = −1 the relation is perfect and exactly linear — every data point lies on the same straight line. Values close to ±1 indicate strong linear association; values close to 0 indicate weak.

6.7.1 The Strength Scale

For quick interpretation, statisticians use a colour-coded scale of |r|. Values close to +1 or −1 are "strong"; values near zero are "weak"; the middle band is "moderate".

−1.0
Perfect −
−0.7
Strong −
−0.3
Weak −
0
Zero
+0.3
Weak +
+0.7
Strong +
+1.0
Perfect +
NCERT Reading Suppose the correlation between marks in English and Statistics is 0.1. The two are positively correlated, but the link is weak: a top scorer in English may score modestly in Statistics. Had r been 0.9, an English topper would almost invariably top Statistics too. For negative correlation, r = −0.9 means a flood of vegetables in the mandi is reliably accompanied by a sharp price drop, whereas r = −0.1 means the price barely moves. The absolute value |r| measures predictive strength; the sign tells the direction.

6.7.2 Why r and Not Just Covariance?

The covariance Σxy / N is also positive when X and Y move together. Why bother dividing by σx · σy? The answer is units. Covariance carries the units of X × Y (e.g. rupees × kilograms) and grows when we change the scale of measurement. Dividing by the two standard deviations cancels the units and bounds the result between −1 and +1, making r a clean dimensionless yardstick that can be compared across studies.

6.8 Reading the Coefficient — Interpretation Practice

What r tells you

  • Sign: direction of linear association (+, − or 0).
  • Magnitude: tightness of the linear relationship.
  • Bounded: |r| ≤ 1, easy to compare across data sets.
  • Unit-free: survives a change of measurement units.

What r does not tell you

  • Whether X causes Y (or vice versa).
  • Whether a non-linear relation exists when r is near 0.
  • Whether outliers are inflating or deflating the value.
  • Anything about the slope of the line of best fit (that is regression, not correlation).
THINK — Sign Inversion Puzzle
Bloom: L4 Analyse

For two variables X and Y, you find r = −0.85. A friend computes r between Y and X and reports r = +0.85. Who is correct — or could both be? Explain in your own words using the formula.

✅ Sample
Both can never be right. The Pearson formula r = Σ(X − X̄)(Y − Ȳ) / √[…] is symmetric in X and Y — swapping the variables does not change either the numerator or the denominator. So rXY = rYX. If one of you got −0.85 and the other +0.85, somebody made an arithmetic error. Correlation does not have a "first" or "second" variable.
DISCUSS — Could r Lie Outside ±1?
Bloom: L5 Evaluate

A student reports r = 1.32 for the correlation between two variables. Without seeing his data, what would you tell him? Justify using a property of r.

✅ Sample
Property 3 of r guarantees that −1 ≤ r ≤ +1 for any data set. A value of 1.32 is mathematically impossible. The student has made an error — most likely in computing ΣX2, ΣY2 or in losing a square-root in the denominator. Recompute carefully; the correct answer must lie inside [−1, +1].

6.9 A Worked Case-Based Question

📋 Case-Based Question — Hours of Coaching and Test Score

An economics tutor records the number of hours of coaching (X) and the test score out of 50 (Y) for six of her students.
X (hours): 2, 4, 6, 8, 10, 12    Y (score): 18, 22, 30, 34, 40, 44. The tutor wants to know whether and how strongly coaching hours and scores are related.
Q1. Define correlation in your own words and state whether the relationship in the data above is positive or negative just by inspection.
L1 Remember
Answer: Correlation is the statistical measure of the direction and intensity of the relationship between two variables. As X (hours) rises from 2 to 12, Y (score) rises from 18 to 44 in step. The relationship is clearly positive.
Q2. Compute X̄, Ȳ, Σ(X − X̄)2, Σ(Y − Ȳ)2 and Σ(X − X̄)(Y − Ȳ), then use formula (2) to find r.
L3 Apply
Answer: X̄ = 42/6 = 7. Ȳ = 188/6 = 31.33. Deviations x: −5, −3, −1, 1, 3, 5; x2: 25, 9, 1, 1, 9, 25 (sum = 70). Deviations y: −13.33, −9.33, −1.33, 2.67, 8.67, 12.67; y2: 177.7, 87.0, 1.78, 7.11, 75.1, 160.4 (sum ≈ 509.3). Cross-products xy: 66.67, 28.0, 1.33, 2.67, 26.0, 63.33 (sum ≈ 188.0). r = 188.0 / √(70 × 509.3) = 188.0 / √35,651 ≈ 188.0 / 188.8 ≈ 0.996.
Q3. The tutor concludes from r = 0.996 that "more coaching causes higher scores". Critically evaluate this claim using one property of correlation.
L4 Analyse
Answer: The conclusion mixes correlation with causation. r = 0.996 only tells us that hours and scores are very strongly linearly related. It does not prove that coaching is the cause. A third variable — for example, student motivation — could drive both how many hours a student attends and how well she performs. The tutor needs a controlled comparison (e.g. randomly assigning students to different hours) before claiming causation.
Q4. Suppose the tutor changes units — she records hours in minutes (multiply X by 60) and score as a percentage out of 100 (multiply Y by 2). What happens to r? Justify using the relevant property.
L5 Evaluate
Answer: r remains unchanged at 0.996. By Property 4, the correlation coefficient is independent of the change of origin and scale: defining U = X/(1/60) = 60X and V = 2Y leaves rUV = rXY because both scale factors are positive. This is exactly why r is so useful — it does not depend on the units of measurement.

6.10 Assertion–Reason Questions

⚖ Assertion–Reason Questions (Class 11)

Choose: (A) Both A and R are true and R is the correct explanation of A. (B) Both A and R are true but R is not the correct explanation of A. (C) A is true, R is false. (D) A is false, R is true.

Assertion (A): A high positive correlation between ice-cream sales and deaths by drowning proves that eating ice-cream causes drowning.
Reason (R): Correlation measures covariation, not causation; a hidden third variable like temperature can drive both quantities.
Correct: (D) — Assertion is false. A high correlation in this case is a textbook example of spurious correlation: the lurking variable, summer temperature, raises both ice-cream sales and the number of swimmers (some of whom drown). Reason is true and is exactly the property of correlation that exposes the false claim.
Assertion (A): Karl Pearson's coefficient of correlation is unaffected by a change of origin and a change of scale (with same-sign scale factors).
Reason (R): Defining U = (X − A)/B and V = (Y − C)/D yields rUV = rXY, which underlies the step-deviation method of computation.
Correct: (A) — Both statements are true and R is the correct explanation of A. This invariance is exactly why we are free to subtract assumed means and divide by common factors during calculation without altering the answer.
Assertion (A): If Karl Pearson's r = 0, the variables X and Y must be entirely independent of each other.
Reason (R): Karl Pearson's coefficient measures only linear association; a strong non-linear relation can still exist when r equals zero.
Correct: (D) — Assertion is false. r = 0 only rules out a linear relationship; the variables may still follow a curved (e.g. parabolic) relation. Reason is true and explains the limit of Pearson's coefficient. The data X = −3, −2, −1, 1, 2, 3 and Y = 9, 4, 1, 1, 4, 9 (Y = X2) gives r = 0 even though Y is fully determined by X.

Continue to Part 2 — Step-Deviation Method, Spearman's Rank Correlation, NCERT Exercises and Summary.

Frequently Asked Questions — Correlation — Types, Scatter Diagram and Karl Pearson's Coefficient

What is correlation in NCERT Class 11 Statistics Chapter 6?

Correlation is a statistical measure that describes the direction and strength of the relationship between two variables — for example, height and weight, or price and quantity demanded. NCERT Class 11 Statistics Chapter 6 explains that correlation is positive when the variables move in the same direction, negative when they move in opposite directions, and zero when there is no systematic relationship. Correlation does not imply causation: two variables may move together because of a common third factor or by coincidence, so further analysis is needed before claiming that one variable causes the other.

What are the types of correlation in NCERT Class 11 Statistics?

NCERT Class 11 Statistics Chapter 6 Part 1 lists three pairs of correlation types. Direction: positive correlation (both variables move together) versus negative correlation (variables move opposite). Linearity: linear correlation (constant rate of change, points lie near a straight line) versus non-linear correlation (variable rate, curved scatter). Number of variables: simple correlation (two variables) versus multiple correlation (three or more variables, beyond Class 11 syllabus). Strength can also be classified as perfect (r = ±1), high (|r| close to 1), moderate or low. The scatter diagram and Karl Pearson's r both reveal these types.

What is a scatter diagram in NCERT Class 11 Statistics Chapter 6?

A scatter diagram is a graphical representation of paired observations of two variables on a coordinate plane, with one variable on the X-axis and the other on the Y-axis. NCERT Class 11 Statistics Chapter 6 Part 1 explains that the pattern formed by the dots reveals the direction, form and strength of correlation. Points clustering tightly around an upward-sloping line indicate strong positive correlation, downward-sloping cluster indicates negative correlation, a random scatter indicates no correlation, and a curved cluster indicates non-linear correlation. The scatter diagram is the first step before calculating any correlation coefficient.

What is the formula for Karl Pearson's coefficient of correlation in Class 11?

Karl Pearson's coefficient of correlation r is calculated as r = Σ(x − x̄)(y − ȳ) / (n · σx · σy) for individual values, or equivalently r = Σdx·dy / √(Σdx² · Σdy²) using deviations from the mean. NCERT Class 11 Statistics Chapter 6 Part 1 also gives the formula using actual values: r = (NΣxy − ΣxΣy) / √((NΣx² − (Σx)²)(NΣy² − (Σy)²)). The coefficient r always lies between −1 and +1, has no units, is unaffected by change of origin or scale, and measures only linear association between two quantitative variables.

What does it mean if Karl Pearson's r is 0.85 in Class 11 Statistics?

A Karl Pearson coefficient of correlation r of 0.85 indicates a strong positive linear correlation between the two variables — they move together in the same direction, and a straight line fits the scatter pattern very well. NCERT Class 11 Statistics Chapter 6 Part 1 gives a rough interpretation guide: |r| above 0.75 is high correlation, between 0.50 and 0.75 is moderate, between 0.25 and 0.50 is low, and below 0.25 is negligible. However, r = 0.85 only describes the linear pattern; the underlying causation must be argued separately, and outliers can inflate the apparent correlation, so a scatter diagram should always be checked.

What are the main properties of correlation coefficient r in NCERT Class 11?

NCERT Class 11 Statistics Chapter 6 Part 1 lists four main properties of Karl Pearson's correlation coefficient r. First, r is a pure number with no units of measurement. Second, r always lies between −1 and +1, where r = +1 means perfect positive correlation and r = −1 means perfect negative correlation. Third, r is unaffected by change of origin (adding/subtracting a constant) and change of scale (multiplying/dividing by a positive constant). Fourth, r is symmetric — the correlation between X and Y equals the correlation between Y and X. These properties make r a robust, comparable measure across different data sets.

AI Tutor
Class 11 Economics — Statistics for Economics
Ready
Hi! 👋 I'm Gaura, your AI Tutor for Correlation Concepts, Scatter Diagram & Karl Pearson. Take your time studying the lesson — whenever you have a doubt, just ask me! I'm here to help.