This MCQ module is based on: Correlation Concepts, Scatter Diagram & Karl Pearson
Correlation Concepts, Scatter Diagram & Karl Pearson
This assessment will be based on: Correlation Concepts, Scatter Diagram & Karl Pearson
Upload images, PDFs, or Word documents to include their content in assessment generation.
Correlation — Types, Scatter Diagram and Karl Pearson's Coefficient
So far in this textbook, every chapter has worked with one variable at a time — heights, marks, incomes, prices. Real economics, however, is rarely about one variable in isolation. The price of tomatoes moves with their supply at the local mandi. The number of ice-creams sold moves with the day's temperature. Hours of study move with the marks scored in the next test. Statisticians call such co-movement correlation. Part 1 builds the idea from scratch: what counts as a relationship, the difference between cause-and-effect and a chance coincidence, the visual shortcut of the scatter diagram, and finally the most famous numerical measure of all — Karl Pearson's coefficient of correlation, r.
6.1 From One Variable to Two — Why Correlation Matters
Earlier chapters showed how to summarise a single mass of data: a mean, a median, a standard deviation. Now imagine two columns of figures sitting side by side. As summer heat rises, hill stations fill with visitors and the queues outside ice-cream parlours grow. The day's temperature and the day's ice-cream sales are clearly not independent — they tend to rise together. As fresh tomatoes flood the local mandi, the price drops from Rs 40 a kilo to Rs 4 a kilo. Supply and price are clearly not independent either — they tend to move in opposite directions.
The branch of statistics that studies such co-movement systematically is called correlation analysis?. NCERT lists the three questions it sets out to answer:
6.2 Types of Relationship
6.2.1 Cause-and-Effect, Coincidence and a Lurking Third Variable
Not every observed co-movement carries the same meaning. NCERT carefully distinguishes three situations:
- Cause-and-effect relationship. The movement in quantity demanded with the price of a commodity is the centrepiece of demand theory (which you will meet in Class 12). Low rainfall in a season tends to depress agricultural productivity. Here one variable genuinely produces the change in the other.
- Coincidence. The arrival of migratory birds in a sanctuary may correlate with the local birth rate, and shoe size may correlate with the money in your pocket. The relationships are real in the data — but they are mere coincidence and have no causal interpretation.
- A lurking third variable. In the famous textbook example, brisk ice-cream sales correlate with deaths by drowning. Eating ice-cream does not cause drowning. The unseen third variable — rising summer temperature — drives both: heat boosts ice-cream sales, and heat drives more people to swimming pools, where some unfortunately drown. This kind of false correlation through a hidden common cause is called a spurious correlation?.
6.2.2 Independent vs Dependent Variables
Even when correlation is causal, statisticians give the two variables different names. The independent variable is the one whose change is treated as the cause; the dependent variable changes in response. In the demand example, price is normally the independent variable and quantity demanded the dependent one. In the rainfall example, rainfall (the natural input) is independent and agricultural yield (the response) is dependent. Correlation itself is symmetric — r between X and Y is the same number as r between Y and X — but cause-and-effect language distinguishes the two roles.
6.2.3 Positive and Negative Correlation
Correlation is positive when the two variables move together in the same direction. Income and consumption: when income rises consumption rises, when income falls consumption falls. Sale of ice-cream and temperature also move in the same direction. Correlation is negative when the variables move in opposite directions. When the price of apples falls, demand for apples increases. When you spend more time studying, your chance of failing falls; when you study less, the chance of low marks rises. Both are negative correlations.
6.2.4 Linear and Non-Linear Correlation
For simplicity, NCERT assumes that the correlation we measure is linear correlation?. A relationship is said to be linear if it can be represented by a straight line on graph paper — equal increments in X are accompanied by equal increments (or decrements) in Y. When the underlying co-movement is curved (the points lie on a parabola or some other curve), the relationship is non-linear, and Karl Pearson's coefficient is not the right tool for the job. Visually inspecting a scatter diagram is the easiest way to tell linear from non-linear before plunging into formulae.
6.3 Three Tools for Measuring Correlation
NCERT names three classroom tools for studying correlation:
- Scatter diagram — a visual technique. The values of the two variables are plotted as points on a graph; the closeness and overall direction of the cloud tell us, at a glance, the form and strength of any relationship. It does not produce a number.
- Karl Pearson's coefficient of correlation (r) — a precise numerical measure of linear association, valid only when the relation between X and Y is roughly straight-line.
- Spearman's rank correlation — a measure of linear association between the ranks assigned to individual items, useful when variables (like beauty or honesty) cannot be measured numerically. We meet this in Part 2.
6.4 The Scatter Diagram — A Visual Window
A scatter diagram? is the simplest way to see whether two variables are related. Each pair of values (X, Y) is plotted as a single point on graph paper. The cloud of points immediately tells us two things: (i) the direction of any relationship — is the cloud sloping up, down or going nowhere — and (ii) the tightness of the relationship — do the points hug a line closely or scatter widely around it? When all the points lie exactly on a line, the correlation is perfect; when the points are widely dispersed, the correlation is weak. The relationship is called linear if the cloud lies near a straight line.
Collect data on the height, weight and marks scored in any two subjects of all the students in your class. Draw three scatter diagrams: height vs weight, height vs marks, and marks in subject 1 vs marks in subject 2. What kind of relationship does each show?
6.5 Karl Pearson's Coefficient of Correlation
The scatter diagram gives a feel; what we now need is a single number that captures the same information precisely. Karl Pearson's coefficient of correlation? — also called the product-moment correlation coefficient — gives a numerical value to the degree of linear association between two variables X and Y. NCERT writes the underlying formulae step by step, starting from familiar quantities.
6.5.1 Building Blocks — Means, Variances, Covariance
Let X₁, X₂, …, XN be N values of X with corresponding values Y₁, Y₂, …, YN of Y. The arithmetic means are the familiar quantities
The variances measure the spread of X and Y around their own means:
The standard deviations σx and σy are the positive square roots of the variances. The truly new quantity is the covariance, which measures how X and Y vary together:
where x = X − X̄ and y = Y − Ȳ are the deviations from the respective means. If both deviations are positive on the same observation (X above average and Y above average), or both negative (both below average), the product xy is positive. If they have opposite signs, xy is negative. The sign of the covariance therefore tells us the direction of the co-movement; the sign of r will follow.
6.5.2 Three Equivalent Formulas for r
The product-moment correlation r is defined as the covariance divided by the product of the two standard deviations:
By substituting the definitions of σx and σy the same number can be written in the more familiar deviation form:
And, because squaring out the deviations gives expressions in raw values only, NCERT also presents the direct formula that avoids computing X̄ and Ȳ explicitly:
An algebraically equivalent rearrangement multiplies numerator and denominator by N:
All four formulas give the same number for the same data set. Choose whichever has the simplest arithmetic for the figures in front of you.
6.6 Worked Example 1 — Years of Schooling and Annual Yield per Acre
| X (Years) | X − X̄ | (X − X̄)2 | Y (Yield, Rs '000) | Y − Ȳ | (Y − Ȳ)2 | (X − X̄)(Y − Ȳ) |
|---|---|---|---|---|---|---|
| 0 | −6 | 36 | 4 | −3 | 9 | 18 |
| 2 | −4 | 16 | 4 | −3 | 9 | 12 |
| 4 | −2 | 4 | 6 | −1 | 1 | 2 |
| 6 | 0 | 0 | 10 | 3 | 9 | 0 |
| 8 | 2 | 4 | 10 | 3 | 9 | 6 |
| 10 | 4 | 16 | 8 | 1 | 1 | 4 |
| 12 | 6 | 36 | 7 | 0 | 0 | 0 |
| ΣX = 42 | — | Σ(X−X̄)2 = 112 | ΣY = 49 | — | Σ(Y−Ȳ)2 = 38 | Σxy = 42 |
Means: X̄ = 42 / 7 = 6 years, Ȳ = 49 / 7 = Rs 7 thousand. Standard deviations:
Plug into formula (1):
Or equivalently, using formula (2) directly:
Both routes give r = 0.644. The number is positive, so years of schooling and annual yield per acre move in the same direction. The value is also fairly large — closer to +1 than to 0 — so the linear association is reasonably strong. This underlines the importance of farmers' education: more schooling tends to be linked with higher yield per acre.
6.7 Properties of the Correlation Coefficient
NCERT lists six core properties of r. Memorise them — they are common one-mark and two-mark questions.
6.7.1 The Strength Scale
For quick interpretation, statisticians use a colour-coded scale of |r|. Values close to +1 or −1 are "strong"; values near zero are "weak"; the middle band is "moderate".
6.7.2 Why r and Not Just Covariance?
The covariance Σxy / N is also positive when X and Y move together. Why bother dividing by σx · σy? The answer is units. Covariance carries the units of X × Y (e.g. rupees × kilograms) and grows when we change the scale of measurement. Dividing by the two standard deviations cancels the units and bounds the result between −1 and +1, making r a clean dimensionless yardstick that can be compared across studies.
6.8 Reading the Coefficient — Interpretation Practice
What r tells you
- Sign: direction of linear association (+, − or 0).
- Magnitude: tightness of the linear relationship.
- Bounded: |r| ≤ 1, easy to compare across data sets.
- Unit-free: survives a change of measurement units.
What r does not tell you
- Whether X causes Y (or vice versa).
- Whether a non-linear relation exists when r is near 0.
- Whether outliers are inflating or deflating the value.
- Anything about the slope of the line of best fit (that is regression, not correlation).
For two variables X and Y, you find r = −0.85. A friend computes r between Y and X and reports r = +0.85. Who is correct — or could both be? Explain in your own words using the formula.
A student reports r = 1.32 for the correlation between two variables. Without seeing his data, what would you tell him? Justify using a property of r.
6.9 A Worked Case-Based Question
📋 Case-Based Question — Hours of Coaching and Test Score
X (hours): 2, 4, 6, 8, 10, 12 Y (score): 18, 22, 30, 34, 40, 44. The tutor wants to know whether and how strongly coaching hours and scores are related.
6.10 Assertion–Reason Questions
Choose: (A) Both A and R are true and R is the correct explanation of A. (B) Both A and R are true but R is not the correct explanation of A. (C) A is true, R is false. (D) A is false, R is true.
Continue to Part 2 — Step-Deviation Method, Spearman's Rank Correlation, NCERT Exercises and Summary.
Frequently Asked Questions — Correlation — Types, Scatter Diagram and Karl Pearson's Coefficient
What is correlation in NCERT Class 11 Statistics Chapter 6?
Correlation is a statistical measure that describes the direction and strength of the relationship between two variables — for example, height and weight, or price and quantity demanded. NCERT Class 11 Statistics Chapter 6 explains that correlation is positive when the variables move in the same direction, negative when they move in opposite directions, and zero when there is no systematic relationship. Correlation does not imply causation: two variables may move together because of a common third factor or by coincidence, so further analysis is needed before claiming that one variable causes the other.
What are the types of correlation in NCERT Class 11 Statistics?
NCERT Class 11 Statistics Chapter 6 Part 1 lists three pairs of correlation types. Direction: positive correlation (both variables move together) versus negative correlation (variables move opposite). Linearity: linear correlation (constant rate of change, points lie near a straight line) versus non-linear correlation (variable rate, curved scatter). Number of variables: simple correlation (two variables) versus multiple correlation (three or more variables, beyond Class 11 syllabus). Strength can also be classified as perfect (r = ±1), high (|r| close to 1), moderate or low. The scatter diagram and Karl Pearson's r both reveal these types.
What is a scatter diagram in NCERT Class 11 Statistics Chapter 6?
A scatter diagram is a graphical representation of paired observations of two variables on a coordinate plane, with one variable on the X-axis and the other on the Y-axis. NCERT Class 11 Statistics Chapter 6 Part 1 explains that the pattern formed by the dots reveals the direction, form and strength of correlation. Points clustering tightly around an upward-sloping line indicate strong positive correlation, downward-sloping cluster indicates negative correlation, a random scatter indicates no correlation, and a curved cluster indicates non-linear correlation. The scatter diagram is the first step before calculating any correlation coefficient.
What is the formula for Karl Pearson's coefficient of correlation in Class 11?
Karl Pearson's coefficient of correlation r is calculated as r = Σ(x − x̄)(y − ȳ) / (n · σx · σy) for individual values, or equivalently r = Σdx·dy / √(Σdx² · Σdy²) using deviations from the mean. NCERT Class 11 Statistics Chapter 6 Part 1 also gives the formula using actual values: r = (NΣxy − ΣxΣy) / √((NΣx² − (Σx)²)(NΣy² − (Σy)²)). The coefficient r always lies between −1 and +1, has no units, is unaffected by change of origin or scale, and measures only linear association between two quantitative variables.
What does it mean if Karl Pearson's r is 0.85 in Class 11 Statistics?
A Karl Pearson coefficient of correlation r of 0.85 indicates a strong positive linear correlation between the two variables — they move together in the same direction, and a straight line fits the scatter pattern very well. NCERT Class 11 Statistics Chapter 6 Part 1 gives a rough interpretation guide: |r| above 0.75 is high correlation, between 0.50 and 0.75 is moderate, between 0.25 and 0.50 is low, and below 0.25 is negligible. However, r = 0.85 only describes the linear pattern; the underlying causation must be argued separately, and outliers can inflate the apparent correlation, so a scatter diagram should always be checked.
What are the main properties of correlation coefficient r in NCERT Class 11?
NCERT Class 11 Statistics Chapter 6 Part 1 lists four main properties of Karl Pearson's correlation coefficient r. First, r is a pure number with no units of measurement. Second, r always lies between −1 and +1, where r = +1 means perfect positive correlation and r = −1 means perfect negative correlation. Third, r is unaffected by change of origin (adding/subtracting a constant) and change of scale (multiplying/dividing by a positive constant). Fourth, r is symmetric — the correlation between X and Y equals the correlation between Y and X. These properties make r a robust, comparable measure across different data sets.