Introduction to Statistics

Learn the fundamentals of statistics, including its importance, types of data, descriptive vs. inferential statistics, data collection methods, sampling techniques, and common biases in research.

What is Statistics?

Statistics is like a superpower for understanding numbers. It’s all about collecting, organizing, analyzing, and interpreting data. When you hear the word "data," think of it as information. It could be numbers, words, or even pictures.

Statistics helps us answer questions like, “What’s typical?”, “What’s unusual?”, and “What happens next?”

Quick Links:
Descriptive and Inferential Statistics
Types of Data- Categorical, Numerical, Discrete and Continuous
Collecting data – Population vs Sample
Sampling – How to Pick the Right People
Bias in Data Collection

Descriptive and Inferential Statistics

There are two main branches of statistics, and they each do something different.

Descriptive Statistics
This is all about summarizing the data you have. Imagine you just took a survey about everyone’s favorite pizza toppings in your school. Descriptive stats will give you numbers like averages, totals, or tables to show what people said.

Example:

300 people surveyed.
150 chose pepperoni.
50% of people LOVE pepperoni.

It’s like holding up a snapshot of what happened.

Inferential Statistics
This is where stats gets a little more like magic. Inferential stats lets us take a small sample of data and make predictions about a bigger group.

Example:

You survey 200 students out of 5,000 in your college.
Inferential statistics helps you guess what all 5,000 students probably think based on the smaller group!

It’s like using a trailer to figure out if you’ll like the whole movie (but with math).

Types of Data

There are two big categories of data you’ll run into –

Categorical or Qualitative Data and
Numerical or Quantitative Data

Categorical Data

This type of data describes qualities or characteristics. It’s anything that can’t be counted with numbers.

Example: Colors of cars (red, blue, black).
Another example: Types of pets (dog, cat, hamster)

Categorical data is further divided into:

1. Nominal Data – “Name It” Data

Think: Categories with no ranking
Nominal = “Name only.” These are just labels with no order—like different flavors of ice cream. You can’t say one is "greater" or "less than" another (even if you have a favorite!).

Examples:

Eye color: Brown, Blue, Green
Types of pets: Dog, Cat, Parrot
Car brands: Toyota, Ford, BMW

Key rule? You can’t logically sort them from best to worst. They're just different, not ranked!

2. Ordinal Data – “Order Matters” Data

Think: Categories WITH a ranking
Ordinal sounds like "order," so there’s a meaningful sequence, but the gap between ranks isn’t always equal.

Examples:

T-shirt sizes: Small, Medium, Large, XL (XL is bigger than Large, but by how much?)
Education levels: High School, Bachelor’s, Master’s, PhD
Customer reviews: Bad, Okay, Good, Excellent

Key rule? There’s a clear ranking, but you don’t know exact differences between levels.

Numerical Data

This data is all about quantities (how much or how many).

Example 1: Your shoe size.
Example 2: The number of fries in your fast food order.

Easy, right? If it’s words, it’s qualitative. If it’s numbers, it’s quantitative.

Just like Categorical data, Numerical data also has two types:

1. Interval Data – “Meaningful Gaps, But No True Zero”

Think: Numbers where differences make sense, but zero is just a placeholder.
With interval data, you can add and subtract values, and the gaps between numbers are equal. But here’s the catch: zero doesn’t mean “nothing.”

Examples:

Temperature in Celsius or Fahrenheit (0°C doesn’t mean “no temperature,” it’s just another point on the scale.)
Years on a timeline (The year 0 doesn’t mean “nothing happened before,” it’s just a starting point.)

Key rule? You can’t multiply or divide meaningfully. Saying “20°C is twice as hot as 10°C” makes no sense because 0°C isn’t the true “start” of heat.

2. Ratio Data – “The Real Deal with a True Zero”

Think: Numbers where zero means “nothing” and all math works!
Ratio data is like interval data, but with a true zero—meaning zero means none of what you’re measuring. This lets you do all math operations: add, subtract, multiply, and divide.

Examples:

Height and weight (0 kg means no weight, and 100 kg is truly twice as heavy as 50 kg.)
Income ($0 means no money, and $100K is twice as much as $50K.)
Distance traveled (0 km means you didn’t move.)

Key rule? Zero means nothing (it’s not just a placeholder), so saying “twice as much” actually makes sense.

Discrete and Continuous Data

Discrete and continuous data aren’t separate types of data. They describe how numbers behave within interval and ratio data (because those are numerical).

1. Discrete Data – “Countable and No Decimals”

Think: Whole numbers, things you can count.
Discrete data consists of specific, separate values—you can’t have fractions or decimals in between.

Examples:

Number of students in a class (You can have 25 students, but not 25.3 students, though that would be funny to think about.)
Number of pets someone owns (You can have 2 dogs, but not 2.5 dogs.)
Shoe sizes (Even though they have decimal points, they come in fixed steps like 7, 7.5, 8, etc.)

Key rule? If you can count it in whole numbers, it’s discrete!

2. Continuous Data – “Infinite Possibilities”

Think: Measurable things that can have decimals.
Continuous data can take any value within a range—it’s measured, not counted.

Examples:

Height and weight (You can be 5.6 feet tall or 72.3 kg.)
Temperature (It can be 25.3°C or 98.76°F.)
Time taken to run a race (Could be exactly 12.483 seconds.)

Key rule? If you can measure it and it can have any decimal value, it’s continuous!

Here’s a cheat sheet to sum it up:

A screenshot of a whTable summarizing nominal, ordinal, interval, and ratio data types.ite sheet with black textAI-generated content may be incorrect.

If it’s just a name, it’s nominal.
If it’s ordered but uneven, it’s ordinal.
If zero is just a point, it’s interval.
If zero means none, it’s ratio.
If you count it, it’s discrete.
If you measure it, it’s continuous.

Collecting Data

Alright, so you know about different types of data. But where does all this data come from? How do you collect it in a way that actually makes sense?

Populations vs. Samples – Do You Need Everybody or Just Somebody?

Imagine you want to find out the average height of people in your city.

You could measure every single person (that’s your population).

OR you could measure a smaller group and use that to estimate the whole city (that’s your sample).

Population = The entire group you care about
Sample = A smaller group taken from the population

Real-Life Example:

If a company wants to know how all customers feel about their product, their population = all customers.
Since asking everyone is too expensive and time-consuming, they survey a few thousand customers instead. That’s their sample.

The Goal? Make sure your sample represents the population accurately—otherwise, your results will be garbage.

Sampling Methods – Picking the Right People

There are different ways to choose a sample, and some are way better than others.

Simple Random Sampling

Aka the “lottery” method.

Everyone has an equal chance of being picked.
Imagine putting everyone’s name in a hat and picking a few at random.
Example: A university randomly selects 200 students from a list of all students.

Pros: Super fair, easy to understand.

Cons: Not always practical (like if your population is spread across the world)

Stratified Sampling

Aka the “Organised Random” method

You split people into groups (strata) and then randomly pick from each group.
Example: If a school wants to survey students, but they want to make sure they have students from each grade level, they divide students by grade and then pick randomly within each grade.

Pros: More accurate because it ensures all groups are included.
Cons: Takes more effort to organize.

Cluster Sampling

Aka the “Random Groups” method

Instead of picking individuals, you randomly select entire groups (clusters).
Example: Instead of surveying individual people across a city, you randomly pick 5 neighbourhoods and survey everyone in them.

Pros: Super convenient for large populations.
Cons: Risky if the chosen clusters aren’t diverse.

Systemic Sampling

Aka the “Every Nth Person” Method

Pick every nth person from a list.
Example: If you have 1,000 people and need 100, you pick every 10th person.

Pros: Quick and simple.
Cons: If there’s a pattern in the list, it might accidentally create bias.

Convenience Sampling (Avoid This!)

You pick whoever is easy to reach—like your friends, coworkers, or people walking by.
Example: A student surveys only their classmates because it’s easier.

Pros: Fast and cheap.
Cons: Usually very biased and doesn’t represent the whole population.

Danger! If your sample is biased, your results will be useless.

Bias in Data Collection – How to Accidentally Ruin Your Study

Even if you pick a good sample, you can still mess things up. Here’s how:

Selection Bias

When you pick the “wrong people.”

Problem: Your sample doesn’t represent the whole population.

Example: A political poll only surveys rich neighbourhoods, so it misses the opinions of lower-income voters.

Non-Response Bias

When people ignore you.

Problem: The people who answer surveys aren’t random—they might have strong opinions.

Example: A restaurant emails surveys to customers, but only angry people respond.

Leading Questions

When you “force” an answer.

Problem: The way you ask a question pushes people toward a certain answer.

Example:
Bad question: “Don’t you agree that our product is amazing?”

Better question: “How would you rate our product?”

Social Desirability Bias

When people lie to look good.

Problem: People don’t always tell the truth if the truth makes them look bad.

Example: Asking people how much they exercise—they might say “5 times a week” even if they haven’t hit the gym in months.

The Golden Rules of Collecting Data

Use a sampling method that makes sense (random > convenience).
Make sure your sample represents the whole population.
Avoid bias in both who you pick and how you ask questions.
Check if people are answering honestly (or just trying to look good).

Get this right, and your data will be solid gold. Get it wrong, and your study is worthless.

So, you’ve done all the hard work of collecting data (hopefully without too much bias or messing up your sample). But right now, your data is probably just a messy list of numbers or responses.

Imagine this: You surveyed 100 people about their favourite ice cream Flavors. Now you have a big pile of answers like:

Chocolate, Vanilla, Chocolate, Strawberry, Mint, Chocolate, Vanilla, Mint, Chocolate…

If you just stare at this list, it doesn’t mean much. How do you make sense of it?

That’s where organizing data comes in! In the next chapter, we’ll turn raw data into easy-to-read tables, charts, and graphs so you can actually see patterns and trends—instead of drowning in numbers.

Up Next: We’ll learn how to count and sort data using tables, and interpret the data using measures.

Introduction to Statistics

What is Statistics?

Descriptive and Inferential Statistics