VCE General Maths Unit 1 AOS 1 | Free VCE Resources

The Basics: Types of Data

In statistics, we work with different types of data. Understanding the type of data you have is the first step in any analysis because it determines the types of graphs and summary statistics you can use. This section breaks down the fundamental classifications.

Categorical Data

Represents characteristics or qualities. It can be divided into groups or categories.

Nominal: Categories without a natural order.

Think of brands of cars (Ford, Toyota, Hyundai) or types of pets (Dog, Cat, Fish). You can’t logically rank them.

Ordinal: Categories with a natural order or ranking.

Examples include ratings (Poor, Average, Good, Excellent) or clothing sizes (Small, Medium, Large). The distance between categories isn’t uniform.

Numerical Data

Represents quantities that can be counted or measured. The values are numbers.

Discrete: Data that can only take specific, separate values (usually whole numbers).

This data is often counted. For example, the number of students in a class, or the number of cars in a car park. You can’t have 2.5 students.

Continuous: Data that can take any value within a given range.

This data is often measured. For example, a person’s height, the weight of a package, or the time it takes to run a race.

Investigating Data Distributions

Once we have data, we want to understand its story. We do this by looking at its distribution – how the values are spread out. We use graphs to see the shape, and numbers to summarise the centre and spread. This section explores how to analyse a single numerical variable.

Shape, Centre & Spread

Click to see different distribution shapes. Hover over the chart for details.

The Normal Distribution & Z-Scores

The 68-95-99.7% rule gives us a quick way to understand the spread of data in a normal distribution. A z-score tells us how many standard deviations a value is from the mean.

Calculate a Z-Score

Formula: $$ z = \frac{x – \bar{x}}{s} $$ where $x$ is the value, $\bar{x}$ is the mean, and $s$ is the standard deviation.

Investigating Relationships Between Two Variables

Often, we want to know if two variables are connected. For example, is there a relationship between hours studied and exam score? This section covers how to visualise, measure, and model the linear relationship between two numerical variables.

Scatterplots & Regression

A scatterplot of Ice Cream Sales vs. Temperature. The explanatory variable (EV) is on the x-axis (Temperature) and the response variable (RV) is on the y-axis (Sales).

Toggle the regression line to model the relationship.

Correlation Coefficient (r)

Measures the strength and direction of a linear relationship. Ranges from -1 to +1.

r = 0.96 (Strong, Positive)

Coefficient of Determination (r²)

The percentage of variation in the RV explained by the variation in the EV.

r² = 0.92

Investigating & Modelling Time Series Data

Some data is collected over time, like monthly sales figures. This is called time series data. We analyse it to find patterns like trends and seasonality, which helps us to forecast future values. This section introduces the basic techniques for time series analysis.

Quarterly Sales Data

The raw data shows fluctuations. We can smooth the data to see the underlying trend more clearly.

Time Series Patterns

Trend: The long-term direction (upward, downward, or stable).
Seasonality: Regular, repeating patterns within a year.
Cycles: Longer-term patterns not tied to a calendar year.
Irregular Fluctuations: Random, unpredictable ‘noise’.

Deseasonalising Data

We remove seasonality to analyse the trend using seasonal indices.

Formula: $$ \text{Deseasonalised} = \frac{\text{Actual}}{\text{Index}} $$