VCE General Maths Unit 3 AOS 1 | Free VCE Resources

Classifying Data

The first step in any statistical investigation is to correctly classify the data. The type of data determines which graphical displays and summary statistics are appropriate. Misclassifying data leads to invalid conclusions.

Categorical Data

Represents qualities or labels. Can be words or numbers acting as labels.

Nominal: Categories with no intrinsic order. (e.g., Hair Colour, Suburb)

Ordinal: Categories with a logical order or ranking. (e.g., T-Shirt Size: S, M, L)

Numerical Data

Represents quantities that are counts or measurements.

Discrete: Can be counted and takes exact values. (e.g., Number of Pets)

Continuous: Can be measured and can take any value within a range. (e.g., Height in cm)

Displaying Distributions

Histograms and Box Plots both show the distribution of numerical data, but they reveal different features. Use the buttons below to toggle the view for a sample dataset of student test scores and see how they compare.

The Normal Distribution

Many variables follow a symmetrical, bell-shaped pattern. For these distributions, we can use the 68-95-99.7% rule to make quick estimations, and z-scores to compare values from different contexts.

The 68-95-99.7% Rule

Click the buttons to see the percentage of data that falls within 1, 2, or 3 standard deviations of the mean in a normal distribution.

Z-Score Calculator

A z-score measures how many standard deviations a value is from the mean. Use it to compare scores from different distributions.

Score (x)

Mean (μ)

Std Dev (σ)

Investigating Associations

Here we explore the relationship between two numerical variables using a scatterplot. Pay attention to the direction, form, and strength of the association, and remember the crucial difference between correlation and causation.

Visualising Linear Associations

A scatterplot helps us see the relationship between two numerical variables. Pearson’s Correlation Coefficient ($r$) gives us a number to describe the strength and direction of that relationship. Drag the slider to see how the scatterplot and its description change.

Adjust Correlation Strength ($r$)

Correlation is NOT Causation

A strong correlation ($r$ value) between two variables doesn’t mean one causes the other. For example, ice cream sales and drownings are strongly correlated.

This is not because ice cream causes drowning! A third variable, hot weather (a confounding variable), causes an increase in both. Always consider potential confounding variables before claiming causation.

Modelling with Least Squares Regression

Once we see a linear trend, we can model it with a “line of best fit”. This line allows us to make predictions. This section uses data comparing car age and price. Use the tabs to explore the model, make predictions, and check the residuals.

Time Series Analysis

Time series data is collected at regular intervals over time. We look for trends, seasonal patterns, and other features. This chart shows quarterly ice cream sales. The forecasting wizard then breaks down how to predict future sales.

Analysing a Time Series Plot

Features: This plot shows a clear upward trend (sales are increasing over time) and a strong seasonal pattern (sales peak in Summer (Q1) and are lowest in Winter (Q3) each year).

Forecasting with Seasonal Data

Step 1: Deseasonalise the Data

Remove the seasonal pattern using seasonal indices to reveal the underlying trend. (e.g., Actual / Seasonal Index)

Step 2: Fit a Trend Line

Fit a least squares regression line to the deseasonalised data to get a trend equation.

Step 3: Predict the Deseasonalised Value

Use the trend equation to predict the value for a future time period.

Step 4: Reseasonalise the Forecast

Put the seasonality back in by multiplying the prediction by the correct seasonal index. (i.e., Predicted Deseasonalised × Seasonal Index)