Excel · Power Query · Power BI · DAX

Student Mental Health
Risk Analysis

A cohort-based severity analysis of 101 university students — moving beyond prevalence counting into structured risk segmentation and intervention prioritisation.

dataset_summary.txt
// project metadata
students: 101
age_range: 18 – 24
years_of_study: 1 – 4
conditions: 3
 
// tools
cleaning: Excel + Power Query
modelling: DAX (Power BI)
viz: Power BI Dashboard
 
// source
dataset: Kaggle
101 Total Students
64 63.4% of cohort Any Condition
28 27.7% of cohort Multiple Cases
10 9.9% of cohort Complex Cases
6 sought help Specialist Care
01 objective

Not "how many?" — but "how severe, and when?"

Most mental health analyses stop at prevalence: counting how many students report a condition. This project was structured around a different question — does psychological burden intensify as students advance academically?

The analysis was built on three principles: cohort-based comparison using Year of Study as the primary lens, severity stratification separating single from overlapping conditions, and proportional normalisation to prevent enrollment imbalance from distorting results. The goal was risk identification and intervention prioritisation, not descriptive reporting.

02 data preparation

Cleaning, removing noise, and engineering severity

The raw dataset had 11 columns and 101 rows collected via a survey form. Several structural problems needed resolving before any analysis could begin.

STEP_01
Removed Low-Signal Columns

Dropped Timestamp — narrow collection window made it analytically meaningless. Dropped Course — excessive categorical fragmentation with low signal value would have added noise, not insight.

STEP_02
Standardised Categorical Values

Column headers were renamed from verbose survey labels (e.g. "Do you have Anxiety?") to clean field names. CGPA was grouped into standardised ranges: 0–1.99, 2.00–2.49, 2.50–2.99, 3.00–3.49, 3.50–4.00. Year labels normalised to Year 1–4.

STEP_03
Removed Incomplete Records

Blank and incomplete rows were removed to ensure analytical reliability. The cleaned dataset retained all 101 valid student records across the four year cohorts.

STEP_04
Severity Engineering — 3 New Columns

Three derived classification columns were created to measure escalation intensity rather than simple presence. This is the core feature engineering step that powers the entire severity analysis.

03 column changes

What was removed, renamed, and built from scratch

Column Status Notes
Timestamp REMOVED Narrow survey window made time analysis meaningless
What is your course? REMOVED Too many fragmented categories, low analytical signal
Choose your gender → Gender RENAMED Shortened from verbose survey phrasing
Your current year of Study → Year Of Study RENAMED Standardised to clean label
What is your CGPA? → CGPA range RENAMED Grouped into 5 standardised ranges for pivot analysis
Do you have Depression? → Depression? RENAMED Shortened; same for Anxiety and Panic Attack
S/N NEW Serial number column added for row identification
At least 1 — Any Condition NEW TRUE if student reports at least one condition. Baseline prevalence flag.
At least 2 — Multiple Cases NEW TRUE if two or more conditions present concurrently. Elevated burden flag.
All three — Complex Cases NEW TRUE if all three conditions (Anxiety + Depression + Panic Attack) reported together. Highest severity tier.
04 formulas & dax logic

The calculations that drive the severity model

Excel — Severity Classification Columns

These three columns were calculated directly in Excel using OR/AND logic on the three condition columns before the data was loaded into Power BI.

At least 1 — Any Condition EXCEL
// TRUE if any one condition is reported
=OR(G2="Yes", H2="Yes", I2="Yes")
// G2=Depression, H2=Anxiety, I2=Panic Attack
At least 2 — Multiple Cases EXCEL
// TRUE if two or more conditions are present
=(COUNTIF(G2:I2,"Yes")>=2)
// Counts "Yes" values across the three condition columns
// Returns TRUE when count is 2 or more
All three — Complex Cases EXCEL
// TRUE only when all three conditions are simultaneously present
=AND(G2="Yes", H2="Yes", I2="Yes")
// Highest severity tier — flags compounded psychological burden

DAX — Proportional Prevalence (Power BI)

Raw counts alone would have been misleading — Year 1 has 43 students while Year 4 has only 8. DAX measures were written to calculate within-cohort prevalence, making year-on-year comparison valid.

Prevalence % Within Year of Study DAX
// % of students in the current year filter with any condition
Prevalence % =
  DIVIDE(
    CALCULATE(COUNTROWS(Students),
      Students[At least 1] = TRUE),
    COUNTROWS(Students),
    0
  )
// Context-aware: responds to Year of Study slicer
// DIVIDE with 0 fallback prevents division errors on empty filters
Complex Cases Count (DAX) DAX
// Conditional count — only rows where all three = TRUE
Complex Cases =
  CALCULATE(
    COUNTROWS(Students),
    Students[All three] = TRUE
  )
05 key insights

Year 3 is the highest-risk cohort — but raw counts hid it

⚠ CRITICAL FINDING

Year 1 had the largest raw case count (25 students). Without proportional normalisation, this looks like the most at-risk cohort. Proportional analysis tells the opposite story — Year 3 carries the highest burden at 75% prevalence.

Cohort-by-Cohort Breakdown

Cohort n Any Condition Multiple Cases Complex Cases
Year 1 43
58.1%
30.2%
9.3%
Year 2 26
73.1%
26.9%
7.7%
Year 3 24
75.0%
29.2%
12.5%
Year 4 8 ⚠
25.0%
12.5%
12.5%

⚠ Year 4 n=8. Low sample size limits generalisability — interpret with caution.

75%
Year 3 any-condition prevalence — the highest risk point in the academic journey
63.4%
Overall cohort prevalence — nearly 2 in 3 students report at least one condition
6 / 64
Only 6 of 64 affected students sought specialist care — a significant treatment gap

Academic Performance Is Not the Filter

CGPA segmentation shows that psychological burden is not isolated to lower-performing students. The highest concentration of students sits in the 3.00–4.00 CGPA range — and conditions are present across all tiers. Stress exposure is cross-performance, which means targeting only low-CGPA students for support would miss the majority of cases.

06 recommendations

Five evidence-based interventions

07 conclusion

This is a risk segmentation model, not a chart

The core contribution of this project is methodological. By combining Excel based severity engineering with DAX-driven proportional analysis in Power BI, the work moves beyond descriptive dashboarding into structured risk identification.

The finding that Year 3 represents the peak risk cohort, not Year 1 which had the highest raw count, would have been invisible without normalisation. That distinction is the difference between resource allocation that works and resource allocation that misses the problem entirely.

project files

Access the full project

// download source files to explore the dataset, dashboard, and full report