Criterion-referenced assessment measures a person against a fixed standard - did they meet the defined criteria, yes or no. Norm-referenced assessment measures a person against everyone else - where do they rank in the group. A driving test is criterion-referenced: you pass if you meet the standard, and it does not matter how everyone else drove. A test graded "on a curve", where the top 10% get an A regardless of their raw score, is norm-referenced. The difference decides what a result actually means - "competent" versus "better than most of the people who sat this".
Criterion-referenced: measured against a standard
Here the question is whether the learner meets explicit, pre-defined criteria. The standard exists before anyone is assessed and does not move based on who turns up. Everyone could pass; everyone could fail. What matters is the work against the bar, not the work against each other.
- Result means something concrete. "Met the standard" tells you what the person can do, not just how they compare.
- Not zero-sum. One person passing does not cost another a pass. There is no fixed quota.
- Depends on a good standard. The whole thing rests on criteria that are clear and observable. Vague criteria make the judgement arbitrary.
Vocational and competency-based training is criterion-referenced by design: a learner is judged competent or not yet competent against units of competency, never ranked against classmates. Certification, licensing, and most rubric-graded coursework work the same way.
Norm-referenced: measured against the group
Here the question is where a person sits relative to others. The result is a rank or percentile. The classic example is a standardised admissions test reported as "92nd percentile" - that number only exists because of how everyone else scored.
- Built to discriminate and rank. Useful when you need to sort or select a limited number from many - admissions, scholarships, shortlisting.
- Zero-sum. Moving up the rank means someone else moves down. The cohort defines the scale.
- Tells you position, not capability. A high percentile in a weak cohort and a low percentile in a strong cohort can represent the same actual ability.
When each is the right tool
Use criterion-referenced assessment when you need to know whether someone can do the thing - training, certification, competency, qualifications, any case where "met the standard" is the meaningful outcome and where it would be unfair to fail a competent person just because their cohort happened to be strong. Use norm-referenced assessment when you genuinely need to rank or select from a pool and places are limited - competitive admissions, grading curves, top-N selection.
A frequent mistake is sliding from one into the other without noticing. Marking "on a curve" turns what looks like a standards-based grade into a ranking, which means a learner's result now depends on their classmates rather than their work. In a training or certification setting that is usually the wrong design - it makes the result undefendable, because two identical submissions could get different grades depending on who else was in the cohort.
How this affects AI marking
AI rubric marking is inherently criterion-referenced and that is a feature. The model compares each response against the performance levels you defined, not against the other submissions - so the standard is the same for the first learner and the last, and the result does not depend on cohort luck. It also cites the evidence for each judgement, which is what makes a criterion-referenced result defensible: you can show exactly why a submission did or did not meet the bar. If you do need ranking, you can still order learners by their criterion-referenced results afterward, but the judgement itself stays anchored to the standard.
This is how Scorafy works - every response is marked against your rubric, not graded on a curve, with cited evidence and a human sign-off. If you are defining the standard, start with how to write an assessment rubric. For VET and RTO teams, criterion-referenced judgement against units of competency is the default; see how that fits at assessment for VET trainers or try the demo.