AI rubric marking works like this: you define the criteria and performance levels, the model reads a person's actual response, maps it to the level it best matches on each criterion, quotes the evidence that justifies the call, and a qualified person reviews and signs off. It is not the model guessing a grade. It is the model comparing a specific answer against a specific standard you set, and showing its working.
Step 1: You define the rubric
A rubric is a set of criteria, each with performance levels. "Risk identification" might run from "names no risks" up to "names the key risks and explains a mitigation for each". The model marks against exactly these levels, so the quality of the marking is set by the quality of the rubric. Observable levels beat fuzzy adjectives.
Step 2: The model reads the actual response
The respondent answers in their own words, records a response, or uploads work. The model reads that real submission - not a key, not a checklist of keywords. This is the difference between marking competence and scoring recall.
Step 3: It maps the response to the levels
For each criterion, the model decides which performance level the response best matches and why. The same rubric is applied to every response, so the standard stays consistent from the first submission to the last - no fatigue, no drift.
Step 4: It cites the evidence
Each judgement points back to the part of the response that justified it, quoting the respondent's own words. This is what makes the mark auditable and what keeps the model honest: if it cannot cite evidence for a strength, it should not award it. A short, candid report on a weak answer is the correct output, not invented praise. More on this in grounded AI feedback.
Step 5: A person reviews and signs off
The model proposes the marks and the evidence. A qualified reviewer checks them, overrides anything that needs human judgement, and finalises. For any consequential result this human step is the design, not an afterthought - see why a person still signs off.
What you get out
A per-respondent report: the level reached on each criterion, the evidence behind each judgement, and feedback the respondent can act on. Consistent across the whole cohort, mapped to your standard, and defensible because every judgement is grounded in the answer.
This is exactly how Scorafy marks. Define your rubric, share a link, review the evidence-based reports, and sign off. See it on a real submission or read how to write a rubric AI can mark against.