A rubric is only as good as it is specific. Vague criteria produce inconsistent grades whether a human or an AI is doing the marking - the difference is that a human hides the inconsistency behind experience, while an AI exposes it. If you want AI to apply your rubric consistently, and you want a human to be able to defend the result, you have to write the rubric properly. Here is how.

Start with what you are actually measuring

Before you write a single criterion, name the competence. Not the topic, the competence. "Knows about care plans" is a topic. "Can write a care plan that addresses the client's stated needs with appropriate interventions" is a competence. The second tells you what to look for in the work; the first tells you nothing.

Write your criteria as observable things the work either does or does not do. If you cannot point to where in a submission a criterion is met or missed, the criterion is too vague to grade.

Use criteria, not adjectives

The most common rubric failure is the floating adjective. "Demonstrates good understanding." "Shows excellent analysis." "Adequate structure." Good, excellent, and adequate mean nothing on their own - every marker fills them in differently, and an AI guesses silently.

Replace the adjective with the observable behaviour:

Instead of "demonstrates good understanding of the concept", write "correctly defines the concept and applies it to the scenario given".
Instead of "well-structured argument", write "states a position, supports it with at least two pieces of evidence, and addresses one counter-argument".
Instead of "appropriate use of sources", write "cites at least three relevant sources and uses them to support specific claims".

The test: could two different people read your criterion and agree on whether a given submission meets it? If not, rewrite it.

Define performance levels concretely

Most rubrics use levels - not yet competent / competent, or a band scale. The levels only work if each one describes what the work looks like at that level, not just a label.

For each criterion, write what distinguishes one level from the next in terms of the work itself. "Competent: identifies all three required risks and proposes a mitigation for each. Not yet competent: misses one or more required risks, or proposes mitigations that do not address the risk identified." Now the boundary is a fact about the submission, not a feeling.

Make it gradeable: structure for the machine and the human

A rubric an AI can apply well has a few structural features:

One idea per criterion. If a criterion bundles two things ("clear and well-evidenced"), split it. A submission can be clear and unevidenced, and a bundled criterion forces a bad single score.
Independent criteria. Avoid criteria that overlap so much that the same flaw gets scored three times. Each should measure something distinct.
Evidence-anchored language. Write criteria in terms of what should appear in the submission. This lets the AI cite the specific lines that justify a score, which is what makes the result auditable. Cited evidence is also what stops your human reviewer from rubber-stamping - covered in human-in-the-loop AI assessment.
Explicit "not present" handling. Say what happens when a required element is simply missing, so the absence is scored deliberately rather than guessed.

Worked example

Take a coding bootcamp task: "Build a function that validates an email address." A weak rubric criterion: "Code quality is good." A gradeable version:

Correctness: Competent - the function returns true for valid addresses and false for the listed invalid cases. Not yet - fails one or more listed cases.
Edge case handling: Competent - handles empty input and missing @ without erroring. Not yet - throws or returns wrong result on either.
Readability: Competent - uses descriptive names and the logic can be followed without comments. Not yet - single-letter names or logic that requires guessing intent.

Each criterion is one idea, observable in the code, and an AI can point to the exact lines that justify each score. A human reviewing it can confirm or override in seconds.

Test your rubric before you trust it

Run it against a handful of real submissions you have already marked. If the rubric-driven scores disagree with your expert judgement, the rubric is usually the problem, not the judgement - the criterion that produced the disagreement is too vague. Tighten it and run again. A rubric that survives this on your hardest borderline cases is one an AI can apply and a human can defend.

Where Scorafy helps

Scorafy grades open-ended answers against your own rubric and cites the evidence from the submission for each score, then routes every result to a qualified assessor to review and sign off. A well-written rubric is what makes that work - the clearer your criteria, the more accurate the scoring and the cited evidence. You keep ownership of the rubric and the final decision; Scorafy applies it consistently and shows its working.

If you have a rubric you want to put to the test, book a demo and bring it. Coding bootcamps can see the bootcamps page; certification bodies, the professional certification page.

How to Write an Assessment Rubric AI Can Grade Against