Left to its defaults, an AI marker flatters. Ask it to give feedback on a thin answer and it will often find strengths that are not in the response - generous, encouraging, and wrong. The fix is grounding: instruct the model to quote the respondent's actual words as evidence for every judgement, and never to credit a strength the answer does not support. If it cannot point to the evidence, the strength does not exist, and a short, candid report is the correct result.
Why models invent strengths
Language models are trained to be helpful and agreeable. Faced with a weak answer and a request for feedback, the path of least resistance is to soften, encourage, and pad. That produces feedback that reads well and means nothing. For assessment, an invented strength is worse than useless: it tells a learner they have demonstrated something they have not, and it makes the grade indefensible the moment anyone checks the answer against the comment.
What grounding means in practice
- Evidence before judgement. Every strength or weakness must be tied to a specific part of the response, ideally a direct quote. No quote, no claim.
- No credit for absence. The model is explicitly instructed not to award a strength the answer does not contain. Silence in the answer is not a pass.
- Candour over comfort. A weak answer earns a short, honest report. The model is told that a brief, critical report is the correct output for a weak submission, not a failure to be helpful.
- Map to the rubric, not to vibes. Judgements are anchored to the performance levels you defined, so "good" has a fixed meaning instead of a generous one.
How to check it is working
Read a marked weak answer. Every positive comment should be traceable to something the respondent actually wrote. If you find praise with no corresponding evidence in the response, the marking is not grounded. A grounded marker, faced with an empty answer, returns an empty-handed report - and that is the point.
How Scorafy does it
This is built into how Scorafy marks. The marker is instructed to quote the respondent's actual answer text as evidence and never to invent strengths the responses do not support. A candid, short report is treated as the correct outcome for a weak answer, not a shortfall. That keeps the feedback honest and the grade defensible, which is the whole reason to use AI marking instead of a flattering chatbot. See how rubric marking works or try it on a real submission.