Skip to main content
Back to blog
GuidesAssessment DesignBest Practices

How to Write Better Assessment Questions (And Why It Matters for AI Analysis)

Scorafy Team22 February 20267 min read

The quality of an AI-generated coaching report depends almost entirely on the quality of the questions that produced it. Give the AI vague, ambiguous, or poorly structured questions, and the analysis will be vague in return. Give it specific, well-designed questions with clear intent, and the reports become noticeably sharper.

This is the part of assessment design that most guides skip. They cover the mechanics - how to add questions, how to set up scoring - but not the craft. Here is a practical guide to writing questions that actually produce useful analysis, whether you are building from scratch or refining an existing assessment. For the full context on how AI-powered assessments work, start with How to Create AI-Powered Coaching Assessments.

Choosing the Right Question Type

Not all question types are equally useful for AI analysis. Understanding when to use each one will improve your assessment design significantly.

Likert Scales (1-5 or 1-10)

Likert scales - "On a scale of 1 to 5, how often do you..." - are the workhorse of assessment design. They produce numerical scores that aggregate cleanly into dimension totals, they are easy for respondents to answer quickly, and they give the AI a structured data set to compare across questions.

The key is keeping the scale consistent throughout your assessment. Mixing a 1-5 scale in section one with a 1-10 scale in section two creates cognitive friction for the respondent and makes scoring harder to interpret. Choose one scale and use it for all rated questions.

For coaching assessments, a 1-5 scale typically works well for frequency questions ("How often do you...") and a 1-10 scale for intensity or confidence questions ("How confident are you..."). Either works - consistency matters more than which one you choose.

Open-Text Questions

Open-text questions are where the richest AI analysis comes from. A well-written open-text question produces a response that the AI can analyse for specificity, emotional tone, self-awareness, and narrative patterns that a numerical rating simply cannot capture.

The trade-off is that open-text questions take longer to answer and some respondents will give minimal replies. Use two to four open-text questions in a typical assessment - enough to give the AI meaningful material without fatiguing respondents.

The best open-text questions invite reflection rather than factual recall. "Describe a recent situation where you found delegation difficult and what you did" is stronger than "What are your challenges with delegation?" The first grounds the response in a real experience. The second invites a generic answer.

Multiple Choice

Multiple choice is useful for demographic context and situational branching, but less useful for analytical depth. "Do you manage a team?" (yes/no) is a great gating question for branching logic. "What best describes your leadership style?" as a substantive assessment question gives the AI less to work with than a rated scale or an open-text response.

Use multiple choice for context-setting questions at the start of your assessment, and for conditional branching decisions. Avoid using it as the primary question type for dimensions you want analysed in depth.

Avoiding Double-Barreled Questions

A double-barreled question asks about two things at once: "Do you communicate clearly and receive feedback well?" If someone communicates clearly but struggles to receive feedback, what number do they give? Their answer becomes meaningless - and the AI cannot analyse a meaningless answer usefully.

Double-barreled questions are surprisingly common in assessment design. Watch for these patterns:

  • "Do you set clear goals and follow through on them?" - two separate skills
  • "Are you comfortable giving and receiving feedback?" - two directions of one skill
  • "Do you manage your time and prioritise effectively?" - related but distinct behaviours
  • "Are you confident and visible as a leader?" - confidence and visibility are different things

The fix is always to split the question. "Do you set clear goals?" becomes question 4. "Do you follow through on the goals you set?" becomes question 5. The AI now has two data points rather than one blurred one, and can identify the interesting case where someone is great at goal-setting but poor on follow-through.

Writing for AI Analysis

When a human reads assessment responses, they bring intuition and context. They read between the lines. An AI works differently - it is reading for the information that is actually in the text. This means specificity in your questions produces significantly better analysis than vagueness.

Compare these two versions of a similar question:

Vague: "How do you feel about your communication skills?"

Specific: "How comfortable are you initiating difficult conversations with direct reports, on a scale of 1-5?"

The specific version tells the AI which communication context (difficult conversations), which direction (initiating, not receiving), and which relationship (direct reports). The analysis can now say something like "you rated yourself low on initiating difficult conversations but high on presenting to senior stakeholders - this suggests your confidence is context-dependent rather than a general communication weakness." That insight is impossible from the vague question.

The same principle applies to open-text questions. "Tell me about your leadership" gives the AI nothing to anchor. "Describe a decision you made in the last six months that you are proud of and one you would approach differently" gives it a narrative, a time frame, and a self-evaluation embedded in real experience.

Question Order Effects

The order of your questions affects how respondents answer them. This is not a flaw in assessment design - it is a feature, if you understand it.

Easier questions first. Starting with straightforward, lower-stakes questions builds momentum and helps respondents settle into the assessment. Opening with "Describe the most challenging leadership situation you have ever faced" will result in shorter, more defensive answers than if you ask the same question after the respondent has warmed up through easier questions.

Group by dimension. When questions from the same dimension appear together, respondents tend to be more consistent within that dimension. Randomly mixing questions from different dimensions can produce noisier data, though it does reduce social desirability bias (respondents cannot "game" a pattern they cannot see).

Context-setting questions come first. Questions that establish role context ("Do you currently manage a team?", "How long have you been in your current role?") should appear at the beginning. They inform conditional branching and give the AI useful context for interpreting everything that follows.

Sensitive questions near the end. If your assessment includes questions about wellbeing, interpersonal conflict, or areas of vulnerability, place them toward the end. Once respondents are engaged with the assessment and have built some familiarity with the question style, they tend to answer more honestly.

How Many Questions Is Optimal?

The research on survey completion rates and the practical experience of assessment designers converge on a similar answer: 10 to 15 questions is the sweet spot for most coaching assessments.

Below 8 questions, the AI does not have enough data points to produce nuanced analysis. The report becomes thin - it can identify what the scores show but cannot find the patterns and cross-dimension connections that make reports genuinely useful. In the Scorafy builder, you will notice the AI-generated preview becomes noticeably richer once you have at least 10 questions across your dimensions.

Above 20 questions, completion rates drop and response quality deteriorates. Respondents become fatigued, open-text answers get shorter and more perfunctory, and the care put into the first half of the assessment is not sustained through the second half. You end up with more data points, but the later ones are less reliable.

The 10 to 15 question range allows for:

  • 2 to 3 context-setting questions at the start
  • 6 to 9 rated dimension questions across your scoring dimensions
  • 2 to 3 open-text questions for depth
  • Conditional branching to skip irrelevant questions (so the actual number any individual answers is often 8 to 12)

Mapping Questions to Dimensions

Every rated question in your assessment should map to exactly one scoring dimension. Questions without a dimension assignment do not contribute to your scoring analysis and can confuse the AI's interpretation of what the score represents.

Aim for at least two questions per dimension - ideally three to four. A dimension with a single question produces a score of either 1 or 0 (out of 1), which does not give the AI meaningful variation to work with. A dimension with four questions produces a score across a wider range and the AI can identify nuance within that dimension.

The Leadership 360 Review template is a useful reference for how to map questions to dimensions in practice. It uses four dimensions with three to four questions each, plus two open-text questions that inform the overall narrative without being assigned to a specific dimension.

When naming your dimensions, be specific about what they measure. "Communication" is vague. "Clarity of written communication" and "Confidence in verbal communication" are distinct dimensions that the AI can analyse separately and connect to each other.

Testing Before You Send

Before sharing your assessment with real respondents, complete it yourself - at least twice, with deliberately different answer patterns.

First run: answer as someone who is strong overall. Read the report. Does it reflect what a genuinely high-performing person in this area would hear from their coach?

Second run: answer as someone who is mixed - strong in some dimensions, weak in others. This is the more revealing test. Does the AI identify the cross-dimension patterns? Does it flag the tension between high scores in one area and low scores in another?

If the reports from your two test runs read very similarly, your questions are probably too vague or your dimensions are too correlated. Sharp, well-designed assessments produce noticeably different reports for different answer patterns - because different answer patterns genuinely do reveal different people.

A Final Note on Tone

The way you write your questions shapes the tone of the responses you receive, which in turn shapes the tone of the AI analysis.

Questions written in a judgemental tone ("How often do you fail to delegate?") produce defensive responses. Questions written in a curious, developmental tone ("How often do you find yourself completing tasks that could have been handled by someone in your team?") produce more honest, reflective answers. The assessment experience should feel like a coaching conversation, not a performance review.

If your questions feel clinical or evaluative when you read them back, reframe them. The best assessment questions make respondents think, "Yes, that is exactly what I want to understand about myself" - not "I am being measured."

For a practical starting point, the Scorafy demo shows how well-designed questions produce a rich AI report in real time. Completing the demo as a respondent is one of the best ways to calibrate what good question design feels like from the other side of the form.

See AI-powered assessments in action

Try the interactive demo - no sign-up required.