The Problem with Generic Training
Most L&D programs start with a spreadsheet of skills and a roster of people. Someone decides "the team needs RAG training" and everyone gets the same course. The engineer who has been building retrieval pipelines for a year sits through "what is an embedding." The one who actually needs help with reranking and evaluation pipelines never gets focused attention on those topics.
Generic training wastes time on what people already know and underinvests in what they actually need. The result: low engagement, slow skill growth, and no way to prove the program worked.
The missing piece is not more content. It is a reliable way to measure what each person knows before building their learning path, and then using that measurement to generate a plan that focuses on exactly the right gaps.
Assess, Score, Learn: The Loop
Unfold now supports a complete assess-to-learn loop through the MCP server and REST API. The flow has three steps, and they chain together without manual intervention.
The Assess-to-Learn Pipeline
AI creates MCQs tailored to a specific skill and proficiency level, anchored to the learner's actual work context. Questions are validated (structural + semantic) before delivery.
The learner answers in your app. Unfold scores against a signed token, maps to a proficiency band, and identifies the exact sub-skills (facets) where the learner fell short.
When there is a gap, Unfold generates a learning path that prioritizes weak areas, skips strong ones, and anchors exercises to the work item the learner is preparing for.
How It Works (Technical)
Step 1: Generate the Assessment
Your system calls the generate_skill_assessment MCP tool (or the equivalent REST endpoint). You specify the skill, the target proficiency band, the number of questions, and the work item context.
generate_skill_assessment({
work_item_context: {
title: "Build the customer support knowledge base agent",
description: "RAG pipeline over 50k support articles with hybrid search"
},
skill: "RAG & Retrieval Systems",
target_proficiency: "medium",
num_questions: 8,
request_id: "assess_team_rag_q2"
})
Unfold's AI generates 8 multiple-choice questions anchored to the work item. Each question goes through two validation passes. A structural validator checks that there is exactly one correct answer, no duplicate options, and the difficulty distribution matches. A semantic validator (a separate AI judge) confirms the marked answer is actually correct, no distractor is also defensibly correct, and the question actually tests the named skill at the right difficulty.
Every question is validated before it reaches the learner. If validation fails, the system regenerates with feedback from the validator. If it still fails, the call returns an error rather than a bad question. One wrong question shapes the learner's entire impression of the platform.
The response includes the questions (without answers) and a signed assessment_token. The token is HMAC-signed and tamper-proof. It contains the answer key, the proficiency band thresholds, and a time-to-live. Your app never sees the answer key directly.
Step 2: Score the Assessment
After the learner answers in your UI, you call score_skill_assessment with the token and their answers.
score_skill_assessment({
assessment_token: "...",
answers: [
{ question_id: "q_1", selected_option_id: "b" },
{ question_id: "q_2", selected_option_id: "a" },
...
],
request_id: "score_team_rag_q2"
})
Scoring is pure computation. No LLM call in the hot path. The response tells you exactly where the learner stands:
Sarah hits the target. No learning path needed. But not everyone does.
If a learner scores 28%, landing in the "low" band when the target was "medium," the response includes a suggested_goal_seed with a title, summary, and the weak sub-skills identified from the questions they missed.
Step 3: Create the Learning Path
When there is a gap, your system calls create_goal with the assessment results in the additional_context field. Unfold's plan synthesis prompt is tuned to use this data:
- Weak facets (the sub-skills the learner missed) become the primary focus of the plan.
- Strong facets (what they got right) are skipped or compressed. No time wasted on basics they already know.
- Work item context anchors the learning examples. Instead of generic "learn about chunking," the plan says "design a chunking strategy for the 50k support article corpus with metadata-aware splits."
- Target band becomes the success criterion baked into the goal description.
The result is a learning path that is personalized not from a template, but from actual evidence of what this specific person needs to learn for this specific work.
Real Results: 20 Engineers, 3 AI Skills
We ran the full pipeline for a team of 20 engineers being upskilled across three AI skills before a product pivot to agent-based architecture.
Proficiency Distribution by Skill
Not every skill had the same gap profile. The assessments revealed where the team was strong and where they needed the most help.
Prompt engineering was a relative strength -- 75% of the team already met the bar. RAG had the widest spread: some engineers were strong on basic retrieval but weak on reranking and evaluation. Agent architecture was the biggest gap, which made sense since the team had not built agents before the pivot.
Where RAG Knowledge Drops Off
The most actionable insight from RAG assessments was the facet-level breakdown. We could see exactly which sub-skills tripped people up.
The steep drop between "chunking strategies" and "hybrid search" told us exactly where most engineers' practical experience ended. Everyone understood retrieval conceptually, but the applied skills -- combining dense and sparse search, building reranking stages, setting up evaluation -- were where the gaps lived.
This is the kind of signal that generic "take this RAG course" training misses entirely.
Targeted Plans vs Generic: The Difference
38 learning paths were generated, each one focused on the exact sub-skills that engineer was missing. For RAG, an engineer who struggled with hybrid search and reranking got a plan that started there, anchored to the actual knowledge base agent they were about to build. An engineer who only missed evaluation frameworks got a focused 3-step path instead of a full 12-step course.
What Makes This Different
Assessment Quality as a Product Surface
Most assessment tools treat question generation as a side feature. Unfold treats it as the front door to the learning vertical. Every question goes through structural and semantic validation. The generator retries with feedback if validation fails. Quality metrics (pass rates, latency, accuracy) are tracked per-skill in a nightly eval suite.
Stateless by Default, Stateful When You Need It
The assessment tools are stateless in the current release. Your system stores the assessment. Unfold generates, validates, and scores. No data persists on the Unfold side except a 24-hour idempotency cache (so retried calls return the same questions, not different ones).
This means zero data residency concerns. Your learner data stays in your system. Unfold is a compute layer.
The Loop Closes
The critical difference between "we assessed the learner" and "we moved the learner from low to medium on RAG" is the loop. Assess, create a targeted plan, learner completes the plan, re-assess. Unfold owns all three steps (assessment generation, plan creation, progress tracking), so the loop is coherent. No hand-off between disconnected tools.
Getting Started
For MCP Users
If your AI agent already uses the Unfold MCP server, the new tools are available immediately after updating to v0.4.0:
npx @unfoldit/mcp-server@0.4.0
Your API key needs the assessment:generate, assessment:score, and assessment:read_capabilities scopes. Ask your org admin to enable them in the API Key settings.
For REST API Users
Three new endpoints:
POST /api/v1/ext/assessments/generate-- generate MCQsPOST /api/v1/ext/assessments/score-- score answersGET /api/v1/ext/assessments/capabilities-- check supported parameters
Authentication uses the same org API key (Bearer unfold_sk_...).
What to Try First
Start with get_assessment_capabilities to see the defaults. Then generate a small assessment (3-5 questions) for a skill your team works with. Score it manually. If the gap detection and goal seed look right, wire it into your onboarding or upskilling pipeline.
Every call takes a request_id. If your system retries (network blip, timeout, client crash), the same request_id returns the exact same assessment or score. You never generate two different assessments for the same request.
What Comes Next
The current release is stateless. Upcoming releases add stored assessment history (per-learner trend tracking), anti-cheat safeguards (one-shot delivery, server-side timers), and a re-assessment scheduler that prompts learners to retake after completing their goal. The data from re-assessments produces "band lift" metrics: proof that the learning path actually moved the learner from low to medium on the skill that mattered.
The assessment tools are available now in @unfoldit/mcp-server@0.4.0 and via the REST API.
Build This With Unfold
Integrate Unfold into your platform using the MCP server or REST API. Create goals, assign them via claim links, and track progress programmatically.
Related
If you want to see how the full pipeline works end to end -- from creating an organization to distributing personalized learning paths at scale -- read How an Education Portal Built Personalized Training Plans for Every Student Using Unfold. That post covers the goal creation, claim link distribution, and progress tracking pieces that come after assessment.