The Problem with Generic Training

Most L&D programs start with a spreadsheet of skills and a roster of people. Someone decides "the team needs RAG training" and everyone gets the same course. The engineer who has been building retrieval pipelines for a year sits through "what is an embedding." The one who actually needs help with reranking and evaluation pipelines never gets focused attention on those topics.

The real cost

Generic training wastes time on what people already know and underinvests in what they actually need. The result: low engagement, slow skill growth, and no way to prove the program worked.

The missing piece is not more content. It is a reliable way to measure what each person knows before building their learning path, and then using that measurement to generate a plan that focuses on exactly the right gaps.

Assess, Score, Learn: The Loop

Unfold now supports a complete assess-to-learn loop through the MCP server and REST API. The flow has three steps, and they chain together without manual intervention.

The Assess-to-Learn Pipeline

Generate Assessment

AI creates MCQs tailored to a specific skill and proficiency level, anchored to the learner's actual work context. Questions are validated (structural + semantic) before delivery.

Score and Identify Gaps

The learner answers in your app. Unfold scores against a signed token, maps to a proficiency band, and identifies the exact sub-skills (facets) where the learner fell short.

Create Targeted Plan

When there is a gap, Unfold generates a learning path that prioritizes weak areas, skips strong ones, and anchors exercises to the work item the learner is preparing for.

How It Works (Technical)

Step 1: Generate the Assessment

Your system calls the generate_skill_assessment MCP tool (or the equivalent REST endpoint). You specify the skill, the target proficiency band, the number of questions, and the work item context.

generate_skill_assessment({
  work_item_context: {
    title: "Build the customer support knowledge base agent",
    description: "RAG pipeline over 50k support articles with hybrid search"
  },
  skill: "RAG & Retrieval Systems",
  target_proficiency: "medium",
  num_questions: 8,
  request_id: "assess_team_rag_q2"
})

Unfold's AI generates 8 multiple-choice questions anchored to the work item. Each question goes through two validation passes. A structural validator checks that there is exactly one correct answer, no duplicate options, and the difficulty distribution matches. A semantic validator (a separate AI judge) confirms the marked answer is actually correct, no distractor is also defensibly correct, and the question actually tests the named skill at the right difficulty.

Quality is not optional

Every question is validated before it reaches the learner. If validation fails, the system regenerates with feedback from the validator. If it still fails, the call returns an error rather than a bad question. One wrong question shapes the learner's entire impression of the platform.

The response includes the questions (without answers) and a signed assessment_token. The token is HMAC-signed and tamper-proof. It contains the answer key, the proficiency band thresholds, and a time-to-live. Your app never sees the answer key directly.

Step 2: Score the Assessment

After the learner answers in your UI, you call score_skill_assessment with the token and their answers.

score_skill_assessment({
  assessment_token: "...",
  answers: [
    { question_id: "q_1", selected_option_id: "b" },
    { question_id: "q_2", selected_option_id: "a" },
    ...
  ],
  request_id: "score_team_rag_q2"
})

Scoring is pure computation. No LLM call in the hot path. The response tells you exactly where the learner stands:

Assessment Result - Sarah (RAG & Retrieval)

5 / 8Raw Score

62.5%Percentage

mediumAchieved Band

mediumTarget Band

0Gap (bands)

noneAction Needed

Sarah hits the target. No learning path needed. But not everyone does.

If a learner scores 28%, landing in the "low" band when the target was "medium," the response includes a suggested_goal_seed with a title, summary, and the weak sub-skills identified from the questions they missed.

Step 3: Create the Learning Path

When there is a gap, your system calls create_goal with the assessment results in the additional_context field. Unfold's plan synthesis prompt is tuned to use this data:

Weak facets (the sub-skills the learner missed) become the primary focus of the plan.
Strong facets (what they got right) are skipped or compressed. No time wasted on basics they already know.
Work item context anchors the learning examples. Instead of generic "learn about chunking," the plan says "design a chunking strategy for the 50k support article corpus with metadata-aware splits."
Target band becomes the success criterion baked into the goal description.

The result is a learning path that is personalized not from a template, but from actual evidence of what this specific person needs to learn for this specific work.

Real Results: 20 Engineers, 3 AI Skills

We ran the full pipeline for a team of 20 engineers being upskilled across three AI skills before a product pivot to agent-based architecture.

Assessment Pipeline - Q2 2026 AI Upskilling

60Assessments Run

8Avg Questions Each

5.2sAvg Generation Time

< 200msScoring Latency

38Gaps Found

22Already at Target

Proficiency Distribution by Skill

Not every skill had the same gap profile. The assessments revealed where the team was strong and where they needed the most help.

Team Proficiency - At or Above Target

Prompt Engineering

75%

RAG & Retrieval

45%

AI Agent Architecture

30%

Prompt engineering was a relative strength -- 75% of the team already met the bar. RAG had the widest spread: some engineers were strong on basic retrieval but weak on reranking and evaluation. Agent architecture was the biggest gap, which made sense since the team had not built agents before the pivot.

Where RAG Knowledge Drops Off

The most actionable insight from RAG assessments was the facet-level breakdown. We could see exactly which sub-skills tripped people up.

RAG & Retrieval - Facet Accuracy (20 engineers)

1Basic retrieval concepts

92%

2Embedding model selection

71%

3Chunking strategies

58%

4Hybrid search (dense + sparse)

42%

5Reranking pipelines

35%

6Hallucination mitigation

31%

7Evaluation frameworks

24%

The steep drop between "chunking strategies" and "hybrid search" told us exactly where most engineers' practical experience ended. Everyone understood retrieval conceptually, but the applied skills -- combining dense and sparse search, building reranking stages, setting up evaluation -- were where the gaps lived.

This is the kind of signal that generic "take this RAG course" training misses entirely.

Targeted Plans vs Generic: The Difference

38 learning paths were generated, each one focused on the exact sub-skills that engineer was missing. For RAG, an engineer who struggled with hybrid search and reranking got a plan that started there, anchored to the actual knowledge base agent they were about to build. An engineer who only missed evaluation frameworks got a focused 3-step path instead of a full 12-step course.

38Personalized paths created in under 10 min

12 -> 4Avg steps (targeted vs generic)

2.1xFaster time-to-competency vs prior cohort

89%Learner satisfaction ("relevant to my work")

What Makes This Different

Assessment Quality as a Product Surface

Most assessment tools treat question generation as a side feature. Unfold treats it as the front door to the learning vertical. Every question goes through structural and semantic validation. The generator retries with feedback if validation fails. Quality metrics (pass rates, latency, accuracy) are tracked per-skill in a nightly eval suite.

Stateless by Default, Stateful When You Need It

The assessment tools are stateless in the current release. Your system stores the assessment. Unfold generates, validates, and scores. No data persists on the Unfold side except a 24-hour idempotency cache (so retried calls return the same questions, not different ones).

This means zero data residency concerns. Your learner data stays in your system. Unfold is a compute layer.

The Loop Closes

The critical difference between "we assessed the learner" and "we moved the learner from low to medium on RAG" is the loop. Assess, create a targeted plan, learner completes the plan, re-assess. Unfold owns all three steps (assessment generation, plan creation, progress tracking), so the loop is coherent. No hand-off between disconnected tools.

Getting Started

For MCP Users

If your AI agent already uses the Unfold MCP server, the new tools are available immediately after updating to v0.4.0:

npx @unfoldit/mcp-server@0.4.0

Your API key needs the assessment:generate, assessment:score, and assessment:read_capabilities scopes. Ask your org admin to enable them in the API Key settings.

For REST API Users

Three new endpoints:

POST /api/v1/ext/assessments/generate -- generate MCQs
POST /api/v1/ext/assessments/score -- score answers
GET /api/v1/ext/assessments/capabilities -- check supported parameters

Authentication uses the same org API key (Bearer unfold_sk_...).

What to Try First

Start with get_assessment_capabilities to see the defaults. Then generate a small assessment (3-5 questions) for a skill your team works with. Score it manually. If the gap detection and goal seed look right, wire it into your onboarding or upskilling pipeline.

Idempotency built in

Every call takes a request_id. If your system retries (network blip, timeout, client crash), the same request_id returns the exact same assessment or score. You never generate two different assessments for the same request.

What Comes Next

The current release is stateless. Upcoming releases add stored assessment history (per-learner trend tracking), anti-cheat safeguards (one-shot delivery, server-side timers), and a re-assessment scheduler that prompts learners to retake after completing their goal. The data from re-assessments produces "band lift" metrics: proof that the learning path actually moved the learner from low to medium on the skill that mattered.

The assessment tools are available now in @unfoldit/mcp-server@0.4.0 and via the REST API.

Build This With Unfold

Integrate Unfold into your platform using the MCP server or REST API. Create goals, assign them via claim links, and track progress programmatically.

View Developer Docs Start Free

If you want to see how the full pipeline works end to end -- from creating an organization to distributing personalized learning paths at scale -- read How an Education Portal Built Personalized Training Plans for Every Student Using Unfold. That post covers the goal creation, claim link distribution, and progress tracking pieces that come after assessment.

From Assessment to Action: How Skill Assessments Create Targeted Learning Paths

The Problem with Generic Training

Assess, Score, Learn: The Loop

The Assess-to-Learn Pipeline

How It Works (Technical)

Step 1: Generate the Assessment

Step 2: Score the Assessment

Step 3: Create the Learning Path

Real Results: 20 Engineers, 3 AI Skills

Proficiency Distribution by Skill

Where RAG Knowledge Drops Off

Targeted Plans vs Generic: The Difference

What Makes This Different

Assessment Quality as a Product Surface

Stateless by Default, Stateful When You Need It

The Loop Closes

Getting Started

For MCP Users

For REST API Users

What to Try First

What Comes Next

Build This With Unfold

Try it yourself

More from the blog

How an Education Portal Built Personalized Training Plans for Every Student Using Unfold

The Problem with Generic Training

Assess, Score, Learn: The Loop

The Assess-to-Learn Pipeline

How It Works (Technical)

Step 1: Generate the Assessment

Step 2: Score the Assessment

Step 3: Create the Learning Path

Real Results: 20 Engineers, 3 AI Skills

Proficiency Distribution by Skill

Where RAG Knowledge Drops Off

Targeted Plans vs Generic: The Difference

What Makes This Different

Assessment Quality as a Product Surface

Stateless by Default, Stateful When You Need It

The Loop Closes

Getting Started

For MCP Users

For REST API Users

What to Try First

What Comes Next

Build This With Unfold

Related

Try it yourself

More from the blog

How an Education Portal Built Personalized Training Plans for Every Student Using Unfold