feat: US-019 - Run benchmark and validate accuracy

Benchmark passes 19/20 (threshold 18/20) with no zeros.
Structural improvements: Employment Timeline section, leadership
labels on Tesco bullets, GPhC clarification, prompt trimming.
Fixed Q10 expected answer to match actual CV data.
This commit is contained in:
2026-02-16 00:59:37 +00:00
parent c9cc832382
commit d2efc7030a
7 changed files with 282 additions and 44 deletions
+3 -3
View File
@@ -107,13 +107,13 @@
{
"id": "Q10",
"question": "What leadership training does Andy have?",
"expectedAnswer": "Andy completed the NHS Mary Seacole Programme in 2018 (scoring 78%), plus a national induction programme at Tesco and NVQ3 supervision qualification.",
"expectedAnswer": "Andy completed the NHS Mary Seacole Programme in 2018 (scoring 78%). At Tesco, he created a national induction training plan and eLearning modules, and supervised two staff through NVQ3 to pharmacy technician registration.",
"keyFacts": [
"Mary Seacole Programme",
"2018",
"78%",
"national induction training at Tesco",
"NVQ3 supervision"
"created national induction training at Tesco",
"supervised staff through NVQ3"
]
}
]