feat: US-015 - Migrate benchmark script to OpenRouter

This commit is contained in:
2026-02-16 00:31:16 +00:00
parent 4bab9b369c
commit 8cc7038942
4 changed files with 163 additions and 46 deletions
+1 -1
View File
@@ -294,7 +294,7 @@
"Typecheck passes"
],
"priority": 15,
"passes": false,
"passes": true,
"notes": "The benchmark uses the non-streaming endpoint (no stream:true needed). OpenRouter non-streaming response format: { choices: [{ message: { content: '...' } }] }. The buildSystemPrompt() function should be imported from the renamed llm.ts (or duplicated if the import path alias doesn't work in tsx scripts — check if @/ alias resolves). Keep the same retry logic structure but update status code handling for OpenRouter. The scoring prompt and question flow are unchanged — only the API transport layer changes."
},
{
+26
View File
@@ -37,6 +37,8 @@
- `handleSubmit(overrideText?)` accepts optional text param — use this when programmatically sending messages (e.g., suggested question chips) to avoid stale `inputValue` state
- `SUGGESTED_QUESTIONS` const array at top of ChatWidget — edit here to change welcome screen chip text
- System prompt prefixes each CV entry with `[item-id]` so the model can directly reference IDs in its `[ITEMS: ...]` suffix — more reliable than expecting pattern inference
- Benchmark script (`scripts/benchmark.ts`) uses OpenRouter non-streaming endpoint — response format: `choices[0].message.content` (not `.delta.content` like streaming). Auth via `Authorization: Bearer` header, API key from `process.env.VITE_OPEN_ROUTER_API_KEY`
- Cannot import `buildSystemPrompt` from `src/lib/llm.ts` into Node scripts — `llm.ts` uses `import.meta.env` (Vite) and `window.location` (browser). Benchmark keeps its own copy of `buildSystemPrompt` that mirrors production
---
@@ -344,3 +346,27 @@
- `buildSystemPrompt()` is now exported from `llm.ts` — benchmark script (US-015) can import it directly instead of duplicating the logic
- The benchmark script (`scripts/benchmark.ts`) still uses the old Gemini API — needs separate migration in US-015
---
## 2026-02-16 - US-015
- Migrated `scripts/benchmark.ts` from Gemini API to OpenRouter API
- Replaced `GEMINI_MODEL` / `GEMINI_API_BASE` with `LLM_MODEL = 'z-ai/glm-5'` and `OPENROUTER_API_URL`
- Updated `getApiKey()` to read `VITE_OPEN_ROUTER_API_KEY` from `.env`
- Renamed `callGemini()` → `callLLM()` with OpenRouter request format:
- OpenAI-compatible messages array with `role: 'system'` for system prompt
- Auth via `Authorization: Bearer` header (not URL param)
- Added `HTTP-Referer` and `X-Title` headers per OpenRouter docs
- Response parsing: `choices[0].message.content` (non-streaming format)
- `max_tokens` (OpenAI format) instead of `maxOutputTokens` (Gemini format)
- Updated `buildSystemPrompt()` to match production `llm.ts` format: item ID prefixes (`[item-id]`), same rules and instructions
- Scoring calls also use OpenRouter via `callLLM()` (same model)
- Rate limit retry logic kept same structure, updated error message text for OpenRouter
- Model name in results output updated to `z-ai/glm-5`
- Verified end-to-end: `npm run benchmark` runs all 10 questions, scores them, saves results to `scripts/benchmark-results/iteration-0.json`
- Typecheck passes (0 errors), lint passes (0 new errors/warnings)
- Files changed: `scripts/benchmark.ts`
- **Learnings for future iterations:**
- Cannot import `buildSystemPrompt` from `src/lib/llm.ts` into Node scripts — `llm.ts` uses `import.meta.env` (Vite-only) and `window.location` (browser-only). Keep a mirrored copy in the benchmark script
- OpenRouter non-streaming response format: `{ choices: [{ message: { content: '...' } }] }` — different from streaming which uses `delta.content`
- For Node.js scripts, use a static URL for `HTTP-Referer` header (e.g., `'https://andycharlwood.co.uk'`) since `window.location` isn't available
- The benchmark script's `buildSystemPrompt()` should be kept in sync with `llm.ts` manually — if one changes, update the other (US-016/US-017 will modify the production prompt)
---