feat: US-013 - Self-host ONNX embedding model

Download all-MiniLM-L6-v2 model files to public/models/ and configure @xenova/transformers to load from local path instead of Hugging Face CDN. Eliminates external dependency for semantic search embedding model.
2026-02-15 20:59:03 +00:00
parent 9e9dd1ae4b
commit 667e5b249c
8 changed files with 30785 additions and 4 deletions
@@ -232,7 +232,7 @@
        "Verify in browser using dev-browser skill"
      ],
      "priority": 12,
-      "passes": false,
+      "passes": true,
      "notes": "Replace the current empty-state text ('Ask me anything about Andy's experience, skills, or projects.') with the new welcome bubble + chips. The chips should call handleSubmit (or equivalent) with the chip text pre-filled — simplest approach is setInputValue(chipText) then immediately trigger submit. Check that the welcome state reappears if the user hasn't sent a message (messages.length === 0). The suggested questions could live in a const array at the top of ChatWidget for easy future editing."
    },
    {
@@ -250,7 +250,7 @@
        "Typecheck passes"
      ],
      "priority": 13,
-      "passes": false,
+      "passes": true,
      "notes": "Transformers.js uses env.localModelPath or env.remoteHost to control where models are fetched from. Setting env.localModelPath = '/models/' should make it look for files at /models/Xenova/all-MiniLM-L6-v2/onnx/model_quantized.onnx etc. The Vite public/ directory serves files at the root — so public/models/ becomes /models/ at runtime. For the build script (Node.js), use a file:// path or the local filesystem path instead. Download model files from https://huggingface.co/Xenova/all-MiniLM-L6-v2/tree/main — the quantized ONNX model is ~23MB. Check what files the pipeline actually requests by watching network tab before making this change."
    },
    {
@@ -12,6 +12,7 @@
 - `src/data/embeddings.json` is an array of `{ id: string, embedding: number[] }` — 42 items, 384-d vectors, IDs match PaletteItem IDs. Vite imports JSON natively.
 - `src/lib/embedding-model.ts` exports `initModel()`, `embedQuery(text)`, `isModelReady()` — check `isModelReady()` before calling `embedQuery()`
 - `initModel()` is called fire-and-forget in `App.tsx` on mount — model loads during boot/ECG/login phases
+- ONNX model files self-hosted in `public/models/Xenova/all-MiniLM-L6-v2/` — `env.localModelPath = '/models/'`, `env.allowRemoteModels = false`, `env.useBrowserCache = false` eliminates HF CDN dependency
 - `src/lib/semantic-search.ts` exports `semanticSearch(queryEmbedding, embeddings, threshold?)` and `loadEmbeddings()` — embeddings are normalized so cosine similarity is dot(a,b)/(mag(a)*mag(b))
 - CommandPalette uses `semanticResults` state + debounced `useEffect` for async semantic search, falling back to Fuse.js when `isModelReady()` returns false or on any error
 - `loadEmbeddings()` and `paletteMap` (Map<id, PaletteItem>) are precomputed via `useMemo` — no re-computation on each search
@@ -31,6 +32,8 @@
 - TopBar is `z-index: 100` (fixed), nav is `z-index: 99` (sticky) — mobile full-screen overlays need `z-index > 100` to appear above them
 - Inline `style={{ display: 'flex' }}` overrides Tailwind's `hidden` class — use `!important` modifier (`max-md:!hidden`) or move display to Tailwind classes to allow responsive hiding
 - ChatWidget mobile breakpoint is `md` (768px) — below this, panel is full-screen; above, it's 380px anchored bottom-right
+- `handleSubmit(overrideText?)` accepts optional text param — use this when programmatically sending messages (e.g., suggested question chips) to avoid stale `inputValue` state
+- `SUGGESTED_QUESTIONS` const array at top of ChatWidget — edit here to change welcome screen chip text

 ---

@@ -250,3 +253,46 @@
  - `100dvh` (dynamic viewport height) is essential for mobile full-screen panels — it accounts for browser chrome (address bar, toolbar) unlike `100vh`
  - The `[data-chat-panel]` CSS selector in the `<style>` block is the right place for responsive size rules since Tailwind can't conditionally set max-height based on viewport width
 ---
+
+## 2026-02-15 - US-012
+- Replaced empty-state centered text with welcome bubble + suggested question chips
+- Welcome bubble styled as assistant message (left-aligned, `var(--bg-dashboard)` bg, `var(--border-light)` border)
+- Added `SUGGESTED_QUESTIONS` const array at module top for easy future editing
+- Three chips: "What's his NHS experience?", "Tell me about his data skills", "What projects has he built?"
+- Chips styled: rounded-full, teal accent border, teal hover tint, `font-ui` 12.5px
+- Clicking a chip calls `handleSubmit(questionText)` — same codepath as typing + Enter
+- Refactored `handleSubmit` to accept optional `overrideText` parameter (avoids stale state issue with `setInputValue` + immediate submit)
+- Wrapped send button `onClick` in arrow function to prevent passing MouseEvent as text argument
+- Welcome/chips visible when `messages.length === 0`, replaced by conversation once any message is sent
+- Typecheck passes (0 errors), lint passes (0 new errors/warnings)
+- Browser verified: welcome bubble displays correctly, chips render, clicking chip sends message and replaces welcome state
+- Files changed: `src/components/ChatWidget.tsx`
+- **Learnings for future iterations:**
+  - When refactoring a callback to accept optional parameters, wrap `onClick={handler}` as `onClick={() => handler()}` to prevent React from passing the SyntheticEvent as the first argument
+  - `SUGGESTED_QUESTIONS` as a module-level const is the simplest approach — easily editable, no data file needed for 3 items
+  - The `handleSubmit(overrideText?)` pattern avoids the stale-state problem: `setInputValue(text)` followed by immediate `handleSubmit()` would read the old `inputValue` since React batches state updates
+---
+
+## 2026-02-15 - US-013
+- Downloaded all-MiniLM-L6-v2 model files to `public/models/Xenova/all-MiniLM-L6-v2/`:
+  - `config.json`, `tokenizer.json`, `tokenizer_config.json`, `onnx/model_quantized.onnx` (~22MB)
+- Updated `src/lib/embedding-model.ts`:
+  - `env.localModelPath = '/models/'` — Vite serves `public/` at root
+  - `env.allowRemoteModels = false` — prevents any HF CDN fallback
+  - `env.useBrowserCache = false` — prevents stale Cache API entries from interfering
+- Updated `scripts/generate-embeddings.ts`:
+  - `env.localModelPath = resolve(import.meta.dirname, '..', 'public', 'models')` — absolute path for Node.js
+  - `env.allowRemoteModels = false`
+- Model files committed as static assets (not in .gitignore)
+- Browser verified: all 4 model files fetched from `localhost:5173/models/` with 200 OK, zero `huggingface.co` requests
+- Semantic search verified working: "data analysis" returns multi-category results (Core Skills, Active Projects, Achievements)
+- Build script (`npm run generate-embeddings`) still works with local model files
+- Typecheck passes (0 errors), lint passes (0 new errors/warnings)
+- Files changed: `src/lib/embedding-model.ts`, `scripts/generate-embeddings.ts`, `public/models/Xenova/all-MiniLM-L6-v2/` (new directory with 4 files)
+- **Learnings for future iterations:**
+  - `@xenova/transformers` env configuration: `env.localModelPath` sets the base path, `env.allowRemoteModels = false` prevents CDN fallback, `env.useBrowserCache = false` bypasses Browser Cache API
+  - The library constructs paths as `{localModelPath}/{modelId}/{filename}` — so `/models/` + `Xenova/all-MiniLM-L6-v2` + `/onnx/model_quantized.onnx`
+  - Browser Cache API can retain stale entries from previous HF CDN loads — setting `useBrowserCache = false` forces fresh fetches from the configured local path
+  - For Node.js scripts, use an absolute filesystem path for `localModelPath` (not a URL)
+  - The quantized ONNX model (`model_quantized.onnx`) is ~22MB — acceptable for a static asset since it's cached after first load
+---