From 667e5b249c3f22f8ab456625ee9dc0708f65fa2f Mon Sep 17 00:00:00 2001 From: Andy Charlwood Date: Sun, 15 Feb 2026 20:59:03 +0000 Subject: [PATCH] feat: US-013 - Self-host ONNX embedding model Download all-MiniLM-L6-v2 model files to public/models/ and configure @xenova/transformers to load from local path instead of Hugging Face CDN. Eliminates external dependency for semantic search embedding model. --- Ralph/prd.json | 4 +- Ralph/progress.txt | 46 + .../Xenova/all-MiniLM-L6-v2/config.json | 25 + .../onnx/model_quantized.onnx | Bin 0 -> 22972370 bytes .../Xenova/all-MiniLM-L6-v2/tokenizer.json | 30686 ++++++++++++++++ .../all-MiniLM-L6-v2/tokenizer_config.json | 15 + scripts/generate-embeddings.ts | 6 +- src/lib/embedding-model.ts | 7 +- 8 files changed, 30785 insertions(+), 4 deletions(-) create mode 100644 public/models/Xenova/all-MiniLM-L6-v2/config.json create mode 100644 public/models/Xenova/all-MiniLM-L6-v2/onnx/model_quantized.onnx create mode 100644 public/models/Xenova/all-MiniLM-L6-v2/tokenizer.json create mode 100644 public/models/Xenova/all-MiniLM-L6-v2/tokenizer_config.json diff --git a/Ralph/prd.json b/Ralph/prd.json index 6775888..c9db4fe 100644 --- a/Ralph/prd.json +++ b/Ralph/prd.json @@ -232,7 +232,7 @@ "Verify in browser using dev-browser skill" ], "priority": 12, - "passes": false, + "passes": true, "notes": "Replace the current empty-state text ('Ask me anything about Andy's experience, skills, or projects.') with the new welcome bubble + chips. The chips should call handleSubmit (or equivalent) with the chip text pre-filled — simplest approach is setInputValue(chipText) then immediately trigger submit. Check that the welcome state reappears if the user hasn't sent a message (messages.length === 0). The suggested questions could live in a const array at the top of ChatWidget for easy future editing." }, { @@ -250,7 +250,7 @@ "Typecheck passes" ], "priority": 13, - "passes": false, + "passes": true, "notes": "Transformers.js uses env.localModelPath or env.remoteHost to control where models are fetched from. Setting env.localModelPath = '/models/' should make it look for files at /models/Xenova/all-MiniLM-L6-v2/onnx/model_quantized.onnx etc. The Vite public/ directory serves files at the root — so public/models/ becomes /models/ at runtime. For the build script (Node.js), use a file:// path or the local filesystem path instead. Download model files from https://huggingface.co/Xenova/all-MiniLM-L6-v2/tree/main — the quantized ONNX model is ~23MB. Check what files the pipeline actually requests by watching network tab before making this change." }, { diff --git a/Ralph/progress.txt b/Ralph/progress.txt index 1d107a6..9796f23 100644 --- a/Ralph/progress.txt +++ b/Ralph/progress.txt @@ -12,6 +12,7 @@ - `src/data/embeddings.json` is an array of `{ id: string, embedding: number[] }` — 42 items, 384-d vectors, IDs match PaletteItem IDs. Vite imports JSON natively. - `src/lib/embedding-model.ts` exports `initModel()`, `embedQuery(text)`, `isModelReady()` — check `isModelReady()` before calling `embedQuery()` - `initModel()` is called fire-and-forget in `App.tsx` on mount — model loads during boot/ECG/login phases +- ONNX model files self-hosted in `public/models/Xenova/all-MiniLM-L6-v2/` — `env.localModelPath = '/models/'`, `env.allowRemoteModels = false`, `env.useBrowserCache = false` eliminates HF CDN dependency - `src/lib/semantic-search.ts` exports `semanticSearch(queryEmbedding, embeddings, threshold?)` and `loadEmbeddings()` — embeddings are normalized so cosine similarity is dot(a,b)/(mag(a)*mag(b)) - CommandPalette uses `semanticResults` state + debounced `useEffect` for async semantic search, falling back to Fuse.js when `isModelReady()` returns false or on any error - `loadEmbeddings()` and `paletteMap` (Map) are precomputed via `useMemo` — no re-computation on each search @@ -31,6 +32,8 @@ - TopBar is `z-index: 100` (fixed), nav is `z-index: 99` (sticky) — mobile full-screen overlays need `z-index > 100` to appear above them - Inline `style={{ display: 'flex' }}` overrides Tailwind's `hidden` class — use `!important` modifier (`max-md:!hidden`) or move display to Tailwind classes to allow responsive hiding - ChatWidget mobile breakpoint is `md` (768px) — below this, panel is full-screen; above, it's 380px anchored bottom-right +- `handleSubmit(overrideText?)` accepts optional text param — use this when programmatically sending messages (e.g., suggested question chips) to avoid stale `inputValue` state +- `SUGGESTED_QUESTIONS` const array at top of ChatWidget — edit here to change welcome screen chip text --- @@ -250,3 +253,46 @@ - `100dvh` (dynamic viewport height) is essential for mobile full-screen panels — it accounts for browser chrome (address bar, toolbar) unlike `100vh` - The `[data-chat-panel]` CSS selector in the `