# PRD: Chat Widget Polish & Model Updates ## Introduction The semantic search and AI chat features are functionally complete (US-001 through US-010). This PRD covers four polish items: mobile full-screen chat experience, a welcome message with suggested questions, self-hosting the ONNX embedding model, and updating from Gemini 2.0 Flash to Gemini 3 Flash Preview. ## Goals - Full-screen chat on mobile (<768px) for a better small-screen experience - Welcome message with suggested question chips to reduce blank-state friction - Self-host the ONNX model (`all-MiniLM-L6-v2`) to eliminate dependency on Hugging Face CDN - Update Gemini model to `gemini-3-flash-preview` and show which model powers the chat - Refresh system prompt while updating the model ## User Stories ### US-011: Mobile full-screen chat panel **Description:** As a mobile visitor, I want the chat panel to be a full-screen overlay so it's easy to use on small screens. **Acceptance Criteria:** - [ ] Below `md` breakpoint (768px), chat panel renders as full-screen overlay (100vw x 100vh, or using `dvh` for mobile browser chrome) - [ ] Full-screen mode has a visible header with close button - [ ] Floating chat button is hidden while panel is open on mobile - [ ] Above 768px, existing panel behavior unchanged (380px wide, anchored bottom-right) - [ ] Smooth transition between open/closed states respects `prefers-reduced-motion` - [ ] Typecheck passes - [ ] Verify in browser using dev-browser skill ### US-012: Welcome message with suggested questions **Description:** As a visitor opening the chat for the first time, I see a friendly welcome and clickable suggested questions so I know what to ask. **Acceptance Criteria:** - [ ] When chat panel opens and conversation is empty, display welcome message: "Hey! I'm here to help you learn more about Andy. What would you like to know?" - [ ] Below the welcome message, show 2-3 clickable pill/chip buttons with suggested questions (e.g., "What's his NHS experience?", "Tell me about his data skills", "What projects has he built?") - [ ] Clicking a suggested question sends it as a user message (same as typing and pressing Enter) - [ ] Welcome message and chips are always visible when conversation is empty (persist across open/close if no messages sent) - [ ] Once a message is sent, the welcome/chips area is replaced by the conversation - [ ] Chips use design system tokens (teal accent border, hover state) - [ ] Typecheck passes - [ ] Verify in browser using dev-browser skill ### US-013: Self-host ONNX embedding model **Description:** As a developer, I want the ONNX model files served from the same host as the site, so there's no runtime dependency on Hugging Face CDN. **Acceptance Criteria:** - [ ] Model files for `all-MiniLM-L6-v2` downloaded and placed in `public/models/all-MiniLM-L6-v2/` (or `public/models/onnx/` — whichever is cleaner) - [ ] Files include at minimum: `onnx/model_quantized.onnx`, `tokenizer.json`, `tokenizer_config.json`, `config.json` - [ ] `src/lib/embedding-model.ts` updated to load from local path instead of Hugging Face CDN - [ ] Build-time embedding script (`scripts/generate-embeddings.ts`) also uses local model path - [ ] `.gitignore` does NOT ignore the model files — they are committed as static assets - [ ] Verify model loads correctly in browser (semantic search still works in command palette) - [ ] Typecheck passes ### US-014: Update to Gemini 3 Flash Preview + model indicator **Description:** As a developer, I want to use the latest free Gemini model, and as a visitor, I want to see what model powers the chat. **Acceptance Criteria:** - [ ] `GEMINI_API_BASE` in `src/lib/gemini.ts` updated from `gemini-2.0-flash` to `gemini-3-flash-preview` - [ ] Review and update the system prompt for clarity (ensure it's well-structured for the new model) - [ ] Review and update the response format instructions (the `[ITEMS: ...]` suffix pattern) - [ ] Small text indicator in chat panel header or footer showing the model name (e.g., "Gemini 3 Flash" in `font-geist`, 11px, tertiary color) - [ ] If the model string needs to change in future, it should be a single constant — not hardcoded in multiple places - [ ] Typecheck passes - [ ] Verify in browser using dev-browser skill ## Functional Requirements - FR-1: Chat panel below 768px uses full-screen overlay layout (`position: fixed; inset: 0`) - FR-2: Chat button hidden when full-screen panel is open on mobile - FR-3: Welcome message and suggested question chips shown when conversation is empty - FR-4: Clicking a suggested question chip triggers the same flow as manually typing and sending - FR-5: ONNX model files served from `public/models/` as static assets - FR-6: `embedding-model.ts` configures Transformers.js to use local model path - FR-7: Gemini API calls use `gemini-3-flash-preview` model - FR-8: Chat UI displays model name indicator ## Non-Goals - No changes to the command palette UI or semantic search ranking logic - No persistent chat history across page loads - No rate limiting or abuse prevention - No changes to the boot/ECG/login flow - No model fine-tuning or custom training ## Design Considerations ### Mobile Full-Screen Chat - Full viewport with safe area insets (`env(safe-area-inset-*)`) for notched devices - Header matches existing panel header style but full-width - Input pinned to bottom, messages scroll above ### Welcome Message & Chips - Welcome text styled as an AI message bubble (left-aligned, light background) - Chips: small rounded pills with teal border, teal text on hover, `font-ui` 12-13px - 2-3 chips arranged in a flex-wrap row below the welcome bubble - Example questions: "What's his NHS experience?", "Tell me about his data skills", "What projects has he built?" ### Model Indicator - Placed in the chat panel header, right-aligned or below the "Ask about Andy" title - `font-geist`, 11px, `var(--text-tertiary)` color - Format: "Powered by Gemini 3 Flash" or just "Gemini 3 Flash" ## Technical Considerations ### Self-Hosting ONNX Model - Transformers.js supports a `localURL` or custom `env.localModelPath` configuration to redirect model loading from HF CDN to a local path - The quantized model (`model_quantized.onnx`) is ~23MB — acceptable for a static deploy - Files must be served with correct MIME types (`.onnx` as `application/octet-stream`) - The build-time script and browser runtime must both point to the same model files ### Gemini Model Update - `gemini-3-flash-preview` may have a different API path structure — verify against the Generative Language API docs - The streaming SSE format should be identical across Flash models, but verify the response shape ## Success Metrics - Mobile chat is comfortable to use on a phone-sized viewport (no overflow, no cropping) - Suggested questions reduce "blank screen" hesitation — visitors engage faster - ONNX model loads successfully from local path (no HF CDN requests in network tab) - Chat responses come through on the new Gemini model with correct item references ## Open Questions - Should the suggested question chips be configurable from a data file, or hardcoded in the component? - Does `gemini-3-flash-preview` require a different API version path (`v1beta` vs `v1`)?