Files
portfolio/tasks/prd-chat-polish.md
T
2026-02-15 23:20:24 +00:00

7.1 KiB

PRD: Chat Widget Polish & Model Updates

Introduction

The semantic search and AI chat features are functionally complete (US-001 through US-010). This PRD covers four polish items: mobile full-screen chat experience, a welcome message with suggested questions, self-hosting the ONNX embedding model, and updating from Gemini 2.0 Flash to Gemini 3 Flash Preview.

Goals

  • Full-screen chat on mobile (<768px) for a better small-screen experience
  • Welcome message with suggested question chips to reduce blank-state friction
  • Self-host the ONNX model (all-MiniLM-L6-v2) to eliminate dependency on Hugging Face CDN
  • Update Gemini model to gemini-3-flash-preview and show which model powers the chat
  • Refresh system prompt while updating the model

User Stories

US-011: Mobile full-screen chat panel

Description: As a mobile visitor, I want the chat panel to be a full-screen overlay so it's easy to use on small screens.

Acceptance Criteria:

  • Below md breakpoint (768px), chat panel renders as full-screen overlay (100vw x 100vh, or using dvh for mobile browser chrome)
  • Full-screen mode has a visible header with close button
  • Floating chat button is hidden while panel is open on mobile
  • Above 768px, existing panel behavior unchanged (380px wide, anchored bottom-right)
  • Smooth transition between open/closed states respects prefers-reduced-motion
  • Typecheck passes
  • Verify in browser using dev-browser skill

US-012: Welcome message with suggested questions

Description: As a visitor opening the chat for the first time, I see a friendly welcome and clickable suggested questions so I know what to ask.

Acceptance Criteria:

  • When chat panel opens and conversation is empty, display welcome message: "Hey! I'm here to help you learn more about Andy. What would you like to know?"
  • Below the welcome message, show 2-3 clickable pill/chip buttons with suggested questions (e.g., "What's his NHS experience?", "Tell me about his data skills", "What projects has he built?")
  • Clicking a suggested question sends it as a user message (same as typing and pressing Enter)
  • Welcome message and chips are always visible when conversation is empty (persist across open/close if no messages sent)
  • Once a message is sent, the welcome/chips area is replaced by the conversation
  • Chips use design system tokens (teal accent border, hover state)
  • Typecheck passes
  • Verify in browser using dev-browser skill

US-013: Self-host ONNX embedding model

Description: As a developer, I want the ONNX model files served from the same host as the site, so there's no runtime dependency on Hugging Face CDN.

Acceptance Criteria:

  • Model files for all-MiniLM-L6-v2 downloaded and placed in public/models/all-MiniLM-L6-v2/ (or public/models/onnx/ — whichever is cleaner)
  • Files include at minimum: onnx/model_quantized.onnx, tokenizer.json, tokenizer_config.json, config.json
  • src/lib/embedding-model.ts updated to load from local path instead of Hugging Face CDN
  • Build-time embedding script (scripts/generate-embeddings.ts) also uses local model path
  • .gitignore does NOT ignore the model files — they are committed as static assets
  • Verify model loads correctly in browser (semantic search still works in command palette)
  • Typecheck passes

US-014: Update to Gemini 3 Flash Preview + model indicator

Description: As a developer, I want to use the latest free Gemini model, and as a visitor, I want to see what model powers the chat.

Acceptance Criteria:

  • GEMINI_API_BASE in src/lib/gemini.ts updated from gemini-2.0-flash to gemini-3-flash-preview
  • Review and update the system prompt for clarity (ensure it's well-structured for the new model)
  • Review and update the response format instructions (the [ITEMS: ...] suffix pattern)
  • Small text indicator in chat panel header or footer showing the model name (e.g., "Gemini 3 Flash" in font-geist, 11px, tertiary color)
  • If the model string needs to change in future, it should be a single constant — not hardcoded in multiple places
  • Typecheck passes
  • Verify in browser using dev-browser skill

Functional Requirements

  • FR-1: Chat panel below 768px uses full-screen overlay layout (position: fixed; inset: 0)
  • FR-2: Chat button hidden when full-screen panel is open on mobile
  • FR-3: Welcome message and suggested question chips shown when conversation is empty
  • FR-4: Clicking a suggested question chip triggers the same flow as manually typing and sending
  • FR-5: ONNX model files served from public/models/ as static assets
  • FR-6: embedding-model.ts configures Transformers.js to use local model path
  • FR-7: Gemini API calls use gemini-3-flash-preview model
  • FR-8: Chat UI displays model name indicator

Non-Goals

  • No changes to the command palette UI or semantic search ranking logic
  • No persistent chat history across page loads
  • No rate limiting or abuse prevention
  • No changes to the boot/ECG/login flow
  • No model fine-tuning or custom training

Design Considerations

Mobile Full-Screen Chat

  • Full viewport with safe area insets (env(safe-area-inset-*)) for notched devices
  • Header matches existing panel header style but full-width
  • Input pinned to bottom, messages scroll above

Welcome Message & Chips

  • Welcome text styled as an AI message bubble (left-aligned, light background)
  • Chips: small rounded pills with teal border, teal text on hover, font-ui 12-13px
  • 2-3 chips arranged in a flex-wrap row below the welcome bubble
  • Example questions: "What's his NHS experience?", "Tell me about his data skills", "What projects has he built?"

Model Indicator

  • Placed in the chat panel header, right-aligned or below the "Ask about Andy" title
  • font-geist, 11px, var(--text-tertiary) color
  • Format: "Powered by Gemini 3 Flash" or just "Gemini 3 Flash"

Technical Considerations

Self-Hosting ONNX Model

  • Transformers.js supports a localURL or custom env.localModelPath configuration to redirect model loading from HF CDN to a local path
  • The quantized model (model_quantized.onnx) is ~23MB — acceptable for a static deploy
  • Files must be served with correct MIME types (.onnx as application/octet-stream)
  • The build-time script and browser runtime must both point to the same model files

Gemini Model Update

  • gemini-3-flash-preview may have a different API path structure — verify against the Generative Language API docs
  • The streaming SSE format should be identical across Flash models, but verify the response shape

Success Metrics

  • Mobile chat is comfortable to use on a phone-sized viewport (no overflow, no cropping)
  • Suggested questions reduce "blank screen" hesitation — visitors engage faster
  • ONNX model loads successfully from local path (no HF CDN requests in network tab)
  • Chat responses come through on the new Gemini model with correct item references

Open Questions

  • Should the suggested question chips be configurable from a data file, or hardcoded in the component?
  • Does gemini-3-flash-preview require a different API version path (v1beta vs v1)?