portfolio/Ralph/progress.txt

# Progress Log — Semantic Search & AI Chat
# Branch: ralph/semantic-search
# Started: 2026-02-15

## Codebase Patterns
- `@xenova/transformers` pipeline with `pooling: 'mean'` and `normalize: true` returns a Tensor; use `Array.from(output.data as Float32Array)` to extract the 384-d vector
- Scripts live in `scripts/` and run via `npx tsx` (tsx is not a project dep, npx fetches it)
- tsconfig `include` only covers `src/` — scripts are type-checked by tsx at runtime, not by `tsc --noEmit`
- Project uses `"type": "module"` in package.json

---

## 2026-02-15 - US-001
- Installed `@xenova/transformers` (^2.17.2)
- Created `scripts/generate-embeddings.ts` with main() that loads `Xenova/all-MiniLM-L6-v2` and embeds a test string
- Added `"generate-embeddings"` npm script
- Verified: outputs vector length 384 and exits cleanly
- Typecheck passes
- Files changed: `package.json`, `package-lock.json`, `scripts/generate-embeddings.ts`
- **Learnings for future iterations:**
  - `pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2')` auto-downloads and caches the ONNX model (~23MB)
  - First run takes a few seconds for model download; subsequent runs are near-instant from cache
  - The pipeline's `pooling: 'mean'` and `normalize: true` options handle mean-pooling and L2 normalization in one step — no manual tensor manipulation needed
  - `output.data` is a `Float32Array`; wrap in `Array.from()` for a plain number array
---