docs: update all documentation for Dash migration (Phase 6)

Rewrote README.md, USER_GUIDE.md, and DEPLOYMENT.md to reflect the Dash application. Updated RALPH_PROMPT.md, guardrails.md, and DESIGN_SYSTEM.md to remove Reflex references. All non-archive documentation now reflects the current Dash + DMC architecture.
2026-02-06 14:54:12 +00:00
parent 4cb5641c2d
commit 54b4a0f743
8 changed files with 635 additions and 956 deletions
@@ -256,6 +256,14 @@ Drawer selection → update_drug_selection → app-state store → load_pathway_
 - [x] Verify: No Reflex imports anywhere in `dash_app/`
 - **Checkpoint**: Full application works, no Reflex remnants, CLAUDE.md updated

+
+
+## Phase 6: Update all documentation
+- [x] Remove `reflex` references from all documentation
+- [x] Verify: No Reflex mentions of reflex in any md files (archive/ excluded — historical)
+- [x] Add documentation in readme re how to run dash app
+- [x] Update all claude.md files (CLAUDE.md was updated in Task 5.4)
+- **Checkpoint**: Full application works, no Reflex remnants, CLAUDE.md updated
 ---

 ## Completion Criteria
@@ -1,128 +1,124 @@
-# Ralph Wiggum Loop - Drug-Aware Indication Matching
+# Ralph Wiggum Loop — Dash Application Maintenance

-You are operating inside an automated loop extending a pathway analysis application with drug-aware indication matching. Each iteration you receive fresh context — you have NO memory of previous iterations. Your only memory is the filesystem.
+You are operating inside an automated loop maintaining an NHS patient pathway analysis tool built with Dash (Plotly) + Dash Mantine Components. Each iteration you receive fresh context — you have NO memory of previous iterations. Your only memory is the filesystem.

-**Current Focus**: Update indication charts so that patient indications are matched **per drug**, not just per patient. Each drug must be validated against the patient's GP diagnoses AND the drug-to-indication mapping from DimSearchTerm.csv.
+**Current Focus**: Maintain and enhance the Dash application in `dash_app/`. The backend (`src/`) provides shared data access and visualization functions. The design target is `01_nhs_classic.html`.

 ## First Actions Every Iteration

 Read these files in this order before doing anything else:

-1. `progress.txt` — What previous iterations accomplished, what's blocked, and what to do next. The most recent entry is most important.
-2. `IMPLEMENTATION_PLAN.md` — Task list with status markers, project overview, and completion criteria.
+1. `progress.txt` — What previous iterations accomplished, what's blocked, and what to do next.
+2. `IMPLEMENTATION_PLAN.md` — Task list with status markers, architecture overview, and completion criteria.
 3. `guardrails.md` — Known failure patterns to avoid. You MUST read and follow these.
-4. `CLAUDE.md` — Project architecture and code patterns.
+4. `CLAUDE.md` — Project architecture and backend code patterns.

 Then run `git log --oneline -5` to see recent commits.

+## Reading the Design Reference
+
+**When building ANY UI component**, read `01_nhs_classic.html` first:
+- It contains the exact CSS classes, HTML structure, and visual layout you must replicate
+- CSS lives in the `<style>` block (lines 8-314) — this becomes `dash_app/assets/nhs.css`
+- HTML structure (lines 316-480+) shows the component hierarchy and class usage
+- Match the design as closely as possible — `className` in Dash = `class` in HTML
+
+**When building data loading or chart callbacks**, reference the shared functions in `src/`:
+- `src/data_processing/pathway_queries.py`: `load_initial_data()` and `load_pathway_nodes()` — shared query functions
+- `src/visualization/plotly_generator.py`: `create_icicle_from_nodes()` — icicle chart from list-of-dicts
+- `dash_app/data/queries.py`: Thin wrapper calling shared functions with correct DB path
+- The original logic is archived in `archive/pathways_app/pathways_app.py` for reference.
+
 ## Narration

 Narrate your work as you go. Your output is the only visibility the operator has into what's happening. For every significant action, explain what you're doing and why:

- **Reading files**: "Reading progress.txt to check what the last iteration accomplished..."
- **Creating code**: "Adding assign_drug_indications() function to diagnosis_lookup.py..."
- **Debugging**: "Drug matching returned 0 results for ADALIMUMAB. Checking DimSearchTerm lookup..."
- **Testing**: "Running import check to verify the new function is accessible..."
- **Making decisions**: "The guardrails say to use substring matching for drug fragments."
- **Committing**: "Committing drug-indication matching logic."
+- **Reading files**: "Reading 01_nhs_classic.html to get CSS classes for the header component..."
+- **Creating code**: "Creating dash_app/components/header.py with make_header() function..."
+- **Debugging**: "Import error for dmc.Drawer — checking dash-mantine-components version..."
+- **Testing**: "Running python run_dash.py to verify the app starts..."
+- **Making decisions**: "The guardrails say to use className from nhs.css, not inline styles."
+- **Committing**: "Committing header and sidebar components."

-Do NOT just output a summary at the end. Narrate throughout. Think of this as a live log of your reasoning.
+Do NOT just output a summary at the end. Narrate throughout.

 ## Task Selection

-You have flexibility to choose which task to work on. Use your judgement, but document your reasoning.
-
 1. Read ALL tasks in IMPLEMENTATION_PLAN.md — understand the full picture
 2. Skip any marked `[x]` (complete) or `[B]` (blocked)
 3. Check progress.txt for guidance — the previous iteration may have recommendations
 4. **Choose a task** based on:
-   - Dependencies (some tasks require others to be done first)
-   - Logical flow (query changes before matching logic, matching before pipeline integration)
-   - Your assessment of what would be most valuable to tackle next
-   - Previous iteration's recommendations (consider but don't blindly follow)
-5. **Document your reasoning**: Before starting work, briefly explain WHY you chose this task over others
+   - Dependencies (scaffolding before components, components before callbacks)
+   - Logical flow (Phase 0 → 1 → 2 → 3 → 4 → 5)
+   - Previous iteration's recommendations
+5. **Document your reasoning**: Before starting, explain WHY you chose this task
 6. Mark your chosen task `[~]` (in progress) in IMPLEMENTATION_PLAN.md

-If your chosen task turns out to be blocked during work:
- Mark it `[B]` with a reason in IMPLEMENTATION_PLAN.md
+If your chosen task is blocked:
+- Mark it `[B]` with a reason
 - Document the blocker in progress.txt
- Move to a different ready task within this same iteration
+- Move to a different ready task

 ## Development

 Work on ONE task per iteration. Build incrementally and verify as you go.

-### Key Concepts
+### Key Technologies

-**Drug-Indication Matching Flow:**
-1. Get patient's GP-matched Search_Terms from Snowflake (ALL matches, not just most recent, with code_frequency)
-   - Only count GP codes from MIN(Intervention Date) onwards (the HCD data window)
-2. Load DimSearchTerm.csv to get which drugs belong to which Search_Terms
-3. For each patient-drug pair: intersection of (Search_Terms listing this drug) AND (patient's GP matches)
-   - If multiple matches: pick highest code_frequency (most GP coding = most likely indication)
-4. Modify UPID to include matched indication: `{UPID}|{search_term}`
-5. Drugs sharing the same indication for the same patient → same modified UPID → same pathway
-6. Drugs under different indications → different modified UPIDs → separate pathways
+- **Dash 2.x**: `from dash import Dash, html, dcc, Input, Output, State, callback_context, ALL`
+- **Dash Mantine Components 0.14.x**: `import dash_mantine_components as dmc` — needs `dmc.MantineProvider` wrapping the layout
+- **Plotly**: `import plotly.graph_objects as go` — for the icicle chart
+- **SQLite**: `import sqlite3` — read-only access to `data/pathways.db`
+- **CSS**: All in `dash_app/assets/nhs.css` — auto-served by Dash

-**DimSearchTerm.csv:**
- `Search_Term`: Clinical condition (e.g., "rheumatoid arthritis")
- `CleanedDrugName`: Pipe-separated drug fragments (e.g., "ADALIMUMAB|GOLIMUMAB|...")
- `PrimaryDirectorate`: The directorate for this condition
- Drug matching: check if any fragment is a substring of the HCD drug name (case-insensitive)
+### Dash Component Patterns

-**Modified UPID Format:**
- Original: `RMV12345` (Provider Code[:3] + PersonKey)
- Modified: `RMV12345|rheumatoid arthritis`
- Fallback: `RMV12345|RHEUMATOLOGY (no GP dx)`
- The existing pathway analyzer treats UPID as an opaque identifier — this works transparently
-
-### Code Patterns
-
- **Snowflake queries**: Use parameterized queries, embed the cluster CTE from CLUSTER_MAPPING_SQL
- **GP record matching**: Return ALL matches per patient (not just most recent)
- **Drug mapping**: Load from `data/DimSearchTerm.csv`, match drug name fragments
- **Pathway pipeline**: Use existing functions — modified UPIDs flow through naturally
- **Reflex state**: No changes expected — indication charts already work, just with better matching
-
-### Key Data Structures
-
-**GP Matches (from Snowflake) — updated to return ALL matches with frequency:**
 ```python
-# Multiple rows per patient (one per matched Search_Term)
-# code_frequency = COUNT of matching SNOMED codes (used as tiebreaker)
-# Only counts codes from MIN(Intervention Date) onwards
-DataFrame with: PatientPseudonym, Search_Term, code_frequency
+# HTML elements use dash.html
+from dash import html
+html.Div(className="top-header", children=[...])
+
+# Mantine components for rich UI
+import dash_mantine_components as dmc
+dmc.Drawer(id="drug-drawer", position="right", size="480px", children=[...])
+dmc.Accordion(children=[dmc.AccordionItem(...)])
+
+# State management
+dcc.Store(id="app-state", storage_type="session", data={})
+
+# Callbacks
+@app.callback(
+    Output("chart-data", "data"),
+    Input("app-state", "data"),
+)
+def load_pathway_data(app_state):
+    ...
 ```

-**Drug-to-Indication Mapping (from DimSearchTerm.csv):**
-```python
-# search_term → list of drug fragments
-{"rheumatoid arthritis": ["ABATACEPT", "ADALIMUMAB", "ANAKINRA", ...]}
-```
+### Database Access Pattern

-**Modified HCD Data:**
 ```python
-# Original UPID replaced with indication-aware UPID
-df["UPID"] = "RMV12345|rheumatoid arthritis"  # for matched drugs
-df["UPID"] = "RMV12345|RHEUMATOLOGY (no GP dx)"  # for unmatched drugs
-```
+from pathlib import Path
+import sqlite3

-**Indication DataFrame:**
-```python
-# Maps modified UPID → Search_Term (for pathway hierarchy level 2)
-indication_df = pd.DataFrame({
-    'Directory': ['rheumatoid arthritis', 'asthma', 'CARDIOLOGY (no GP dx)']
-}, index=['RMV12345|rheumatoid arthritis', 'RMV12345|asthma', 'RMV67890|CARDIOLOGY (no GP dx)'])
+DB_PATH = Path(__file__).resolve().parents[2] / "data" / "pathways.db"
+
+def load_pathway_data(filter_id, chart_type, selected_drugs=None, selected_directorates=None):
+    conn = sqlite3.connect(str(DB_PATH))
+    conn.row_factory = sqlite3.Row
+    # ... query with parameterized WHERE ...
+    conn.close()
+    return result_dict
 ```

 ### Verification Steps

 After writing code, ALWAYS verify:

-1. **Syntax check**: `python -m py_compile <file.py>`
-2. **Import check**: `python -c "from module import function"`
-3. **For database changes**: Test with query against pathways.db
-4. **For Reflex changes**: `python -m reflex compile`
+1. **Import check**: `python -c "from dash_app.app import app"` (or specific module)
+2. **App starts**: `python run_dash.py` — must start without errors
+3. **Visual check** (when building UI): describe what you expect to see at localhost:8050
+4. **For callbacks**: verify the callback chain fires correctly (add temporary `print()` statements if needed)

 If any step fails, fix the issue before proceeding.

@@ -133,24 +129,23 @@ Every task MUST pass validation before being marked complete:
 ### Tier 1: Code Validation (MANDATORY)
 - Code compiles without Python syntax errors
 - Imports work without errors
- No TypeErrors, ImportErrors, or AttributeErrors
+- `python run_dash.py` starts without exceptions

-### Tier 2: Data Validation (for data/pipeline tasks)
- Queries return expected row counts
- Data structures have correct columns/types
- Drug-indication matching produces valid results
- Modified UPIDs have correct format
+### Tier 2: Layout Validation (for UI component tasks)
+- Component renders in the browser
+- CSS classes match 01_nhs_classic.html
+- Layout structure matches the HTML concept

-### Tier 3: Functional Validation (for UI/integration tasks)
- Reflex compiles the app without errors
- State changes trigger expected behavior
- Both chart types render correctly
+### Tier 3: Functional Validation (for callback tasks)
+- Callbacks fire when inputs change
+- Data flows correctly through dcc.Store chain
+- Chart renders with real data from SQLite

 ### Validation Failure

 If any tier fails:
 - DO NOT mark the task complete
- Document the failure details in progress.txt
+- Document the failure in progress.txt
 - Fix the issue within this iteration if possible
 - If you cannot fix it, mark the task `[B]` with details

@@ -159,34 +154,33 @@ If any tier fails:
 Before marking ANY task `[x]`, ALL of these must be true:

 1. Code is saved to the appropriate file(s)
-2. Tier 1 code validation passed
+2. Tier 1 validation passed (imports + app starts)
 3. Tier 2/3 validation passed (as applicable)
 4. All changes committed to git with a descriptive message

-These are non-negotiable. A task that "feels done" but hasn't passed all gates is NOT done.
+These are non-negotiable.

 ## Update Progress

-After completing your work (whether the task succeeded, failed, or was blocked), append to progress.txt using this format:
+After completing your work, append to progress.txt using this format:

 ```
 ## Iteration [N] — [YYYY-MM-DD]
 ### Task: [which task you worked on]
 ### Why this task:
 - [Brief explanation of why you chose this task over others]
- [What dependencies or logical flow led to this choice]
 ### Status: COMPLETE | BLOCKED | IN PROGRESS
 ### What was done:
 - [Specific actions taken]
 ### Validation results:
- Tier 1 (Code): [syntax check, import check]
- Tier 2 (Data): [query results, row counts]
- Tier 3 (Functional): [reflex compile, UI check]
+- Tier 1 (Code): [import check, app starts]
+- Tier 2 (Layout): [renders correctly, CSS matches]
+- Tier 3 (Functional): [callbacks fire, data flows]
 ### Files changed:
 - [list of files created/modified]
 ### Committed: [git hash] "[commit message]"
 ### Patterns discovered:
- [Any reusable learnings — query patterns, matching logic quirks]
+- [Any reusable learnings — Dash patterns, DMC quirks, CSS gotchas]
 ### Next iteration should:
 - [Explicit guidance for what the next fresh instance should do first]
 - [Note any context that would be lost without writing it here]
@@ -194,20 +188,20 @@ After completing your work (whether the task succeeded, failed, or was blocked),
 - [Any tasks that are blocked and why]
 ```

-If you discover a failure pattern that future iterations should avoid, add it to `guardrails.md`.
+If you discover a failure pattern, add it to `guardrails.md`.

 ## Commit Changes

 1. Stage changed files
-2. Use a descriptive commit message referencing the task (e.g., "feat: add drug-indication matching function (Task 2.1)")
-3. Commit after your task is validated and complete — one commit per logical unit of work
+2. Use a descriptive commit message referencing the task (e.g., "feat: create dash_app skeleton with nhs.css (Task 0.1 + 0.2)")
+3. Commit after your task is validated and complete
 4. If you updated progress.txt with a blocked status, commit that too

 ## Completion Check

 If ALL tasks in IMPLEMENTATION_PLAN.md are marked `[x]`:

-1. Run `reflex compile` to verify app compiles
+1. Run `python run_dash.py` to verify app starts cleanly
 2. Verify all completion criteria at the bottom of IMPLEMENTATION_PLAN.md are satisfied
 3. Only then output the completion signal on its own line:

@@ -217,20 +211,19 @@ If ALL tasks in IMPLEMENTATION_PLAN.md are marked `[x]`:

 DO NOT output this string under any other circumstances.
 DO NOT output it if any task is still `[ ]` or `[B]` or `[~]`.
-DO NOT paraphrase, vary, or conditionally output this string.

 ## Rules

 - Complete ONE task per iteration, then update progress and stop
 - ALWAYS read progress.txt, guardrails.md before starting work
- **Match drugs to indications** — not just patients to indications
- **Use DimSearchTerm.csv** for drug-to-Search_Term mapping
- **Return ALL GP matches** — not just most recent (remove QUALIFY ROW_NUMBER = 1)
- **Modified UPID format**: `{UPID}|{search_term}` — pipe delimiter is safe
- **Use PseudoNHSNoLinked** — NOT PersonKey for GP record matching
- **Substring matching** for drug fragments from DimSearchTerm.csv
+- **Read 01_nhs_classic.html** when building ANY visual component
+- **Read src/data_processing/pathway_queries.py and src/visualization/plotly_generator.py** when building data logic or chart callbacks
+- **DO NOT modify pipeline/analysis logic** in src/ (pathway_pipeline, transforms, diagnosis_lookup, pathway_analyzer, refresh_pathways)
+- **DO add shared utilities** to src/ (visualization/plotly_generator.py, data_processing/database.py) rather than duplicating logic in dash_app/
+- **Use className from nhs.css** — not inline styles
+- **dcc.Store for state** — no server-side globals
+- **Unidirectional callbacks** — app-state → chart-data → UI
+- **Port icicle_figure exactly** — same customdata, colorscale, templates
 - Keep commits atomic and well-described
- If stuck on the same issue for more than 2 attempts within one iteration, document it in progress.txt and move to the next ready task
- When in doubt, check existing code for patterns that work
- **Pipeline before UI** — processing logic before Reflex changes
- **Don't change directory charts** — only indication chart matching changes
+- If stuck for 2+ attempts, document in progress.txt and move on
+- `python run_dash.py` must work after every task
@@ -5,142 +5,144 @@ A web-based application for analyzing secondary care patient treatment pathways.
 ## Features

 - **Interactive Visualization**: Plotly icicle charts showing patient treatment hierarchies with cost and frequency statistics
- **Multi-Source Data Loading**: CSV/Parquet files, SQLite database, or direct Snowflake integration
- **GP Diagnosis Validation**: Validate patient indications against GP SNOMED codes via NHS Snowflake
- **Modern Web Interface**: Browser-based UI using Reflex framework with NHS branding
+- **Dual Chart Types**: Directory-based (Trust → Directorate → Drug → Pathway) and Indication-based (Trust → GP Diagnosis → Drug → Pathway) views
+- **Pre-computed Pathways**: Treatment pathways pre-processed and stored in SQLite for sub-50ms filter response times
+- **GP Diagnosis Matching**: Patient indications matched from GP records using SNOMED cluster codes (~93% match rate)
+- **Modern Web Interface**: Browser-based UI using Dash (Plotly) + Dash Mantine Components with NHS branding
+- **Drug Browser**: Drawer-based card browser organized by clinical directorate for drug/indication selection
 - **Flexible Filtering**: Filter by date range, NHS trusts, drugs, and medical directories
- **Export Options**: Export charts as interactive HTML or data as CSV

 ## Requirements

 - Python 3.10 or higher
- pip or uv package manager
+- uv package manager (recommended)

-### Optional (for Snowflake integration)
- `snowflake-connector-python` package
+### Optional (for data refresh)
 - Access to NHS Snowflake data warehouse with SSO authentication

 ## Installation

-### Using pip
-
 ```bash
 # Clone the repository
 git clone <repository-url>
 cd patient-pathway-analysis

 # Install dependencies
-pip install -r requirements.txt
-```
-
-### Using uv (recommended)
-
-```bash
-# Install uv if not already installed
-pip install uv
-
-# Sync dependencies
 uv sync
-```

-### Install with test dependencies
-
-```bash
-pip install -e ".[test]"
+# One-time dev setup: adds src/ to Python path via .pth file
+uv run python setup_dev.py
 ```

 ## Quick Start

-### 1. Run the Web Application (Recommended)
+### Run the Web Application

 ```bash
-reflex run
+python run_dash.py
 ```

-Open http://localhost:3000 in your browser.
+Open http://localhost:8050 in your browser.
+
+The application loads pre-computed pathway data from SQLite on startup. No additional configuration is needed for viewing existing data.
+
+### Refresh Pathway Data (requires Snowflake)
+
+```bash
+# Initialize/migrate the database
+python -m data_processing.migrate
+
+# Full refresh — both chart types, all date filters
+python -m cli.refresh_pathways --chart-type all
+
+# Directory charts only (faster, ~5 minutes)
+python -m cli.refresh_pathways --chart-type directory
+
+# Indication charts only (~12 minutes, includes GP lookup)
+python -m cli.refresh_pathways --chart-type indication
+
+# Dry run (test without database changes)
+python -m cli.refresh_pathways --chart-type all --dry-run -v
+```

 ## Usage

-### Web Interface (Reflex)
+### Interface Overview

-1. **Load Data**: On the home page, select your data source:
-   - **SQLite Database**: Uses pre-loaded data from `data/pathways.db`
-   - **File Upload**: Drag and drop a CSV or Parquet file
-   - **Snowflake**: Fetch data directly from NHS Snowflake (requires configuration)
+The application has a single-page layout with:

-2. **Configure Filters**:
-   - Set date range (Start Date, End Date, Last Seen After)
-   - Navigate to Drug/Trust/Directory selection pages using the sidebar
-   - Use search boxes to find and select items
-   - Set minimum patient threshold to filter small groups
+| Component | Purpose |
+|-----------|---------|
+| **Header** | NHS branding, data freshness indicator (patient count + relative time) |
+| **Sidebar** | Navigation items with drawer triggers for Drug Selection, Trust Selection, Indications |
+| **KPI Row** | 4 cards: Unique Patients, Drug Types, Total Cost, Indication Match Rate |
+| **Filter Bar** | Chart type toggle (By Directory / By Indication) + date filter dropdowns |
+| **Chart Card** | Interactive Plotly icicle chart with loading spinner |
+| **Drawer** | Right-side panel with drug chips, trust chips, and directorate card browser |

-3. **Run Analysis**: Click "Run Analysis" to generate the icicle chart
+### Filtering Data

-4. **Export Results**:
-   - **Export HTML**: Save the interactive chart as a standalone HTML file
-   - **Export CSV**: Export the filtered data as a CSV file
+1. **Chart Type**: Toggle between "By Directory" and "By Indication" views
+2. **Date Filters**: Select treatment initiation period and last-seen window
+3. **Drug Selection**: Open the drawer to select specific drugs via chips
+4. **Trust Selection**: Open the drawer to filter by NHS trusts
+5. **Directorate Browser**: Navigate directorates → indications → drug fragments in the drawer
+6. **Clear Filters**: Reset all selections to show full dataset

-### Data Migration
+### Understanding the Pathway Chart

-To populate the SQLite database from CSV files:
+The icicle chart displays hierarchical treatment pathways:

-```bash
-# Initialize database schema
-python -m data_processing.migrate
-
-# Load reference data from CSV files
-python -m data_processing.migrate --reference-data --verify
-
-# Load patient data from a CSV/Parquet file
-python -m data_processing.migrate --load-patient-data path/to/data.csv
+```
+Root (Regional Total)
+  └─ Trust Name (e.g., "Norfolk and Norwich University Hospitals")
+      └─ Directory/Indication (e.g., "Rheumatology" or "rheumatoid arthritis")
+          └─ Drug Name (e.g., "ADALIMUMAB")
+              └─ Treatment Pathway (e.g., "ADALIMUMAB → INFLIXIMAB")
 ```

-### Snowflake Configuration
+- **Width**: Relative patient count
+- **Color intensity**: Proportion of parent group
+- **Hover**: Shows cost, dosing frequency, date range, and per-patient statistics
+- **Click**: Zoom into a specific branch

-To use Snowflake integration, edit `config/snowflake.toml`:
+### Date Filter Combinations

-```toml
-[connection]
-account = "your-account-identifier"
-warehouse = "your-warehouse"
-database = "DATA_HUB"
-schema = "CDM"
-authenticator = "externalbrowser"  # NHS SSO authentication
-```
+| Initiated | Last Seen | Description |
+|-----------|-----------|-------------|
+| All years | Last 6 months | Default — all patients active recently |
+| All years | Last 12 months | Broader activity window |
+| Last 1 year | Last 6 months | Recently initiated, active |
+| Last 1 year | Last 12 months | Recently initiated, any activity |
+| Last 2 years | Last 6 months | Medium history, active |
+| Last 2 years | Last 12 months | Medium history, any activity |

 ## Project Structure

 ```
 .
-├── core/                    # Core configuration and models
-├── data_processing/         # Data layer (SQLite, Snowflake, loaders)
-├── analysis/                # Analysis pipeline (refactored from generate_graph)
-├── visualization/           # Chart generation (Plotly)
-├── pathways_app/            # Reflex web application
-├── tools/                   # Legacy modules (original analysis engine)
-├── config/                  # Configuration files
-├── data/                    # Reference data and SQLite database
-├── docs/                    # Additional documentation
-└── tests/                   # Test suite
+├── src/                         # All application library code
+│   ├── core/                    # Foundation: paths, models, logging
+│   ├── config/                  # Snowflake connection settings
+│   ├── data_processing/         # Data layer (SQLite, Snowflake, transforms)
+│   ├── analysis/                # Analysis pipeline
+│   ├── visualization/           # Plotly chart generation
+│   └── cli/                     # CLI tools (refresh_pathways)
+├── dash_app/                    # Dash web application
+│   ├── app.py                   # App entry point, layout, stores
+│   ├── assets/nhs.css           # NHS design system CSS
+│   ├── data/                    # Query wrappers + card browser data
+│   ├── components/              # UI components (header, sidebar, etc.)
+│   └── callbacks/               # Dash callbacks (filters, chart, KPI, drawer)
+├── run_dash.py                  # Entry point: python run_dash.py
+├── data/                        # Reference data + SQLite DB (pathways.db)
+├── tests/                       # Test suite (113 tests)
+├── docs/                        # Documentation
+└── archive/                     # Historical/deprecated code
 ```

 See `CLAUDE.md` for detailed architecture documentation.

-## Documentation
-
- [docs/USER_GUIDE.md](docs/USER_GUIDE.md) - End-user guide for using the web interface
- [docs/DEPLOYMENT.md](docs/DEPLOYMENT.md) - Production deployment guide (Docker, nginx, cloud)
- [CLAUDE.md](CLAUDE.md) - Technical architecture documentation for developers
-
-## Deployment
-
-Quick production start:
-
-```bash
-# Run in production mode
-reflex run --env prod
-```
-
 ## Running Tests

 ```bash
@@ -150,75 +152,56 @@ python -m pytest tests/ -v
 # Run with coverage
 python -m pytest tests/ -v --cov=core --cov=data_processing --cov=analysis

-# Run only fast tests (exclude slow/integration)
+# Run only fast tests
 python -m pytest tests/ -v -m "not slow"
 ```

-## Reference Data Files
+## Configuration

-The `data/` directory contains essential reference files:
+### Snowflake Connection (`src/config/snowflake.toml`)

-| File | Purpose |
-|------|---------|
-| `include.csv` | Drug filter list with default selections |
-| `defaultTrusts.csv` | NHS Trust list for filtering |
-| `directory_list.csv` | Medical specialties/directories |
-| `drugnames.csv` | Drug name standardization mapping |
-| `org_codes.csv` | Provider code to organization name mapping |
-| `drug_directory_list.csv` | Valid drug-to-directory mappings |
-| `drug_indication_clusters.csv` | Drug to SNOMED cluster mappings |
-| `ta-recommendations.xlsx` | NICE TA recommendations |
+```toml
+[snowflake]
+account = "your-account"
+database = "DATA_HUB"
+schema = "CDM"
+warehouse = "your-warehouse"
+authenticator = "externalbrowser"  # Required for NHS SSO
+```

 ## Troubleshooting

-### Reflex compilation errors
-
-If you encounter compilation errors when running `reflex run`:
+### App won't start

 ```bash
-# Clear the build cache and restart
-rm -rf .web
-reflex run
+# Ensure dependencies are installed
+uv sync
+
+# Ensure src/ is on Python path
+uv run python setup_dev.py
+
+# Try running with uv
+uv run python run_dash.py
+```
+
+### Database not found
+
+```bash
+# Check data/pathways.db exists
+python -m data_processing.migrate
 ```

 ### Snowflake connection issues

-1. Ensure `snowflake-connector-python` is installed:
-   ```bash
-   pip install snowflake-connector-python
-   ```
+1. Ensure `src/config/snowflake.toml` has the correct account identifier
+2. A browser window will open for SSO authentication
+3. Verify your network allows Snowflake connections

-2. Check that `config/snowflake.toml` has the correct account identifier
+## Documentation

-3. For SSO authentication, a browser window will open automatically
-
-### SQLite database not found
-
-If `data/pathways.db` doesn't exist, create it:
-
-```bash
-python -m data_processing.migrate
-python -m data_processing.migrate --reference-data
-```
-
-## Development
-
-### Code Quality
-
-```bash
-# Type checking
-python -m mypy core/ data_processing/ analysis/ --ignore-missing-imports
-
-# Run tests with coverage report
-python -m pytest tests/ -v --cov=core --cov=data_processing --cov-report=html
-```
-
-### Adding New Reference Data
-
-1. Add CSV file to `data/` directory
-2. Define schema in `data_processing/schema.py`
-3. Create migration function in `data_processing/reference_data.py`
-4. Add path to `PathConfig` in `core/config.py`
+- [CLAUDE.md](CLAUDE.md) — Technical architecture documentation
+- [docs/USER_GUIDE.md](docs/USER_GUIDE.md) — End-user guide
+- [docs/DEPLOYMENT.md](docs/DEPLOYMENT.md) — Deployment guide

 ## License

@@ -1,10 +1,10 @@
-# Reflex Deployment Guide
+# Deployment Guide

-This guide covers deployment options for the Patient Pathway Analysis web application built with Reflex.
+This guide covers deployment options for the Patient Pathway Analysis web application built with Dash.

 ## Overview

-Reflex applications compile to a FastAPI backend and Next.js frontend. This creates two deployment artifacts that can be deployed together or separately depending on your infrastructure requirements.
+The application is a single-process Python Dash app that serves both the frontend and API from one server. It reads pre-computed data from a local SQLite database.

 ## Development Mode

@@ -12,9 +12,9 @@ For local development:

 ```bash
 # Start development server with hot reload
-reflex run
+python run_dash.py

-# Access the application at http://localhost:3000
+# Access the application at http://localhost:8050
 ```

 ## Production Deployment Options
@@ -24,84 +24,55 @@ reflex run
 The simplest approach for internal deployments:

 ```bash
-# Run in production mode (optimized build)
-reflex run --env prod
-```
+# Run with Gunicorn (Linux/macOS)
+gunicorn dash_app.app:server -b 0.0.0.0:8050 --workers 4

-This starts:
- FastAPI backend on port 8000
- Next.js frontend on port 3000
+# Or directly with Python
+python run_dash.py
+```

 For background execution:

 ```bash
 # Using nohup (Linux/macOS)
-nohup reflex run --env prod > reflex.log 2>&1 &
+nohup gunicorn dash_app.app:server -b 0.0.0.0:8050 --workers 4 > dash.log 2>&1 &

 # Using PowerShell (Windows)
-Start-Process -NoNewWindow -FilePath "reflex" -ArgumentList "run --env prod"
+Start-Process -NoNewWindow -FilePath "python" -ArgumentList "run_dash.py"
 ```

-### Option 2: Separate Backend and Frontend
-
-For more control, run backend and frontend separately:
-
-```bash
-# Terminal 1: Start backend only
-reflex run --env prod --backend-only
-
-# Terminal 2: Start frontend only
-reflex run --env prod --frontend-only
-```
-
-### Option 3: Static Export
-
-Export the frontend as static files for deployment on static hosting or CDN:
-
-```bash
-# Export application
-reflex export
-
-# This creates:
-# - frontend.zip (static Next.js build)
-# - backend.zip (Python application source)
-```
-
-Then:
-1. Unzip `frontend.zip` and serve via nginx, Apache, or any static file server
-2. Run the backend separately using uvicorn/gunicorn
-
-### Option 4: Docker Deployment
+### Option 2: Docker Deployment

 Create a `Dockerfile` for containerized deployment:

 ```dockerfile
-# Dockerfile
 FROM python:3.11-slim

 WORKDIR /app

-# Install Node.js for Reflex frontend build
-RUN apt-get update && apt-get install -y curl && \
-    curl -fsSL https://deb.nodesource.com/setup_18.x | bash - && \
-    apt-get install -y nodejs && \
-    rm -rf /var/lib/apt/lists/*
+# Install uv for fast dependency management
+RUN pip install uv

-# Copy requirements and install dependencies
-COPY requirements.txt pyproject.toml ./
-RUN pip install --no-cache-dir -r requirements.txt
+# Copy dependency files
+COPY pyproject.toml uv.lock ./
+
+# Install dependencies
+RUN uv sync --no-dev

 # Copy application code
-COPY . .
+COPY src/ src/
+COPY dash_app/ dash_app/
+COPY data/ data/
+COPY run_dash.py setup_dev.py ./

-# Initialize Reflex (downloads frontend dependencies)
-RUN reflex init --loglevel debug
+# Set up Python path
+RUN uv run python setup_dev.py

-# Expose ports
-EXPOSE 3000 8000
+# Expose port
+EXPOSE 8050

-# Start in production mode
-CMD ["reflex", "run", "--env", "prod"]
+# Start the application
+CMD ["uv", "run", "gunicorn", "dash_app.app:server", "-b", "0.0.0.0:8050", "--workers", "4"]
 ```

 Build and run:
@@ -111,41 +82,24 @@ Build and run:
 docker build -t pathway-analysis .

 # Run the container
-docker run -p 3000:3000 -p 8000:8000 \
+docker run -p 8050:8050 \
  -v $(pwd)/data:/app/data \
-  -v $(pwd)/config:/app/config \
  pathway-analysis
 ```

-### Option 5: Docker Compose (Recommended for Production)
-
-Create `docker-compose.yml` for multi-container deployment:
+### Option 3: Docker Compose

 ```yaml
 version: '3.8'

 services:
-  backend:
+  app:
    build: .
-    command: reflex run --env prod --backend-only
    ports:
-      - "8000:8000"
+      - "8050:8050"
    volumes:
      - ./data:/app/data
-      - ./config:/app/config
-    environment:
-      - REFLEX_ENV=prod
-    restart: unless-stopped
-
-  frontend:
-    build: .
-    command: reflex run --env prod --frontend-only
-    ports:
-      - "3000:3000"
-    depends_on:
-      - backend
-    environment:
-      - REFLEX_ENV=prod
+      - ./src/config:/app/src/config
    restart: unless-stopped
 ```

@@ -162,42 +116,16 @@ docker-compose up -d
 For production deployments behind nginx:

 ```nginx
-# /etc/nginx/sites-available/pathway-analysis
 server {
    listen 80;
    server_name your-server.nhs.uk;

-    # Backend API endpoints
-    location /admin {
-        proxy_pass http://localhost:8000;
-        proxy_set_header Host $host;
-        proxy_set_header X-Real-IP $remote_addr;
-    }
-
-    location /ping {
-        proxy_pass http://localhost:8000;
-    }
-
-    location /upload {
-        proxy_pass http://localhost:8000;
-        client_max_body_size 100M;  # For large data file uploads
-    }
-
-    # WebSocket connections (required for Reflex state sync)
-    location /_event/ {
-        proxy_pass http://localhost:8000;
-        proxy_http_version 1.1;
-        proxy_set_header Upgrade $http_upgrade;
-        proxy_set_header Connection "upgrade";
-        proxy_set_header Host $host;
-        proxy_read_timeout 86400;  # 24 hours for long-running connections
-    }
-
-    # Frontend (all other requests)
    location / {
-        proxy_pass http://localhost:3000;
+        proxy_pass http://localhost:8050;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
+        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
+        proxy_set_header X-Forwarded-Proto $scheme;
    }
 }
 ```
@@ -209,69 +137,21 @@ sudo ln -s /etc/nginx/sites-available/pathway-analysis /etc/nginx/sites-enabled/
 sudo nginx -t && sudo systemctl reload nginx
 ```

-### Caddy (Alternative)
-
-Caddy provides automatic HTTPS:
-
-```caddyfile
-# Caddyfile
-your-server.nhs.uk {
-    # Backend API
-    handle /admin/* {
-        reverse_proxy localhost:8000
-    }
-    handle /ping {
-        reverse_proxy localhost:8000
-    }
-    handle /upload {
-        reverse_proxy localhost:8000
-    }
-    handle /_event/* {
-        reverse_proxy localhost:8000
-    }
-
-    # Frontend
-    handle {
-        reverse_proxy localhost:3000
-    }
-}
-```
-
 ## Process Management

 ### Systemd (Linux)

-Create service files for automatic startup:
-
 ```ini
-# /etc/systemd/system/pathway-backend.service
+# /etc/systemd/system/pathway-analysis.service
 [Unit]
-Description=Pathway Analysis Backend
+Description=Pathway Analysis Dash App
 After=network.target

 [Service]
 Type=simple
 User=www-data
 WorkingDirectory=/opt/pathway-analysis
-ExecStart=/usr/bin/reflex run --env prod --backend-only
-Restart=always
-RestartSec=10
-
-[Install]
-WantedBy=multi-user.target
-```
-
-```ini
-# /etc/systemd/system/pathway-frontend.service
-[Unit]
-Description=Pathway Analysis Frontend
-After=network.target pathway-backend.service
-
-[Service]
-Type=simple
-User=www-data
-WorkingDirectory=/opt/pathway-analysis
-ExecStart=/usr/bin/reflex run --env prod --frontend-only
+ExecStart=/opt/pathway-analysis/.venv/bin/gunicorn dash_app.app:server -b 0.0.0.0:8050 --workers 4
 Restart=always
 RestartSec=10

@@ -283,8 +163,8 @@ Enable and start:

 ```bash
 sudo systemctl daemon-reload
-sudo systemctl enable pathway-backend pathway-frontend
-sudo systemctl start pathway-backend pathway-frontend
+sudo systemctl enable pathway-analysis
+sudo systemctl start pathway-analysis
 ```

 ### Windows Service
@@ -296,8 +176,8 @@ Use NSSM (Non-Sucking Service Manager) on Windows:
 choco install nssm

 # Create service
-nssm install PathwayAnalysis "C:\Path\To\reflex.exe" "run --env prod"
-nssm set PathwayAnalysis AppDirectory "C:\Path\To\Patient pathway analysis"
+nssm install PathwayAnalysis "C:\Path\To\python.exe" "run_dash.py"
+nssm set PathwayAnalysis AppDirectory "C:\Path\To\pathway-analysis"
 nssm start PathwayAnalysis
 ```

@@ -305,192 +185,112 @@ nssm start PathwayAnalysis

 ### Production Environment Variables

-Set these environment variables for production:
-
 ```bash
-# Reflex configuration
-export REFLEX_ENV=prod
-
-# Database paths (if using custom locations)
+# Database path (if using custom location)
 export PATHWAY_DB_PATH=/var/data/pathways.db
-export PATHWAY_CACHE_DIR=/var/cache/pathway-analysis

-# Snowflake (if using)
+# Snowflake (for data refresh only — not needed for the web app)
 export SNOWFLAKE_ACCOUNT=your-account
 export SNOWFLAKE_WAREHOUSE=your-warehouse
 ```

 ### Snowflake Configuration

-Ensure `config/snowflake.toml` is properly configured for production:
+Snowflake is only needed for the data refresh CLI command, not for running the web application. Ensure `src/config/snowflake.toml` is configured:

 ```toml
-[connection]
+[snowflake]
 account = "your-production-account"
 warehouse = "ANALYTICS_WH"
 database = "DATA_HUB"
 schema = "CDM"
-authenticator = "externalbrowser"  # or "oauth" for service accounts
-
-[cache]
-enabled = true
-directory = "/var/cache/pathway-analysis"
-ttl_seconds = 86400  # 24 hours
+authenticator = "externalbrowser"
 ```

-## Reflex Cloud
+## Data Refresh

-For managed hosting, consider [Reflex Cloud](https://reflex.dev/cloud/):
+The web application reads pre-computed data from SQLite. To update the data:

 ```bash
-# Deploy to Reflex Cloud
-reflex deploy
+# Full refresh (both chart types, all date filters)
+python -m cli.refresh_pathways --chart-type all
+
+# The app will serve new data immediately — no restart needed
 ```

-Benefits:
- Zero configuration deployment
- Automatic scaling
- Built-in SSL certificates
- Managed state management with Redis
+Schedule this as a cron job or Windows Task Scheduler task for periodic updates.

 ## Security Considerations

 ### Network Security

-1. **Firewall Rules**: Only expose necessary ports (typically just 80/443)
-2. **HTTPS**: Use TLS certificates (Let's Encrypt or organizational certs)
+1. **Firewall Rules**: Only expose port 8050 (or 80/443 behind reverse proxy)
+2. **HTTPS**: Use TLS certificates via reverse proxy (nginx, Caddy)
 3. **VPN**: Consider restricting access to NHS network only

 ### Data Security

-1. **Database Access**: Ensure SQLite database permissions are restricted
-2. **File Uploads**: Validate file types and scan for malware
-3. **Snowflake**: Use least-privilege service accounts
-
-### Authentication
-
-For NHS deployments, consider adding authentication:
-
-```python
-# Example: Add basic auth middleware
-import reflex as rx
-from starlette.middleware import Middleware
-from starlette.middleware.authentication import AuthenticationMiddleware
-
-# In rxconfig.py
-config = rx.Config(
-    app_name="pathways_app",
-    # Add authentication middleware
-)
-```
+1. **Database Access**: The app uses read-only SQLite access
+2. **No file uploads**: The Dash app does not accept file uploads
+3. **No authentication built in**: Add authentication via reverse proxy or middleware if needed

 ## Monitoring

 ### Health Checks

-The application provides endpoints for monitoring:
-
- `/ping` - Basic health check
- Backend port 8000 - FastAPI health
+The application serves at `/` — a 200 response indicates the app is running.

 ### Logging

-Configure logging for production:
+Dash outputs request logs to stdout. Configure log aggregation as needed:

-```python
-# In pathways_app/pathways_app.py
-import logging
-
-logging.basicConfig(
-    level=logging.INFO,
-    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
-    handlers=[
-        logging.FileHandler('/var/log/pathway-analysis/app.log'),
-        logging.StreamHandler()
-    ]
-)
+```bash
+# Redirect logs to file
+gunicorn dash_app.app:server -b 0.0.0.0:8050 --access-logfile /var/log/pathway-analysis/access.log --error-logfile /var/log/pathway-analysis/error.log
 ```

 ## Troubleshooting

-### Common Issues
+### Port already in use

-**Port already in use:**
 ```bash
-# Find and kill process using port 3000
-lsof -i :3000
-kill -9 <PID>
+# Find process using port 8050
+lsof -i :8050   # Linux/macOS
+netstat -ano | findstr :8050   # Windows
 ```

-**Build cache issues:**
-```bash
-# Clear Reflex build cache
-rm -rf .web
-reflex run --env prod
-```
+### Database not found

-**Database connection errors:**
 ```bash
-# Verify database exists and has correct permissions
+# Verify database exists
 ls -la data/pathways.db
 sqlite3 data/pathways.db ".tables"
+
+# Recreate if needed
+python -m data_processing.migrate
+python -m cli.refresh_pathways --chart-type all
 ```

-**Snowflake authentication:**
- Ensure browser is available for SSO popup
- Check firewall allows connections to Snowflake endpoints
- Verify account identifier is correct
-
-## Performance Tuning
-
-### Backend (FastAPI/Uvicorn)
-
-For high-traffic deployments:
+### Import errors

 ```bash
-# Run with multiple workers
-uvicorn pathways_app:app --workers 4 --host 0.0.0.0 --port 8000
-```
+# Ensure src/ is on Python path
+uv run python setup_dev.py

-### State Management
-
-For multi-instance deployments, configure Redis for state management:
-
-```python
-# rxconfig.py
-config = rx.Config(
-    app_name="pathways_app",
-    state_manager_mode="redis",
-    redis_url="redis://localhost:6379/0",
-)
-```
-
-### Caching
-
-Enable aggressive caching for Snowflake queries in `config/snowflake.toml`:
-
-```toml
-[cache]
-enabled = true
-ttl_seconds = 86400  # 24 hours for historical data
-ttl_current_data_seconds = 3600  # 1 hour for recent data
-max_size_mb = 1000  # 1GB cache
+# Verify imports
+uv run python -c "from dash_app.app import app; print('OK')"
 ```

 ---

 ## Quick Reference

-| Environment | Command | Ports |
-|-------------|---------|-------|
-| Development | `reflex run` | 3000, 8000 |
-| Production | `reflex run --env prod` | 3000, 8000 |
-| Backend only | `reflex run --backend-only` | 8000 |
-| Frontend only | `reflex run --frontend-only` | 3000 |
-| Export | `reflex export` | Static files |
-| Cloud | `reflex deploy` | Managed |
+| Environment | Command | Port |
+|-------------|---------|------|
+| Development | `python run_dash.py` | 8050 |
+| Production | `gunicorn dash_app.app:server -b 0.0.0.0:8050 --workers 4` | 8050 |
+| Docker | `docker run -p 8050:8050 pathway-analysis` | 8050 |

 For more information, see:
- [Reflex Documentation](https://reflex.dev/docs/)
- [Reflex Cloud](https://reflex.dev/cloud/)
- [FastAPI Deployment](https://fastapi.tiangolo.com/deployment/)
+- [Dash Documentation](https://dash.plotly.com/)
+- [Gunicorn Deployment](https://docs.gunicorn.org/en/stable/deploy.html)
@@ -187,8 +187,8 @@ All transitions: 150ms ease-out (faster than before)
 }
 ```

-### Reflex Implementation
- Use `height="calc(100vh - 96px)"` for chart container
- Use `width="100%"` with `padding_x="16px"` for full-width
- Use `flex="1"` to let chart grow
- Keep `min_height="500px"` as fallback
+### Dash Implementation
+- Chart container uses `dcc.Loading` wrapper around `dcc.Graph`
+- Full-width layout via CSS class `.chart-card` in `dash_app/assets/nhs.css`
+- Minimum height set via CSS: `min-height: 500px`
+- Margins controlled in `create_icicle_from_nodes()`: `t:40, l:8, r:8, b:24`
@@ -6,15 +6,11 @@ This guide explains how to use the NHS High-Cost Drug Patient Pathway Analysis T

 1. [Getting Started](#getting-started)
 2. [Interface Overview](#interface-overview)
-3. [Selecting Your Data Source](#selecting-your-data-source)
-4. [Configuring Analysis Filters](#configuring-analysis-filters)
-5. [Selecting Drugs, Trusts, and Directories](#selecting-drugs-trusts-and-directories)
-6. [Running the Analysis](#running-the-analysis)
-7. [Understanding the Pathway Chart](#understanding-the-pathway-chart)
-8. [Exporting Results](#exporting-results)
-9. [GP Indication Validation](#gp-indication-validation)
-10. [Keyboard Navigation and Accessibility](#keyboard-navigation-and-accessibility)
-11. [Troubleshooting](#troubleshooting)
+3. [Filtering Data](#filtering-data)
+4. [Using the Drug Browser](#using-the-drug-browser)
+5. [Understanding the Pathway Chart](#understanding-the-pathway-chart)
+6. [GP Indication Matching](#gp-indication-matching)
+7. [Troubleshooting](#troubleshooting)

 ---

@@ -25,371 +21,229 @@ This guide explains how to use the NHS High-Cost Drug Patient Pathway Analysis T
 Start the application by running:

 ```bash
-reflex run
+python run_dash.py
 ```

-Then open your browser to **http://localhost:3000**
+Then open your browser to **http://localhost:8050**

-The application will automatically load reference data (drugs, trusts, directories) when you first access it.
+The application automatically loads pre-computed pathway data from SQLite on startup. No additional setup is needed to view existing data.

-### First-Time Setup
+### Data Freshness

-1. Click **Load Reference Data** on the Home page to populate the filter options
-2. Select your preferred data source (SQLite, File Upload, or Snowflake)
-3. Configure your date range and other filters
-4. Click **Run Analysis** to generate your first pathway chart
+The header bar shows when data was last refreshed:
+- **Patient count**: Total patients in the dataset (e.g., "11,118 patients")
+- **Last updated**: Relative time since the last data refresh (e.g., "2h ago")
+
+To refresh the data, run the CLI command (requires Snowflake access):
+
+```bash
+python -m cli.refresh_pathways --chart-type all
+```

 ---

 ## Interface Overview

-The application has four main pages, accessible from the sidebar navigation:
+The application is a single-page layout with the following components:

-| Page | Purpose |
-|------|---------|
-| **Home** | Main analysis dashboard with data source selection, filters, and chart display |
-| **Drug Selection** | Select which high-cost drugs to include in the analysis |
-| **Trust Selection** | Filter by specific NHS trusts |
-| **Directory Selection** | Filter by medical directories/specialties |
+### Header
+- NHS branding and application title ("HCD Analysis")
+- Green status dot with patient count and last-updated time

-### Navigation
+### Sidebar (Left)
+Navigation items including:
+- **Pathway Overview** — main view (always active)
+- **Drug Selection** — opens the drug browser drawer
+- **Trust Selection** — opens the drawer with trust chips
+- **Indications** — opens the drawer with directorate browser

- **Desktop**: Use the sidebar on the left to switch between pages
- **Mobile**: Use the top navigation bar
- **Keyboard**: Press Tab to navigate, Enter to select
+### KPI Row
+Four summary cards that update dynamically:
+- **Unique Patients** — number of distinct patients matching current filters
+- **Drug Types** — number of distinct drugs in filtered data
+- **Total Cost** — total cost of treatments in the filtered dataset
+- **Indication Match** — GP diagnosis match rate (~93% for indication charts, shown as "—" for directory charts)
+
+### Filter Bar
+- **Chart type toggle**: "By Directory" / "By Indication" pills
+- **Treatment Initiated**: All years, Last 2 years, or Last 1 year
+- **Last Seen**: Last 6 months or Last 12 months
+
+### Chart Card
+- Dynamic subtitle showing the current hierarchy (e.g., "Trust → Directorate → Drug → Pathway")
+- Interactive Plotly icicle chart
+- Loading spinner during data fetch

 ---

-## Selecting Your Data Source
+## Filtering Data

-The application supports three data sources:
+### Chart Type

-### 1. SQLite Database (Recommended)
+Toggle between two views using the pills in the filter bar:

-Pre-loaded patient data stored locally for fast performance.
+| View | Hierarchy | Best For |
+|------|-----------|----------|
+| **By Directory** | Trust → Directorate → Drug → Pathway | Understanding treatment by medical specialty |
+| **By Indication** | Trust → GP Diagnosis → Drug → Pathway | Understanding treatment by patient condition |

-**Advantages:**
- Fastest analysis performance
- Works offline
- No authentication required
+### Date Filters

-**To use:** Click "Use SQLite" in the Data Source section
+Two dropdowns control the time window:

-### 2. File Upload
+| Filter | Options | Effect |
+|--------|---------|--------|
+| **Treatment Initiated** | All years, Last 2 years, Last 1 year | When patients started treatment |
+| **Last Seen** | Last 6 months, Last 12 months | Most recent activity window |

-Upload CSV or Parquet files directly.
+The default is "All years / Last 6 months" — showing all patients who have been active in the last 6 months.

-**Supported formats:**
- CSV files (.csv)
- Apache Parquet files (.parquet, .pq)
+### Drug and Trust Selection

-**To use:**
-1. Drag and drop a file, or click the upload area
-2. Wait for the file to process
-3. Click "Use File" to select it as your data source
+Open the drawer (right panel) by clicking "Drug Selection" or "Trust Selection" in the sidebar:

-### 3. Snowflake
+- **Drug chips**: Click to select/deselect specific drugs. Selected drugs filter the chart.
+- **Trust chips**: Click to select/deselect specific NHS trusts.
+- **Clear All Filters**: Button at the bottom resets all drug and trust selections.

-Query live data from the NHS data warehouse.
-
-**Requirements:**
- Snowflake must be configured (see `config/snowflake.toml`)
- Browser-based NHS SSO authentication
-
-**To use:** Click "Use Snowflake" - you'll be prompted to authenticate via your browser
+**No selections = show everything.** Leaving chips unselected is the same as selecting all.

 ---

-## Configuring Analysis Filters
+## Using the Drug Browser

-The Home page provides several filter options:
+The drawer contains three sections:

-### Date Range
+### All Drugs
+A flat list of all 42 available drugs as selectable chips. Click one or more to filter the chart to those drugs only.

-| Field | Description |
-|-------|-------------|
-| **Start Date** | Include patients initiated from this date onwards |
-| **End Date** | Include patients initiated until this date |
-| **Last Seen After** | Only include patients with activity after this date (excludes patients who haven't been seen recently) |
+### Trusts
+A list of 7 NHS trusts as selectable chips. Click to filter by specific organizations.

-**Tip:** The default range is the last 12 months.
+### By Directorate
+An accordion browser organized by clinical directorate:

-### Minimum Patients
+1. Click a **directorate** (e.g., "CARDIOLOGY") to expand it
+2. Inside, click an **indication** (e.g., "heart failure") to expand further
+3. Each indication shows **drug fragment badges** (e.g., "SACUBITRIL", "IVABRADINE")
+4. Clicking a drug fragment badge selects all full drug names that contain that fragment

-Filter out pathways with fewer patients than the threshold you set.
+For example, clicking the "ADALIMUMAB" badge would select "ADALIMUMAB" in the drug chips above.

- Use the slider for quick adjustment (0-100)
- Or type a specific number in the text field
- Set to 0 to show all pathways regardless of patient count
+### Fragment Matching

-### Custom Title
+Drug fragments are substrings, not exact matches. The fragment "INHALED" would match drugs like "INHALED BECLOMETASONE" and "INHALED FLUTICASONE".

-Override the automatically generated chart title with your own text.
-
- Leave empty to use the default title: "Patients initiated [start date] to [end date]"
- Useful for specific reports or presentations
-
---
-
-## Selecting Drugs, Trusts, and Directories
-
-Each selection page works the same way:
-
-### Navigation
-
-1. Click "Drug Selection", "Trust Selection", or "Directory Selection" in the sidebar
-2. The page shows all available options with checkboxes
-
-### Search
-
-Type in the search box to filter the list. The list updates as you type.
-
-### Selection Actions
-
-| Button | Action |
-|--------|--------|
-| **Select All** | Check all visible items |
-| **Clear All** | Uncheck all items |
-| **Select Defaults** | (Drugs only) Select pre-configured default drugs (Include=1 in include.csv) |
-
-### Selection Behavior
-
- **No items selected** = Include ALL items in analysis
- **Some items selected** = Include ONLY the selected items
-
-This means leaving a filter empty is equivalent to "select all".
-
---
-
-## Running the Analysis
-
-### Steps
-
-1. Ensure your data source is selected and configured
-2. Set your date range and other filters
-3. Select desired drugs, trusts, and directories (or leave empty for all)
-4. Click the green **Run Analysis** button
-
-### During Analysis
-
- The button shows a spinner while analysis is running
- Status messages appear below the button
- The interface remains responsive - you can review settings
-
-### After Analysis
-
- The pathway chart appears in the chart section
- Export buttons become available
- GP indication validation results appear (if Snowflake is connected)
+Clicking a fragment toggles its matching drugs:
+- **First click**: Selects all matching drugs
+- **Second click**: Deselects all matching drugs (if all were already selected)

 ---

 ## Understanding the Pathway Chart

-The analysis generates an interactive **icicle chart** showing patient treatment pathways.
-
 ### Hierarchy Structure

-The chart displays a hierarchical structure:
+The icicle chart displays a hierarchical breakdown:

+**Directory view:**
 ```
-N&WICS (Regional Total)
-  └─ Trust Name (e.g., "Norfolk and Norwich University Hospitals")
-      └─ Directory (e.g., "Rheumatology", "Gastroenterology")
-          └─ Drug Name (e.g., "ADALIMUMAB", "INFLIXIMAB")
+Root (Regional Total)
+  └─ Trust (e.g., "Norfolk and Norwich University Hospitals")
+      └─ Directorate (e.g., "RHEUMATOLOGY")
+          └─ Drug (e.g., "ADALIMUMAB")
+              └─ Pathway (e.g., "ADALIMUMAB → INFLIXIMAB")
+```
+
+**Indication view:**
+```
+Root (Regional Total)
+  └─ Trust
+      └─ GP Diagnosis (e.g., "rheumatoid arthritis")
+          └─ Drug
+              └─ Pathway
 ```

 ### Reading the Chart

 - **Width** of each section indicates relative patient count
- **Color intensity** indicates proportion of patients at that level
- **Labels** show the category name and patient count
+- **Color intensity** (NHS blue gradient) indicates proportion of parent group
+- **Labels** show the name and patient count

 ### Interacting with the Chart

 | Action | Effect |
 |--------|--------|
 | **Click** a section | Zoom in to show details for that branch |
-| **Click** the root | Zoom out to show full hierarchy |
-| **Hover** over a section | See tooltip with patient count |
-| Use the **toolbar** | Reset, download image, pan, zoom |
+| **Click** the parent/root | Zoom back out |
+| **Hover** over a section | See tooltip with patient count, cost, dosing frequency, dates |

-### Plotly Toolbar
+### Hover Tooltip Information

-The chart includes a Plotly toolbar (top right) with:
-
- **Download as PNG** - Save static image
- **Zoom controls** - Zoom in/out
- **Pan** - Click and drag to move
- **Reset** - Return to original view
+When hovering over a chart section, you'll see:
+- Patient count and percentage of parent
+- Total cost and cost per patient
+- First and last seen dates
+- Treatment dosing frequency (for drug nodes)
+- Cost per patient per annum

 ---

-## Exporting Results
+## GP Indication Matching

-Two export options are available after running an analysis:
+When viewing "By Indication" charts, the application uses pre-computed GP diagnosis matches:

-### Export HTML
+### How It Works

-Creates an interactive HTML file that can be opened in any browser.
+1. During data refresh, each patient's NHS pseudonym is queried against GP primary care records
+2. SNOMED cluster codes map clinical conditions to drug indications
+3. The most recent GP diagnosis match is used for each patient
+4. ~93% of patients are matched to a GP diagnosis

- **Output**: `data/exports/pathway_chart_[timestamp].html`
- **Use case**: Sharing interactive charts via email or file share
- **Features**: Full interactivity, no software required to view
+### Unmatched Patients

-### Export CSV
+Patients without a GP diagnosis match appear under their directorate with a "(no GP dx)" suffix (e.g., "RHEUMATOLOGY (no GP dx)").

-Exports the underlying data as a spreadsheet.
-
- **Output**: `data/exports/pathway_data_[timestamp].csv`
- **Use case**: Further analysis in Excel, importing to other tools
- **Includes**: Patient IDs, drugs, dates, costs, directories, indication validation status
-
-### Export Location
-
-All exports are saved to the `data/exports/` directory with timestamped filenames to prevent overwriting.
-
---
-
-## GP Indication Validation
-
-When connected to Snowflake, the application validates whether patients have appropriate GP diagnoses for their prescribed drugs.
-
-### What It Does
-
-1. Looks up the drug's licensed indications (e.g., ADALIMUMAB for rheumatoid arthritis)
-2. Finds corresponding SNOMED codes for those indications
-3. Checks each patient's GP records for matching diagnoses
-4. Reports the match rate per drug
-
-### Understanding Results
-
-After analysis, a table shows:
-
-| Column | Meaning |
-|--------|---------|
-| **Drug Name** | The high-cost drug |
-| **Total Patients** | Number of patients prescribed this drug |
-| **With GP Indication** | Patients with matching GP diagnosis |
-| **Match Rate** | Percentage with valid indication |
-
-### Match Rate Interpretation
-
-| Rate | Meaning | Color |
-|------|---------|-------|
-| **80%+** | Good coverage - most patients have GP diagnoses | Green |
-| **50-79%** | Moderate coverage - investigate missing cases | Orange |
-| **<50%** | Low coverage - may indicate data quality issues or off-label use | Red |
-
-### Why Rates May Be Low
-
-Low match rates don't necessarily indicate problems:
-
- **Cross-provider treatment**: Patient's GP is outside the data coverage
- **Recent diagnoses**: Diagnosis not yet recorded in GP system
- **Specialist-only conditions**: Some conditions are only managed in secondary care
- **Off-label prescribing**: Legitimate use for indications not in the mapping
-
-### Enabling/Disabling
-
-Indication validation is enabled by default when Snowflake is connected. It requires:
- Active Snowflake connection
- Drug-to-cluster mappings in the database
-
---
-
-## Keyboard Navigation and Accessibility
-
-The application is designed to be accessible:
-
-### Skip Link
-
-Press **Tab** when the page loads to reveal a "Skip to main content" link that bypasses navigation.
-
-### Keyboard Navigation
-
-| Key | Action |
-|-----|--------|
-| **Tab** | Move to next interactive element |
-| **Shift+Tab** | Move to previous element |
-| **Enter** | Activate buttons, links, checkboxes |
-| **Space** | Toggle checkboxes |
-| **Arrow keys** | Adjust sliders |
-
-### Screen Reader Support
-
- All buttons and inputs have descriptive labels
- Status messages announce via ARIA live regions
- Charts include figure descriptions
-
-### Theme Toggle
-
-A dark/light mode toggle is available at the bottom of the sidebar for visual preference.
+Reasons for unmatched patients:
+- GP is outside the data coverage area
+- Diagnosis not yet recorded in GP system
+- Condition managed only in secondary care
+- Off-label prescribing

 ---

 ## Troubleshooting

-### "No data available" Error
+### No data showing

-**Cause**: No data matches your current filter settings
+1. Check the filter bar — are filters too restrictive?
+2. Try clearing all drug/trust selections in the drawer
+3. Widen the date range (e.g., "All years / Last 12 months")

-**Solutions:**
-1. Check your date range - is it too narrow?
-2. Verify your data source has data loaded
-3. Check if selected trusts/drugs have any matching records
-4. Try clearing all selections (to include everything)
+### Chart shows "No matching pathways found"

-### Chart Not Displaying
+The current filter combination matches zero patients. Adjust filters or click "Clear All Filters" in the drawer.

-**Cause**: Analysis completed but no data met the minimum patients threshold
+### App won't start

-**Solutions:**
-1. Lower the minimum patients threshold
-2. Expand your date range
-3. Select more drugs or trusts
+```bash
+# Ensure dependencies are installed
+uv sync

-### Snowflake Connection Failed
+# Ensure src/ is on Python path
+uv run python setup_dev.py

-**Cause**: Unable to connect to Snowflake
+# Run with uv
+uv run python run_dash.py
+```

-**Solutions:**
-1. Check that `config/snowflake.toml` exists and is configured
-2. Complete browser authentication when prompted
-3. Verify your network allows Snowflake connections
-4. Try using SQLite as an alternative data source
+### Stale data

-### File Upload Failed
+Data is as fresh as the last CLI refresh. Check the header's "Last updated" indicator. To refresh:

-**Cause**: File format or content issue
-
-**Solutions:**
-1. Ensure file is CSV or Parquet format
-2. Check file isn't corrupted or empty
-3. Verify file contains required columns
-4. Try a smaller file to test
-
-### Slow Performance
-
-**Cause**: Large data volume or complex filtering
-
-**Solutions:**
-1. Use SQLite instead of file upload for large datasets
-2. Narrow your date range
-3. Select fewer drugs/trusts to analyze
-4. Increase minimum patients threshold to reduce chart complexity
-
-### Reference Data Not Loading
-
-**Cause**: Missing or corrupted reference files
-
-**Solutions:**
-1. Click "Load Reference Data" to retry
-2. Check that `data/` directory contains required CSV files:
-   - `include.csv`
-   - `defaultTrusts.csv`
-   - `directory_list.csv`
-3. Verify files aren't empty or malformed
+```bash
+python -m cli.refresh_pathways --chart-type all
+```

 ---

@@ -397,7 +251,7 @@ A dark/light mode toggle is available at the bottom of the sidebar for visual pr

 If you encounter issues not covered in this guide:

-1. Check the [README](../README.md) for installation and setup information
+1. Check the [README](../README.md) for installation and setup
 2. Review [DEPLOYMENT.md](./DEPLOYMENT.md) for server configuration
 3. Consult [CLAUDE.md](../CLAUDE.md) for technical architecture details
-4. Contact your local support team for NHS-specific questions
+4. Contact the Medicines Intelligence team for NHS-specific questions
@@ -5,129 +5,145 @@ If you discover a new failure pattern during your work, add it to this file.

 ---

-## Drug-Indication Matching Guardrails
+## Backend Isolation

-### Match drugs to indications, not just patients to indications
- **When**: Building the indication mapping for pathway charts
- **Rule**: Each drug must be validated against BOTH the patient's GP diagnoses AND the drug-to-indication mapping from DimSearchTerm.csv. A patient being diagnosed with rheumatoid arthritis does NOT mean all their drugs are for rheumatoid arthritis.
- **Why**: The previous approach assigned ONE indication per patient (most recent GP dx), ignoring which drugs actually treat which conditions. This produced misleading pathways.
+### Do NOT modify pipeline/analysis logic in src/
+- **When**: Building Dash integration
+- **Rule**: Do NOT change the logic in these files — they are the data pipeline and must stay as-is:
+  - `data_processing/pathway_pipeline.py`, `transforms.py`, `diagnosis_lookup.py` (matching/query logic)
+  - `analysis/pathway_analyzer.py`, `statistics.py`
+  - `cli/refresh_pathways.py`
+  - `data_processing/schema.py`, `reference_data.py`, `cache.py`, `data_source.py`
+- **Why**: The pipeline is complete and tested. Changing it risks breaking the data refresh workflow.

-### Use DimSearchTerm.csv for drug-to-Search_Term mapping
- **When**: Determining which Search_Term a drug belongs to
- **Rule**: Load `data/DimSearchTerm.csv`. The `CleanedDrugName` column has pipe-separated drug name fragments. Match HCD drug names against these fragments using substring matching (case-insensitive).
- **Why**: This CSV is the authoritative mapping of which drugs are used for which clinical indications.
+### DO use shared utilities in src/ rather than duplicating
+- **When**: The Dash app needs data loading or figure construction
+- **Rule**: Dash callbacks should CALL INTO `src/`, not duplicate the code. Shared functions:
+  - `data_processing/pathway_queries.py` — `load_initial_data()` and `load_pathway_nodes()` for all SQLite queries
+  - `visualization/plotly_generator.py` — `create_icicle_from_nodes()` for icicle chart from list-of-dicts
+  - `dash_app/data/queries.py` — thin wrapper that resolves DB path and delegates to shared functions
+- **Why**: Duplicating SQL queries and figure logic creates copies that drift apart. Shared code in `src/` is the cleaner architecture.

-### Use substring matching for drug fragments
- **When**: Matching HCD drug names against DimSearchTerm CleanedDrugName fragments
- **Rule**: Check if any fragment from DimSearchTerm is a SUBSTRING of the HCD drug name (case-insensitive). E.g., "PEGYLATED" should match "PEGYLATED LIPOSOMAL DOXORUBICIN".
- **Why**: DimSearchTerm contains both full drug names (ADALIMUMAB) and partial fragments (PEGYLATED, INHALED). Exact match would miss the partial ones.
-
-### Modified UPID uses pipe delimiter
- **When**: Creating indication-aware UPIDs
- **Rule**: Format is `{original_UPID}|{search_term}`. Use pipe `|` as delimiter. Do NOT use ` - ` (hyphen with spaces) as that's used for pathway hierarchy levels in the `ids` column.
- **Why**: The `ids` column uses " - " to separate hierarchy levels (e.g., "N&WICS - NNUH - rheumatoid arthritis - ADALIMUMAB"). Using the same delimiter in UPIDs would break hierarchy parsing.
-
-### Return ALL GP matches per patient, not just most recent
- **When**: Querying Snowflake for patient GP diagnoses
- **Rule**: Remove `QUALIFY ROW_NUMBER() OVER (PARTITION BY ... ORDER BY EventDateTime DESC) = 1`. Return ALL matching Search_Terms per patient with `GROUP BY + COUNT(*)` for code_frequency.
- **Why**: A patient may have GP diagnoses for both rheumatoid arthritis AND asthma. We need ALL matches to cross-reference with their drugs.
-
-### Restrict GP code lookup to HCD data window
- **When**: Building the WHERE clause for the GP record query
- **Rule**: Add `AND pc."EventDateTime" >= :earliest_hcd_date` where `earliest_hcd_date` is `MIN(Intervention Date)` from the HCD DataFrame. Pass this as a parameter to `get_patient_indication_groups()`.
- **Why**: Old GP codes from years before treatment started add noise. A diagnosis coded 10 years ago may no longer be relevant. Restricting to the HCD window ensures code_frequency reflects recent clinical activity for the conditions being actively treated.
-
-### Tiebreaker: highest GP code frequency when a drug matches multiple indications
- **When**: A single drug maps to multiple Search_Terms AND the patient has GP dx for multiple
- **Rule**: Use `code_frequency` (COUNT of matching SNOMED codes per Search_Term per patient) from the GP query. The Search_Term with the most matching codes in the patient's GP record wins. If tied, use alphabetical Search_Term for determinism.
- **Why**: E.g., ADALIMUMAB is listed under rheumatoid arthritis, crohn's disease, psoriatic arthritis, etc. A patient with 47 RA codes and 2 crohn's codes is almost certainly on ADALIMUMAB for RA. Frequency of GP coding is a much stronger signal of clinical intent than recency — a recent one-off asthma check doesn't mean ADALIMUMAB is for asthma.
-
-### Same patient, different indications = separate modified UPIDs
- **When**: A patient's drugs map to different Search_Terms
- **Rule**: Create separate modified UPIDs for each indication. E.g., `RMV12345|rheumatoid arthritis` and `RMV12345|asthma`. These are treated as separate "patients" by the pathway analyzer.
- **Why**: This is the core design — drugs for different indications should create separate treatment pathways, even for the same physical patient.
-
-### Fallback to directory for unmatched drugs
- **When**: A drug doesn't match any Search_Term OR the patient has no GP dx for any of the drug's Search_Terms
- **Rule**: Use fallback format: `{UPID}|{Directory} (no GP dx)`. The indication_df maps this to `"{Directory} (no GP dx)"`.
- **Why**: Maintains consistent behavior with the previous approach for patients/drugs without GP diagnosis matches.
-
-### Merge asthma Search_Terms but keep urticaria separate
- **When**: Working with asthma-related Search_Terms from CLUSTER_MAPPING_SQL or DimSearchTerm.csv
- **Rule**: Merge "allergic asthma", "asthma", and "severe persistent allergic asthma" into a single "asthma" Search_Term. Keep "urticaria" as a separate Search_Term — do NOT merge it with asthma.
- **Why**: These are clinically the same condition at different severity levels. Splitting them fragments the data. Urticaria is a distinct dermatological condition that happens to share OMALIZUMAB.
-
-### Don't modify directory chart processing
- **When**: Making changes to the indication matching logic
- **Rule**: Only modify the indication chart path (`elif current_chart_type == "indication":`). Directory charts use unmodified UPIDs and directory-based grouping.
- **Why**: Directory charts work correctly and should not be affected by indication matching changes.
+### Do NOT modify pathways.db schema or data
+- **When**: Querying the database from Dash callbacks
+- **Rule**: Read-only access. Use `sqlite3.connect(db_path)` with SELECT queries only. Never INSERT, UPDATE, DELETE, or ALTER.
+- **Why**: pathways.db is populated by `python -m cli.refresh_pathways`. The Dash app is a read-only consumer.

 ---

-## Snowflake Query Guardrails
+## CSS & Design Fidelity

-### Use PseudoNHSNoLinked for GP record matching
- **When**: Querying GP records (PrimaryCareClinicalCoding) for patient diagnoses
- **Rule**: Use `PseudoNHSNoLinked` column from HCD data, NOT `PersonKey` (LocalPatientID)
- **Why**: PersonKey is provider-specific local ID. Only PseudoNHSNoLinked matches PatientPseudonym in GP records.
+### Use className matching 01_nhs_classic.html, not inline styles
+- **When**: Building any Dash HTML component
+- **Rule**: Use `className="css-class-name"` referencing classes from `dash_app/assets/nhs.css`. Do NOT use inline `style={}` dicts for layout/visual styling. Only use inline styles for truly dynamic values (e.g., `style={"flex": patient_count}` for proportional widths).
+- **Why**: CSS fidelity to the HTML concept is a primary goal. Inline styles drift from the design and are harder to maintain.

-### Embed cluster query as CTE in Snowflake
- **When**: Looking up patient indications during data refresh
- **Rule**: Use the `CLUSTER_MAPPING_SQL` content as a WITH clause in the patient lookup query
- **Why**: This ensures we always use the complete cluster mapping and don't need local storage
+### nhs.css is the single source of CSS truth
+- **When**: Adding or modifying styles
+- **Rule**: All styles go in `dash_app/assets/nhs.css`. If the concept HTML doesn't have a class for something, add it to nhs.css with the same naming convention (`.component__element--modifier`).
+- **Why**: Dash auto-serves files from `assets/`. Keeping CSS in one file matches the design source (01_nhs_classic.html) and avoids style fragmentation.

-### Quote mixed-case column aliases in Snowflake SQL
- **When**: Writing SELECT queries that return results to Python code
- **Rule**: Use `AS "ColumnName"` (quoted) for any column alias you'll access by name in Python
- **Why**: Snowflake uppercases unquoted identifiers. `SELECT foo AS Search_Term` returns `SEARCH_TERM`, so `row.get('Search_Term')` returns None. Fix: `SELECT foo AS "Search_Term"`
-
-### Build indication_df from all unique UPIDs, not PseudoNHSNoLinked
- **When**: Creating the indication mapping DataFrame for pathway processing
- **Rule**: Use `df.drop_duplicates(subset=['UPID'])` not `drop_duplicates(subset=['PseudoNHSNoLinked'])`
- **Why**: A patient visiting multiple providers has multiple UPIDs. Using unique PseudoNHSNoLinked only maps one UPID per patient, leaving others as NaN.
+### Read 01_nhs_classic.html when building UI components
+- **When**: Creating any component in `dash_app/components/`
+- **Rule**: Read `01_nhs_classic.html` first to see the exact HTML structure, CSS classes, and element hierarchy for that component. Match it as closely as possible.
+- **Why**: The HTML concept IS the design spec. Deviating creates visual inconsistency.

 ---

-## Data Processing Guardrails
+## Callback Architecture

-### Copy DataFrames in functions that modify columns
- **When**: Writing functions like `prepare_data()` that modify DataFrame columns
- **Rule**: Always `df = df.copy()` at the start of any function that modifies column values on the input DataFrame
- **Why**: `prepare_data()` mapped Provider Code → Name in-place. When called multiple times on the same DataFrame, only the first call worked. The fix: `df.copy()` prevents destructive mutation.
+### No circular callback dependencies
+- **When**: Writing Dash callbacks
+- **Rule**: Callbacks must flow unidirectionally: filter inputs → `app-state` store → `chart-data` store → UI components. Never have a component that is both Input and Output in the same callback chain without an intermediate store.
+- **Why**: Dash raises `DuplicateCallback` errors for circular dependencies, and they're extremely hard to debug.

-### Include chart_type in UNIQUE constraints for pathway_nodes
- **When**: Creating or modifying the pathway_nodes table schema
- **Rule**: The UNIQUE constraint MUST include `chart_type`: `UNIQUE(date_filter_id, chart_type, ids)`
- **Why**: Without `chart_type`, `INSERT OR REPLACE` silently overwrites directory chart nodes when indication chart nodes are inserted.
+### Use dcc.Store for all state, not server-side globals
+- **When**: Managing application state (selected filters, chart data, reference data)
+- **Rule**: ALL state lives in `dcc.Store` components. Never use module-level globals, class variables, or `flask.g` for state. The 3 stores are: `app-state` (session), `chart-data` (memory), `reference-data` (session).
+- **Why**: Dash is stateless per request. Server-side state breaks with multiple users and causes subtle bugs during development.

-### Handle NaN in Directory when building fallback labels
- **When**: Creating fallback indication labels for patients without GP diagnosis match
- **Rule**: Check `pd.notna(directory)` before concatenating to string. Use `"UNKNOWN (no GP dx)"` for NaN cases.
- **Why**: NaN handling prevents TypeError and ensures meaningful fallback labels.
+### Use callback_context for multi-input callbacks
+- **When**: A callback has multiple Inputs and needs to know which one triggered it
+- **Rule**: Use `dash.callback_context.triggered` (or `ctx.triggered_id` in Dash 2.x) to determine the triggering input.
+- **Why**: Without this, the callback runs for every input change and you can't distinguish which filter changed.

-### Use parameterized queries for SQLite
- **When**: Building WHERE clauses with user-selected filters
- **Rule**: Use `?` placeholders and pass params tuple — never string interpolation
- **Why**: Prevents SQL injection and handles special characters in drug/directory names
-
-### Use existing pathway_analyzer functions
- **When**: Processing pathway data for the icicle chart
- **Rule**: Reuse functions from `analysis/pathway_analyzer.py` — don't reinvent
- **Why**: The existing code handles edge cases (empty groups, statistics calculation, color mapping)
+### Pattern-matching callbacks for dynamic drug chips
+- **When**: Building the card browser drawer with clickable drug chips
+- **Rule**: Use `{"type": "drug-chip", "index": drug_name}` pattern for chip IDs. Register callbacks with `Input({"type": "drug-chip", "index": ALL}, "n_clicks")`. Access triggered chip via `ctx.triggered_id["index"]`.
+- **Why**: The number of drug chips is dynamic (changes per directorate/indication). Pattern-matching callbacks handle this without hardcoding IDs.

 ---

-## Reflex Guardrails
+## Plotly Figure

-### Use .to() methods for Var operations in rx.foreach
- **When**: Working with items inside `rx.foreach` render functions
- **Rule**: Use `item.to(int)` for numeric comparisons, `item.to_string()` for text operations
- **Why**: Items from rx.foreach are Var objects, not plain Python values.
+### Preserve create_icicle_from_nodes() in src/visualization/plotly_generator.py
+- **When**: Modifying the icicle chart
+- **Rule**: `create_icicle_from_nodes(nodes, title)` in `src/visualization/plotly_generator.py` is the shared icicle chart function. It accepts list-of-dicts from dcc.Store. Key properties:
+  - 10-field customdata structure (value, colour, cost, costpp, first_seen, last_seen, first_seen_parent, last_seen_parent, average_spacing, cost_pp_pa)
+  - NHS colorscale: `[[0.0, "#003087"], [0.25, "#0066CC"], [0.5, "#1E88E5"], [0.75, "#4FC3F7"], [1.0, "#E3F2FD"]]`
+  - `maxdepth=3`, `branchvalues="total"`, `sort=False`
+  - Layout: transparent background, reduced margins, autosize
+- **Why**: The icicle chart is tested and correct. The Dash callback in `dash_app/callbacks/chart.py` calls this function.

-### Use rx.cond for conditional rendering, not Python if
- **When**: Conditionally showing/hiding components or changing styles based on state
- **Rule**: Use `rx.cond(condition, true_component, false_component)` — not Python `if`
- **Why**: Python `if` evaluates at definition time; `rx.cond` evaluates reactively at render time
+### Chart data is a list of dicts
+- **When**: Passing data between `chart-data` store and chart callback
+- **Rule**: `chart-data` store holds `{"nodes": [...], "unique_patients": int, "total_drugs": int, "total_cost": float}`. Each node is a dict with keys matching the SQLite columns needed for the figure: `parents, ids, labels, value, cost, costpp, colour, first_seen, last_seen, first_seen_parent, last_seen_parent, average_spacing, cost_pp_pa`.
+- **Why**: `dcc.Store` serializes to JSON. Keep the same dict structure that `pathways_app.py` uses for `chart_data` so the figure callback works identically.
+
+---
+
+## Data Extraction
+
+### Keep data logic in shared src/ functions, not dash_app/ duplicates
+- **When**: Adding or modifying data loading functions
+- **Rule**: SQL queries and data logic live in `src/data_processing/pathway_queries.py`. The `dash_app/data/queries.py` is a thin wrapper that resolves the DB path and delegates. Do not duplicate queries in `dash_app/`.
+- **Why**: Shared code in `src/` prevents query drift and keeps the single source of truth for data access.
+
+### DimSearchTerm.csv fragments are substrings
+- **When**: Building the card browser or matching drugs to indications
+- **Rule**: `CleanedDrugName` values in DimSearchTerm.csv are drug name FRAGMENTS (e.g., "ADALIMUMAB", "PEGYLATED", "INHALED"). They're matched against full drug names using `drug_name.upper().contains(fragment)`. Don't assume exact match.
+- **Why**: Some fragments are partial (INHALED matches "INHALED BECLOMETASONE", "INHALED FLUTICASONE", etc.).
+
+### Apply SEARCH_TERM_MERGE_MAP when loading DimSearchTerm.csv
+- **When**: Building the directorate tree in `card_browser.py`
+- **Rule**: Import and apply `SEARCH_TERM_MERGE_MAP` from `data_processing.diagnosis_lookup` to normalize "allergic asthma" → "asthma" and "severe persistent allergic asthma" → "asthma". Keep "urticaria" separate.
+- **Why**: The Snowflake query and pathway processing already use merged Search_Terms. The card browser must match.
+
+---
+
+## SQLite Queries
+
+### Use parameterized queries for all filters
+- **When**: Building WHERE clauses with user-selected values
+- **Rule**: Use `?` placeholders and pass params as a list. Never use f-strings or string interpolation for filter values.
+- **Why**: Prevents SQL injection and handles special characters in drug/directory names (e.g., "CROHN'S DISEASE").
+
+### Database path resolution
+- **When**: Connecting to pathways.db from dash_app/
+- **Rule**: Use `Path(__file__).resolve().parents[2] / "data" / "pathways.db"` from files in `dash_app/data/`. This resolves from `dash_app/data/queries.py` → project root → `data/pathways.db`.
+- **Why**: Relative paths break depending on the working directory. Absolute path resolution is reliable.
+
+---
+
+## Dash Framework
+
+### Wrap layout in dmc.MantineProvider
+- **When**: Setting up the app layout in `app.py`
+- **Rule**: The outermost layout element must be `dmc.MantineProvider(children=[...])`. Without this, DMC components (Drawer, Accordion, Chip, etc.) won't render.
+- **Why**: Dash Mantine Components requires the Provider context to function.
+
+### dcc.Store storage_type matters
+- **When**: Creating the 3 store components
+- **Rule**:
+  - `app-state`: `storage_type="session"` — persists across page refreshes within a tab
+  - `chart-data`: `storage_type="memory"` — cleared on page refresh (reloaded from SQLite)
+  - `reference-data`: `storage_type="session"` — loaded once, persists across refreshes
+- **Why**: Wrong storage type causes stale data bugs (memory clears too often) or wasted queries (session persists when it shouldn't).
+
+### Dash assets directory is auto-served
+- **When**: Placing CSS, JS, or images
+- **Rule**: Put static assets in `dash_app/assets/`. Dash serves them automatically. Reference CSS via `className`, not `<link>` tags.
+- **Why**: Dash's asset pipeline handles caching and serving. Manual `<link>` tags are unnecessary and may not work.

 ---

@@ -148,20 +164,10 @@ If you discover a new failure pattern during your work, add it to this file.
 - **Rule**: The "Next iteration should" section must contain specific, actionable guidance
 - **Why**: The next iteration has zero memory. If you don't write it down, it's lost.

-### Check existing code for patterns
- **When**: Unsure how to implement something
- **Rule**: Look at `pathways_app/pathways_app.py`, `analysis/pathway_analyzer.py`, `cli/refresh_pathways.py`
- **Why**: The existing codebase has solved many quirks already
-
-### Snowflake connection_timeout must be high enough for GP lookup queries
- **When**: GP record queries against PrimaryCareClinicalCoding time out
- **Rule**: Ensure `connection_timeout` in config/snowflake.toml is at least 600 (currently set to 600). This controls the Python client's `network_timeout`, which is how long the client waits for ANY Snowflake response. Do NOT lower this value.
- **Why**: GP lookup queries take ~40s per batch due to CTE compilation overhead. With connection_timeout=30, every batch timed out silently (error 000604/57014).
-
-### Use large batch sizes (5000+) for GP record lookups
- **When**: Calling `get_patient_indication_groups()` with patient batches
- **Rule**: Use batch_size=5000 or larger. The query time is ~40s regardless of batch size (5 patients ≈ 500 patients ≈ 5000 patients). Smaller batches just multiply the fixed overhead.
- **Why**: With batch_size=500, 36K patients needed 74 batches × 40s = ~50 min. With batch_size=5000, only 8 batches × 45s = ~6 min. The bottleneck is CTE compilation, not data volume.
+### Validate with `python run_dash.py`
+- **When**: After completing any task
+- **Rule**: Run `python run_dash.py` (or `python -c "from dash_app.app import app"` for import checks). The app must start without errors after EVERY task.
+- **Why**: Broken imports or circular dependencies compound across tasks. Catch them immediately.

 <!--
 ADD NEW GUARDRAILS BELOW as failures are observed during the loop.
@@ -885,11 +885,46 @@ Migrating the HCD Analysis frontend from Reflex to Dash (Plotly) + Dash Mantine
 ### Blocked items (iter 16):
 - None

-## MIGRATION COMPLETE
-All 20 tasks across 5 phases complete. 16 iterations total.
+## Iteration 17 — 2026-02-06
+### Task: Phase 6 — Update all documentation
+### Why this task:
+- Phases 0-5 complete. Phase 6 (documentation cleanup) was the only remaining work.
+- All non-archive .md files still referenced Reflex (commands, architecture, deployment).
+### Status: COMPLETE
+### What was done:
+- **README.md**: Complete rewrite — replaced `reflex run` with `python run_dash.py`, updated features list (dual charts, pre-computed pathways, drug browser), updated project structure (dash_app/ instead of pathways_app/), updated quick start, usage, troubleshooting
+- **docs/USER_GUIDE.md**: Complete rewrite — new single-page Dash UI guide covering header, sidebar, KPI row, filter bar, chart card, drawer, drug browser, indication matching
+- **docs/DEPLOYMENT.md**: Complete rewrite — Dash/Gunicorn deployment replacing Reflex/FastAPI/Next.js, Docker examples, nginx config, systemd service, single port 8050
+- **docs/DESIGN_SYSTEM.md**: Updated "Reflex Implementation" section → "Dash Implementation" with correct CSS/component references
+- **RALPH_PROMPT.md**: Updated title ("Dash Application Maintenance" not "Reflex → Dash Migration"), updated data reference section to point to src/ shared functions instead of pathways_app/
+- **guardrails.md**: Updated 3 rules to reflect current Dash architecture (shared utilities exist, icicle function exists, data queries exist) instead of migration instructions
+- **IMPLEMENTATION_PLAN.md**: Phase 6 tasks marked [x]
+### Validation results:
+- Tier 1 (Code): `from dash_app.app import app` — OK, 7 callbacks registered
+- Grep for Reflex in non-archive .md files: only CLAUDE.md line 140 (archive description — accurate) and IMPLEMENTATION_PLAN.md (historical migration log — accurate)
+### Files changed:
+- README.md — Rewritten for Dash
+- docs/USER_GUIDE.md — Rewritten for Dash
+- docs/DEPLOYMENT.md — Rewritten for Dash
+- docs/DESIGN_SYSTEM.md — Updated implementation section
+- RALPH_PROMPT.md — Updated title and references
+- guardrails.md — Updated 3 rules
+- IMPLEMENTATION_PLAN.md — Phase 6 marked [x]
+- progress.txt — This entry
+### Patterns discovered:
+- archive/ files (IMPROVEMENT_RECOMMENDATIONS.md) retain Reflex references intentionally — they're historical
+- IMPLEMENTATION_PLAN.md retains Reflex references in completed task descriptions — these are accurate migration history
+### Next iteration should:
+- Phase 6 is complete. All tasks across all phases are now [x].
+### Blocked items:
+- None
+
+## ALL PHASES COMPLETE
+All 24 tasks across 6 phases complete. 17 iterations total.
 - Phase 0: Scaffolding (2 tasks) — iteration 1
 - Phase 1: Data Access (2 tasks) — iterations 2-3
 - Phase 2: Static Layout (3 tasks) — iterations 4-6
 - Phase 3: Core Callbacks (4 tasks) — iterations 7-10
 - Phase 4: Drawer (2 tasks) — iterations 11-12
 - Phase 5: Polish & Cleanup (4 tasks) — iterations 13-16
+- Phase 6: Documentation (4 tasks) — iteration 17