docs: update all documentation for Dash migration (Phase 6)

Rewrote README.md, USER_GUIDE.md, and DEPLOYMENT.md to reflect
the Dash application. Updated RALPH_PROMPT.md, guardrails.md, and
DESIGN_SYSTEM.md to remove Reflex references. All non-archive
documentation now reflects the current Dash + DMC architecture.
This commit is contained in:
Andrew Charlwood
2026-02-06 14:54:12 +00:00
parent 4cb5641c2d
commit 54b4a0f743
8 changed files with 635 additions and 956 deletions
+8
View File
@@ -256,6 +256,14 @@ Drawer selection → update_drug_selection → app-state store → load_pathway_
- [x] Verify: No Reflex imports anywhere in `dash_app/` - [x] Verify: No Reflex imports anywhere in `dash_app/`
- **Checkpoint**: Full application works, no Reflex remnants, CLAUDE.md updated - **Checkpoint**: Full application works, no Reflex remnants, CLAUDE.md updated
## Phase 6: Update all documentation
- [x] Remove `reflex` references from all documentation
- [x] Verify: No Reflex mentions of reflex in any md files (archive/ excluded — historical)
- [x] Add documentation in readme re how to run dash app
- [x] Update all claude.md files (CLAUDE.md was updated in Task 5.4)
- **Checkpoint**: Full application works, no Reflex remnants, CLAUDE.md updated
--- ---
## Completion Criteria ## Completion Criteria
+106 -113
View File
@@ -1,128 +1,124 @@
# Ralph Wiggum Loop - Drug-Aware Indication Matching # Ralph Wiggum Loop Dash Application Maintenance
You are operating inside an automated loop extending a pathway analysis application with drug-aware indication matching. Each iteration you receive fresh context — you have NO memory of previous iterations. Your only memory is the filesystem. You are operating inside an automated loop maintaining an NHS patient pathway analysis tool built with Dash (Plotly) + Dash Mantine Components. Each iteration you receive fresh context — you have NO memory of previous iterations. Your only memory is the filesystem.
**Current Focus**: Update indication charts so that patient indications are matched **per drug**, not just per patient. Each drug must be validated against the patient's GP diagnoses AND the drug-to-indication mapping from DimSearchTerm.csv. **Current Focus**: Maintain and enhance the Dash application in `dash_app/`. The backend (`src/`) provides shared data access and visualization functions. The design target is `01_nhs_classic.html`.
## First Actions Every Iteration ## First Actions Every Iteration
Read these files in this order before doing anything else: Read these files in this order before doing anything else:
1. `progress.txt` — What previous iterations accomplished, what's blocked, and what to do next. The most recent entry is most important. 1. `progress.txt` — What previous iterations accomplished, what's blocked, and what to do next.
2. `IMPLEMENTATION_PLAN.md` — Task list with status markers, project overview, and completion criteria. 2. `IMPLEMENTATION_PLAN.md` — Task list with status markers, architecture overview, and completion criteria.
3. `guardrails.md` — Known failure patterns to avoid. You MUST read and follow these. 3. `guardrails.md` — Known failure patterns to avoid. You MUST read and follow these.
4. `CLAUDE.md` — Project architecture and code patterns. 4. `CLAUDE.md` — Project architecture and backend code patterns.
Then run `git log --oneline -5` to see recent commits. Then run `git log --oneline -5` to see recent commits.
## Reading the Design Reference
**When building ANY UI component**, read `01_nhs_classic.html` first:
- It contains the exact CSS classes, HTML structure, and visual layout you must replicate
- CSS lives in the `<style>` block (lines 8-314) — this becomes `dash_app/assets/nhs.css`
- HTML structure (lines 316-480+) shows the component hierarchy and class usage
- Match the design as closely as possible — `className` in Dash = `class` in HTML
**When building data loading or chart callbacks**, reference the shared functions in `src/`:
- `src/data_processing/pathway_queries.py`: `load_initial_data()` and `load_pathway_nodes()` — shared query functions
- `src/visualization/plotly_generator.py`: `create_icicle_from_nodes()` — icicle chart from list-of-dicts
- `dash_app/data/queries.py`: Thin wrapper calling shared functions with correct DB path
- The original logic is archived in `archive/pathways_app/pathways_app.py` for reference.
## Narration ## Narration
Narrate your work as you go. Your output is the only visibility the operator has into what's happening. For every significant action, explain what you're doing and why: Narrate your work as you go. Your output is the only visibility the operator has into what's happening. For every significant action, explain what you're doing and why:
- **Reading files**: "Reading progress.txt to check what the last iteration accomplished..." - **Reading files**: "Reading 01_nhs_classic.html to get CSS classes for the header component..."
- **Creating code**: "Adding assign_drug_indications() function to diagnosis_lookup.py..." - **Creating code**: "Creating dash_app/components/header.py with make_header() function..."
- **Debugging**: "Drug matching returned 0 results for ADALIMUMAB. Checking DimSearchTerm lookup..." - **Debugging**: "Import error for dmc.Drawer — checking dash-mantine-components version..."
- **Testing**: "Running import check to verify the new function is accessible..." - **Testing**: "Running python run_dash.py to verify the app starts..."
- **Making decisions**: "The guardrails say to use substring matching for drug fragments." - **Making decisions**: "The guardrails say to use className from nhs.css, not inline styles."
- **Committing**: "Committing drug-indication matching logic." - **Committing**: "Committing header and sidebar components."
Do NOT just output a summary at the end. Narrate throughout. Think of this as a live log of your reasoning. Do NOT just output a summary at the end. Narrate throughout.
## Task Selection ## Task Selection
You have flexibility to choose which task to work on. Use your judgement, but document your reasoning.
1. Read ALL tasks in IMPLEMENTATION_PLAN.md — understand the full picture 1. Read ALL tasks in IMPLEMENTATION_PLAN.md — understand the full picture
2. Skip any marked `[x]` (complete) or `[B]` (blocked) 2. Skip any marked `[x]` (complete) or `[B]` (blocked)
3. Check progress.txt for guidance — the previous iteration may have recommendations 3. Check progress.txt for guidance — the previous iteration may have recommendations
4. **Choose a task** based on: 4. **Choose a task** based on:
- Dependencies (some tasks require others to be done first) - Dependencies (scaffolding before components, components before callbacks)
- Logical flow (query changes before matching logic, matching before pipeline integration) - Logical flow (Phase 0 → 1 → 2 → 3 → 4 → 5)
- Your assessment of what would be most valuable to tackle next - Previous iteration's recommendations
- Previous iteration's recommendations (consider but don't blindly follow) 5. **Document your reasoning**: Before starting, explain WHY you chose this task
5. **Document your reasoning**: Before starting work, briefly explain WHY you chose this task over others
6. Mark your chosen task `[~]` (in progress) in IMPLEMENTATION_PLAN.md 6. Mark your chosen task `[~]` (in progress) in IMPLEMENTATION_PLAN.md
If your chosen task turns out to be blocked during work: If your chosen task is blocked:
- Mark it `[B]` with a reason in IMPLEMENTATION_PLAN.md - Mark it `[B]` with a reason
- Document the blocker in progress.txt - Document the blocker in progress.txt
- Move to a different ready task within this same iteration - Move to a different ready task
## Development ## Development
Work on ONE task per iteration. Build incrementally and verify as you go. Work on ONE task per iteration. Build incrementally and verify as you go.
### Key Concepts ### Key Technologies
**Drug-Indication Matching Flow:** - **Dash 2.x**: `from dash import Dash, html, dcc, Input, Output, State, callback_context, ALL`
1. Get patient's GP-matched Search_Terms from Snowflake (ALL matches, not just most recent, with code_frequency) - **Dash Mantine Components 0.14.x**: `import dash_mantine_components as dmc` — needs `dmc.MantineProvider` wrapping the layout
- Only count GP codes from MIN(Intervention Date) onwards (the HCD data window) - **Plotly**: `import plotly.graph_objects as go` — for the icicle chart
2. Load DimSearchTerm.csv to get which drugs belong to which Search_Terms - **SQLite**: `import sqlite3` — read-only access to `data/pathways.db`
3. For each patient-drug pair: intersection of (Search_Terms listing this drug) AND (patient's GP matches) - **CSS**: All in `dash_app/assets/nhs.css` — auto-served by Dash
- If multiple matches: pick highest code_frequency (most GP coding = most likely indication)
4. Modify UPID to include matched indication: `{UPID}|{search_term}`
5. Drugs sharing the same indication for the same patient → same modified UPID → same pathway
6. Drugs under different indications → different modified UPIDs → separate pathways
**DimSearchTerm.csv:** ### Dash Component Patterns
- `Search_Term`: Clinical condition (e.g., "rheumatoid arthritis")
- `CleanedDrugName`: Pipe-separated drug fragments (e.g., "ADALIMUMAB|GOLIMUMAB|...")
- `PrimaryDirectorate`: The directorate for this condition
- Drug matching: check if any fragment is a substring of the HCD drug name (case-insensitive)
**Modified UPID Format:**
- Original: `RMV12345` (Provider Code[:3] + PersonKey)
- Modified: `RMV12345|rheumatoid arthritis`
- Fallback: `RMV12345|RHEUMATOLOGY (no GP dx)`
- The existing pathway analyzer treats UPID as an opaque identifier — this works transparently
### Code Patterns
- **Snowflake queries**: Use parameterized queries, embed the cluster CTE from CLUSTER_MAPPING_SQL
- **GP record matching**: Return ALL matches per patient (not just most recent)
- **Drug mapping**: Load from `data/DimSearchTerm.csv`, match drug name fragments
- **Pathway pipeline**: Use existing functions — modified UPIDs flow through naturally
- **Reflex state**: No changes expected — indication charts already work, just with better matching
### Key Data Structures
**GP Matches (from Snowflake) — updated to return ALL matches with frequency:**
```python ```python
# Multiple rows per patient (one per matched Search_Term) # HTML elements use dash.html
# code_frequency = COUNT of matching SNOMED codes (used as tiebreaker) from dash import html
# Only counts codes from MIN(Intervention Date) onwards html.Div(className="top-header", children=[...])
DataFrame with: PatientPseudonym, Search_Term, code_frequency
# Mantine components for rich UI
import dash_mantine_components as dmc
dmc.Drawer(id="drug-drawer", position="right", size="480px", children=[...])
dmc.Accordion(children=[dmc.AccordionItem(...)])
# State management
dcc.Store(id="app-state", storage_type="session", data={})
# Callbacks
@app.callback(
Output("chart-data", "data"),
Input("app-state", "data"),
)
def load_pathway_data(app_state):
...
``` ```
**Drug-to-Indication Mapping (from DimSearchTerm.csv):** ### Database Access Pattern
```python
# search_term → list of drug fragments
{"rheumatoid arthritis": ["ABATACEPT", "ADALIMUMAB", "ANAKINRA", ...]}
```
**Modified HCD Data:**
```python ```python
# Original UPID replaced with indication-aware UPID from pathlib import Path
df["UPID"] = "RMV12345|rheumatoid arthritis" # for matched drugs import sqlite3
df["UPID"] = "RMV12345|RHEUMATOLOGY (no GP dx)" # for unmatched drugs
```
**Indication DataFrame:** DB_PATH = Path(__file__).resolve().parents[2] / "data" / "pathways.db"
```python
# Maps modified UPID → Search_Term (for pathway hierarchy level 2) def load_pathway_data(filter_id, chart_type, selected_drugs=None, selected_directorates=None):
indication_df = pd.DataFrame({ conn = sqlite3.connect(str(DB_PATH))
'Directory': ['rheumatoid arthritis', 'asthma', 'CARDIOLOGY (no GP dx)'] conn.row_factory = sqlite3.Row
}, index=['RMV12345|rheumatoid arthritis', 'RMV12345|asthma', 'RMV67890|CARDIOLOGY (no GP dx)']) # ... query with parameterized WHERE ...
conn.close()
return result_dict
``` ```
### Verification Steps ### Verification Steps
After writing code, ALWAYS verify: After writing code, ALWAYS verify:
1. **Syntax check**: `python -m py_compile <file.py>` 1. **Import check**: `python -c "from dash_app.app import app"` (or specific module)
2. **Import check**: `python -c "from module import function"` 2. **App starts**: `python run_dash.py` — must start without errors
3. **For database changes**: Test with query against pathways.db 3. **Visual check** (when building UI): describe what you expect to see at localhost:8050
4. **For Reflex changes**: `python -m reflex compile` 4. **For callbacks**: verify the callback chain fires correctly (add temporary `print()` statements if needed)
If any step fails, fix the issue before proceeding. If any step fails, fix the issue before proceeding.
@@ -133,24 +129,23 @@ Every task MUST pass validation before being marked complete:
### Tier 1: Code Validation (MANDATORY) ### Tier 1: Code Validation (MANDATORY)
- Code compiles without Python syntax errors - Code compiles without Python syntax errors
- Imports work without errors - Imports work without errors
- No TypeErrors, ImportErrors, or AttributeErrors - `python run_dash.py` starts without exceptions
### Tier 2: Data Validation (for data/pipeline tasks) ### Tier 2: Layout Validation (for UI component tasks)
- Queries return expected row counts - Component renders in the browser
- Data structures have correct columns/types - CSS classes match 01_nhs_classic.html
- Drug-indication matching produces valid results - Layout structure matches the HTML concept
- Modified UPIDs have correct format
### Tier 3: Functional Validation (for UI/integration tasks) ### Tier 3: Functional Validation (for callback tasks)
- Reflex compiles the app without errors - Callbacks fire when inputs change
- State changes trigger expected behavior - Data flows correctly through dcc.Store chain
- Both chart types render correctly - Chart renders with real data from SQLite
### Validation Failure ### Validation Failure
If any tier fails: If any tier fails:
- DO NOT mark the task complete - DO NOT mark the task complete
- Document the failure details in progress.txt - Document the failure in progress.txt
- Fix the issue within this iteration if possible - Fix the issue within this iteration if possible
- If you cannot fix it, mark the task `[B]` with details - If you cannot fix it, mark the task `[B]` with details
@@ -159,34 +154,33 @@ If any tier fails:
Before marking ANY task `[x]`, ALL of these must be true: Before marking ANY task `[x]`, ALL of these must be true:
1. Code is saved to the appropriate file(s) 1. Code is saved to the appropriate file(s)
2. Tier 1 code validation passed 2. Tier 1 validation passed (imports + app starts)
3. Tier 2/3 validation passed (as applicable) 3. Tier 2/3 validation passed (as applicable)
4. All changes committed to git with a descriptive message 4. All changes committed to git with a descriptive message
These are non-negotiable. A task that "feels done" but hasn't passed all gates is NOT done. These are non-negotiable.
## Update Progress ## Update Progress
After completing your work (whether the task succeeded, failed, or was blocked), append to progress.txt using this format: After completing your work, append to progress.txt using this format:
``` ```
## Iteration [N] — [YYYY-MM-DD] ## Iteration [N] — [YYYY-MM-DD]
### Task: [which task you worked on] ### Task: [which task you worked on]
### Why this task: ### Why this task:
- [Brief explanation of why you chose this task over others] - [Brief explanation of why you chose this task over others]
- [What dependencies or logical flow led to this choice]
### Status: COMPLETE | BLOCKED | IN PROGRESS ### Status: COMPLETE | BLOCKED | IN PROGRESS
### What was done: ### What was done:
- [Specific actions taken] - [Specific actions taken]
### Validation results: ### Validation results:
- Tier 1 (Code): [syntax check, import check] - Tier 1 (Code): [import check, app starts]
- Tier 2 (Data): [query results, row counts] - Tier 2 (Layout): [renders correctly, CSS matches]
- Tier 3 (Functional): [reflex compile, UI check] - Tier 3 (Functional): [callbacks fire, data flows]
### Files changed: ### Files changed:
- [list of files created/modified] - [list of files created/modified]
### Committed: [git hash] "[commit message]" ### Committed: [git hash] "[commit message]"
### Patterns discovered: ### Patterns discovered:
- [Any reusable learnings — query patterns, matching logic quirks] - [Any reusable learnings — Dash patterns, DMC quirks, CSS gotchas]
### Next iteration should: ### Next iteration should:
- [Explicit guidance for what the next fresh instance should do first] - [Explicit guidance for what the next fresh instance should do first]
- [Note any context that would be lost without writing it here] - [Note any context that would be lost without writing it here]
@@ -194,20 +188,20 @@ After completing your work (whether the task succeeded, failed, or was blocked),
- [Any tasks that are blocked and why] - [Any tasks that are blocked and why]
``` ```
If you discover a failure pattern that future iterations should avoid, add it to `guardrails.md`. If you discover a failure pattern, add it to `guardrails.md`.
## Commit Changes ## Commit Changes
1. Stage changed files 1. Stage changed files
2. Use a descriptive commit message referencing the task (e.g., "feat: add drug-indication matching function (Task 2.1)") 2. Use a descriptive commit message referencing the task (e.g., "feat: create dash_app skeleton with nhs.css (Task 0.1 + 0.2)")
3. Commit after your task is validated and complete — one commit per logical unit of work 3. Commit after your task is validated and complete
4. If you updated progress.txt with a blocked status, commit that too 4. If you updated progress.txt with a blocked status, commit that too
## Completion Check ## Completion Check
If ALL tasks in IMPLEMENTATION_PLAN.md are marked `[x]`: If ALL tasks in IMPLEMENTATION_PLAN.md are marked `[x]`:
1. Run `reflex compile` to verify app compiles 1. Run `python run_dash.py` to verify app starts cleanly
2. Verify all completion criteria at the bottom of IMPLEMENTATION_PLAN.md are satisfied 2. Verify all completion criteria at the bottom of IMPLEMENTATION_PLAN.md are satisfied
3. Only then output the completion signal on its own line: 3. Only then output the completion signal on its own line:
@@ -217,20 +211,19 @@ If ALL tasks in IMPLEMENTATION_PLAN.md are marked `[x]`:
DO NOT output this string under any other circumstances. DO NOT output this string under any other circumstances.
DO NOT output it if any task is still `[ ]` or `[B]` or `[~]`. DO NOT output it if any task is still `[ ]` or `[B]` or `[~]`.
DO NOT paraphrase, vary, or conditionally output this string.
## Rules ## Rules
- Complete ONE task per iteration, then update progress and stop - Complete ONE task per iteration, then update progress and stop
- ALWAYS read progress.txt, guardrails.md before starting work - ALWAYS read progress.txt, guardrails.md before starting work
- **Match drugs to indications** — not just patients to indications - **Read 01_nhs_classic.html** when building ANY visual component
- **Use DimSearchTerm.csv** for drug-to-Search_Term mapping - **Read src/data_processing/pathway_queries.py and src/visualization/plotly_generator.py** when building data logic or chart callbacks
- **Return ALL GP matches** — not just most recent (remove QUALIFY ROW_NUMBER = 1) - **DO NOT modify pipeline/analysis logic** in src/ (pathway_pipeline, transforms, diagnosis_lookup, pathway_analyzer, refresh_pathways)
- **Modified UPID format**: `{UPID}|{search_term}` — pipe delimiter is safe - **DO add shared utilities** to src/ (visualization/plotly_generator.py, data_processing/database.py) rather than duplicating logic in dash_app/
- **Use PseudoNHSNoLinked** — NOT PersonKey for GP record matching - **Use className from nhs.css** — not inline styles
- **Substring matching** for drug fragments from DimSearchTerm.csv - **dcc.Store for state** — no server-side globals
- **Unidirectional callbacks** — app-state → chart-data → UI
- **Port icicle_figure exactly** — same customdata, colorscale, templates
- Keep commits atomic and well-described - Keep commits atomic and well-described
- If stuck on the same issue for more than 2 attempts within one iteration, document it in progress.txt and move to the next ready task - If stuck for 2+ attempts, document in progress.txt and move on
- When in doubt, check existing code for patterns that work - `python run_dash.py` must work after every task
- **Pipeline before UI** — processing logic before Reflex changes
- **Don't change directory charts** — only indication chart matching changes
+123 -140
View File
@@ -5,142 +5,144 @@ A web-based application for analyzing secondary care patient treatment pathways.
## Features ## Features
- **Interactive Visualization**: Plotly icicle charts showing patient treatment hierarchies with cost and frequency statistics - **Interactive Visualization**: Plotly icicle charts showing patient treatment hierarchies with cost and frequency statistics
- **Multi-Source Data Loading**: CSV/Parquet files, SQLite database, or direct Snowflake integration - **Dual Chart Types**: Directory-based (Trust → Directorate → Drug → Pathway) and Indication-based (Trust → GP Diagnosis → Drug → Pathway) views
- **GP Diagnosis Validation**: Validate patient indications against GP SNOMED codes via NHS Snowflake - **Pre-computed Pathways**: Treatment pathways pre-processed and stored in SQLite for sub-50ms filter response times
- **Modern Web Interface**: Browser-based UI using Reflex framework with NHS branding - **GP Diagnosis Matching**: Patient indications matched from GP records using SNOMED cluster codes (~93% match rate)
- **Modern Web Interface**: Browser-based UI using Dash (Plotly) + Dash Mantine Components with NHS branding
- **Drug Browser**: Drawer-based card browser organized by clinical directorate for drug/indication selection
- **Flexible Filtering**: Filter by date range, NHS trusts, drugs, and medical directories - **Flexible Filtering**: Filter by date range, NHS trusts, drugs, and medical directories
- **Export Options**: Export charts as interactive HTML or data as CSV
## Requirements ## Requirements
- Python 3.10 or higher - Python 3.10 or higher
- pip or uv package manager - uv package manager (recommended)
### Optional (for Snowflake integration) ### Optional (for data refresh)
- `snowflake-connector-python` package
- Access to NHS Snowflake data warehouse with SSO authentication - Access to NHS Snowflake data warehouse with SSO authentication
## Installation ## Installation
### Using pip
```bash ```bash
# Clone the repository # Clone the repository
git clone <repository-url> git clone <repository-url>
cd patient-pathway-analysis cd patient-pathway-analysis
# Install dependencies # Install dependencies
pip install -r requirements.txt
```
### Using uv (recommended)
```bash
# Install uv if not already installed
pip install uv
# Sync dependencies
uv sync uv sync
```
### Install with test dependencies # One-time dev setup: adds src/ to Python path via .pth file
uv run python setup_dev.py
```bash
pip install -e ".[test]"
``` ```
## Quick Start ## Quick Start
### 1. Run the Web Application (Recommended) ### Run the Web Application
```bash ```bash
reflex run python run_dash.py
``` ```
Open http://localhost:3000 in your browser. Open http://localhost:8050 in your browser.
The application loads pre-computed pathway data from SQLite on startup. No additional configuration is needed for viewing existing data.
### Refresh Pathway Data (requires Snowflake)
```bash
# Initialize/migrate the database
python -m data_processing.migrate
# Full refresh — both chart types, all date filters
python -m cli.refresh_pathways --chart-type all
# Directory charts only (faster, ~5 minutes)
python -m cli.refresh_pathways --chart-type directory
# Indication charts only (~12 minutes, includes GP lookup)
python -m cli.refresh_pathways --chart-type indication
# Dry run (test without database changes)
python -m cli.refresh_pathways --chart-type all --dry-run -v
```
## Usage ## Usage
### Web Interface (Reflex) ### Interface Overview
1. **Load Data**: On the home page, select your data source: The application has a single-page layout with:
- **SQLite Database**: Uses pre-loaded data from `data/pathways.db`
- **File Upload**: Drag and drop a CSV or Parquet file
- **Snowflake**: Fetch data directly from NHS Snowflake (requires configuration)
2. **Configure Filters**: | Component | Purpose |
- Set date range (Start Date, End Date, Last Seen After) |-----------|---------|
- Navigate to Drug/Trust/Directory selection pages using the sidebar | **Header** | NHS branding, data freshness indicator (patient count + relative time) |
- Use search boxes to find and select items | **Sidebar** | Navigation items with drawer triggers for Drug Selection, Trust Selection, Indications |
- Set minimum patient threshold to filter small groups | **KPI Row** | 4 cards: Unique Patients, Drug Types, Total Cost, Indication Match Rate |
| **Filter Bar** | Chart type toggle (By Directory / By Indication) + date filter dropdowns |
| **Chart Card** | Interactive Plotly icicle chart with loading spinner |
| **Drawer** | Right-side panel with drug chips, trust chips, and directorate card browser |
3. **Run Analysis**: Click "Run Analysis" to generate the icicle chart ### Filtering Data
4. **Export Results**: 1. **Chart Type**: Toggle between "By Directory" and "By Indication" views
- **Export HTML**: Save the interactive chart as a standalone HTML file 2. **Date Filters**: Select treatment initiation period and last-seen window
- **Export CSV**: Export the filtered data as a CSV file 3. **Drug Selection**: Open the drawer to select specific drugs via chips
4. **Trust Selection**: Open the drawer to filter by NHS trusts
5. **Directorate Browser**: Navigate directorates → indications → drug fragments in the drawer
6. **Clear Filters**: Reset all selections to show full dataset
### Data Migration ### Understanding the Pathway Chart
To populate the SQLite database from CSV files: The icicle chart displays hierarchical treatment pathways:
```bash ```
# Initialize database schema Root (Regional Total)
python -m data_processing.migrate └─ Trust Name (e.g., "Norfolk and Norwich University Hospitals")
└─ Directory/Indication (e.g., "Rheumatology" or "rheumatoid arthritis")
# Load reference data from CSV files └─ Drug Name (e.g., "ADALIMUMAB")
python -m data_processing.migrate --reference-data --verify └─ Treatment Pathway (e.g., "ADALIMUMAB → INFLIXIMAB")
# Load patient data from a CSV/Parquet file
python -m data_processing.migrate --load-patient-data path/to/data.csv
``` ```
### Snowflake Configuration - **Width**: Relative patient count
- **Color intensity**: Proportion of parent group
- **Hover**: Shows cost, dosing frequency, date range, and per-patient statistics
- **Click**: Zoom into a specific branch
To use Snowflake integration, edit `config/snowflake.toml`: ### Date Filter Combinations
```toml | Initiated | Last Seen | Description |
[connection] |-----------|-----------|-------------|
account = "your-account-identifier" | All years | Last 6 months | Default — all patients active recently |
warehouse = "your-warehouse" | All years | Last 12 months | Broader activity window |
database = "DATA_HUB" | Last 1 year | Last 6 months | Recently initiated, active |
schema = "CDM" | Last 1 year | Last 12 months | Recently initiated, any activity |
authenticator = "externalbrowser" # NHS SSO authentication | Last 2 years | Last 6 months | Medium history, active |
``` | Last 2 years | Last 12 months | Medium history, any activity |
## Project Structure ## Project Structure
``` ```
. .
├── core/ # Core configuration and models ├── src/ # All application library code
├── data_processing/ # Data layer (SQLite, Snowflake, loaders) │ ├── core/ # Foundation: paths, models, logging
├── analysis/ # Analysis pipeline (refactored from generate_graph) │ ├── config/ # Snowflake connection settings
├── visualization/ # Chart generation (Plotly) │ ├── data_processing/ # Data layer (SQLite, Snowflake, transforms)
├── pathways_app/ # Reflex web application │ ├── analysis/ # Analysis pipeline
├── tools/ # Legacy modules (original analysis engine) │ ├── visualization/ # Plotly chart generation
├── config/ # Configuration files │ └── cli/ # CLI tools (refresh_pathways)
├── data/ # Reference data and SQLite database ├── dash_app/ # Dash web application
├── docs/ # Additional documentation │ ├── app.py # App entry point, layout, stores
└── tests/ # Test suite │ ├── assets/nhs.css # NHS design system CSS
│ ├── data/ # Query wrappers + card browser data
│ ├── components/ # UI components (header, sidebar, etc.)
│ └── callbacks/ # Dash callbacks (filters, chart, KPI, drawer)
├── run_dash.py # Entry point: python run_dash.py
├── data/ # Reference data + SQLite DB (pathways.db)
├── tests/ # Test suite (113 tests)
├── docs/ # Documentation
└── archive/ # Historical/deprecated code
``` ```
See `CLAUDE.md` for detailed architecture documentation. See `CLAUDE.md` for detailed architecture documentation.
## Documentation
- [docs/USER_GUIDE.md](docs/USER_GUIDE.md) - End-user guide for using the web interface
- [docs/DEPLOYMENT.md](docs/DEPLOYMENT.md) - Production deployment guide (Docker, nginx, cloud)
- [CLAUDE.md](CLAUDE.md) - Technical architecture documentation for developers
## Deployment
Quick production start:
```bash
# Run in production mode
reflex run --env prod
```
## Running Tests ## Running Tests
```bash ```bash
@@ -150,75 +152,56 @@ python -m pytest tests/ -v
# Run with coverage # Run with coverage
python -m pytest tests/ -v --cov=core --cov=data_processing --cov=analysis python -m pytest tests/ -v --cov=core --cov=data_processing --cov=analysis
# Run only fast tests (exclude slow/integration) # Run only fast tests
python -m pytest tests/ -v -m "not slow" python -m pytest tests/ -v -m "not slow"
``` ```
## Reference Data Files ## Configuration
The `data/` directory contains essential reference files: ### Snowflake Connection (`src/config/snowflake.toml`)
| File | Purpose | ```toml
|------|---------| [snowflake]
| `include.csv` | Drug filter list with default selections | account = "your-account"
| `defaultTrusts.csv` | NHS Trust list for filtering | database = "DATA_HUB"
| `directory_list.csv` | Medical specialties/directories | schema = "CDM"
| `drugnames.csv` | Drug name standardization mapping | warehouse = "your-warehouse"
| `org_codes.csv` | Provider code to organization name mapping | authenticator = "externalbrowser" # Required for NHS SSO
| `drug_directory_list.csv` | Valid drug-to-directory mappings | ```
| `drug_indication_clusters.csv` | Drug to SNOMED cluster mappings |
| `ta-recommendations.xlsx` | NICE TA recommendations |
## Troubleshooting ## Troubleshooting
### Reflex compilation errors ### App won't start
If you encounter compilation errors when running `reflex run`:
```bash ```bash
# Clear the build cache and restart # Ensure dependencies are installed
rm -rf .web uv sync
reflex run
# Ensure src/ is on Python path
uv run python setup_dev.py
# Try running with uv
uv run python run_dash.py
```
### Database not found
```bash
# Check data/pathways.db exists
python -m data_processing.migrate
``` ```
### Snowflake connection issues ### Snowflake connection issues
1. Ensure `snowflake-connector-python` is installed: 1. Ensure `src/config/snowflake.toml` has the correct account identifier
```bash 2. A browser window will open for SSO authentication
pip install snowflake-connector-python 3. Verify your network allows Snowflake connections
```
2. Check that `config/snowflake.toml` has the correct account identifier ## Documentation
3. For SSO authentication, a browser window will open automatically - [CLAUDE.md](CLAUDE.md) — Technical architecture documentation
- [docs/USER_GUIDE.md](docs/USER_GUIDE.md) — End-user guide
### SQLite database not found - [docs/DEPLOYMENT.md](docs/DEPLOYMENT.md) — Deployment guide
If `data/pathways.db` doesn't exist, create it:
```bash
python -m data_processing.migrate
python -m data_processing.migrate --reference-data
```
## Development
### Code Quality
```bash
# Type checking
python -m mypy core/ data_processing/ analysis/ --ignore-missing-imports
# Run tests with coverage report
python -m pytest tests/ -v --cov=core --cov=data_processing --cov-report=html
```
### Adding New Reference Data
1. Add CSV file to `data/` directory
2. Define schema in `data_processing/schema.py`
3. Create migration function in `data_processing/reference_data.py`
4. Add path to `PathConfig` in `core/config.py`
## License ## License
+89 -289
View File
@@ -1,10 +1,10 @@
# Reflex Deployment Guide # Deployment Guide
This guide covers deployment options for the Patient Pathway Analysis web application built with Reflex. This guide covers deployment options for the Patient Pathway Analysis web application built with Dash.
## Overview ## Overview
Reflex applications compile to a FastAPI backend and Next.js frontend. This creates two deployment artifacts that can be deployed together or separately depending on your infrastructure requirements. The application is a single-process Python Dash app that serves both the frontend and API from one server. It reads pre-computed data from a local SQLite database.
## Development Mode ## Development Mode
@@ -12,9 +12,9 @@ For local development:
```bash ```bash
# Start development server with hot reload # Start development server with hot reload
reflex run python run_dash.py
# Access the application at http://localhost:3000 # Access the application at http://localhost:8050
``` ```
## Production Deployment Options ## Production Deployment Options
@@ -24,84 +24,55 @@ reflex run
The simplest approach for internal deployments: The simplest approach for internal deployments:
```bash ```bash
# Run in production mode (optimized build) # Run with Gunicorn (Linux/macOS)
reflex run --env prod gunicorn dash_app.app:server -b 0.0.0.0:8050 --workers 4
```
This starts: # Or directly with Python
- FastAPI backend on port 8000 python run_dash.py
- Next.js frontend on port 3000 ```
For background execution: For background execution:
```bash ```bash
# Using nohup (Linux/macOS) # Using nohup (Linux/macOS)
nohup reflex run --env prod > reflex.log 2>&1 & nohup gunicorn dash_app.app:server -b 0.0.0.0:8050 --workers 4 > dash.log 2>&1 &
# Using PowerShell (Windows) # Using PowerShell (Windows)
Start-Process -NoNewWindow -FilePath "reflex" -ArgumentList "run --env prod" Start-Process -NoNewWindow -FilePath "python" -ArgumentList "run_dash.py"
``` ```
### Option 2: Separate Backend and Frontend ### Option 2: Docker Deployment
For more control, run backend and frontend separately:
```bash
# Terminal 1: Start backend only
reflex run --env prod --backend-only
# Terminal 2: Start frontend only
reflex run --env prod --frontend-only
```
### Option 3: Static Export
Export the frontend as static files for deployment on static hosting or CDN:
```bash
# Export application
reflex export
# This creates:
# - frontend.zip (static Next.js build)
# - backend.zip (Python application source)
```
Then:
1. Unzip `frontend.zip` and serve via nginx, Apache, or any static file server
2. Run the backend separately using uvicorn/gunicorn
### Option 4: Docker Deployment
Create a `Dockerfile` for containerized deployment: Create a `Dockerfile` for containerized deployment:
```dockerfile ```dockerfile
# Dockerfile
FROM python:3.11-slim FROM python:3.11-slim
WORKDIR /app WORKDIR /app
# Install Node.js for Reflex frontend build # Install uv for fast dependency management
RUN apt-get update && apt-get install -y curl && \ RUN pip install uv
curl -fsSL https://deb.nodesource.com/setup_18.x | bash - && \
apt-get install -y nodejs && \
rm -rf /var/lib/apt/lists/*
# Copy requirements and install dependencies # Copy dependency files
COPY requirements.txt pyproject.toml ./ COPY pyproject.toml uv.lock ./
RUN pip install --no-cache-dir -r requirements.txt
# Install dependencies
RUN uv sync --no-dev
# Copy application code # Copy application code
COPY . . COPY src/ src/
COPY dash_app/ dash_app/
COPY data/ data/
COPY run_dash.py setup_dev.py ./
# Initialize Reflex (downloads frontend dependencies) # Set up Python path
RUN reflex init --loglevel debug RUN uv run python setup_dev.py
# Expose ports # Expose port
EXPOSE 3000 8000 EXPOSE 8050
# Start in production mode # Start the application
CMD ["reflex", "run", "--env", "prod"] CMD ["uv", "run", "gunicorn", "dash_app.app:server", "-b", "0.0.0.0:8050", "--workers", "4"]
``` ```
Build and run: Build and run:
@@ -111,41 +82,24 @@ Build and run:
docker build -t pathway-analysis . docker build -t pathway-analysis .
# Run the container # Run the container
docker run -p 3000:3000 -p 8000:8000 \ docker run -p 8050:8050 \
-v $(pwd)/data:/app/data \ -v $(pwd)/data:/app/data \
-v $(pwd)/config:/app/config \
pathway-analysis pathway-analysis
``` ```
### Option 5: Docker Compose (Recommended for Production) ### Option 3: Docker Compose
Create `docker-compose.yml` for multi-container deployment:
```yaml ```yaml
version: '3.8' version: '3.8'
services: services:
backend: app:
build: . build: .
command: reflex run --env prod --backend-only
ports: ports:
- "8000:8000" - "8050:8050"
volumes: volumes:
- ./data:/app/data - ./data:/app/data
- ./config:/app/config - ./src/config:/app/src/config
environment:
- REFLEX_ENV=prod
restart: unless-stopped
frontend:
build: .
command: reflex run --env prod --frontend-only
ports:
- "3000:3000"
depends_on:
- backend
environment:
- REFLEX_ENV=prod
restart: unless-stopped restart: unless-stopped
``` ```
@@ -162,42 +116,16 @@ docker-compose up -d
For production deployments behind nginx: For production deployments behind nginx:
```nginx ```nginx
# /etc/nginx/sites-available/pathway-analysis
server { server {
listen 80; listen 80;
server_name your-server.nhs.uk; server_name your-server.nhs.uk;
# Backend API endpoints
location /admin {
proxy_pass http://localhost:8000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
location /ping {
proxy_pass http://localhost:8000;
}
location /upload {
proxy_pass http://localhost:8000;
client_max_body_size 100M; # For large data file uploads
}
# WebSocket connections (required for Reflex state sync)
location /_event/ {
proxy_pass http://localhost:8000;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_read_timeout 86400; # 24 hours for long-running connections
}
# Frontend (all other requests)
location / { location / {
proxy_pass http://localhost:3000; proxy_pass http://localhost:8050;
proxy_set_header Host $host; proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
} }
} }
``` ```
@@ -209,69 +137,21 @@ sudo ln -s /etc/nginx/sites-available/pathway-analysis /etc/nginx/sites-enabled/
sudo nginx -t && sudo systemctl reload nginx sudo nginx -t && sudo systemctl reload nginx
``` ```
### Caddy (Alternative)
Caddy provides automatic HTTPS:
```caddyfile
# Caddyfile
your-server.nhs.uk {
# Backend API
handle /admin/* {
reverse_proxy localhost:8000
}
handle /ping {
reverse_proxy localhost:8000
}
handle /upload {
reverse_proxy localhost:8000
}
handle /_event/* {
reverse_proxy localhost:8000
}
# Frontend
handle {
reverse_proxy localhost:3000
}
}
```
## Process Management ## Process Management
### Systemd (Linux) ### Systemd (Linux)
Create service files for automatic startup:
```ini ```ini
# /etc/systemd/system/pathway-backend.service # /etc/systemd/system/pathway-analysis.service
[Unit] [Unit]
Description=Pathway Analysis Backend Description=Pathway Analysis Dash App
After=network.target After=network.target
[Service] [Service]
Type=simple Type=simple
User=www-data User=www-data
WorkingDirectory=/opt/pathway-analysis WorkingDirectory=/opt/pathway-analysis
ExecStart=/usr/bin/reflex run --env prod --backend-only ExecStart=/opt/pathway-analysis/.venv/bin/gunicorn dash_app.app:server -b 0.0.0.0:8050 --workers 4
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
```
```ini
# /etc/systemd/system/pathway-frontend.service
[Unit]
Description=Pathway Analysis Frontend
After=network.target pathway-backend.service
[Service]
Type=simple
User=www-data
WorkingDirectory=/opt/pathway-analysis
ExecStart=/usr/bin/reflex run --env prod --frontend-only
Restart=always Restart=always
RestartSec=10 RestartSec=10
@@ -283,8 +163,8 @@ Enable and start:
```bash ```bash
sudo systemctl daemon-reload sudo systemctl daemon-reload
sudo systemctl enable pathway-backend pathway-frontend sudo systemctl enable pathway-analysis
sudo systemctl start pathway-backend pathway-frontend sudo systemctl start pathway-analysis
``` ```
### Windows Service ### Windows Service
@@ -296,8 +176,8 @@ Use NSSM (Non-Sucking Service Manager) on Windows:
choco install nssm choco install nssm
# Create service # Create service
nssm install PathwayAnalysis "C:\Path\To\reflex.exe" "run --env prod" nssm install PathwayAnalysis "C:\Path\To\python.exe" "run_dash.py"
nssm set PathwayAnalysis AppDirectory "C:\Path\To\Patient pathway analysis" nssm set PathwayAnalysis AppDirectory "C:\Path\To\pathway-analysis"
nssm start PathwayAnalysis nssm start PathwayAnalysis
``` ```
@@ -305,192 +185,112 @@ nssm start PathwayAnalysis
### Production Environment Variables ### Production Environment Variables
Set these environment variables for production:
```bash ```bash
# Reflex configuration # Database path (if using custom location)
export REFLEX_ENV=prod
# Database paths (if using custom locations)
export PATHWAY_DB_PATH=/var/data/pathways.db export PATHWAY_DB_PATH=/var/data/pathways.db
export PATHWAY_CACHE_DIR=/var/cache/pathway-analysis
# Snowflake (if using) # Snowflake (for data refresh only — not needed for the web app)
export SNOWFLAKE_ACCOUNT=your-account export SNOWFLAKE_ACCOUNT=your-account
export SNOWFLAKE_WAREHOUSE=your-warehouse export SNOWFLAKE_WAREHOUSE=your-warehouse
``` ```
### Snowflake Configuration ### Snowflake Configuration
Ensure `config/snowflake.toml` is properly configured for production: Snowflake is only needed for the data refresh CLI command, not for running the web application. Ensure `src/config/snowflake.toml` is configured:
```toml ```toml
[connection] [snowflake]
account = "your-production-account" account = "your-production-account"
warehouse = "ANALYTICS_WH" warehouse = "ANALYTICS_WH"
database = "DATA_HUB" database = "DATA_HUB"
schema = "CDM" schema = "CDM"
authenticator = "externalbrowser" # or "oauth" for service accounts authenticator = "externalbrowser"
[cache]
enabled = true
directory = "/var/cache/pathway-analysis"
ttl_seconds = 86400 # 24 hours
``` ```
## Reflex Cloud ## Data Refresh
For managed hosting, consider [Reflex Cloud](https://reflex.dev/cloud/): The web application reads pre-computed data from SQLite. To update the data:
```bash ```bash
# Deploy to Reflex Cloud # Full refresh (both chart types, all date filters)
reflex deploy python -m cli.refresh_pathways --chart-type all
# The app will serve new data immediately — no restart needed
``` ```
Benefits: Schedule this as a cron job or Windows Task Scheduler task for periodic updates.
- Zero configuration deployment
- Automatic scaling
- Built-in SSL certificates
- Managed state management with Redis
## Security Considerations ## Security Considerations
### Network Security ### Network Security
1. **Firewall Rules**: Only expose necessary ports (typically just 80/443) 1. **Firewall Rules**: Only expose port 8050 (or 80/443 behind reverse proxy)
2. **HTTPS**: Use TLS certificates (Let's Encrypt or organizational certs) 2. **HTTPS**: Use TLS certificates via reverse proxy (nginx, Caddy)
3. **VPN**: Consider restricting access to NHS network only 3. **VPN**: Consider restricting access to NHS network only
### Data Security ### Data Security
1. **Database Access**: Ensure SQLite database permissions are restricted 1. **Database Access**: The app uses read-only SQLite access
2. **File Uploads**: Validate file types and scan for malware 2. **No file uploads**: The Dash app does not accept file uploads
3. **Snowflake**: Use least-privilege service accounts 3. **No authentication built in**: Add authentication via reverse proxy or middleware if needed
### Authentication
For NHS deployments, consider adding authentication:
```python
# Example: Add basic auth middleware
import reflex as rx
from starlette.middleware import Middleware
from starlette.middleware.authentication import AuthenticationMiddleware
# In rxconfig.py
config = rx.Config(
app_name="pathways_app",
# Add authentication middleware
)
```
## Monitoring ## Monitoring
### Health Checks ### Health Checks
The application provides endpoints for monitoring: The application serves at `/` — a 200 response indicates the app is running.
- `/ping` - Basic health check
- Backend port 8000 - FastAPI health
### Logging ### Logging
Configure logging for production: Dash outputs request logs to stdout. Configure log aggregation as needed:
```python ```bash
# In pathways_app/pathways_app.py # Redirect logs to file
import logging gunicorn dash_app.app:server -b 0.0.0.0:8050 --access-logfile /var/log/pathway-analysis/access.log --error-logfile /var/log/pathway-analysis/error.log
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler('/var/log/pathway-analysis/app.log'),
logging.StreamHandler()
]
)
``` ```
## Troubleshooting ## Troubleshooting
### Common Issues ### Port already in use
**Port already in use:**
```bash ```bash
# Find and kill process using port 3000 # Find process using port 8050
lsof -i :3000 lsof -i :8050 # Linux/macOS
kill -9 <PID> netstat -ano | findstr :8050 # Windows
``` ```
**Build cache issues:** ### Database not found
```bash
# Clear Reflex build cache
rm -rf .web
reflex run --env prod
```
**Database connection errors:**
```bash ```bash
# Verify database exists and has correct permissions # Verify database exists
ls -la data/pathways.db ls -la data/pathways.db
sqlite3 data/pathways.db ".tables" sqlite3 data/pathways.db ".tables"
# Recreate if needed
python -m data_processing.migrate
python -m cli.refresh_pathways --chart-type all
``` ```
**Snowflake authentication:** ### Import errors
- Ensure browser is available for SSO popup
- Check firewall allows connections to Snowflake endpoints
- Verify account identifier is correct
## Performance Tuning
### Backend (FastAPI/Uvicorn)
For high-traffic deployments:
```bash ```bash
# Run with multiple workers # Ensure src/ is on Python path
uvicorn pathways_app:app --workers 4 --host 0.0.0.0 --port 8000 uv run python setup_dev.py
```
### State Management # Verify imports
uv run python -c "from dash_app.app import app; print('OK')"
For multi-instance deployments, configure Redis for state management:
```python
# rxconfig.py
config = rx.Config(
app_name="pathways_app",
state_manager_mode="redis",
redis_url="redis://localhost:6379/0",
)
```
### Caching
Enable aggressive caching for Snowflake queries in `config/snowflake.toml`:
```toml
[cache]
enabled = true
ttl_seconds = 86400 # 24 hours for historical data
ttl_current_data_seconds = 3600 # 1 hour for recent data
max_size_mb = 1000 # 1GB cache
``` ```
--- ---
## Quick Reference ## Quick Reference
| Environment | Command | Ports | | Environment | Command | Port |
|-------------|---------|-------| |-------------|---------|------|
| Development | `reflex run` | 3000, 8000 | | Development | `python run_dash.py` | 8050 |
| Production | `reflex run --env prod` | 3000, 8000 | | Production | `gunicorn dash_app.app:server -b 0.0.0.0:8050 --workers 4` | 8050 |
| Backend only | `reflex run --backend-only` | 8000 | | Docker | `docker run -p 8050:8050 pathway-analysis` | 8050 |
| Frontend only | `reflex run --frontend-only` | 3000 |
| Export | `reflex export` | Static files |
| Cloud | `reflex deploy` | Managed |
For more information, see: For more information, see:
- [Reflex Documentation](https://reflex.dev/docs/) - [Dash Documentation](https://dash.plotly.com/)
- [Reflex Cloud](https://reflex.dev/cloud/) - [Gunicorn Deployment](https://docs.gunicorn.org/en/stable/deploy.html)
- [FastAPI Deployment](https://fastapi.tiangolo.com/deployment/)
+5 -5
View File
@@ -187,8 +187,8 @@ All transitions: 150ms ease-out (faster than before)
} }
``` ```
### Reflex Implementation ### Dash Implementation
- Use `height="calc(100vh - 96px)"` for chart container - Chart container uses `dcc.Loading` wrapper around `dcc.Graph`
- Use `width="100%"` with `padding_x="16px"` for full-width - Full-width layout via CSS class `.chart-card` in `dash_app/assets/nhs.css`
- Use `flex="1"` to let chart grow - Minimum height set via CSS: `min-height: 500px`
- Keep `min_height="500px"` as fallback - Margins controlled in `create_icicle_from_nodes()`: `t:40, l:8, r:8, b:24`
+145 -291
View File
@@ -6,15 +6,11 @@ This guide explains how to use the NHS High-Cost Drug Patient Pathway Analysis T
1. [Getting Started](#getting-started) 1. [Getting Started](#getting-started)
2. [Interface Overview](#interface-overview) 2. [Interface Overview](#interface-overview)
3. [Selecting Your Data Source](#selecting-your-data-source) 3. [Filtering Data](#filtering-data)
4. [Configuring Analysis Filters](#configuring-analysis-filters) 4. [Using the Drug Browser](#using-the-drug-browser)
5. [Selecting Drugs, Trusts, and Directories](#selecting-drugs-trusts-and-directories) 5. [Understanding the Pathway Chart](#understanding-the-pathway-chart)
6. [Running the Analysis](#running-the-analysis) 6. [GP Indication Matching](#gp-indication-matching)
7. [Understanding the Pathway Chart](#understanding-the-pathway-chart) 7. [Troubleshooting](#troubleshooting)
8. [Exporting Results](#exporting-results)
9. [GP Indication Validation](#gp-indication-validation)
10. [Keyboard Navigation and Accessibility](#keyboard-navigation-and-accessibility)
11. [Troubleshooting](#troubleshooting)
--- ---
@@ -25,371 +21,229 @@ This guide explains how to use the NHS High-Cost Drug Patient Pathway Analysis T
Start the application by running: Start the application by running:
```bash ```bash
reflex run python run_dash.py
``` ```
Then open your browser to **http://localhost:3000** Then open your browser to **http://localhost:8050**
The application will automatically load reference data (drugs, trusts, directories) when you first access it. The application automatically loads pre-computed pathway data from SQLite on startup. No additional setup is needed to view existing data.
### First-Time Setup ### Data Freshness
1. Click **Load Reference Data** on the Home page to populate the filter options The header bar shows when data was last refreshed:
2. Select your preferred data source (SQLite, File Upload, or Snowflake) - **Patient count**: Total patients in the dataset (e.g., "11,118 patients")
3. Configure your date range and other filters - **Last updated**: Relative time since the last data refresh (e.g., "2h ago")
4. Click **Run Analysis** to generate your first pathway chart
To refresh the data, run the CLI command (requires Snowflake access):
```bash
python -m cli.refresh_pathways --chart-type all
```
--- ---
## Interface Overview ## Interface Overview
The application has four main pages, accessible from the sidebar navigation: The application is a single-page layout with the following components:
| Page | Purpose | ### Header
|------|---------| - NHS branding and application title ("HCD Analysis")
| **Home** | Main analysis dashboard with data source selection, filters, and chart display | - Green status dot with patient count and last-updated time
| **Drug Selection** | Select which high-cost drugs to include in the analysis |
| **Trust Selection** | Filter by specific NHS trusts |
| **Directory Selection** | Filter by medical directories/specialties |
### Navigation ### Sidebar (Left)
Navigation items including:
- **Pathway Overview** — main view (always active)
- **Drug Selection** — opens the drug browser drawer
- **Trust Selection** — opens the drawer with trust chips
- **Indications** — opens the drawer with directorate browser
- **Desktop**: Use the sidebar on the left to switch between pages ### KPI Row
- **Mobile**: Use the top navigation bar Four summary cards that update dynamically:
- **Keyboard**: Press Tab to navigate, Enter to select - **Unique Patients** — number of distinct patients matching current filters
- **Drug Types** — number of distinct drugs in filtered data
- **Total Cost** — total cost of treatments in the filtered dataset
- **Indication Match** — GP diagnosis match rate (~93% for indication charts, shown as "—" for directory charts)
### Filter Bar
- **Chart type toggle**: "By Directory" / "By Indication" pills
- **Treatment Initiated**: All years, Last 2 years, or Last 1 year
- **Last Seen**: Last 6 months or Last 12 months
### Chart Card
- Dynamic subtitle showing the current hierarchy (e.g., "Trust → Directorate → Drug → Pathway")
- Interactive Plotly icicle chart
- Loading spinner during data fetch
--- ---
## Selecting Your Data Source ## Filtering Data
The application supports three data sources: ### Chart Type
### 1. SQLite Database (Recommended) Toggle between two views using the pills in the filter bar:
Pre-loaded patient data stored locally for fast performance. | View | Hierarchy | Best For |
|------|-----------|----------|
| **By Directory** | Trust → Directorate → Drug → Pathway | Understanding treatment by medical specialty |
| **By Indication** | Trust → GP Diagnosis → Drug → Pathway | Understanding treatment by patient condition |
**Advantages:** ### Date Filters
- Fastest analysis performance
- Works offline
- No authentication required
**To use:** Click "Use SQLite" in the Data Source section Two dropdowns control the time window:
### 2. File Upload | Filter | Options | Effect |
|--------|---------|--------|
| **Treatment Initiated** | All years, Last 2 years, Last 1 year | When patients started treatment |
| **Last Seen** | Last 6 months, Last 12 months | Most recent activity window |
Upload CSV or Parquet files directly. The default is "All years / Last 6 months" — showing all patients who have been active in the last 6 months.
**Supported formats:** ### Drug and Trust Selection
- CSV files (.csv)
- Apache Parquet files (.parquet, .pq)
**To use:** Open the drawer (right panel) by clicking "Drug Selection" or "Trust Selection" in the sidebar:
1. Drag and drop a file, or click the upload area
2. Wait for the file to process
3. Click "Use File" to select it as your data source
### 3. Snowflake - **Drug chips**: Click to select/deselect specific drugs. Selected drugs filter the chart.
- **Trust chips**: Click to select/deselect specific NHS trusts.
- **Clear All Filters**: Button at the bottom resets all drug and trust selections.
Query live data from the NHS data warehouse. **No selections = show everything.** Leaving chips unselected is the same as selecting all.
**Requirements:**
- Snowflake must be configured (see `config/snowflake.toml`)
- Browser-based NHS SSO authentication
**To use:** Click "Use Snowflake" - you'll be prompted to authenticate via your browser
--- ---
## Configuring Analysis Filters ## Using the Drug Browser
The Home page provides several filter options: The drawer contains three sections:
### Date Range ### All Drugs
A flat list of all 42 available drugs as selectable chips. Click one or more to filter the chart to those drugs only.
| Field | Description | ### Trusts
|-------|-------------| A list of 7 NHS trusts as selectable chips. Click to filter by specific organizations.
| **Start Date** | Include patients initiated from this date onwards |
| **End Date** | Include patients initiated until this date |
| **Last Seen After** | Only include patients with activity after this date (excludes patients who haven't been seen recently) |
**Tip:** The default range is the last 12 months. ### By Directorate
An accordion browser organized by clinical directorate:
### Minimum Patients 1. Click a **directorate** (e.g., "CARDIOLOGY") to expand it
2. Inside, click an **indication** (e.g., "heart failure") to expand further
3. Each indication shows **drug fragment badges** (e.g., "SACUBITRIL", "IVABRADINE")
4. Clicking a drug fragment badge selects all full drug names that contain that fragment
Filter out pathways with fewer patients than the threshold you set. For example, clicking the "ADALIMUMAB" badge would select "ADALIMUMAB" in the drug chips above.
- Use the slider for quick adjustment (0-100) ### Fragment Matching
- Or type a specific number in the text field
- Set to 0 to show all pathways regardless of patient count
### Custom Title Drug fragments are substrings, not exact matches. The fragment "INHALED" would match drugs like "INHALED BECLOMETASONE" and "INHALED FLUTICASONE".
Override the automatically generated chart title with your own text. Clicking a fragment toggles its matching drugs:
- **First click**: Selects all matching drugs
- Leave empty to use the default title: "Patients initiated [start date] to [end date]" - **Second click**: Deselects all matching drugs (if all were already selected)
- Useful for specific reports or presentations
---
## Selecting Drugs, Trusts, and Directories
Each selection page works the same way:
### Navigation
1. Click "Drug Selection", "Trust Selection", or "Directory Selection" in the sidebar
2. The page shows all available options with checkboxes
### Search
Type in the search box to filter the list. The list updates as you type.
### Selection Actions
| Button | Action |
|--------|--------|
| **Select All** | Check all visible items |
| **Clear All** | Uncheck all items |
| **Select Defaults** | (Drugs only) Select pre-configured default drugs (Include=1 in include.csv) |
### Selection Behavior
- **No items selected** = Include ALL items in analysis
- **Some items selected** = Include ONLY the selected items
This means leaving a filter empty is equivalent to "select all".
---
## Running the Analysis
### Steps
1. Ensure your data source is selected and configured
2. Set your date range and other filters
3. Select desired drugs, trusts, and directories (or leave empty for all)
4. Click the green **Run Analysis** button
### During Analysis
- The button shows a spinner while analysis is running
- Status messages appear below the button
- The interface remains responsive - you can review settings
### After Analysis
- The pathway chart appears in the chart section
- Export buttons become available
- GP indication validation results appear (if Snowflake is connected)
--- ---
## Understanding the Pathway Chart ## Understanding the Pathway Chart
The analysis generates an interactive **icicle chart** showing patient treatment pathways.
### Hierarchy Structure ### Hierarchy Structure
The chart displays a hierarchical structure: The icicle chart displays a hierarchical breakdown:
**Directory view:**
``` ```
N&WICS (Regional Total) Root (Regional Total)
└─ Trust Name (e.g., "Norfolk and Norwich University Hospitals") └─ Trust (e.g., "Norfolk and Norwich University Hospitals")
└─ Directory (e.g., "Rheumatology", "Gastroenterology") └─ Directorate (e.g., "RHEUMATOLOGY")
└─ Drug Name (e.g., "ADALIMUMAB", "INFLIXIMAB") └─ Drug (e.g., "ADALIMUMAB")
└─ Pathway (e.g., "ADALIMUMAB → INFLIXIMAB")
```
**Indication view:**
```
Root (Regional Total)
└─ Trust
└─ GP Diagnosis (e.g., "rheumatoid arthritis")
└─ Drug
└─ Pathway
``` ```
### Reading the Chart ### Reading the Chart
- **Width** of each section indicates relative patient count - **Width** of each section indicates relative patient count
- **Color intensity** indicates proportion of patients at that level - **Color intensity** (NHS blue gradient) indicates proportion of parent group
- **Labels** show the category name and patient count - **Labels** show the name and patient count
### Interacting with the Chart ### Interacting with the Chart
| Action | Effect | | Action | Effect |
|--------|--------| |--------|--------|
| **Click** a section | Zoom in to show details for that branch | | **Click** a section | Zoom in to show details for that branch |
| **Click** the root | Zoom out to show full hierarchy | | **Click** the parent/root | Zoom back out |
| **Hover** over a section | See tooltip with patient count | | **Hover** over a section | See tooltip with patient count, cost, dosing frequency, dates |
| Use the **toolbar** | Reset, download image, pan, zoom |
### Plotly Toolbar ### Hover Tooltip Information
The chart includes a Plotly toolbar (top right) with: When hovering over a chart section, you'll see:
- Patient count and percentage of parent
- **Download as PNG** - Save static image - Total cost and cost per patient
- **Zoom controls** - Zoom in/out - First and last seen dates
- **Pan** - Click and drag to move - Treatment dosing frequency (for drug nodes)
- **Reset** - Return to original view - Cost per patient per annum
--- ---
## Exporting Results ## GP Indication Matching
Two export options are available after running an analysis: When viewing "By Indication" charts, the application uses pre-computed GP diagnosis matches:
### Export HTML ### How It Works
Creates an interactive HTML file that can be opened in any browser. 1. During data refresh, each patient's NHS pseudonym is queried against GP primary care records
2. SNOMED cluster codes map clinical conditions to drug indications
3. The most recent GP diagnosis match is used for each patient
4. ~93% of patients are matched to a GP diagnosis
- **Output**: `data/exports/pathway_chart_[timestamp].html` ### Unmatched Patients
- **Use case**: Sharing interactive charts via email or file share
- **Features**: Full interactivity, no software required to view
### Export CSV Patients without a GP diagnosis match appear under their directorate with a "(no GP dx)" suffix (e.g., "RHEUMATOLOGY (no GP dx)").
Exports the underlying data as a spreadsheet. Reasons for unmatched patients:
- GP is outside the data coverage area
- **Output**: `data/exports/pathway_data_[timestamp].csv` - Diagnosis not yet recorded in GP system
- **Use case**: Further analysis in Excel, importing to other tools - Condition managed only in secondary care
- **Includes**: Patient IDs, drugs, dates, costs, directories, indication validation status - Off-label prescribing
### Export Location
All exports are saved to the `data/exports/` directory with timestamped filenames to prevent overwriting.
---
## GP Indication Validation
When connected to Snowflake, the application validates whether patients have appropriate GP diagnoses for their prescribed drugs.
### What It Does
1. Looks up the drug's licensed indications (e.g., ADALIMUMAB for rheumatoid arthritis)
2. Finds corresponding SNOMED codes for those indications
3. Checks each patient's GP records for matching diagnoses
4. Reports the match rate per drug
### Understanding Results
After analysis, a table shows:
| Column | Meaning |
|--------|---------|
| **Drug Name** | The high-cost drug |
| **Total Patients** | Number of patients prescribed this drug |
| **With GP Indication** | Patients with matching GP diagnosis |
| **Match Rate** | Percentage with valid indication |
### Match Rate Interpretation
| Rate | Meaning | Color |
|------|---------|-------|
| **80%+** | Good coverage - most patients have GP diagnoses | Green |
| **50-79%** | Moderate coverage - investigate missing cases | Orange |
| **<50%** | Low coverage - may indicate data quality issues or off-label use | Red |
### Why Rates May Be Low
Low match rates don't necessarily indicate problems:
- **Cross-provider treatment**: Patient's GP is outside the data coverage
- **Recent diagnoses**: Diagnosis not yet recorded in GP system
- **Specialist-only conditions**: Some conditions are only managed in secondary care
- **Off-label prescribing**: Legitimate use for indications not in the mapping
### Enabling/Disabling
Indication validation is enabled by default when Snowflake is connected. It requires:
- Active Snowflake connection
- Drug-to-cluster mappings in the database
---
## Keyboard Navigation and Accessibility
The application is designed to be accessible:
### Skip Link
Press **Tab** when the page loads to reveal a "Skip to main content" link that bypasses navigation.
### Keyboard Navigation
| Key | Action |
|-----|--------|
| **Tab** | Move to next interactive element |
| **Shift+Tab** | Move to previous element |
| **Enter** | Activate buttons, links, checkboxes |
| **Space** | Toggle checkboxes |
| **Arrow keys** | Adjust sliders |
### Screen Reader Support
- All buttons and inputs have descriptive labels
- Status messages announce via ARIA live regions
- Charts include figure descriptions
### Theme Toggle
A dark/light mode toggle is available at the bottom of the sidebar for visual preference.
--- ---
## Troubleshooting ## Troubleshooting
### "No data available" Error ### No data showing
**Cause**: No data matches your current filter settings 1. Check the filter bar — are filters too restrictive?
2. Try clearing all drug/trust selections in the drawer
3. Widen the date range (e.g., "All years / Last 12 months")
**Solutions:** ### Chart shows "No matching pathways found"
1. Check your date range - is it too narrow?
2. Verify your data source has data loaded
3. Check if selected trusts/drugs have any matching records
4. Try clearing all selections (to include everything)
### Chart Not Displaying The current filter combination matches zero patients. Adjust filters or click "Clear All Filters" in the drawer.
**Cause**: Analysis completed but no data met the minimum patients threshold ### App won't start
**Solutions:** ```bash
1. Lower the minimum patients threshold # Ensure dependencies are installed
2. Expand your date range uv sync
3. Select more drugs or trusts
### Snowflake Connection Failed # Ensure src/ is on Python path
uv run python setup_dev.py
**Cause**: Unable to connect to Snowflake # Run with uv
uv run python run_dash.py
```
**Solutions:** ### Stale data
1. Check that `config/snowflake.toml` exists and is configured
2. Complete browser authentication when prompted
3. Verify your network allows Snowflake connections
4. Try using SQLite as an alternative data source
### File Upload Failed Data is as fresh as the last CLI refresh. Check the header's "Last updated" indicator. To refresh:
**Cause**: File format or content issue ```bash
python -m cli.refresh_pathways --chart-type all
**Solutions:** ```
1. Ensure file is CSV or Parquet format
2. Check file isn't corrupted or empty
3. Verify file contains required columns
4. Try a smaller file to test
### Slow Performance
**Cause**: Large data volume or complex filtering
**Solutions:**
1. Use SQLite instead of file upload for large datasets
2. Narrow your date range
3. Select fewer drugs/trusts to analyze
4. Increase minimum patients threshold to reduce chart complexity
### Reference Data Not Loading
**Cause**: Missing or corrupted reference files
**Solutions:**
1. Click "Load Reference Data" to retry
2. Check that `data/` directory contains required CSV files:
- `include.csv`
- `defaultTrusts.csv`
- `directory_list.csv`
3. Verify files aren't empty or malformed
--- ---
@@ -397,7 +251,7 @@ A dark/light mode toggle is available at the bottom of the sidebar for visual pr
If you encounter issues not covered in this guide: If you encounter issues not covered in this guide:
1. Check the [README](../README.md) for installation and setup information 1. Check the [README](../README.md) for installation and setup
2. Review [DEPLOYMENT.md](./DEPLOYMENT.md) for server configuration 2. Review [DEPLOYMENT.md](./DEPLOYMENT.md) for server configuration
3. Consult [CLAUDE.md](../CLAUDE.md) for technical architecture details 3. Consult [CLAUDE.md](../CLAUDE.md) for technical architecture details
4. Contact your local support team for NHS-specific questions 4. Contact the Medicines Intelligence team for NHS-specific questions
+122 -116
View File
@@ -5,129 +5,145 @@ If you discover a new failure pattern during your work, add it to this file.
--- ---
## Drug-Indication Matching Guardrails ## Backend Isolation
### Match drugs to indications, not just patients to indications ### Do NOT modify pipeline/analysis logic in src/
- **When**: Building the indication mapping for pathway charts - **When**: Building Dash integration
- **Rule**: Each drug must be validated against BOTH the patient's GP diagnoses AND the drug-to-indication mapping from DimSearchTerm.csv. A patient being diagnosed with rheumatoid arthritis does NOT mean all their drugs are for rheumatoid arthritis. - **Rule**: Do NOT change the logic in these files — they are the data pipeline and must stay as-is:
- **Why**: The previous approach assigned ONE indication per patient (most recent GP dx), ignoring which drugs actually treat which conditions. This produced misleading pathways. - `data_processing/pathway_pipeline.py`, `transforms.py`, `diagnosis_lookup.py` (matching/query logic)
- `analysis/pathway_analyzer.py`, `statistics.py`
- `cli/refresh_pathways.py`
- `data_processing/schema.py`, `reference_data.py`, `cache.py`, `data_source.py`
- **Why**: The pipeline is complete and tested. Changing it risks breaking the data refresh workflow.
### Use DimSearchTerm.csv for drug-to-Search_Term mapping ### DO use shared utilities in src/ rather than duplicating
- **When**: Determining which Search_Term a drug belongs to - **When**: The Dash app needs data loading or figure construction
- **Rule**: Load `data/DimSearchTerm.csv`. The `CleanedDrugName` column has pipe-separated drug name fragments. Match HCD drug names against these fragments using substring matching (case-insensitive). - **Rule**: Dash callbacks should CALL INTO `src/`, not duplicate the code. Shared functions:
- **Why**: This CSV is the authoritative mapping of which drugs are used for which clinical indications. - `data_processing/pathway_queries.py``load_initial_data()` and `load_pathway_nodes()` for all SQLite queries
- `visualization/plotly_generator.py``create_icicle_from_nodes()` for icicle chart from list-of-dicts
- `dash_app/data/queries.py` — thin wrapper that resolves DB path and delegates to shared functions
- **Why**: Duplicating SQL queries and figure logic creates copies that drift apart. Shared code in `src/` is the cleaner architecture.
### Use substring matching for drug fragments ### Do NOT modify pathways.db schema or data
- **When**: Matching HCD drug names against DimSearchTerm CleanedDrugName fragments - **When**: Querying the database from Dash callbacks
- **Rule**: Check if any fragment from DimSearchTerm is a SUBSTRING of the HCD drug name (case-insensitive). E.g., "PEGYLATED" should match "PEGYLATED LIPOSOMAL DOXORUBICIN". - **Rule**: Read-only access. Use `sqlite3.connect(db_path)` with SELECT queries only. Never INSERT, UPDATE, DELETE, or ALTER.
- **Why**: DimSearchTerm contains both full drug names (ADALIMUMAB) and partial fragments (PEGYLATED, INHALED). Exact match would miss the partial ones. - **Why**: pathways.db is populated by `python -m cli.refresh_pathways`. The Dash app is a read-only consumer.
### Modified UPID uses pipe delimiter
- **When**: Creating indication-aware UPIDs
- **Rule**: Format is `{original_UPID}|{search_term}`. Use pipe `|` as delimiter. Do NOT use ` - ` (hyphen with spaces) as that's used for pathway hierarchy levels in the `ids` column.
- **Why**: The `ids` column uses " - " to separate hierarchy levels (e.g., "N&WICS - NNUH - rheumatoid arthritis - ADALIMUMAB"). Using the same delimiter in UPIDs would break hierarchy parsing.
### Return ALL GP matches per patient, not just most recent
- **When**: Querying Snowflake for patient GP diagnoses
- **Rule**: Remove `QUALIFY ROW_NUMBER() OVER (PARTITION BY ... ORDER BY EventDateTime DESC) = 1`. Return ALL matching Search_Terms per patient with `GROUP BY + COUNT(*)` for code_frequency.
- **Why**: A patient may have GP diagnoses for both rheumatoid arthritis AND asthma. We need ALL matches to cross-reference with their drugs.
### Restrict GP code lookup to HCD data window
- **When**: Building the WHERE clause for the GP record query
- **Rule**: Add `AND pc."EventDateTime" >= :earliest_hcd_date` where `earliest_hcd_date` is `MIN(Intervention Date)` from the HCD DataFrame. Pass this as a parameter to `get_patient_indication_groups()`.
- **Why**: Old GP codes from years before treatment started add noise. A diagnosis coded 10 years ago may no longer be relevant. Restricting to the HCD window ensures code_frequency reflects recent clinical activity for the conditions being actively treated.
### Tiebreaker: highest GP code frequency when a drug matches multiple indications
- **When**: A single drug maps to multiple Search_Terms AND the patient has GP dx for multiple
- **Rule**: Use `code_frequency` (COUNT of matching SNOMED codes per Search_Term per patient) from the GP query. The Search_Term with the most matching codes in the patient's GP record wins. If tied, use alphabetical Search_Term for determinism.
- **Why**: E.g., ADALIMUMAB is listed under rheumatoid arthritis, crohn's disease, psoriatic arthritis, etc. A patient with 47 RA codes and 2 crohn's codes is almost certainly on ADALIMUMAB for RA. Frequency of GP coding is a much stronger signal of clinical intent than recency — a recent one-off asthma check doesn't mean ADALIMUMAB is for asthma.
### Same patient, different indications = separate modified UPIDs
- **When**: A patient's drugs map to different Search_Terms
- **Rule**: Create separate modified UPIDs for each indication. E.g., `RMV12345|rheumatoid arthritis` and `RMV12345|asthma`. These are treated as separate "patients" by the pathway analyzer.
- **Why**: This is the core design — drugs for different indications should create separate treatment pathways, even for the same physical patient.
### Fallback to directory for unmatched drugs
- **When**: A drug doesn't match any Search_Term OR the patient has no GP dx for any of the drug's Search_Terms
- **Rule**: Use fallback format: `{UPID}|{Directory} (no GP dx)`. The indication_df maps this to `"{Directory} (no GP dx)"`.
- **Why**: Maintains consistent behavior with the previous approach for patients/drugs without GP diagnosis matches.
### Merge asthma Search_Terms but keep urticaria separate
- **When**: Working with asthma-related Search_Terms from CLUSTER_MAPPING_SQL or DimSearchTerm.csv
- **Rule**: Merge "allergic asthma", "asthma", and "severe persistent allergic asthma" into a single "asthma" Search_Term. Keep "urticaria" as a separate Search_Term — do NOT merge it with asthma.
- **Why**: These are clinically the same condition at different severity levels. Splitting them fragments the data. Urticaria is a distinct dermatological condition that happens to share OMALIZUMAB.
### Don't modify directory chart processing
- **When**: Making changes to the indication matching logic
- **Rule**: Only modify the indication chart path (`elif current_chart_type == "indication":`). Directory charts use unmodified UPIDs and directory-based grouping.
- **Why**: Directory charts work correctly and should not be affected by indication matching changes.
--- ---
## Snowflake Query Guardrails ## CSS & Design Fidelity
### Use PseudoNHSNoLinked for GP record matching ### Use className matching 01_nhs_classic.html, not inline styles
- **When**: Querying GP records (PrimaryCareClinicalCoding) for patient diagnoses - **When**: Building any Dash HTML component
- **Rule**: Use `PseudoNHSNoLinked` column from HCD data, NOT `PersonKey` (LocalPatientID) - **Rule**: Use `className="css-class-name"` referencing classes from `dash_app/assets/nhs.css`. Do NOT use inline `style={}` dicts for layout/visual styling. Only use inline styles for truly dynamic values (e.g., `style={"flex": patient_count}` for proportional widths).
- **Why**: PersonKey is provider-specific local ID. Only PseudoNHSNoLinked matches PatientPseudonym in GP records. - **Why**: CSS fidelity to the HTML concept is a primary goal. Inline styles drift from the design and are harder to maintain.
### Embed cluster query as CTE in Snowflake ### nhs.css is the single source of CSS truth
- **When**: Looking up patient indications during data refresh - **When**: Adding or modifying styles
- **Rule**: Use the `CLUSTER_MAPPING_SQL` content as a WITH clause in the patient lookup query - **Rule**: All styles go in `dash_app/assets/nhs.css`. If the concept HTML doesn't have a class for something, add it to nhs.css with the same naming convention (`.component__element--modifier`).
- **Why**: This ensures we always use the complete cluster mapping and don't need local storage - **Why**: Dash auto-serves files from `assets/`. Keeping CSS in one file matches the design source (01_nhs_classic.html) and avoids style fragmentation.
### Quote mixed-case column aliases in Snowflake SQL ### Read 01_nhs_classic.html when building UI components
- **When**: Writing SELECT queries that return results to Python code - **When**: Creating any component in `dash_app/components/`
- **Rule**: Use `AS "ColumnName"` (quoted) for any column alias you'll access by name in Python - **Rule**: Read `01_nhs_classic.html` first to see the exact HTML structure, CSS classes, and element hierarchy for that component. Match it as closely as possible.
- **Why**: Snowflake uppercases unquoted identifiers. `SELECT foo AS Search_Term` returns `SEARCH_TERM`, so `row.get('Search_Term')` returns None. Fix: `SELECT foo AS "Search_Term"` - **Why**: The HTML concept IS the design spec. Deviating creates visual inconsistency.
### Build indication_df from all unique UPIDs, not PseudoNHSNoLinked
- **When**: Creating the indication mapping DataFrame for pathway processing
- **Rule**: Use `df.drop_duplicates(subset=['UPID'])` not `drop_duplicates(subset=['PseudoNHSNoLinked'])`
- **Why**: A patient visiting multiple providers has multiple UPIDs. Using unique PseudoNHSNoLinked only maps one UPID per patient, leaving others as NaN.
--- ---
## Data Processing Guardrails ## Callback Architecture
### Copy DataFrames in functions that modify columns ### No circular callback dependencies
- **When**: Writing functions like `prepare_data()` that modify DataFrame columns - **When**: Writing Dash callbacks
- **Rule**: Always `df = df.copy()` at the start of any function that modifies column values on the input DataFrame - **Rule**: Callbacks must flow unidirectionally: filter inputs → `app-state` store → `chart-data` store → UI components. Never have a component that is both Input and Output in the same callback chain without an intermediate store.
- **Why**: `prepare_data()` mapped Provider Code → Name in-place. When called multiple times on the same DataFrame, only the first call worked. The fix: `df.copy()` prevents destructive mutation. - **Why**: Dash raises `DuplicateCallback` errors for circular dependencies, and they're extremely hard to debug.
### Include chart_type in UNIQUE constraints for pathway_nodes ### Use dcc.Store for all state, not server-side globals
- **When**: Creating or modifying the pathway_nodes table schema - **When**: Managing application state (selected filters, chart data, reference data)
- **Rule**: The UNIQUE constraint MUST include `chart_type`: `UNIQUE(date_filter_id, chart_type, ids)` - **Rule**: ALL state lives in `dcc.Store` components. Never use module-level globals, class variables, or `flask.g` for state. The 3 stores are: `app-state` (session), `chart-data` (memory), `reference-data` (session).
- **Why**: Without `chart_type`, `INSERT OR REPLACE` silently overwrites directory chart nodes when indication chart nodes are inserted. - **Why**: Dash is stateless per request. Server-side state breaks with multiple users and causes subtle bugs during development.
### Handle NaN in Directory when building fallback labels ### Use callback_context for multi-input callbacks
- **When**: Creating fallback indication labels for patients without GP diagnosis match - **When**: A callback has multiple Inputs and needs to know which one triggered it
- **Rule**: Check `pd.notna(directory)` before concatenating to string. Use `"UNKNOWN (no GP dx)"` for NaN cases. - **Rule**: Use `dash.callback_context.triggered` (or `ctx.triggered_id` in Dash 2.x) to determine the triggering input.
- **Why**: NaN handling prevents TypeError and ensures meaningful fallback labels. - **Why**: Without this, the callback runs for every input change and you can't distinguish which filter changed.
### Use parameterized queries for SQLite ### Pattern-matching callbacks for dynamic drug chips
- **When**: Building WHERE clauses with user-selected filters - **When**: Building the card browser drawer with clickable drug chips
- **Rule**: Use `?` placeholders and pass params tuple — never string interpolation - **Rule**: Use `{"type": "drug-chip", "index": drug_name}` pattern for chip IDs. Register callbacks with `Input({"type": "drug-chip", "index": ALL}, "n_clicks")`. Access triggered chip via `ctx.triggered_id["index"]`.
- **Why**: Prevents SQL injection and handles special characters in drug/directory names - **Why**: The number of drug chips is dynamic (changes per directorate/indication). Pattern-matching callbacks handle this without hardcoding IDs.
### Use existing pathway_analyzer functions
- **When**: Processing pathway data for the icicle chart
- **Rule**: Reuse functions from `analysis/pathway_analyzer.py` — don't reinvent
- **Why**: The existing code handles edge cases (empty groups, statistics calculation, color mapping)
--- ---
## Reflex Guardrails ## Plotly Figure
### Use .to() methods for Var operations in rx.foreach ### Preserve create_icicle_from_nodes() in src/visualization/plotly_generator.py
- **When**: Working with items inside `rx.foreach` render functions - **When**: Modifying the icicle chart
- **Rule**: Use `item.to(int)` for numeric comparisons, `item.to_string()` for text operations - **Rule**: `create_icicle_from_nodes(nodes, title)` in `src/visualization/plotly_generator.py` is the shared icicle chart function. It accepts list-of-dicts from dcc.Store. Key properties:
- **Why**: Items from rx.foreach are Var objects, not plain Python values. - 10-field customdata structure (value, colour, cost, costpp, first_seen, last_seen, first_seen_parent, last_seen_parent, average_spacing, cost_pp_pa)
- NHS colorscale: `[[0.0, "#003087"], [0.25, "#0066CC"], [0.5, "#1E88E5"], [0.75, "#4FC3F7"], [1.0, "#E3F2FD"]]`
- `maxdepth=3`, `branchvalues="total"`, `sort=False`
- Layout: transparent background, reduced margins, autosize
- **Why**: The icicle chart is tested and correct. The Dash callback in `dash_app/callbacks/chart.py` calls this function.
### Use rx.cond for conditional rendering, not Python if ### Chart data is a list of dicts
- **When**: Conditionally showing/hiding components or changing styles based on state - **When**: Passing data between `chart-data` store and chart callback
- **Rule**: Use `rx.cond(condition, true_component, false_component)` — not Python `if` - **Rule**: `chart-data` store holds `{"nodes": [...], "unique_patients": int, "total_drugs": int, "total_cost": float}`. Each node is a dict with keys matching the SQLite columns needed for the figure: `parents, ids, labels, value, cost, costpp, colour, first_seen, last_seen, first_seen_parent, last_seen_parent, average_spacing, cost_pp_pa`.
- **Why**: Python `if` evaluates at definition time; `rx.cond` evaluates reactively at render time - **Why**: `dcc.Store` serializes to JSON. Keep the same dict structure that `pathways_app.py` uses for `chart_data` so the figure callback works identically.
---
## Data Extraction
### Keep data logic in shared src/ functions, not dash_app/ duplicates
- **When**: Adding or modifying data loading functions
- **Rule**: SQL queries and data logic live in `src/data_processing/pathway_queries.py`. The `dash_app/data/queries.py` is a thin wrapper that resolves the DB path and delegates. Do not duplicate queries in `dash_app/`.
- **Why**: Shared code in `src/` prevents query drift and keeps the single source of truth for data access.
### DimSearchTerm.csv fragments are substrings
- **When**: Building the card browser or matching drugs to indications
- **Rule**: `CleanedDrugName` values in DimSearchTerm.csv are drug name FRAGMENTS (e.g., "ADALIMUMAB", "PEGYLATED", "INHALED"). They're matched against full drug names using `drug_name.upper().contains(fragment)`. Don't assume exact match.
- **Why**: Some fragments are partial (INHALED matches "INHALED BECLOMETASONE", "INHALED FLUTICASONE", etc.).
### Apply SEARCH_TERM_MERGE_MAP when loading DimSearchTerm.csv
- **When**: Building the directorate tree in `card_browser.py`
- **Rule**: Import and apply `SEARCH_TERM_MERGE_MAP` from `data_processing.diagnosis_lookup` to normalize "allergic asthma" → "asthma" and "severe persistent allergic asthma" → "asthma". Keep "urticaria" separate.
- **Why**: The Snowflake query and pathway processing already use merged Search_Terms. The card browser must match.
---
## SQLite Queries
### Use parameterized queries for all filters
- **When**: Building WHERE clauses with user-selected values
- **Rule**: Use `?` placeholders and pass params as a list. Never use f-strings or string interpolation for filter values.
- **Why**: Prevents SQL injection and handles special characters in drug/directory names (e.g., "CROHN'S DISEASE").
### Database path resolution
- **When**: Connecting to pathways.db from dash_app/
- **Rule**: Use `Path(__file__).resolve().parents[2] / "data" / "pathways.db"` from files in `dash_app/data/`. This resolves from `dash_app/data/queries.py` → project root → `data/pathways.db`.
- **Why**: Relative paths break depending on the working directory. Absolute path resolution is reliable.
---
## Dash Framework
### Wrap layout in dmc.MantineProvider
- **When**: Setting up the app layout in `app.py`
- **Rule**: The outermost layout element must be `dmc.MantineProvider(children=[...])`. Without this, DMC components (Drawer, Accordion, Chip, etc.) won't render.
- **Why**: Dash Mantine Components requires the Provider context to function.
### dcc.Store storage_type matters
- **When**: Creating the 3 store components
- **Rule**:
- `app-state`: `storage_type="session"` — persists across page refreshes within a tab
- `chart-data`: `storage_type="memory"` — cleared on page refresh (reloaded from SQLite)
- `reference-data`: `storage_type="session"` — loaded once, persists across refreshes
- **Why**: Wrong storage type causes stale data bugs (memory clears too often) or wasted queries (session persists when it shouldn't).
### Dash assets directory is auto-served
- **When**: Placing CSS, JS, or images
- **Rule**: Put static assets in `dash_app/assets/`. Dash serves them automatically. Reference CSS via `className`, not `<link>` tags.
- **Why**: Dash's asset pipeline handles caching and serving. Manual `<link>` tags are unnecessary and may not work.
--- ---
@@ -148,20 +164,10 @@ If you discover a new failure pattern during your work, add it to this file.
- **Rule**: The "Next iteration should" section must contain specific, actionable guidance - **Rule**: The "Next iteration should" section must contain specific, actionable guidance
- **Why**: The next iteration has zero memory. If you don't write it down, it's lost. - **Why**: The next iteration has zero memory. If you don't write it down, it's lost.
### Check existing code for patterns ### Validate with `python run_dash.py`
- **When**: Unsure how to implement something - **When**: After completing any task
- **Rule**: Look at `pathways_app/pathways_app.py`, `analysis/pathway_analyzer.py`, `cli/refresh_pathways.py` - **Rule**: Run `python run_dash.py` (or `python -c "from dash_app.app import app"` for import checks). The app must start without errors after EVERY task.
- **Why**: The existing codebase has solved many quirks already - **Why**: Broken imports or circular dependencies compound across tasks. Catch them immediately.
### Snowflake connection_timeout must be high enough for GP lookup queries
- **When**: GP record queries against PrimaryCareClinicalCoding time out
- **Rule**: Ensure `connection_timeout` in config/snowflake.toml is at least 600 (currently set to 600). This controls the Python client's `network_timeout`, which is how long the client waits for ANY Snowflake response. Do NOT lower this value.
- **Why**: GP lookup queries take ~40s per batch due to CTE compilation overhead. With connection_timeout=30, every batch timed out silently (error 000604/57014).
### Use large batch sizes (5000+) for GP record lookups
- **When**: Calling `get_patient_indication_groups()` with patient batches
- **Rule**: Use batch_size=5000 or larger. The query time is ~40s regardless of batch size (5 patients ≈ 500 patients ≈ 5000 patients). Smaller batches just multiply the fixed overhead.
- **Why**: With batch_size=500, 36K patients needed 74 batches × 40s = ~50 min. With batch_size=5000, only 8 batches × 45s = ~6 min. The bottleneck is CTE compilation, not data volume.
<!-- <!--
ADD NEW GUARDRAILS BELOW as failures are observed during the loop. ADD NEW GUARDRAILS BELOW as failures are observed during the loop.
+37 -2
View File
@@ -885,11 +885,46 @@ Migrating the HCD Analysis frontend from Reflex to Dash (Plotly) + Dash Mantine
### Blocked items (iter 16): ### Blocked items (iter 16):
- None - None
## MIGRATION COMPLETE ## Iteration 17 — 2026-02-06
All 20 tasks across 5 phases complete. 16 iterations total. ### Task: Phase 6 — Update all documentation
### Why this task:
- Phases 0-5 complete. Phase 6 (documentation cleanup) was the only remaining work.
- All non-archive .md files still referenced Reflex (commands, architecture, deployment).
### Status: COMPLETE
### What was done:
- **README.md**: Complete rewrite — replaced `reflex run` with `python run_dash.py`, updated features list (dual charts, pre-computed pathways, drug browser), updated project structure (dash_app/ instead of pathways_app/), updated quick start, usage, troubleshooting
- **docs/USER_GUIDE.md**: Complete rewrite — new single-page Dash UI guide covering header, sidebar, KPI row, filter bar, chart card, drawer, drug browser, indication matching
- **docs/DEPLOYMENT.md**: Complete rewrite — Dash/Gunicorn deployment replacing Reflex/FastAPI/Next.js, Docker examples, nginx config, systemd service, single port 8050
- **docs/DESIGN_SYSTEM.md**: Updated "Reflex Implementation" section → "Dash Implementation" with correct CSS/component references
- **RALPH_PROMPT.md**: Updated title ("Dash Application Maintenance" not "Reflex → Dash Migration"), updated data reference section to point to src/ shared functions instead of pathways_app/
- **guardrails.md**: Updated 3 rules to reflect current Dash architecture (shared utilities exist, icicle function exists, data queries exist) instead of migration instructions
- **IMPLEMENTATION_PLAN.md**: Phase 6 tasks marked [x]
### Validation results:
- Tier 1 (Code): `from dash_app.app import app` — OK, 7 callbacks registered
- Grep for Reflex in non-archive .md files: only CLAUDE.md line 140 (archive description — accurate) and IMPLEMENTATION_PLAN.md (historical migration log — accurate)
### Files changed:
- README.md — Rewritten for Dash
- docs/USER_GUIDE.md — Rewritten for Dash
- docs/DEPLOYMENT.md — Rewritten for Dash
- docs/DESIGN_SYSTEM.md — Updated implementation section
- RALPH_PROMPT.md — Updated title and references
- guardrails.md — Updated 3 rules
- IMPLEMENTATION_PLAN.md — Phase 6 marked [x]
- progress.txt — This entry
### Patterns discovered:
- archive/ files (IMPROVEMENT_RECOMMENDATIONS.md) retain Reflex references intentionally — they're historical
- IMPLEMENTATION_PLAN.md retains Reflex references in completed task descriptions — these are accurate migration history
### Next iteration should:
- Phase 6 is complete. All tasks across all phases are now [x].
### Blocked items:
- None
## ALL PHASES COMPLETE
All 24 tasks across 6 phases complete. 17 iterations total.
- Phase 0: Scaffolding (2 tasks) — iteration 1 - Phase 0: Scaffolding (2 tasks) — iteration 1
- Phase 1: Data Access (2 tasks) — iterations 2-3 - Phase 1: Data Access (2 tasks) — iterations 2-3
- Phase 2: Static Layout (3 tasks) — iterations 4-6 - Phase 2: Static Layout (3 tasks) — iterations 4-6
- Phase 3: Core Callbacks (4 tasks) — iterations 7-10 - Phase 3: Core Callbacks (4 tasks) — iterations 7-10
- Phase 4: Drawer (2 tasks) — iterations 11-12 - Phase 4: Drawer (2 tasks) — iterations 11-12
- Phase 5: Polish & Cleanup (4 tasks) — iterations 13-16 - Phase 5: Polish & Cleanup (4 tasks) — iterations 13-16
- Phase 6: Documentation (4 tasks) — iteration 17