docs: update all documentation for Dash migration (Phase 6)

Rewrote README.md, USER_GUIDE.md, and DEPLOYMENT.md to reflect
the Dash application. Updated RALPH_PROMPT.md, guardrails.md, and
DESIGN_SYSTEM.md to remove Reflex references. All non-archive
documentation now reflects the current Dash + DMC architecture.
This commit is contained in:
Andrew Charlwood
2026-02-06 14:54:12 +00:00
parent 4cb5641c2d
commit 54b4a0f743
8 changed files with 635 additions and 956 deletions
+8
View File
@@ -256,6 +256,14 @@ Drawer selection → update_drug_selection → app-state store → load_pathway_
- [x] Verify: No Reflex imports anywhere in `dash_app/`
- **Checkpoint**: Full application works, no Reflex remnants, CLAUDE.md updated
## Phase 6: Update all documentation
- [x] Remove `reflex` references from all documentation
- [x] Verify: No Reflex mentions of reflex in any md files (archive/ excluded — historical)
- [x] Add documentation in readme re how to run dash app
- [x] Update all claude.md files (CLAUDE.md was updated in Task 5.4)
- **Checkpoint**: Full application works, no Reflex remnants, CLAUDE.md updated
---
## Completion Criteria
+106 -113
View File
@@ -1,128 +1,124 @@
# Ralph Wiggum Loop - Drug-Aware Indication Matching
# Ralph Wiggum Loop Dash Application Maintenance
You are operating inside an automated loop extending a pathway analysis application with drug-aware indication matching. Each iteration you receive fresh context — you have NO memory of previous iterations. Your only memory is the filesystem.
You are operating inside an automated loop maintaining an NHS patient pathway analysis tool built with Dash (Plotly) + Dash Mantine Components. Each iteration you receive fresh context — you have NO memory of previous iterations. Your only memory is the filesystem.
**Current Focus**: Update indication charts so that patient indications are matched **per drug**, not just per patient. Each drug must be validated against the patient's GP diagnoses AND the drug-to-indication mapping from DimSearchTerm.csv.
**Current Focus**: Maintain and enhance the Dash application in `dash_app/`. The backend (`src/`) provides shared data access and visualization functions. The design target is `01_nhs_classic.html`.
## First Actions Every Iteration
Read these files in this order before doing anything else:
1. `progress.txt` — What previous iterations accomplished, what's blocked, and what to do next. The most recent entry is most important.
2. `IMPLEMENTATION_PLAN.md` — Task list with status markers, project overview, and completion criteria.
1. `progress.txt` — What previous iterations accomplished, what's blocked, and what to do next.
2. `IMPLEMENTATION_PLAN.md` — Task list with status markers, architecture overview, and completion criteria.
3. `guardrails.md` — Known failure patterns to avoid. You MUST read and follow these.
4. `CLAUDE.md` — Project architecture and code patterns.
4. `CLAUDE.md` — Project architecture and backend code patterns.
Then run `git log --oneline -5` to see recent commits.
## Reading the Design Reference
**When building ANY UI component**, read `01_nhs_classic.html` first:
- It contains the exact CSS classes, HTML structure, and visual layout you must replicate
- CSS lives in the `<style>` block (lines 8-314) — this becomes `dash_app/assets/nhs.css`
- HTML structure (lines 316-480+) shows the component hierarchy and class usage
- Match the design as closely as possible — `className` in Dash = `class` in HTML
**When building data loading or chart callbacks**, reference the shared functions in `src/`:
- `src/data_processing/pathway_queries.py`: `load_initial_data()` and `load_pathway_nodes()` — shared query functions
- `src/visualization/plotly_generator.py`: `create_icicle_from_nodes()` — icicle chart from list-of-dicts
- `dash_app/data/queries.py`: Thin wrapper calling shared functions with correct DB path
- The original logic is archived in `archive/pathways_app/pathways_app.py` for reference.
## Narration
Narrate your work as you go. Your output is the only visibility the operator has into what's happening. For every significant action, explain what you're doing and why:
- **Reading files**: "Reading progress.txt to check what the last iteration accomplished..."
- **Creating code**: "Adding assign_drug_indications() function to diagnosis_lookup.py..."
- **Debugging**: "Drug matching returned 0 results for ADALIMUMAB. Checking DimSearchTerm lookup..."
- **Testing**: "Running import check to verify the new function is accessible..."
- **Making decisions**: "The guardrails say to use substring matching for drug fragments."
- **Committing**: "Committing drug-indication matching logic."
- **Reading files**: "Reading 01_nhs_classic.html to get CSS classes for the header component..."
- **Creating code**: "Creating dash_app/components/header.py with make_header() function..."
- **Debugging**: "Import error for dmc.Drawer — checking dash-mantine-components version..."
- **Testing**: "Running python run_dash.py to verify the app starts..."
- **Making decisions**: "The guardrails say to use className from nhs.css, not inline styles."
- **Committing**: "Committing header and sidebar components."
Do NOT just output a summary at the end. Narrate throughout. Think of this as a live log of your reasoning.
Do NOT just output a summary at the end. Narrate throughout.
## Task Selection
You have flexibility to choose which task to work on. Use your judgement, but document your reasoning.
1. Read ALL tasks in IMPLEMENTATION_PLAN.md — understand the full picture
2. Skip any marked `[x]` (complete) or `[B]` (blocked)
3. Check progress.txt for guidance — the previous iteration may have recommendations
4. **Choose a task** based on:
- Dependencies (some tasks require others to be done first)
- Logical flow (query changes before matching logic, matching before pipeline integration)
- Your assessment of what would be most valuable to tackle next
- Previous iteration's recommendations (consider but don't blindly follow)
5. **Document your reasoning**: Before starting work, briefly explain WHY you chose this task over others
- Dependencies (scaffolding before components, components before callbacks)
- Logical flow (Phase 0 → 1 → 2 → 3 → 4 → 5)
- Previous iteration's recommendations
5. **Document your reasoning**: Before starting, explain WHY you chose this task
6. Mark your chosen task `[~]` (in progress) in IMPLEMENTATION_PLAN.md
If your chosen task turns out to be blocked during work:
- Mark it `[B]` with a reason in IMPLEMENTATION_PLAN.md
If your chosen task is blocked:
- Mark it `[B]` with a reason
- Document the blocker in progress.txt
- Move to a different ready task within this same iteration
- Move to a different ready task
## Development
Work on ONE task per iteration. Build incrementally and verify as you go.
### Key Concepts
### Key Technologies
**Drug-Indication Matching Flow:**
1. Get patient's GP-matched Search_Terms from Snowflake (ALL matches, not just most recent, with code_frequency)
- Only count GP codes from MIN(Intervention Date) onwards (the HCD data window)
2. Load DimSearchTerm.csv to get which drugs belong to which Search_Terms
3. For each patient-drug pair: intersection of (Search_Terms listing this drug) AND (patient's GP matches)
- If multiple matches: pick highest code_frequency (most GP coding = most likely indication)
4. Modify UPID to include matched indication: `{UPID}|{search_term}`
5. Drugs sharing the same indication for the same patient → same modified UPID → same pathway
6. Drugs under different indications → different modified UPIDs → separate pathways
- **Dash 2.x**: `from dash import Dash, html, dcc, Input, Output, State, callback_context, ALL`
- **Dash Mantine Components 0.14.x**: `import dash_mantine_components as dmc` — needs `dmc.MantineProvider` wrapping the layout
- **Plotly**: `import plotly.graph_objects as go` — for the icicle chart
- **SQLite**: `import sqlite3` — read-only access to `data/pathways.db`
- **CSS**: All in `dash_app/assets/nhs.css` — auto-served by Dash
**DimSearchTerm.csv:**
- `Search_Term`: Clinical condition (e.g., "rheumatoid arthritis")
- `CleanedDrugName`: Pipe-separated drug fragments (e.g., "ADALIMUMAB|GOLIMUMAB|...")
- `PrimaryDirectorate`: The directorate for this condition
- Drug matching: check if any fragment is a substring of the HCD drug name (case-insensitive)
### Dash Component Patterns
**Modified UPID Format:**
- Original: `RMV12345` (Provider Code[:3] + PersonKey)
- Modified: `RMV12345|rheumatoid arthritis`
- Fallback: `RMV12345|RHEUMATOLOGY (no GP dx)`
- The existing pathway analyzer treats UPID as an opaque identifier — this works transparently
### Code Patterns
- **Snowflake queries**: Use parameterized queries, embed the cluster CTE from CLUSTER_MAPPING_SQL
- **GP record matching**: Return ALL matches per patient (not just most recent)
- **Drug mapping**: Load from `data/DimSearchTerm.csv`, match drug name fragments
- **Pathway pipeline**: Use existing functions — modified UPIDs flow through naturally
- **Reflex state**: No changes expected — indication charts already work, just with better matching
### Key Data Structures
**GP Matches (from Snowflake) — updated to return ALL matches with frequency:**
```python
# Multiple rows per patient (one per matched Search_Term)
# code_frequency = COUNT of matching SNOMED codes (used as tiebreaker)
# Only counts codes from MIN(Intervention Date) onwards
DataFrame with: PatientPseudonym, Search_Term, code_frequency
# HTML elements use dash.html
from dash import html
html.Div(className="top-header", children=[...])
# Mantine components for rich UI
import dash_mantine_components as dmc
dmc.Drawer(id="drug-drawer", position="right", size="480px", children=[...])
dmc.Accordion(children=[dmc.AccordionItem(...)])
# State management
dcc.Store(id="app-state", storage_type="session", data={})
# Callbacks
@app.callback(
Output("chart-data", "data"),
Input("app-state", "data"),
)
def load_pathway_data(app_state):
...
```
**Drug-to-Indication Mapping (from DimSearchTerm.csv):**
```python
# search_term → list of drug fragments
{"rheumatoid arthritis": ["ABATACEPT", "ADALIMUMAB", "ANAKINRA", ...]}
```
### Database Access Pattern
**Modified HCD Data:**
```python
# Original UPID replaced with indication-aware UPID
df["UPID"] = "RMV12345|rheumatoid arthritis" # for matched drugs
df["UPID"] = "RMV12345|RHEUMATOLOGY (no GP dx)" # for unmatched drugs
```
from pathlib import Path
import sqlite3
**Indication DataFrame:**
```python
# Maps modified UPID → Search_Term (for pathway hierarchy level 2)
indication_df = pd.DataFrame({
'Directory': ['rheumatoid arthritis', 'asthma', 'CARDIOLOGY (no GP dx)']
}, index=['RMV12345|rheumatoid arthritis', 'RMV12345|asthma', 'RMV67890|CARDIOLOGY (no GP dx)'])
DB_PATH = Path(__file__).resolve().parents[2] / "data" / "pathways.db"
def load_pathway_data(filter_id, chart_type, selected_drugs=None, selected_directorates=None):
conn = sqlite3.connect(str(DB_PATH))
conn.row_factory = sqlite3.Row
# ... query with parameterized WHERE ...
conn.close()
return result_dict
```
### Verification Steps
After writing code, ALWAYS verify:
1. **Syntax check**: `python -m py_compile <file.py>`
2. **Import check**: `python -c "from module import function"`
3. **For database changes**: Test with query against pathways.db
4. **For Reflex changes**: `python -m reflex compile`
1. **Import check**: `python -c "from dash_app.app import app"` (or specific module)
2. **App starts**: `python run_dash.py` — must start without errors
3. **Visual check** (when building UI): describe what you expect to see at localhost:8050
4. **For callbacks**: verify the callback chain fires correctly (add temporary `print()` statements if needed)
If any step fails, fix the issue before proceeding.
@@ -133,24 +129,23 @@ Every task MUST pass validation before being marked complete:
### Tier 1: Code Validation (MANDATORY)
- Code compiles without Python syntax errors
- Imports work without errors
- No TypeErrors, ImportErrors, or AttributeErrors
- `python run_dash.py` starts without exceptions
### Tier 2: Data Validation (for data/pipeline tasks)
- Queries return expected row counts
- Data structures have correct columns/types
- Drug-indication matching produces valid results
- Modified UPIDs have correct format
### Tier 2: Layout Validation (for UI component tasks)
- Component renders in the browser
- CSS classes match 01_nhs_classic.html
- Layout structure matches the HTML concept
### Tier 3: Functional Validation (for UI/integration tasks)
- Reflex compiles the app without errors
- State changes trigger expected behavior
- Both chart types render correctly
### Tier 3: Functional Validation (for callback tasks)
- Callbacks fire when inputs change
- Data flows correctly through dcc.Store chain
- Chart renders with real data from SQLite
### Validation Failure
If any tier fails:
- DO NOT mark the task complete
- Document the failure details in progress.txt
- Document the failure in progress.txt
- Fix the issue within this iteration if possible
- If you cannot fix it, mark the task `[B]` with details
@@ -159,34 +154,33 @@ If any tier fails:
Before marking ANY task `[x]`, ALL of these must be true:
1. Code is saved to the appropriate file(s)
2. Tier 1 code validation passed
2. Tier 1 validation passed (imports + app starts)
3. Tier 2/3 validation passed (as applicable)
4. All changes committed to git with a descriptive message
These are non-negotiable. A task that "feels done" but hasn't passed all gates is NOT done.
These are non-negotiable.
## Update Progress
After completing your work (whether the task succeeded, failed, or was blocked), append to progress.txt using this format:
After completing your work, append to progress.txt using this format:
```
## Iteration [N] — [YYYY-MM-DD]
### Task: [which task you worked on]
### Why this task:
- [Brief explanation of why you chose this task over others]
- [What dependencies or logical flow led to this choice]
### Status: COMPLETE | BLOCKED | IN PROGRESS
### What was done:
- [Specific actions taken]
### Validation results:
- Tier 1 (Code): [syntax check, import check]
- Tier 2 (Data): [query results, row counts]
- Tier 3 (Functional): [reflex compile, UI check]
- Tier 1 (Code): [import check, app starts]
- Tier 2 (Layout): [renders correctly, CSS matches]
- Tier 3 (Functional): [callbacks fire, data flows]
### Files changed:
- [list of files created/modified]
### Committed: [git hash] "[commit message]"
### Patterns discovered:
- [Any reusable learnings — query patterns, matching logic quirks]
- [Any reusable learnings — Dash patterns, DMC quirks, CSS gotchas]
### Next iteration should:
- [Explicit guidance for what the next fresh instance should do first]
- [Note any context that would be lost without writing it here]
@@ -194,20 +188,20 @@ After completing your work (whether the task succeeded, failed, or was blocked),
- [Any tasks that are blocked and why]
```
If you discover a failure pattern that future iterations should avoid, add it to `guardrails.md`.
If you discover a failure pattern, add it to `guardrails.md`.
## Commit Changes
1. Stage changed files
2. Use a descriptive commit message referencing the task (e.g., "feat: add drug-indication matching function (Task 2.1)")
3. Commit after your task is validated and complete — one commit per logical unit of work
2. Use a descriptive commit message referencing the task (e.g., "feat: create dash_app skeleton with nhs.css (Task 0.1 + 0.2)")
3. Commit after your task is validated and complete
4. If you updated progress.txt with a blocked status, commit that too
## Completion Check
If ALL tasks in IMPLEMENTATION_PLAN.md are marked `[x]`:
1. Run `reflex compile` to verify app compiles
1. Run `python run_dash.py` to verify app starts cleanly
2. Verify all completion criteria at the bottom of IMPLEMENTATION_PLAN.md are satisfied
3. Only then output the completion signal on its own line:
@@ -217,20 +211,19 @@ If ALL tasks in IMPLEMENTATION_PLAN.md are marked `[x]`:
DO NOT output this string under any other circumstances.
DO NOT output it if any task is still `[ ]` or `[B]` or `[~]`.
DO NOT paraphrase, vary, or conditionally output this string.
## Rules
- Complete ONE task per iteration, then update progress and stop
- ALWAYS read progress.txt, guardrails.md before starting work
- **Match drugs to indications** — not just patients to indications
- **Use DimSearchTerm.csv** for drug-to-Search_Term mapping
- **Return ALL GP matches** — not just most recent (remove QUALIFY ROW_NUMBER = 1)
- **Modified UPID format**: `{UPID}|{search_term}` — pipe delimiter is safe
- **Use PseudoNHSNoLinked** — NOT PersonKey for GP record matching
- **Substring matching** for drug fragments from DimSearchTerm.csv
- **Read 01_nhs_classic.html** when building ANY visual component
- **Read src/data_processing/pathway_queries.py and src/visualization/plotly_generator.py** when building data logic or chart callbacks
- **DO NOT modify pipeline/analysis logic** in src/ (pathway_pipeline, transforms, diagnosis_lookup, pathway_analyzer, refresh_pathways)
- **DO add shared utilities** to src/ (visualization/plotly_generator.py, data_processing/database.py) rather than duplicating logic in dash_app/
- **Use className from nhs.css** — not inline styles
- **dcc.Store for state** — no server-side globals
- **Unidirectional callbacks** — app-state → chart-data → UI
- **Port icicle_figure exactly** — same customdata, colorscale, templates
- Keep commits atomic and well-described
- If stuck on the same issue for more than 2 attempts within one iteration, document it in progress.txt and move to the next ready task
- When in doubt, check existing code for patterns that work
- **Pipeline before UI** — processing logic before Reflex changes
- **Don't change directory charts** — only indication chart matching changes
- If stuck for 2+ attempts, document in progress.txt and move on
- `python run_dash.py` must work after every task
+123 -140
View File
@@ -5,142 +5,144 @@ A web-based application for analyzing secondary care patient treatment pathways.
## Features
- **Interactive Visualization**: Plotly icicle charts showing patient treatment hierarchies with cost and frequency statistics
- **Multi-Source Data Loading**: CSV/Parquet files, SQLite database, or direct Snowflake integration
- **GP Diagnosis Validation**: Validate patient indications against GP SNOMED codes via NHS Snowflake
- **Modern Web Interface**: Browser-based UI using Reflex framework with NHS branding
- **Dual Chart Types**: Directory-based (Trust → Directorate → Drug → Pathway) and Indication-based (Trust → GP Diagnosis → Drug → Pathway) views
- **Pre-computed Pathways**: Treatment pathways pre-processed and stored in SQLite for sub-50ms filter response times
- **GP Diagnosis Matching**: Patient indications matched from GP records using SNOMED cluster codes (~93% match rate)
- **Modern Web Interface**: Browser-based UI using Dash (Plotly) + Dash Mantine Components with NHS branding
- **Drug Browser**: Drawer-based card browser organized by clinical directorate for drug/indication selection
- **Flexible Filtering**: Filter by date range, NHS trusts, drugs, and medical directories
- **Export Options**: Export charts as interactive HTML or data as CSV
## Requirements
- Python 3.10 or higher
- pip or uv package manager
- uv package manager (recommended)
### Optional (for Snowflake integration)
- `snowflake-connector-python` package
### Optional (for data refresh)
- Access to NHS Snowflake data warehouse with SSO authentication
## Installation
### Using pip
```bash
# Clone the repository
git clone <repository-url>
cd patient-pathway-analysis
# Install dependencies
pip install -r requirements.txt
```
### Using uv (recommended)
```bash
# Install uv if not already installed
pip install uv
# Sync dependencies
uv sync
```
### Install with test dependencies
```bash
pip install -e ".[test]"
# One-time dev setup: adds src/ to Python path via .pth file
uv run python setup_dev.py
```
## Quick Start
### 1. Run the Web Application (Recommended)
### Run the Web Application
```bash
reflex run
python run_dash.py
```
Open http://localhost:3000 in your browser.
Open http://localhost:8050 in your browser.
The application loads pre-computed pathway data from SQLite on startup. No additional configuration is needed for viewing existing data.
### Refresh Pathway Data (requires Snowflake)
```bash
# Initialize/migrate the database
python -m data_processing.migrate
# Full refresh — both chart types, all date filters
python -m cli.refresh_pathways --chart-type all
# Directory charts only (faster, ~5 minutes)
python -m cli.refresh_pathways --chart-type directory
# Indication charts only (~12 minutes, includes GP lookup)
python -m cli.refresh_pathways --chart-type indication
# Dry run (test without database changes)
python -m cli.refresh_pathways --chart-type all --dry-run -v
```
## Usage
### Web Interface (Reflex)
### Interface Overview
1. **Load Data**: On the home page, select your data source:
- **SQLite Database**: Uses pre-loaded data from `data/pathways.db`
- **File Upload**: Drag and drop a CSV or Parquet file
- **Snowflake**: Fetch data directly from NHS Snowflake (requires configuration)
The application has a single-page layout with:
2. **Configure Filters**:
- Set date range (Start Date, End Date, Last Seen After)
- Navigate to Drug/Trust/Directory selection pages using the sidebar
- Use search boxes to find and select items
- Set minimum patient threshold to filter small groups
| Component | Purpose |
|-----------|---------|
| **Header** | NHS branding, data freshness indicator (patient count + relative time) |
| **Sidebar** | Navigation items with drawer triggers for Drug Selection, Trust Selection, Indications |
| **KPI Row** | 4 cards: Unique Patients, Drug Types, Total Cost, Indication Match Rate |
| **Filter Bar** | Chart type toggle (By Directory / By Indication) + date filter dropdowns |
| **Chart Card** | Interactive Plotly icicle chart with loading spinner |
| **Drawer** | Right-side panel with drug chips, trust chips, and directorate card browser |
3. **Run Analysis**: Click "Run Analysis" to generate the icicle chart
### Filtering Data
4. **Export Results**:
- **Export HTML**: Save the interactive chart as a standalone HTML file
- **Export CSV**: Export the filtered data as a CSV file
1. **Chart Type**: Toggle between "By Directory" and "By Indication" views
2. **Date Filters**: Select treatment initiation period and last-seen window
3. **Drug Selection**: Open the drawer to select specific drugs via chips
4. **Trust Selection**: Open the drawer to filter by NHS trusts
5. **Directorate Browser**: Navigate directorates → indications → drug fragments in the drawer
6. **Clear Filters**: Reset all selections to show full dataset
### Data Migration
### Understanding the Pathway Chart
To populate the SQLite database from CSV files:
The icicle chart displays hierarchical treatment pathways:
```bash
# Initialize database schema
python -m data_processing.migrate
# Load reference data from CSV files
python -m data_processing.migrate --reference-data --verify
# Load patient data from a CSV/Parquet file
python -m data_processing.migrate --load-patient-data path/to/data.csv
```
Root (Regional Total)
└─ Trust Name (e.g., "Norfolk and Norwich University Hospitals")
└─ Directory/Indication (e.g., "Rheumatology" or "rheumatoid arthritis")
└─ Drug Name (e.g., "ADALIMUMAB")
└─ Treatment Pathway (e.g., "ADALIMUMAB → INFLIXIMAB")
```
### Snowflake Configuration
- **Width**: Relative patient count
- **Color intensity**: Proportion of parent group
- **Hover**: Shows cost, dosing frequency, date range, and per-patient statistics
- **Click**: Zoom into a specific branch
To use Snowflake integration, edit `config/snowflake.toml`:
### Date Filter Combinations
```toml
[connection]
account = "your-account-identifier"
warehouse = "your-warehouse"
database = "DATA_HUB"
schema = "CDM"
authenticator = "externalbrowser" # NHS SSO authentication
```
| Initiated | Last Seen | Description |
|-----------|-----------|-------------|
| All years | Last 6 months | Default — all patients active recently |
| All years | Last 12 months | Broader activity window |
| Last 1 year | Last 6 months | Recently initiated, active |
| Last 1 year | Last 12 months | Recently initiated, any activity |
| Last 2 years | Last 6 months | Medium history, active |
| Last 2 years | Last 12 months | Medium history, any activity |
## Project Structure
```
.
├── core/ # Core configuration and models
├── data_processing/ # Data layer (SQLite, Snowflake, loaders)
├── analysis/ # Analysis pipeline (refactored from generate_graph)
├── visualization/ # Chart generation (Plotly)
├── pathways_app/ # Reflex web application
├── tools/ # Legacy modules (original analysis engine)
├── config/ # Configuration files
├── data/ # Reference data and SQLite database
├── docs/ # Additional documentation
└── tests/ # Test suite
├── src/ # All application library code
│ ├── core/ # Foundation: paths, models, logging
│ ├── config/ # Snowflake connection settings
│ ├── data_processing/ # Data layer (SQLite, Snowflake, transforms)
│ ├── analysis/ # Analysis pipeline
│ ├── visualization/ # Plotly chart generation
│ └── cli/ # CLI tools (refresh_pathways)
├── dash_app/ # Dash web application
│ ├── app.py # App entry point, layout, stores
│ ├── assets/nhs.css # NHS design system CSS
│ ├── data/ # Query wrappers + card browser data
│ ├── components/ # UI components (header, sidebar, etc.)
│ └── callbacks/ # Dash callbacks (filters, chart, KPI, drawer)
├── run_dash.py # Entry point: python run_dash.py
├── data/ # Reference data + SQLite DB (pathways.db)
├── tests/ # Test suite (113 tests)
├── docs/ # Documentation
└── archive/ # Historical/deprecated code
```
See `CLAUDE.md` for detailed architecture documentation.
## Documentation
- [docs/USER_GUIDE.md](docs/USER_GUIDE.md) - End-user guide for using the web interface
- [docs/DEPLOYMENT.md](docs/DEPLOYMENT.md) - Production deployment guide (Docker, nginx, cloud)
- [CLAUDE.md](CLAUDE.md) - Technical architecture documentation for developers
## Deployment
Quick production start:
```bash
# Run in production mode
reflex run --env prod
```
## Running Tests
```bash
@@ -150,75 +152,56 @@ python -m pytest tests/ -v
# Run with coverage
python -m pytest tests/ -v --cov=core --cov=data_processing --cov=analysis
# Run only fast tests (exclude slow/integration)
# Run only fast tests
python -m pytest tests/ -v -m "not slow"
```
## Reference Data Files
## Configuration
The `data/` directory contains essential reference files:
### Snowflake Connection (`src/config/snowflake.toml`)
| File | Purpose |
|------|---------|
| `include.csv` | Drug filter list with default selections |
| `defaultTrusts.csv` | NHS Trust list for filtering |
| `directory_list.csv` | Medical specialties/directories |
| `drugnames.csv` | Drug name standardization mapping |
| `org_codes.csv` | Provider code to organization name mapping |
| `drug_directory_list.csv` | Valid drug-to-directory mappings |
| `drug_indication_clusters.csv` | Drug to SNOMED cluster mappings |
| `ta-recommendations.xlsx` | NICE TA recommendations |
```toml
[snowflake]
account = "your-account"
database = "DATA_HUB"
schema = "CDM"
warehouse = "your-warehouse"
authenticator = "externalbrowser" # Required for NHS SSO
```
## Troubleshooting
### Reflex compilation errors
If you encounter compilation errors when running `reflex run`:
### App won't start
```bash
# Clear the build cache and restart
rm -rf .web
reflex run
# Ensure dependencies are installed
uv sync
# Ensure src/ is on Python path
uv run python setup_dev.py
# Try running with uv
uv run python run_dash.py
```
### Database not found
```bash
# Check data/pathways.db exists
python -m data_processing.migrate
```
### Snowflake connection issues
1. Ensure `snowflake-connector-python` is installed:
```bash
pip install snowflake-connector-python
```
1. Ensure `src/config/snowflake.toml` has the correct account identifier
2. A browser window will open for SSO authentication
3. Verify your network allows Snowflake connections
2. Check that `config/snowflake.toml` has the correct account identifier
## Documentation
3. For SSO authentication, a browser window will open automatically
### SQLite database not found
If `data/pathways.db` doesn't exist, create it:
```bash
python -m data_processing.migrate
python -m data_processing.migrate --reference-data
```
## Development
### Code Quality
```bash
# Type checking
python -m mypy core/ data_processing/ analysis/ --ignore-missing-imports
# Run tests with coverage report
python -m pytest tests/ -v --cov=core --cov=data_processing --cov-report=html
```
### Adding New Reference Data
1. Add CSV file to `data/` directory
2. Define schema in `data_processing/schema.py`
3. Create migration function in `data_processing/reference_data.py`
4. Add path to `PathConfig` in `core/config.py`
- [CLAUDE.md](CLAUDE.md) — Technical architecture documentation
- [docs/USER_GUIDE.md](docs/USER_GUIDE.md) — End-user guide
- [docs/DEPLOYMENT.md](docs/DEPLOYMENT.md) — Deployment guide
## License
+89 -289
View File
@@ -1,10 +1,10 @@
# Reflex Deployment Guide
# Deployment Guide
This guide covers deployment options for the Patient Pathway Analysis web application built with Reflex.
This guide covers deployment options for the Patient Pathway Analysis web application built with Dash.
## Overview
Reflex applications compile to a FastAPI backend and Next.js frontend. This creates two deployment artifacts that can be deployed together or separately depending on your infrastructure requirements.
The application is a single-process Python Dash app that serves both the frontend and API from one server. It reads pre-computed data from a local SQLite database.
## Development Mode
@@ -12,9 +12,9 @@ For local development:
```bash
# Start development server with hot reload
reflex run
python run_dash.py
# Access the application at http://localhost:3000
# Access the application at http://localhost:8050
```
## Production Deployment Options
@@ -24,84 +24,55 @@ reflex run
The simplest approach for internal deployments:
```bash
# Run in production mode (optimized build)
reflex run --env prod
```
# Run with Gunicorn (Linux/macOS)
gunicorn dash_app.app:server -b 0.0.0.0:8050 --workers 4
This starts:
- FastAPI backend on port 8000
- Next.js frontend on port 3000
# Or directly with Python
python run_dash.py
```
For background execution:
```bash
# Using nohup (Linux/macOS)
nohup reflex run --env prod > reflex.log 2>&1 &
nohup gunicorn dash_app.app:server -b 0.0.0.0:8050 --workers 4 > dash.log 2>&1 &
# Using PowerShell (Windows)
Start-Process -NoNewWindow -FilePath "reflex" -ArgumentList "run --env prod"
Start-Process -NoNewWindow -FilePath "python" -ArgumentList "run_dash.py"
```
### Option 2: Separate Backend and Frontend
For more control, run backend and frontend separately:
```bash
# Terminal 1: Start backend only
reflex run --env prod --backend-only
# Terminal 2: Start frontend only
reflex run --env prod --frontend-only
```
### Option 3: Static Export
Export the frontend as static files for deployment on static hosting or CDN:
```bash
# Export application
reflex export
# This creates:
# - frontend.zip (static Next.js build)
# - backend.zip (Python application source)
```
Then:
1. Unzip `frontend.zip` and serve via nginx, Apache, or any static file server
2. Run the backend separately using uvicorn/gunicorn
### Option 4: Docker Deployment
### Option 2: Docker Deployment
Create a `Dockerfile` for containerized deployment:
```dockerfile
# Dockerfile
FROM python:3.11-slim
WORKDIR /app
# Install Node.js for Reflex frontend build
RUN apt-get update && apt-get install -y curl && \
curl -fsSL https://deb.nodesource.com/setup_18.x | bash - && \
apt-get install -y nodejs && \
rm -rf /var/lib/apt/lists/*
# Install uv for fast dependency management
RUN pip install uv
# Copy requirements and install dependencies
COPY requirements.txt pyproject.toml ./
RUN pip install --no-cache-dir -r requirements.txt
# Copy dependency files
COPY pyproject.toml uv.lock ./
# Install dependencies
RUN uv sync --no-dev
# Copy application code
COPY . .
COPY src/ src/
COPY dash_app/ dash_app/
COPY data/ data/
COPY run_dash.py setup_dev.py ./
# Initialize Reflex (downloads frontend dependencies)
RUN reflex init --loglevel debug
# Set up Python path
RUN uv run python setup_dev.py
# Expose ports
EXPOSE 3000 8000
# Expose port
EXPOSE 8050
# Start in production mode
CMD ["reflex", "run", "--env", "prod"]
# Start the application
CMD ["uv", "run", "gunicorn", "dash_app.app:server", "-b", "0.0.0.0:8050", "--workers", "4"]
```
Build and run:
@@ -111,41 +82,24 @@ Build and run:
docker build -t pathway-analysis .
# Run the container
docker run -p 3000:3000 -p 8000:8000 \
docker run -p 8050:8050 \
-v $(pwd)/data:/app/data \
-v $(pwd)/config:/app/config \
pathway-analysis
```
### Option 5: Docker Compose (Recommended for Production)
Create `docker-compose.yml` for multi-container deployment:
### Option 3: Docker Compose
```yaml
version: '3.8'
services:
backend:
app:
build: .
command: reflex run --env prod --backend-only
ports:
- "8000:8000"
- "8050:8050"
volumes:
- ./data:/app/data
- ./config:/app/config
environment:
- REFLEX_ENV=prod
restart: unless-stopped
frontend:
build: .
command: reflex run --env prod --frontend-only
ports:
- "3000:3000"
depends_on:
- backend
environment:
- REFLEX_ENV=prod
- ./src/config:/app/src/config
restart: unless-stopped
```
@@ -162,42 +116,16 @@ docker-compose up -d
For production deployments behind nginx:
```nginx
# /etc/nginx/sites-available/pathway-analysis
server {
listen 80;
server_name your-server.nhs.uk;
# Backend API endpoints
location /admin {
proxy_pass http://localhost:8000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
location /ping {
proxy_pass http://localhost:8000;
}
location /upload {
proxy_pass http://localhost:8000;
client_max_body_size 100M; # For large data file uploads
}
# WebSocket connections (required for Reflex state sync)
location /_event/ {
proxy_pass http://localhost:8000;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_read_timeout 86400; # 24 hours for long-running connections
}
# Frontend (all other requests)
location / {
proxy_pass http://localhost:3000;
proxy_pass http://localhost:8050;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
```
@@ -209,69 +137,21 @@ sudo ln -s /etc/nginx/sites-available/pathway-analysis /etc/nginx/sites-enabled/
sudo nginx -t && sudo systemctl reload nginx
```
### Caddy (Alternative)
Caddy provides automatic HTTPS:
```caddyfile
# Caddyfile
your-server.nhs.uk {
# Backend API
handle /admin/* {
reverse_proxy localhost:8000
}
handle /ping {
reverse_proxy localhost:8000
}
handle /upload {
reverse_proxy localhost:8000
}
handle /_event/* {
reverse_proxy localhost:8000
}
# Frontend
handle {
reverse_proxy localhost:3000
}
}
```
## Process Management
### Systemd (Linux)
Create service files for automatic startup:
```ini
# /etc/systemd/system/pathway-backend.service
# /etc/systemd/system/pathway-analysis.service
[Unit]
Description=Pathway Analysis Backend
Description=Pathway Analysis Dash App
After=network.target
[Service]
Type=simple
User=www-data
WorkingDirectory=/opt/pathway-analysis
ExecStart=/usr/bin/reflex run --env prod --backend-only
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
```
```ini
# /etc/systemd/system/pathway-frontend.service
[Unit]
Description=Pathway Analysis Frontend
After=network.target pathway-backend.service
[Service]
Type=simple
User=www-data
WorkingDirectory=/opt/pathway-analysis
ExecStart=/usr/bin/reflex run --env prod --frontend-only
ExecStart=/opt/pathway-analysis/.venv/bin/gunicorn dash_app.app:server -b 0.0.0.0:8050 --workers 4
Restart=always
RestartSec=10
@@ -283,8 +163,8 @@ Enable and start:
```bash
sudo systemctl daemon-reload
sudo systemctl enable pathway-backend pathway-frontend
sudo systemctl start pathway-backend pathway-frontend
sudo systemctl enable pathway-analysis
sudo systemctl start pathway-analysis
```
### Windows Service
@@ -296,8 +176,8 @@ Use NSSM (Non-Sucking Service Manager) on Windows:
choco install nssm
# Create service
nssm install PathwayAnalysis "C:\Path\To\reflex.exe" "run --env prod"
nssm set PathwayAnalysis AppDirectory "C:\Path\To\Patient pathway analysis"
nssm install PathwayAnalysis "C:\Path\To\python.exe" "run_dash.py"
nssm set PathwayAnalysis AppDirectory "C:\Path\To\pathway-analysis"
nssm start PathwayAnalysis
```
@@ -305,192 +185,112 @@ nssm start PathwayAnalysis
### Production Environment Variables
Set these environment variables for production:
```bash
# Reflex configuration
export REFLEX_ENV=prod
# Database paths (if using custom locations)
# Database path (if using custom location)
export PATHWAY_DB_PATH=/var/data/pathways.db
export PATHWAY_CACHE_DIR=/var/cache/pathway-analysis
# Snowflake (if using)
# Snowflake (for data refresh only — not needed for the web app)
export SNOWFLAKE_ACCOUNT=your-account
export SNOWFLAKE_WAREHOUSE=your-warehouse
```
### Snowflake Configuration
Ensure `config/snowflake.toml` is properly configured for production:
Snowflake is only needed for the data refresh CLI command, not for running the web application. Ensure `src/config/snowflake.toml` is configured:
```toml
[connection]
[snowflake]
account = "your-production-account"
warehouse = "ANALYTICS_WH"
database = "DATA_HUB"
schema = "CDM"
authenticator = "externalbrowser" # or "oauth" for service accounts
[cache]
enabled = true
directory = "/var/cache/pathway-analysis"
ttl_seconds = 86400 # 24 hours
authenticator = "externalbrowser"
```
## Reflex Cloud
## Data Refresh
For managed hosting, consider [Reflex Cloud](https://reflex.dev/cloud/):
The web application reads pre-computed data from SQLite. To update the data:
```bash
# Deploy to Reflex Cloud
reflex deploy
# Full refresh (both chart types, all date filters)
python -m cli.refresh_pathways --chart-type all
# The app will serve new data immediately — no restart needed
```
Benefits:
- Zero configuration deployment
- Automatic scaling
- Built-in SSL certificates
- Managed state management with Redis
Schedule this as a cron job or Windows Task Scheduler task for periodic updates.
## Security Considerations
### Network Security
1. **Firewall Rules**: Only expose necessary ports (typically just 80/443)
2. **HTTPS**: Use TLS certificates (Let's Encrypt or organizational certs)
1. **Firewall Rules**: Only expose port 8050 (or 80/443 behind reverse proxy)
2. **HTTPS**: Use TLS certificates via reverse proxy (nginx, Caddy)
3. **VPN**: Consider restricting access to NHS network only
### Data Security
1. **Database Access**: Ensure SQLite database permissions are restricted
2. **File Uploads**: Validate file types and scan for malware
3. **Snowflake**: Use least-privilege service accounts
### Authentication
For NHS deployments, consider adding authentication:
```python
# Example: Add basic auth middleware
import reflex as rx
from starlette.middleware import Middleware
from starlette.middleware.authentication import AuthenticationMiddleware
# In rxconfig.py
config = rx.Config(
app_name="pathways_app",
# Add authentication middleware
)
```
1. **Database Access**: The app uses read-only SQLite access
2. **No file uploads**: The Dash app does not accept file uploads
3. **No authentication built in**: Add authentication via reverse proxy or middleware if needed
## Monitoring
### Health Checks
The application provides endpoints for monitoring:
- `/ping` - Basic health check
- Backend port 8000 - FastAPI health
The application serves at `/` — a 200 response indicates the app is running.
### Logging
Configure logging for production:
Dash outputs request logs to stdout. Configure log aggregation as needed:
```python
# In pathways_app/pathways_app.py
import logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler('/var/log/pathway-analysis/app.log'),
logging.StreamHandler()
]
)
```bash
# Redirect logs to file
gunicorn dash_app.app:server -b 0.0.0.0:8050 --access-logfile /var/log/pathway-analysis/access.log --error-logfile /var/log/pathway-analysis/error.log
```
## Troubleshooting
### Common Issues
### Port already in use
**Port already in use:**
```bash
# Find and kill process using port 3000
lsof -i :3000
kill -9 <PID>
# Find process using port 8050
lsof -i :8050 # Linux/macOS
netstat -ano | findstr :8050 # Windows
```
**Build cache issues:**
```bash
# Clear Reflex build cache
rm -rf .web
reflex run --env prod
```
### Database not found
**Database connection errors:**
```bash
# Verify database exists and has correct permissions
# Verify database exists
ls -la data/pathways.db
sqlite3 data/pathways.db ".tables"
# Recreate if needed
python -m data_processing.migrate
python -m cli.refresh_pathways --chart-type all
```
**Snowflake authentication:**
- Ensure browser is available for SSO popup
- Check firewall allows connections to Snowflake endpoints
- Verify account identifier is correct
## Performance Tuning
### Backend (FastAPI/Uvicorn)
For high-traffic deployments:
### Import errors
```bash
# Run with multiple workers
uvicorn pathways_app:app --workers 4 --host 0.0.0.0 --port 8000
```
# Ensure src/ is on Python path
uv run python setup_dev.py
### State Management
For multi-instance deployments, configure Redis for state management:
```python
# rxconfig.py
config = rx.Config(
app_name="pathways_app",
state_manager_mode="redis",
redis_url="redis://localhost:6379/0",
)
```
### Caching
Enable aggressive caching for Snowflake queries in `config/snowflake.toml`:
```toml
[cache]
enabled = true
ttl_seconds = 86400 # 24 hours for historical data
ttl_current_data_seconds = 3600 # 1 hour for recent data
max_size_mb = 1000 # 1GB cache
# Verify imports
uv run python -c "from dash_app.app import app; print('OK')"
```
---
## Quick Reference
| Environment | Command | Ports |
|-------------|---------|-------|
| Development | `reflex run` | 3000, 8000 |
| Production | `reflex run --env prod` | 3000, 8000 |
| Backend only | `reflex run --backend-only` | 8000 |
| Frontend only | `reflex run --frontend-only` | 3000 |
| Export | `reflex export` | Static files |
| Cloud | `reflex deploy` | Managed |
| Environment | Command | Port |
|-------------|---------|------|
| Development | `python run_dash.py` | 8050 |
| Production | `gunicorn dash_app.app:server -b 0.0.0.0:8050 --workers 4` | 8050 |
| Docker | `docker run -p 8050:8050 pathway-analysis` | 8050 |
For more information, see:
- [Reflex Documentation](https://reflex.dev/docs/)
- [Reflex Cloud](https://reflex.dev/cloud/)
- [FastAPI Deployment](https://fastapi.tiangolo.com/deployment/)
- [Dash Documentation](https://dash.plotly.com/)
- [Gunicorn Deployment](https://docs.gunicorn.org/en/stable/deploy.html)
+5 -5
View File
@@ -187,8 +187,8 @@ All transitions: 150ms ease-out (faster than before)
}
```
### Reflex Implementation
- Use `height="calc(100vh - 96px)"` for chart container
- Use `width="100%"` with `padding_x="16px"` for full-width
- Use `flex="1"` to let chart grow
- Keep `min_height="500px"` as fallback
### Dash Implementation
- Chart container uses `dcc.Loading` wrapper around `dcc.Graph`
- Full-width layout via CSS class `.chart-card` in `dash_app/assets/nhs.css`
- Minimum height set via CSS: `min-height: 500px`
- Margins controlled in `create_icicle_from_nodes()`: `t:40, l:8, r:8, b:24`
+145 -291
View File
@@ -6,15 +6,11 @@ This guide explains how to use the NHS High-Cost Drug Patient Pathway Analysis T
1. [Getting Started](#getting-started)
2. [Interface Overview](#interface-overview)
3. [Selecting Your Data Source](#selecting-your-data-source)
4. [Configuring Analysis Filters](#configuring-analysis-filters)
5. [Selecting Drugs, Trusts, and Directories](#selecting-drugs-trusts-and-directories)
6. [Running the Analysis](#running-the-analysis)
7. [Understanding the Pathway Chart](#understanding-the-pathway-chart)
8. [Exporting Results](#exporting-results)
9. [GP Indication Validation](#gp-indication-validation)
10. [Keyboard Navigation and Accessibility](#keyboard-navigation-and-accessibility)
11. [Troubleshooting](#troubleshooting)
3. [Filtering Data](#filtering-data)
4. [Using the Drug Browser](#using-the-drug-browser)
5. [Understanding the Pathway Chart](#understanding-the-pathway-chart)
6. [GP Indication Matching](#gp-indication-matching)
7. [Troubleshooting](#troubleshooting)
---
@@ -25,371 +21,229 @@ This guide explains how to use the NHS High-Cost Drug Patient Pathway Analysis T
Start the application by running:
```bash
reflex run
python run_dash.py
```
Then open your browser to **http://localhost:3000**
Then open your browser to **http://localhost:8050**
The application will automatically load reference data (drugs, trusts, directories) when you first access it.
The application automatically loads pre-computed pathway data from SQLite on startup. No additional setup is needed to view existing data.
### First-Time Setup
### Data Freshness
1. Click **Load Reference Data** on the Home page to populate the filter options
2. Select your preferred data source (SQLite, File Upload, or Snowflake)
3. Configure your date range and other filters
4. Click **Run Analysis** to generate your first pathway chart
The header bar shows when data was last refreshed:
- **Patient count**: Total patients in the dataset (e.g., "11,118 patients")
- **Last updated**: Relative time since the last data refresh (e.g., "2h ago")
To refresh the data, run the CLI command (requires Snowflake access):
```bash
python -m cli.refresh_pathways --chart-type all
```
---
## Interface Overview
The application has four main pages, accessible from the sidebar navigation:
The application is a single-page layout with the following components:
| Page | Purpose |
|------|---------|
| **Home** | Main analysis dashboard with data source selection, filters, and chart display |
| **Drug Selection** | Select which high-cost drugs to include in the analysis |
| **Trust Selection** | Filter by specific NHS trusts |
| **Directory Selection** | Filter by medical directories/specialties |
### Header
- NHS branding and application title ("HCD Analysis")
- Green status dot with patient count and last-updated time
### Navigation
### Sidebar (Left)
Navigation items including:
- **Pathway Overview** — main view (always active)
- **Drug Selection** — opens the drug browser drawer
- **Trust Selection** — opens the drawer with trust chips
- **Indications** — opens the drawer with directorate browser
- **Desktop**: Use the sidebar on the left to switch between pages
- **Mobile**: Use the top navigation bar
- **Keyboard**: Press Tab to navigate, Enter to select
### KPI Row
Four summary cards that update dynamically:
- **Unique Patients** — number of distinct patients matching current filters
- **Drug Types** — number of distinct drugs in filtered data
- **Total Cost** — total cost of treatments in the filtered dataset
- **Indication Match** — GP diagnosis match rate (~93% for indication charts, shown as "—" for directory charts)
### Filter Bar
- **Chart type toggle**: "By Directory" / "By Indication" pills
- **Treatment Initiated**: All years, Last 2 years, or Last 1 year
- **Last Seen**: Last 6 months or Last 12 months
### Chart Card
- Dynamic subtitle showing the current hierarchy (e.g., "Trust → Directorate → Drug → Pathway")
- Interactive Plotly icicle chart
- Loading spinner during data fetch
---
## Selecting Your Data Source
## Filtering Data
The application supports three data sources:
### Chart Type
### 1. SQLite Database (Recommended)
Toggle between two views using the pills in the filter bar:
Pre-loaded patient data stored locally for fast performance.
| View | Hierarchy | Best For |
|------|-----------|----------|
| **By Directory** | Trust → Directorate → Drug → Pathway | Understanding treatment by medical specialty |
| **By Indication** | Trust → GP Diagnosis → Drug → Pathway | Understanding treatment by patient condition |
**Advantages:**
- Fastest analysis performance
- Works offline
- No authentication required
### Date Filters
**To use:** Click "Use SQLite" in the Data Source section
Two dropdowns control the time window:
### 2. File Upload
| Filter | Options | Effect |
|--------|---------|--------|
| **Treatment Initiated** | All years, Last 2 years, Last 1 year | When patients started treatment |
| **Last Seen** | Last 6 months, Last 12 months | Most recent activity window |
Upload CSV or Parquet files directly.
The default is "All years / Last 6 months" — showing all patients who have been active in the last 6 months.
**Supported formats:**
- CSV files (.csv)
- Apache Parquet files (.parquet, .pq)
### Drug and Trust Selection
**To use:**
1. Drag and drop a file, or click the upload area
2. Wait for the file to process
3. Click "Use File" to select it as your data source
Open the drawer (right panel) by clicking "Drug Selection" or "Trust Selection" in the sidebar:
### 3. Snowflake
- **Drug chips**: Click to select/deselect specific drugs. Selected drugs filter the chart.
- **Trust chips**: Click to select/deselect specific NHS trusts.
- **Clear All Filters**: Button at the bottom resets all drug and trust selections.
Query live data from the NHS data warehouse.
**Requirements:**
- Snowflake must be configured (see `config/snowflake.toml`)
- Browser-based NHS SSO authentication
**To use:** Click "Use Snowflake" - you'll be prompted to authenticate via your browser
**No selections = show everything.** Leaving chips unselected is the same as selecting all.
---
## Configuring Analysis Filters
## Using the Drug Browser
The Home page provides several filter options:
The drawer contains three sections:
### Date Range
### All Drugs
A flat list of all 42 available drugs as selectable chips. Click one or more to filter the chart to those drugs only.
| Field | Description |
|-------|-------------|
| **Start Date** | Include patients initiated from this date onwards |
| **End Date** | Include patients initiated until this date |
| **Last Seen After** | Only include patients with activity after this date (excludes patients who haven't been seen recently) |
### Trusts
A list of 7 NHS trusts as selectable chips. Click to filter by specific organizations.
**Tip:** The default range is the last 12 months.
### By Directorate
An accordion browser organized by clinical directorate:
### Minimum Patients
1. Click a **directorate** (e.g., "CARDIOLOGY") to expand it
2. Inside, click an **indication** (e.g., "heart failure") to expand further
3. Each indication shows **drug fragment badges** (e.g., "SACUBITRIL", "IVABRADINE")
4. Clicking a drug fragment badge selects all full drug names that contain that fragment
Filter out pathways with fewer patients than the threshold you set.
For example, clicking the "ADALIMUMAB" badge would select "ADALIMUMAB" in the drug chips above.
- Use the slider for quick adjustment (0-100)
- Or type a specific number in the text field
- Set to 0 to show all pathways regardless of patient count
### Fragment Matching
### Custom Title
Drug fragments are substrings, not exact matches. The fragment "INHALED" would match drugs like "INHALED BECLOMETASONE" and "INHALED FLUTICASONE".
Override the automatically generated chart title with your own text.
- Leave empty to use the default title: "Patients initiated [start date] to [end date]"
- Useful for specific reports or presentations
---
## Selecting Drugs, Trusts, and Directories
Each selection page works the same way:
### Navigation
1. Click "Drug Selection", "Trust Selection", or "Directory Selection" in the sidebar
2. The page shows all available options with checkboxes
### Search
Type in the search box to filter the list. The list updates as you type.
### Selection Actions
| Button | Action |
|--------|--------|
| **Select All** | Check all visible items |
| **Clear All** | Uncheck all items |
| **Select Defaults** | (Drugs only) Select pre-configured default drugs (Include=1 in include.csv) |
### Selection Behavior
- **No items selected** = Include ALL items in analysis
- **Some items selected** = Include ONLY the selected items
This means leaving a filter empty is equivalent to "select all".
---
## Running the Analysis
### Steps
1. Ensure your data source is selected and configured
2. Set your date range and other filters
3. Select desired drugs, trusts, and directories (or leave empty for all)
4. Click the green **Run Analysis** button
### During Analysis
- The button shows a spinner while analysis is running
- Status messages appear below the button
- The interface remains responsive - you can review settings
### After Analysis
- The pathway chart appears in the chart section
- Export buttons become available
- GP indication validation results appear (if Snowflake is connected)
Clicking a fragment toggles its matching drugs:
- **First click**: Selects all matching drugs
- **Second click**: Deselects all matching drugs (if all were already selected)
---
## Understanding the Pathway Chart
The analysis generates an interactive **icicle chart** showing patient treatment pathways.
### Hierarchy Structure
The chart displays a hierarchical structure:
The icicle chart displays a hierarchical breakdown:
**Directory view:**
```
N&WICS (Regional Total)
└─ Trust Name (e.g., "Norfolk and Norwich University Hospitals")
└─ Directory (e.g., "Rheumatology", "Gastroenterology")
└─ Drug Name (e.g., "ADALIMUMAB", "INFLIXIMAB")
Root (Regional Total)
└─ Trust (e.g., "Norfolk and Norwich University Hospitals")
└─ Directorate (e.g., "RHEUMATOLOGY")
└─ Drug (e.g., "ADALIMUMAB")
└─ Pathway (e.g., "ADALIMUMAB → INFLIXIMAB")
```
**Indication view:**
```
Root (Regional Total)
└─ Trust
└─ GP Diagnosis (e.g., "rheumatoid arthritis")
└─ Drug
└─ Pathway
```
### Reading the Chart
- **Width** of each section indicates relative patient count
- **Color intensity** indicates proportion of patients at that level
- **Labels** show the category name and patient count
- **Color intensity** (NHS blue gradient) indicates proportion of parent group
- **Labels** show the name and patient count
### Interacting with the Chart
| Action | Effect |
|--------|--------|
| **Click** a section | Zoom in to show details for that branch |
| **Click** the root | Zoom out to show full hierarchy |
| **Hover** over a section | See tooltip with patient count |
| Use the **toolbar** | Reset, download image, pan, zoom |
| **Click** the parent/root | Zoom back out |
| **Hover** over a section | See tooltip with patient count, cost, dosing frequency, dates |
### Plotly Toolbar
### Hover Tooltip Information
The chart includes a Plotly toolbar (top right) with:
- **Download as PNG** - Save static image
- **Zoom controls** - Zoom in/out
- **Pan** - Click and drag to move
- **Reset** - Return to original view
When hovering over a chart section, you'll see:
- Patient count and percentage of parent
- Total cost and cost per patient
- First and last seen dates
- Treatment dosing frequency (for drug nodes)
- Cost per patient per annum
---
## Exporting Results
## GP Indication Matching
Two export options are available after running an analysis:
When viewing "By Indication" charts, the application uses pre-computed GP diagnosis matches:
### Export HTML
### How It Works
Creates an interactive HTML file that can be opened in any browser.
1. During data refresh, each patient's NHS pseudonym is queried against GP primary care records
2. SNOMED cluster codes map clinical conditions to drug indications
3. The most recent GP diagnosis match is used for each patient
4. ~93% of patients are matched to a GP diagnosis
- **Output**: `data/exports/pathway_chart_[timestamp].html`
- **Use case**: Sharing interactive charts via email or file share
- **Features**: Full interactivity, no software required to view
### Unmatched Patients
### Export CSV
Patients without a GP diagnosis match appear under their directorate with a "(no GP dx)" suffix (e.g., "RHEUMATOLOGY (no GP dx)").
Exports the underlying data as a spreadsheet.
- **Output**: `data/exports/pathway_data_[timestamp].csv`
- **Use case**: Further analysis in Excel, importing to other tools
- **Includes**: Patient IDs, drugs, dates, costs, directories, indication validation status
### Export Location
All exports are saved to the `data/exports/` directory with timestamped filenames to prevent overwriting.
---
## GP Indication Validation
When connected to Snowflake, the application validates whether patients have appropriate GP diagnoses for their prescribed drugs.
### What It Does
1. Looks up the drug's licensed indications (e.g., ADALIMUMAB for rheumatoid arthritis)
2. Finds corresponding SNOMED codes for those indications
3. Checks each patient's GP records for matching diagnoses
4. Reports the match rate per drug
### Understanding Results
After analysis, a table shows:
| Column | Meaning |
|--------|---------|
| **Drug Name** | The high-cost drug |
| **Total Patients** | Number of patients prescribed this drug |
| **With GP Indication** | Patients with matching GP diagnosis |
| **Match Rate** | Percentage with valid indication |
### Match Rate Interpretation
| Rate | Meaning | Color |
|------|---------|-------|
| **80%+** | Good coverage - most patients have GP diagnoses | Green |
| **50-79%** | Moderate coverage - investigate missing cases | Orange |
| **<50%** | Low coverage - may indicate data quality issues or off-label use | Red |
### Why Rates May Be Low
Low match rates don't necessarily indicate problems:
- **Cross-provider treatment**: Patient's GP is outside the data coverage
- **Recent diagnoses**: Diagnosis not yet recorded in GP system
- **Specialist-only conditions**: Some conditions are only managed in secondary care
- **Off-label prescribing**: Legitimate use for indications not in the mapping
### Enabling/Disabling
Indication validation is enabled by default when Snowflake is connected. It requires:
- Active Snowflake connection
- Drug-to-cluster mappings in the database
---
## Keyboard Navigation and Accessibility
The application is designed to be accessible:
### Skip Link
Press **Tab** when the page loads to reveal a "Skip to main content" link that bypasses navigation.
### Keyboard Navigation
| Key | Action |
|-----|--------|
| **Tab** | Move to next interactive element |
| **Shift+Tab** | Move to previous element |
| **Enter** | Activate buttons, links, checkboxes |
| **Space** | Toggle checkboxes |
| **Arrow keys** | Adjust sliders |
### Screen Reader Support
- All buttons and inputs have descriptive labels
- Status messages announce via ARIA live regions
- Charts include figure descriptions
### Theme Toggle
A dark/light mode toggle is available at the bottom of the sidebar for visual preference.
Reasons for unmatched patients:
- GP is outside the data coverage area
- Diagnosis not yet recorded in GP system
- Condition managed only in secondary care
- Off-label prescribing
---
## Troubleshooting
### "No data available" Error
### No data showing
**Cause**: No data matches your current filter settings
1. Check the filter bar — are filters too restrictive?
2. Try clearing all drug/trust selections in the drawer
3. Widen the date range (e.g., "All years / Last 12 months")
**Solutions:**
1. Check your date range - is it too narrow?
2. Verify your data source has data loaded
3. Check if selected trusts/drugs have any matching records
4. Try clearing all selections (to include everything)
### Chart shows "No matching pathways found"
### Chart Not Displaying
The current filter combination matches zero patients. Adjust filters or click "Clear All Filters" in the drawer.
**Cause**: Analysis completed but no data met the minimum patients threshold
### App won't start
**Solutions:**
1. Lower the minimum patients threshold
2. Expand your date range
3. Select more drugs or trusts
```bash
# Ensure dependencies are installed
uv sync
### Snowflake Connection Failed
# Ensure src/ is on Python path
uv run python setup_dev.py
**Cause**: Unable to connect to Snowflake
# Run with uv
uv run python run_dash.py
```
**Solutions:**
1. Check that `config/snowflake.toml` exists and is configured
2. Complete browser authentication when prompted
3. Verify your network allows Snowflake connections
4. Try using SQLite as an alternative data source
### Stale data
### File Upload Failed
Data is as fresh as the last CLI refresh. Check the header's "Last updated" indicator. To refresh:
**Cause**: File format or content issue
**Solutions:**
1. Ensure file is CSV or Parquet format
2. Check file isn't corrupted or empty
3. Verify file contains required columns
4. Try a smaller file to test
### Slow Performance
**Cause**: Large data volume or complex filtering
**Solutions:**
1. Use SQLite instead of file upload for large datasets
2. Narrow your date range
3. Select fewer drugs/trusts to analyze
4. Increase minimum patients threshold to reduce chart complexity
### Reference Data Not Loading
**Cause**: Missing or corrupted reference files
**Solutions:**
1. Click "Load Reference Data" to retry
2. Check that `data/` directory contains required CSV files:
- `include.csv`
- `defaultTrusts.csv`
- `directory_list.csv`
3. Verify files aren't empty or malformed
```bash
python -m cli.refresh_pathways --chart-type all
```
---
@@ -397,7 +251,7 @@ A dark/light mode toggle is available at the bottom of the sidebar for visual pr
If you encounter issues not covered in this guide:
1. Check the [README](../README.md) for installation and setup information
1. Check the [README](../README.md) for installation and setup
2. Review [DEPLOYMENT.md](./DEPLOYMENT.md) for server configuration
3. Consult [CLAUDE.md](../CLAUDE.md) for technical architecture details
4. Contact your local support team for NHS-specific questions
4. Contact the Medicines Intelligence team for NHS-specific questions
+122 -116
View File
@@ -5,129 +5,145 @@ If you discover a new failure pattern during your work, add it to this file.
---
## Drug-Indication Matching Guardrails
## Backend Isolation
### Match drugs to indications, not just patients to indications
- **When**: Building the indication mapping for pathway charts
- **Rule**: Each drug must be validated against BOTH the patient's GP diagnoses AND the drug-to-indication mapping from DimSearchTerm.csv. A patient being diagnosed with rheumatoid arthritis does NOT mean all their drugs are for rheumatoid arthritis.
- **Why**: The previous approach assigned ONE indication per patient (most recent GP dx), ignoring which drugs actually treat which conditions. This produced misleading pathways.
### Do NOT modify pipeline/analysis logic in src/
- **When**: Building Dash integration
- **Rule**: Do NOT change the logic in these files — they are the data pipeline and must stay as-is:
- `data_processing/pathway_pipeline.py`, `transforms.py`, `diagnosis_lookup.py` (matching/query logic)
- `analysis/pathway_analyzer.py`, `statistics.py`
- `cli/refresh_pathways.py`
- `data_processing/schema.py`, `reference_data.py`, `cache.py`, `data_source.py`
- **Why**: The pipeline is complete and tested. Changing it risks breaking the data refresh workflow.
### Use DimSearchTerm.csv for drug-to-Search_Term mapping
- **When**: Determining which Search_Term a drug belongs to
- **Rule**: Load `data/DimSearchTerm.csv`. The `CleanedDrugName` column has pipe-separated drug name fragments. Match HCD drug names against these fragments using substring matching (case-insensitive).
- **Why**: This CSV is the authoritative mapping of which drugs are used for which clinical indications.
### DO use shared utilities in src/ rather than duplicating
- **When**: The Dash app needs data loading or figure construction
- **Rule**: Dash callbacks should CALL INTO `src/`, not duplicate the code. Shared functions:
- `data_processing/pathway_queries.py``load_initial_data()` and `load_pathway_nodes()` for all SQLite queries
- `visualization/plotly_generator.py``create_icicle_from_nodes()` for icicle chart from list-of-dicts
- `dash_app/data/queries.py` — thin wrapper that resolves DB path and delegates to shared functions
- **Why**: Duplicating SQL queries and figure logic creates copies that drift apart. Shared code in `src/` is the cleaner architecture.
### Use substring matching for drug fragments
- **When**: Matching HCD drug names against DimSearchTerm CleanedDrugName fragments
- **Rule**: Check if any fragment from DimSearchTerm is a SUBSTRING of the HCD drug name (case-insensitive). E.g., "PEGYLATED" should match "PEGYLATED LIPOSOMAL DOXORUBICIN".
- **Why**: DimSearchTerm contains both full drug names (ADALIMUMAB) and partial fragments (PEGYLATED, INHALED). Exact match would miss the partial ones.
### Modified UPID uses pipe delimiter
- **When**: Creating indication-aware UPIDs
- **Rule**: Format is `{original_UPID}|{search_term}`. Use pipe `|` as delimiter. Do NOT use ` - ` (hyphen with spaces) as that's used for pathway hierarchy levels in the `ids` column.
- **Why**: The `ids` column uses " - " to separate hierarchy levels (e.g., "N&WICS - NNUH - rheumatoid arthritis - ADALIMUMAB"). Using the same delimiter in UPIDs would break hierarchy parsing.
### Return ALL GP matches per patient, not just most recent
- **When**: Querying Snowflake for patient GP diagnoses
- **Rule**: Remove `QUALIFY ROW_NUMBER() OVER (PARTITION BY ... ORDER BY EventDateTime DESC) = 1`. Return ALL matching Search_Terms per patient with `GROUP BY + COUNT(*)` for code_frequency.
- **Why**: A patient may have GP diagnoses for both rheumatoid arthritis AND asthma. We need ALL matches to cross-reference with their drugs.
### Restrict GP code lookup to HCD data window
- **When**: Building the WHERE clause for the GP record query
- **Rule**: Add `AND pc."EventDateTime" >= :earliest_hcd_date` where `earliest_hcd_date` is `MIN(Intervention Date)` from the HCD DataFrame. Pass this as a parameter to `get_patient_indication_groups()`.
- **Why**: Old GP codes from years before treatment started add noise. A diagnosis coded 10 years ago may no longer be relevant. Restricting to the HCD window ensures code_frequency reflects recent clinical activity for the conditions being actively treated.
### Tiebreaker: highest GP code frequency when a drug matches multiple indications
- **When**: A single drug maps to multiple Search_Terms AND the patient has GP dx for multiple
- **Rule**: Use `code_frequency` (COUNT of matching SNOMED codes per Search_Term per patient) from the GP query. The Search_Term with the most matching codes in the patient's GP record wins. If tied, use alphabetical Search_Term for determinism.
- **Why**: E.g., ADALIMUMAB is listed under rheumatoid arthritis, crohn's disease, psoriatic arthritis, etc. A patient with 47 RA codes and 2 crohn's codes is almost certainly on ADALIMUMAB for RA. Frequency of GP coding is a much stronger signal of clinical intent than recency — a recent one-off asthma check doesn't mean ADALIMUMAB is for asthma.
### Same patient, different indications = separate modified UPIDs
- **When**: A patient's drugs map to different Search_Terms
- **Rule**: Create separate modified UPIDs for each indication. E.g., `RMV12345|rheumatoid arthritis` and `RMV12345|asthma`. These are treated as separate "patients" by the pathway analyzer.
- **Why**: This is the core design — drugs for different indications should create separate treatment pathways, even for the same physical patient.
### Fallback to directory for unmatched drugs
- **When**: A drug doesn't match any Search_Term OR the patient has no GP dx for any of the drug's Search_Terms
- **Rule**: Use fallback format: `{UPID}|{Directory} (no GP dx)`. The indication_df maps this to `"{Directory} (no GP dx)"`.
- **Why**: Maintains consistent behavior with the previous approach for patients/drugs without GP diagnosis matches.
### Merge asthma Search_Terms but keep urticaria separate
- **When**: Working with asthma-related Search_Terms from CLUSTER_MAPPING_SQL or DimSearchTerm.csv
- **Rule**: Merge "allergic asthma", "asthma", and "severe persistent allergic asthma" into a single "asthma" Search_Term. Keep "urticaria" as a separate Search_Term — do NOT merge it with asthma.
- **Why**: These are clinically the same condition at different severity levels. Splitting them fragments the data. Urticaria is a distinct dermatological condition that happens to share OMALIZUMAB.
### Don't modify directory chart processing
- **When**: Making changes to the indication matching logic
- **Rule**: Only modify the indication chart path (`elif current_chart_type == "indication":`). Directory charts use unmodified UPIDs and directory-based grouping.
- **Why**: Directory charts work correctly and should not be affected by indication matching changes.
### Do NOT modify pathways.db schema or data
- **When**: Querying the database from Dash callbacks
- **Rule**: Read-only access. Use `sqlite3.connect(db_path)` with SELECT queries only. Never INSERT, UPDATE, DELETE, or ALTER.
- **Why**: pathways.db is populated by `python -m cli.refresh_pathways`. The Dash app is a read-only consumer.
---
## Snowflake Query Guardrails
## CSS & Design Fidelity
### Use PseudoNHSNoLinked for GP record matching
- **When**: Querying GP records (PrimaryCareClinicalCoding) for patient diagnoses
- **Rule**: Use `PseudoNHSNoLinked` column from HCD data, NOT `PersonKey` (LocalPatientID)
- **Why**: PersonKey is provider-specific local ID. Only PseudoNHSNoLinked matches PatientPseudonym in GP records.
### Use className matching 01_nhs_classic.html, not inline styles
- **When**: Building any Dash HTML component
- **Rule**: Use `className="css-class-name"` referencing classes from `dash_app/assets/nhs.css`. Do NOT use inline `style={}` dicts for layout/visual styling. Only use inline styles for truly dynamic values (e.g., `style={"flex": patient_count}` for proportional widths).
- **Why**: CSS fidelity to the HTML concept is a primary goal. Inline styles drift from the design and are harder to maintain.
### Embed cluster query as CTE in Snowflake
- **When**: Looking up patient indications during data refresh
- **Rule**: Use the `CLUSTER_MAPPING_SQL` content as a WITH clause in the patient lookup query
- **Why**: This ensures we always use the complete cluster mapping and don't need local storage
### nhs.css is the single source of CSS truth
- **When**: Adding or modifying styles
- **Rule**: All styles go in `dash_app/assets/nhs.css`. If the concept HTML doesn't have a class for something, add it to nhs.css with the same naming convention (`.component__element--modifier`).
- **Why**: Dash auto-serves files from `assets/`. Keeping CSS in one file matches the design source (01_nhs_classic.html) and avoids style fragmentation.
### Quote mixed-case column aliases in Snowflake SQL
- **When**: Writing SELECT queries that return results to Python code
- **Rule**: Use `AS "ColumnName"` (quoted) for any column alias you'll access by name in Python
- **Why**: Snowflake uppercases unquoted identifiers. `SELECT foo AS Search_Term` returns `SEARCH_TERM`, so `row.get('Search_Term')` returns None. Fix: `SELECT foo AS "Search_Term"`
### Build indication_df from all unique UPIDs, not PseudoNHSNoLinked
- **When**: Creating the indication mapping DataFrame for pathway processing
- **Rule**: Use `df.drop_duplicates(subset=['UPID'])` not `drop_duplicates(subset=['PseudoNHSNoLinked'])`
- **Why**: A patient visiting multiple providers has multiple UPIDs. Using unique PseudoNHSNoLinked only maps one UPID per patient, leaving others as NaN.
### Read 01_nhs_classic.html when building UI components
- **When**: Creating any component in `dash_app/components/`
- **Rule**: Read `01_nhs_classic.html` first to see the exact HTML structure, CSS classes, and element hierarchy for that component. Match it as closely as possible.
- **Why**: The HTML concept IS the design spec. Deviating creates visual inconsistency.
---
## Data Processing Guardrails
## Callback Architecture
### Copy DataFrames in functions that modify columns
- **When**: Writing functions like `prepare_data()` that modify DataFrame columns
- **Rule**: Always `df = df.copy()` at the start of any function that modifies column values on the input DataFrame
- **Why**: `prepare_data()` mapped Provider Code → Name in-place. When called multiple times on the same DataFrame, only the first call worked. The fix: `df.copy()` prevents destructive mutation.
### No circular callback dependencies
- **When**: Writing Dash callbacks
- **Rule**: Callbacks must flow unidirectionally: filter inputs → `app-state` store → `chart-data` store → UI components. Never have a component that is both Input and Output in the same callback chain without an intermediate store.
- **Why**: Dash raises `DuplicateCallback` errors for circular dependencies, and they're extremely hard to debug.
### Include chart_type in UNIQUE constraints for pathway_nodes
- **When**: Creating or modifying the pathway_nodes table schema
- **Rule**: The UNIQUE constraint MUST include `chart_type`: `UNIQUE(date_filter_id, chart_type, ids)`
- **Why**: Without `chart_type`, `INSERT OR REPLACE` silently overwrites directory chart nodes when indication chart nodes are inserted.
### Use dcc.Store for all state, not server-side globals
- **When**: Managing application state (selected filters, chart data, reference data)
- **Rule**: ALL state lives in `dcc.Store` components. Never use module-level globals, class variables, or `flask.g` for state. The 3 stores are: `app-state` (session), `chart-data` (memory), `reference-data` (session).
- **Why**: Dash is stateless per request. Server-side state breaks with multiple users and causes subtle bugs during development.
### Handle NaN in Directory when building fallback labels
- **When**: Creating fallback indication labels for patients without GP diagnosis match
- **Rule**: Check `pd.notna(directory)` before concatenating to string. Use `"UNKNOWN (no GP dx)"` for NaN cases.
- **Why**: NaN handling prevents TypeError and ensures meaningful fallback labels.
### Use callback_context for multi-input callbacks
- **When**: A callback has multiple Inputs and needs to know which one triggered it
- **Rule**: Use `dash.callback_context.triggered` (or `ctx.triggered_id` in Dash 2.x) to determine the triggering input.
- **Why**: Without this, the callback runs for every input change and you can't distinguish which filter changed.
### Use parameterized queries for SQLite
- **When**: Building WHERE clauses with user-selected filters
- **Rule**: Use `?` placeholders and pass params tuple — never string interpolation
- **Why**: Prevents SQL injection and handles special characters in drug/directory names
### Use existing pathway_analyzer functions
- **When**: Processing pathway data for the icicle chart
- **Rule**: Reuse functions from `analysis/pathway_analyzer.py` — don't reinvent
- **Why**: The existing code handles edge cases (empty groups, statistics calculation, color mapping)
### Pattern-matching callbacks for dynamic drug chips
- **When**: Building the card browser drawer with clickable drug chips
- **Rule**: Use `{"type": "drug-chip", "index": drug_name}` pattern for chip IDs. Register callbacks with `Input({"type": "drug-chip", "index": ALL}, "n_clicks")`. Access triggered chip via `ctx.triggered_id["index"]`.
- **Why**: The number of drug chips is dynamic (changes per directorate/indication). Pattern-matching callbacks handle this without hardcoding IDs.
---
## Reflex Guardrails
## Plotly Figure
### Use .to() methods for Var operations in rx.foreach
- **When**: Working with items inside `rx.foreach` render functions
- **Rule**: Use `item.to(int)` for numeric comparisons, `item.to_string()` for text operations
- **Why**: Items from rx.foreach are Var objects, not plain Python values.
### Preserve create_icicle_from_nodes() in src/visualization/plotly_generator.py
- **When**: Modifying the icicle chart
- **Rule**: `create_icicle_from_nodes(nodes, title)` in `src/visualization/plotly_generator.py` is the shared icicle chart function. It accepts list-of-dicts from dcc.Store. Key properties:
- 10-field customdata structure (value, colour, cost, costpp, first_seen, last_seen, first_seen_parent, last_seen_parent, average_spacing, cost_pp_pa)
- NHS colorscale: `[[0.0, "#003087"], [0.25, "#0066CC"], [0.5, "#1E88E5"], [0.75, "#4FC3F7"], [1.0, "#E3F2FD"]]`
- `maxdepth=3`, `branchvalues="total"`, `sort=False`
- Layout: transparent background, reduced margins, autosize
- **Why**: The icicle chart is tested and correct. The Dash callback in `dash_app/callbacks/chart.py` calls this function.
### Use rx.cond for conditional rendering, not Python if
- **When**: Conditionally showing/hiding components or changing styles based on state
- **Rule**: Use `rx.cond(condition, true_component, false_component)` — not Python `if`
- **Why**: Python `if` evaluates at definition time; `rx.cond` evaluates reactively at render time
### Chart data is a list of dicts
- **When**: Passing data between `chart-data` store and chart callback
- **Rule**: `chart-data` store holds `{"nodes": [...], "unique_patients": int, "total_drugs": int, "total_cost": float}`. Each node is a dict with keys matching the SQLite columns needed for the figure: `parents, ids, labels, value, cost, costpp, colour, first_seen, last_seen, first_seen_parent, last_seen_parent, average_spacing, cost_pp_pa`.
- **Why**: `dcc.Store` serializes to JSON. Keep the same dict structure that `pathways_app.py` uses for `chart_data` so the figure callback works identically.
---
## Data Extraction
### Keep data logic in shared src/ functions, not dash_app/ duplicates
- **When**: Adding or modifying data loading functions
- **Rule**: SQL queries and data logic live in `src/data_processing/pathway_queries.py`. The `dash_app/data/queries.py` is a thin wrapper that resolves the DB path and delegates. Do not duplicate queries in `dash_app/`.
- **Why**: Shared code in `src/` prevents query drift and keeps the single source of truth for data access.
### DimSearchTerm.csv fragments are substrings
- **When**: Building the card browser or matching drugs to indications
- **Rule**: `CleanedDrugName` values in DimSearchTerm.csv are drug name FRAGMENTS (e.g., "ADALIMUMAB", "PEGYLATED", "INHALED"). They're matched against full drug names using `drug_name.upper().contains(fragment)`. Don't assume exact match.
- **Why**: Some fragments are partial (INHALED matches "INHALED BECLOMETASONE", "INHALED FLUTICASONE", etc.).
### Apply SEARCH_TERM_MERGE_MAP when loading DimSearchTerm.csv
- **When**: Building the directorate tree in `card_browser.py`
- **Rule**: Import and apply `SEARCH_TERM_MERGE_MAP` from `data_processing.diagnosis_lookup` to normalize "allergic asthma" → "asthma" and "severe persistent allergic asthma" → "asthma". Keep "urticaria" separate.
- **Why**: The Snowflake query and pathway processing already use merged Search_Terms. The card browser must match.
---
## SQLite Queries
### Use parameterized queries for all filters
- **When**: Building WHERE clauses with user-selected values
- **Rule**: Use `?` placeholders and pass params as a list. Never use f-strings or string interpolation for filter values.
- **Why**: Prevents SQL injection and handles special characters in drug/directory names (e.g., "CROHN'S DISEASE").
### Database path resolution
- **When**: Connecting to pathways.db from dash_app/
- **Rule**: Use `Path(__file__).resolve().parents[2] / "data" / "pathways.db"` from files in `dash_app/data/`. This resolves from `dash_app/data/queries.py` → project root → `data/pathways.db`.
- **Why**: Relative paths break depending on the working directory. Absolute path resolution is reliable.
---
## Dash Framework
### Wrap layout in dmc.MantineProvider
- **When**: Setting up the app layout in `app.py`
- **Rule**: The outermost layout element must be `dmc.MantineProvider(children=[...])`. Without this, DMC components (Drawer, Accordion, Chip, etc.) won't render.
- **Why**: Dash Mantine Components requires the Provider context to function.
### dcc.Store storage_type matters
- **When**: Creating the 3 store components
- **Rule**:
- `app-state`: `storage_type="session"` — persists across page refreshes within a tab
- `chart-data`: `storage_type="memory"` — cleared on page refresh (reloaded from SQLite)
- `reference-data`: `storage_type="session"` — loaded once, persists across refreshes
- **Why**: Wrong storage type causes stale data bugs (memory clears too often) or wasted queries (session persists when it shouldn't).
### Dash assets directory is auto-served
- **When**: Placing CSS, JS, or images
- **Rule**: Put static assets in `dash_app/assets/`. Dash serves them automatically. Reference CSS via `className`, not `<link>` tags.
- **Why**: Dash's asset pipeline handles caching and serving. Manual `<link>` tags are unnecessary and may not work.
---
@@ -148,20 +164,10 @@ If you discover a new failure pattern during your work, add it to this file.
- **Rule**: The "Next iteration should" section must contain specific, actionable guidance
- **Why**: The next iteration has zero memory. If you don't write it down, it's lost.
### Check existing code for patterns
- **When**: Unsure how to implement something
- **Rule**: Look at `pathways_app/pathways_app.py`, `analysis/pathway_analyzer.py`, `cli/refresh_pathways.py`
- **Why**: The existing codebase has solved many quirks already
### Snowflake connection_timeout must be high enough for GP lookup queries
- **When**: GP record queries against PrimaryCareClinicalCoding time out
- **Rule**: Ensure `connection_timeout` in config/snowflake.toml is at least 600 (currently set to 600). This controls the Python client's `network_timeout`, which is how long the client waits for ANY Snowflake response. Do NOT lower this value.
- **Why**: GP lookup queries take ~40s per batch due to CTE compilation overhead. With connection_timeout=30, every batch timed out silently (error 000604/57014).
### Use large batch sizes (5000+) for GP record lookups
- **When**: Calling `get_patient_indication_groups()` with patient batches
- **Rule**: Use batch_size=5000 or larger. The query time is ~40s regardless of batch size (5 patients ≈ 500 patients ≈ 5000 patients). Smaller batches just multiply the fixed overhead.
- **Why**: With batch_size=500, 36K patients needed 74 batches × 40s = ~50 min. With batch_size=5000, only 8 batches × 45s = ~6 min. The bottleneck is CTE compilation, not data volume.
### Validate with `python run_dash.py`
- **When**: After completing any task
- **Rule**: Run `python run_dash.py` (or `python -c "from dash_app.app import app"` for import checks). The app must start without errors after EVERY task.
- **Why**: Broken imports or circular dependencies compound across tasks. Catch them immediately.
<!--
ADD NEW GUARDRAILS BELOW as failures are observed during the loop.
+37 -2
View File
@@ -885,11 +885,46 @@ Migrating the HCD Analysis frontend from Reflex to Dash (Plotly) + Dash Mantine
### Blocked items (iter 16):
- None
## MIGRATION COMPLETE
All 20 tasks across 5 phases complete. 16 iterations total.
## Iteration 17 — 2026-02-06
### Task: Phase 6 — Update all documentation
### Why this task:
- Phases 0-5 complete. Phase 6 (documentation cleanup) was the only remaining work.
- All non-archive .md files still referenced Reflex (commands, architecture, deployment).
### Status: COMPLETE
### What was done:
- **README.md**: Complete rewrite — replaced `reflex run` with `python run_dash.py`, updated features list (dual charts, pre-computed pathways, drug browser), updated project structure (dash_app/ instead of pathways_app/), updated quick start, usage, troubleshooting
- **docs/USER_GUIDE.md**: Complete rewrite — new single-page Dash UI guide covering header, sidebar, KPI row, filter bar, chart card, drawer, drug browser, indication matching
- **docs/DEPLOYMENT.md**: Complete rewrite — Dash/Gunicorn deployment replacing Reflex/FastAPI/Next.js, Docker examples, nginx config, systemd service, single port 8050
- **docs/DESIGN_SYSTEM.md**: Updated "Reflex Implementation" section → "Dash Implementation" with correct CSS/component references
- **RALPH_PROMPT.md**: Updated title ("Dash Application Maintenance" not "Reflex → Dash Migration"), updated data reference section to point to src/ shared functions instead of pathways_app/
- **guardrails.md**: Updated 3 rules to reflect current Dash architecture (shared utilities exist, icicle function exists, data queries exist) instead of migration instructions
- **IMPLEMENTATION_PLAN.md**: Phase 6 tasks marked [x]
### Validation results:
- Tier 1 (Code): `from dash_app.app import app` — OK, 7 callbacks registered
- Grep for Reflex in non-archive .md files: only CLAUDE.md line 140 (archive description — accurate) and IMPLEMENTATION_PLAN.md (historical migration log — accurate)
### Files changed:
- README.md — Rewritten for Dash
- docs/USER_GUIDE.md — Rewritten for Dash
- docs/DEPLOYMENT.md — Rewritten for Dash
- docs/DESIGN_SYSTEM.md — Updated implementation section
- RALPH_PROMPT.md — Updated title and references
- guardrails.md — Updated 3 rules
- IMPLEMENTATION_PLAN.md — Phase 6 marked [x]
- progress.txt — This entry
### Patterns discovered:
- archive/ files (IMPROVEMENT_RECOMMENDATIONS.md) retain Reflex references intentionally — they're historical
- IMPLEMENTATION_PLAN.md retains Reflex references in completed task descriptions — these are accurate migration history
### Next iteration should:
- Phase 6 is complete. All tasks across all phases are now [x].
### Blocked items:
- None
## ALL PHASES COMPLETE
All 24 tasks across 6 phases complete. 17 iterations total.
- Phase 0: Scaffolding (2 tasks) — iteration 1
- Phase 1: Data Access (2 tasks) — iterations 2-3
- Phase 2: Static Layout (3 tasks) — iterations 4-6
- Phase 3: Core Callbacks (4 tasks) — iterations 7-10
- Phase 4: Drawer (2 tasks) — iterations 11-12
- Phase 5: Polish & Cleanup (4 tasks) — iterations 13-16
- Phase 6: Documentation (4 tasks) — iteration 17