From 1c3ece6480be6e9a74a2453034fbe2ff582e0c51 Mon Sep 17 00:00:00 2001 From: Andrew Charlwood Date: Fri, 6 Feb 2026 12:57:47 +0000 Subject: [PATCH] feat: create dash_app skeleton with nhs.css and MantineProvider (Phase 0) - dash_app/ directory structure: app.py, assets/, data/, components/, callbacks/, utils/ - run_dash.py entry point at project root - Added dash>=2.14.0 and dash-mantine-components>=0.14.0 to pyproject.toml - app.py: Dash app with MantineProvider wrapper and 3 dcc.Store components - nhs.css: extracted from 01_nhs_classic.html (sans mock icicle CSS) - Validated: app starts cleanly at localhost:8050 --- IMPLEMENTATION_PLAN.md | 472 ++++++++++++++++++-------------- dash_app/__init__.py | 0 dash_app/app.py | 36 +++ dash_app/assets/nhs.css | 258 +++++++++++++++++ dash_app/callbacks/__init__.py | 0 dash_app/components/__init__.py | 0 dash_app/data/__init__.py | 0 dash_app/utils/__init__.py | 0 progress.txt | 457 ++++++------------------------- pyproject.toml | 2 + run_dash.py | 5 + uv.lock | 131 +++++++++ 12 files changed, 783 insertions(+), 578 deletions(-) create mode 100644 dash_app/__init__.py create mode 100644 dash_app/app.py create mode 100644 dash_app/assets/nhs.css create mode 100644 dash_app/callbacks/__init__.py create mode 100644 dash_app/components/__init__.py create mode 100644 dash_app/data/__init__.py create mode 100644 dash_app/utils/__init__.py create mode 100644 run_dash.py diff --git a/IMPLEMENTATION_PLAN.md b/IMPLEMENTATION_PLAN.md index 7e618b6..3f99497 100644 --- a/IMPLEMENTATION_PLAN.md +++ b/IMPLEMENTATION_PLAN.md @@ -1,246 +1,314 @@ -# Implementation Plan - Drug-Aware Indication Matching +# Implementation Plan — Reflex → Dash Migration ## Project Overview -Update the indication-based pathway charts so that patient indications are matched **per drug**, not just per patient. Currently, each patient gets ONE indication (most recent GP diagnosis match). This ignores which drugs the patient is actually taking. +Migrate the Reflex web application to Dash (Plotly) + Dash Mantine Components. The backend (`src/`) is untouched — only the frontend changes. -### The Problem +### What Changes +- `pathways_app/` (Reflex) → `dash_app/` (Dash + DMC) +- `run_dash.py` entry point replaces `reflex run` +- CSS extracted from `01_nhs_classic.html` → `dash_app/assets/nhs.css` +- Drug/Directory/Indication filters consolidated into a right-side `dmc.Drawer` -A patient on ADALIMUMAB + OMALIZUMAB currently gets assigned a single indication (e.g., "rheumatoid arthritis" — the most recent GP match). But: -- ADALIMUMAB is used for rheumatoid arthritis, axial spondyloarthritis, crohn's disease, etc. -- OMALIZUMAB is used for asthma, allergic asthma, urticaria +### What Stays (DO NOT MODIFY pipeline/analysis logic) +- `data_processing/pathway_pipeline.py`, `transforms.py`, `diagnosis_lookup.py` (matching logic) +- `analysis/pathway_analyzer.py`, `statistics.py` +- `cli/refresh_pathways.py` +- `data_processing/schema.py`, `reference_data.py`, `cache.py`, `data_source.py` +- SQLite schema and `pathway_nodes` table +- `data/` reference files (CSVs, pathways.db) -These are different clinical pathways and should be treated as separate treatment journeys. +### What CAN be edited in `src/` (shared utilities) +- `visualization/plotly_generator.py` — add/refactor a function to accept list-of-dicts (what Dash produces) instead of only DataFrames +- `data_processing/database.py` — add shared query functions for pathway node loading so both Reflex and Dash use the same queries +- `core/config.py` — if path resolution needs adjusting -### The Solution +### Dash App Structure +``` +dash_app/ +├── __init__.py +├── app.py # Entry point, layout root, dcc.Store components +├── assets/ +│ └── nhs.css # Extracted from 01_nhs_classic.html +├── data/ +│ ├── queries.py # SQLite queries (extracted from Reflex AppState) +│ └── card_browser.py # DimSearchTerm.csv → directorate tree +├── components/ +│ ├── header.py # Top header bar +│ ├── sidebar.py # Left navigation +│ ├── kpi_row.py # 4 KPI cards +│ ├── filter_bar.py # Chart type toggle + date dropdowns +│ ├── chart_card.py # Chart area with tabs + dcc.Graph +│ ├── drawer.py # dmc.Drawer with card browser +│ └── footer.py # Page footer +├── callbacks/ +│ ├── __init__.py # register_callbacks(app) +│ ├── filters.py # Date/chart-type → app-state store +│ ├── chart.py # chart-data → go.Icicle figure +│ ├── drawer.py # Drawer open/close + drug selection +│ └── kpi.py # chart-data → KPI card values +└── utils/ + └── formatting.py # Cost/patient display formatters +``` -Match each drug to an indication by cross-referencing: -1. **GP diagnosis** — which Search_Terms the patient has matching SNOMED codes for -2. **Drug mapping** — which Search_Terms list each drug (from `DimSearchTerm.csv`) +### State Management (3 dcc.Store components) +- **app-state** (session): `chart_type`, `initiated`, `last_seen`, `selected_drugs`, `selected_directorates`, `date_filter_id` +- **chart-data** (memory): `nodes[]`, `unique_patients`, `total_drugs`, `total_cost` +- **reference-data** (session): `available_drugs`, `directorate_tree` (loaded once) -Only assign a drug to an indication if BOTH conditions are met. If a patient's drugs map to different indications, they become separate pathways (via modified UPID). +### Callback Chain +``` +Page Load → load_reference_data → reference-data store + → load_pathway_data → chart-data store + ├→ update_kpis → KPI cards + └→ update_chart → dcc.Graph -### Key Design Decisions +Filter change → update_app_state → app-state store → load_pathway_data → (chain above) -| Aspect | Decision | -|--------|----------| -| Drug-indication source | `data/DimSearchTerm.csv` — Search_Term → CleanedDrugName mapping | -| UPID modification | `{original_UPID}\|{search_term}` for drugs with matched indication | -| GP diagnosis matching | Return ALL matches per patient (not just most recent) | -| Drug matching | Substring match: HCD drug name contains DimSearchTerm fragment | -| Multiple indication matches per drug | Use highest GP code frequency as tiebreaker (COUNT of matching SNOMED codes per Search_Term) | -| GP code time range | Only codes from MIN(Intervention Date) onwards — restricts to HCD data window | -| No indication match | Fallback to directory (same as current behavior) | -| Same patient, different indications | Separate pathways via different modified UPIDs | +Drawer selection → update_drug_selection → app-state store → load_pathway_data → (chain above) +``` -### Examples - -**Patient on ADALIMUMAB + GOLIMUMAB, GP dx: axial spondyloarthritis + asthma** -- axial spondyloarthritis drug list includes both ADALIMUMAB and GOLIMUMAB -- → Both drugs grouped under "axial spondyloarthritis", single pathway -- Modified UPID: `RMV12345|axial spondyloarthritis` - -**Patient on ADALIMUMAB + OMALIZUMAB, GP dx: axial spondyloarthritis + asthma** -- axial spondyloarthritis lists ADALIMUMAB but not OMALIZUMAB -- asthma lists OMALIZUMAB but not ADALIMUMAB -- → Two separate pathways: - - `RMV12345|axial spondyloarthritis` with ADALIMUMAB - - `RMV12345|asthma` with OMALIZUMAB - -**Patient on ADALIMUMAB, GP dx: rheumatoid arthritis (47 codes) + crohn's disease (2 codes)** -- Both Search_Terms list ADALIMUMAB AND patient has GP dx for both -- → Tiebreaker: highest code frequency — rheumatoid arthritis has 47 matching SNOMED codes vs 2 for crohn's -- → Single pathway under rheumatoid arthritis (more clinical activity = more likely the treatment indication) +### Directorate Card Browser (dmc.Drawer) +- Position: right, ~480px wide +- **Top card**: "All Drugs" — flat list from `pathway_nodes` level 3. Pick one drug → see it across all directorates/indications. +- **Below**: Cards per PrimaryDirectorate (from DimSearchTerm.csv). Each has `dmc.Accordion` with indication items → drug chips inside. +- **Clear Filters** button resets all selections. +- Data model: `DimSearchTerm.csv` grouped by PrimaryDirectorate → Search_Term → CleanedDrugName --- -## Phase 1: Update Snowflake Query & Drug Mapping +## Phase 0: Project Scaffolding -### 1.1 Update `get_patient_indication_groups()` to return ALL matches with frequency -- [x] Modify the Snowflake query in `get_patient_indication_groups()` (diagnosis_lookup.py): - - Remove `QUALIFY ROW_NUMBER() OVER (PARTITION BY ... ORDER BY EventDateTime DESC) = 1` - - Return ALL matching Search_Terms per patient with code frequency: - ```sql - SELECT pc."PatientPseudonym" AS "PatientPseudonym", - aic.Search_Term AS "Search_Term", - COUNT(*) AS "code_frequency" - FROM PrimaryCareClinicalCoding pc - JOIN AllIndicationCodes aic ON pc."SNOMEDCode" = aic.SNOMEDCode - WHERE pc."PatientPseudonym" IN (...) - AND pc."EventDateTime" >= :earliest_hcd_date - GROUP BY pc."PatientPseudonym", aic.Search_Term - ``` - - `code_frequency` = number of matching SNOMED codes per Search_Term per patient - - Higher frequency = more clinical activity = stronger signal for tiebreaker - - `earliest_hcd_date` = `MIN(Intervention Date)` from the HCD DataFrame — restricts GP codes to the HCD data window, reducing noise from old/irrelevant diagnoses -- [x] Accept `earliest_hcd_date` parameter in `get_patient_indication_groups()` and pass to query -- [x] Keep batch processing (500 patients per query) -- [x] Update return type: DataFrame now has multiple rows per patient (PatientPseudonym, Search_Term, code_frequency) -- [x] Verify: Query returns more rows than before — 537,794 patient-indication rows (avg 16.0 per matched patient) vs previous single row per patient +### 0.1 Create dash_app/ skeleton + update pyproject.toml +- [x] Create `dash_app/` directory with `__init__.py`, `app.py`, subdirectories (`assets/`, `data/`, `components/`, `callbacks/`, `utils/`) +- [x] Create `run_dash.py` at project root (simple `from dash_app.app import app; app.run(debug=True, port=8050)`) +- [x] Update `pyproject.toml`: add `dash>=2.14.0`, `dash-mantine-components>=0.14.0` to dependencies (keep `reflex` temporarily) +- [x] Create minimal `app.py` with `dash.Dash(__name__)`, DMC provider wrapper, and "Hello Dash" placeholder layout +- **Checkpoint**: `python run_dash.py` starts, shows "Hello Dash" at localhost:8050 ✓ -### 1.2 Merge related asthma Search_Terms in CLUSTER_MAPPING_SQL -- [x] In `CLUSTER_MAPPING_SQL` (diagnosis_lookup.py), merge these 3 Search_Terms into one `"asthma"` entry: - - `allergic asthma` (Cluster: OMALIZUMAB only) - - `asthma` (Cluster: BENRALIZUMAB, DUPILUMAB, INHALED, MEPOLIZUMAB, OMALIZUMAB, RESLIZUMAB) - - `severe persistent allergic asthma` (Cluster: OMALIZUMAB only) -- [x] Map all 3 Cluster_IDs to `Search_Term = 'asthma'` in the CTE VALUES -- [x] `urticaria` (OMALIZUMAB, DERMATOLOGY) stays SEPARATE — do NOT merge with asthma -- [x] Also update `load_drug_indication_mapping()` to apply the same merge when loading DimSearchTerm.csv: - - Combine drug lists from all 3 entries under a single `"asthma"` key - - Deduplicate drug fragments (OMALIZUMAB appears in all 3) -- [x] Verify: GP code lookup returns `"asthma"` (not `"allergic asthma"` or `"severe persistent allergic asthma"`) -- [x] Verify: Drug mapping for `"asthma"` includes full combined drug list: BENRALIZUMAB, DUPILUMAB, INHALED, MEPOLIZUMAB, OMALIZUMAB, RESLIZUMAB - -### 1.3 Build drug-to-Search_Term lookup from DimSearchTerm.csv -- [x] Add function `load_drug_indication_mapping()` to `diagnosis_lookup.py`: - - Loads `data/DimSearchTerm.csv` - - Builds dict: `drug_fragment (uppercase) → list[Search_Term]` - - Also builds reverse: `search_term → list[drug_fragments]` - - CleanedDrugName is pipe-separated (e.g., "ADALIMUMAB|GOLIMUMAB|IXEKIZUMAB") -- [x] Add function `get_search_terms_for_drug(drug_name, search_term_to_fragments) -> list[str]`: - - Returns all Search_Terms whose drug fragments are substrings of the drug name (case-insensitive) - - More practical than per-term boolean check — returns all matches at once for Phase 2 use -- [x] Verify: ADALIMUMAB matches "axial spondyloarthritis", OMALIZUMAB matches "asthma" +### 0.2 Extract CSS from 01_nhs_classic.html into dash_app/assets/nhs.css +- [x] Copy the `