feat: create dash_app skeleton with nhs.css and MantineProvider (Phase 0)

- dash_app/ directory structure: app.py, assets/, data/, components/, callbacks/, utils/
- run_dash.py entry point at project root
- Added dash>=2.14.0 and dash-mantine-components>=0.14.0 to pyproject.toml
- app.py: Dash app with MantineProvider wrapper and 3 dcc.Store components
- nhs.css: extracted from 01_nhs_classic.html (sans mock icicle CSS)
- Validated: app starts cleanly at localhost:8050
This commit is contained in:
Andrew Charlwood
2026-02-06 12:57:47 +00:00
parent 76838887e6
commit 1c3ece6480
12 changed files with 783 additions and 578 deletions
+270 -202
View File
@@ -1,246 +1,314 @@
# Implementation Plan - Drug-Aware Indication Matching
# Implementation Plan — Reflex → Dash Migration
## Project Overview
Update the indication-based pathway charts so that patient indications are matched **per drug**, not just per patient. Currently, each patient gets ONE indication (most recent GP diagnosis match). This ignores which drugs the patient is actually taking.
Migrate the Reflex web application to Dash (Plotly) + Dash Mantine Components. The backend (`src/`) is untouched — only the frontend changes.
### The Problem
### What Changes
- `pathways_app/` (Reflex) → `dash_app/` (Dash + DMC)
- `run_dash.py` entry point replaces `reflex run`
- CSS extracted from `01_nhs_classic.html``dash_app/assets/nhs.css`
- Drug/Directory/Indication filters consolidated into a right-side `dmc.Drawer`
A patient on ADALIMUMAB + OMALIZUMAB currently gets assigned a single indication (e.g., "rheumatoid arthritis" — the most recent GP match). But:
- ADALIMUMAB is used for rheumatoid arthritis, axial spondyloarthritis, crohn's disease, etc.
- OMALIZUMAB is used for asthma, allergic asthma, urticaria
### What Stays (DO NOT MODIFY pipeline/analysis logic)
- `data_processing/pathway_pipeline.py`, `transforms.py`, `diagnosis_lookup.py` (matching logic)
- `analysis/pathway_analyzer.py`, `statistics.py`
- `cli/refresh_pathways.py`
- `data_processing/schema.py`, `reference_data.py`, `cache.py`, `data_source.py`
- SQLite schema and `pathway_nodes` table
- `data/` reference files (CSVs, pathways.db)
These are different clinical pathways and should be treated as separate treatment journeys.
### What CAN be edited in `src/` (shared utilities)
- `visualization/plotly_generator.py` — add/refactor a function to accept list-of-dicts (what Dash produces) instead of only DataFrames
- `data_processing/database.py` — add shared query functions for pathway node loading so both Reflex and Dash use the same queries
- `core/config.py` — if path resolution needs adjusting
### The Solution
### Dash App Structure
```
dash_app/
├── __init__.py
├── app.py # Entry point, layout root, dcc.Store components
├── assets/
│ └── nhs.css # Extracted from 01_nhs_classic.html
├── data/
│ ├── queries.py # SQLite queries (extracted from Reflex AppState)
│ └── card_browser.py # DimSearchTerm.csv → directorate tree
├── components/
│ ├── header.py # Top header bar
│ ├── sidebar.py # Left navigation
│ ├── kpi_row.py # 4 KPI cards
│ ├── filter_bar.py # Chart type toggle + date dropdowns
│ ├── chart_card.py # Chart area with tabs + dcc.Graph
│ ├── drawer.py # dmc.Drawer with card browser
│ └── footer.py # Page footer
├── callbacks/
│ ├── __init__.py # register_callbacks(app)
│ ├── filters.py # Date/chart-type → app-state store
│ ├── chart.py # chart-data → go.Icicle figure
│ ├── drawer.py # Drawer open/close + drug selection
│ └── kpi.py # chart-data → KPI card values
└── utils/
└── formatting.py # Cost/patient display formatters
```
Match each drug to an indication by cross-referencing:
1. **GP diagnosis** — which Search_Terms the patient has matching SNOMED codes for
2. **Drug mapping** — which Search_Terms list each drug (from `DimSearchTerm.csv`)
### State Management (3 dcc.Store components)
- **app-state** (session): `chart_type`, `initiated`, `last_seen`, `selected_drugs`, `selected_directorates`, `date_filter_id`
- **chart-data** (memory): `nodes[]`, `unique_patients`, `total_drugs`, `total_cost`
- **reference-data** (session): `available_drugs`, `directorate_tree` (loaded once)
Only assign a drug to an indication if BOTH conditions are met. If a patient's drugs map to different indications, they become separate pathways (via modified UPID).
### Callback Chain
```
Page Load → load_reference_data → reference-data store
→ load_pathway_data → chart-data store
├→ update_kpis → KPI cards
└→ update_chart → dcc.Graph
### Key Design Decisions
Filter change → update_app_state → app-state store → load_pathway_data → (chain above)
| Aspect | Decision |
|--------|----------|
| Drug-indication source | `data/DimSearchTerm.csv` — Search_Term → CleanedDrugName mapping |
| UPID modification | `{original_UPID}\|{search_term}` for drugs with matched indication |
| GP diagnosis matching | Return ALL matches per patient (not just most recent) |
| Drug matching | Substring match: HCD drug name contains DimSearchTerm fragment |
| Multiple indication matches per drug | Use highest GP code frequency as tiebreaker (COUNT of matching SNOMED codes per Search_Term) |
| GP code time range | Only codes from MIN(Intervention Date) onwards — restricts to HCD data window |
| No indication match | Fallback to directory (same as current behavior) |
| Same patient, different indications | Separate pathways via different modified UPIDs |
Drawer selection → update_drug_selection → app-state store → load_pathway_data → (chain above)
```
### Examples
**Patient on ADALIMUMAB + GOLIMUMAB, GP dx: axial spondyloarthritis + asthma**
- axial spondyloarthritis drug list includes both ADALIMUMAB and GOLIMUMAB
- → Both drugs grouped under "axial spondyloarthritis", single pathway
- Modified UPID: `RMV12345|axial spondyloarthritis`
**Patient on ADALIMUMAB + OMALIZUMAB, GP dx: axial spondyloarthritis + asthma**
- axial spondyloarthritis lists ADALIMUMAB but not OMALIZUMAB
- asthma lists OMALIZUMAB but not ADALIMUMAB
- → Two separate pathways:
- `RMV12345|axial spondyloarthritis` with ADALIMUMAB
- `RMV12345|asthma` with OMALIZUMAB
**Patient on ADALIMUMAB, GP dx: rheumatoid arthritis (47 codes) + crohn's disease (2 codes)**
- Both Search_Terms list ADALIMUMAB AND patient has GP dx for both
- → Tiebreaker: highest code frequency — rheumatoid arthritis has 47 matching SNOMED codes vs 2 for crohn's
- → Single pathway under rheumatoid arthritis (more clinical activity = more likely the treatment indication)
### Directorate Card Browser (dmc.Drawer)
- Position: right, ~480px wide
- **Top card**: "All Drugs" — flat list from `pathway_nodes` level 3. Pick one drug → see it across all directorates/indications.
- **Below**: Cards per PrimaryDirectorate (from DimSearchTerm.csv). Each has `dmc.Accordion` with indication items → drug chips inside.
- **Clear Filters** button resets all selections.
- Data model: `DimSearchTerm.csv` grouped by PrimaryDirectorate → Search_Term → CleanedDrugName
---
## Phase 1: Update Snowflake Query & Drug Mapping
## Phase 0: Project Scaffolding
### 1.1 Update `get_patient_indication_groups()` to return ALL matches with frequency
- [x] Modify the Snowflake query in `get_patient_indication_groups()` (diagnosis_lookup.py):
- Remove `QUALIFY ROW_NUMBER() OVER (PARTITION BY ... ORDER BY EventDateTime DESC) = 1`
- Return ALL matching Search_Terms per patient with code frequency:
```sql
SELECT pc."PatientPseudonym" AS "PatientPseudonym",
aic.Search_Term AS "Search_Term",
COUNT(*) AS "code_frequency"
FROM PrimaryCareClinicalCoding pc
JOIN AllIndicationCodes aic ON pc."SNOMEDCode" = aic.SNOMEDCode
WHERE pc."PatientPseudonym" IN (...)
AND pc."EventDateTime" >= :earliest_hcd_date
GROUP BY pc."PatientPseudonym", aic.Search_Term
```
- `code_frequency` = number of matching SNOMED codes per Search_Term per patient
- Higher frequency = more clinical activity = stronger signal for tiebreaker
- `earliest_hcd_date` = `MIN(Intervention Date)` from the HCD DataFrame — restricts GP codes to the HCD data window, reducing noise from old/irrelevant diagnoses
- [x] Accept `earliest_hcd_date` parameter in `get_patient_indication_groups()` and pass to query
- [x] Keep batch processing (500 patients per query)
- [x] Update return type: DataFrame now has multiple rows per patient (PatientPseudonym, Search_Term, code_frequency)
- [x] Verify: Query returns more rows than before — 537,794 patient-indication rows (avg 16.0 per matched patient) vs previous single row per patient
### 0.1 Create dash_app/ skeleton + update pyproject.toml
- [x] Create `dash_app/` directory with `__init__.py`, `app.py`, subdirectories (`assets/`, `data/`, `components/`, `callbacks/`, `utils/`)
- [x] Create `run_dash.py` at project root (simple `from dash_app.app import app; app.run(debug=True, port=8050)`)
- [x] Update `pyproject.toml`: add `dash>=2.14.0`, `dash-mantine-components>=0.14.0` to dependencies (keep `reflex` temporarily)
- [x] Create minimal `app.py` with `dash.Dash(__name__)`, DMC provider wrapper, and "Hello Dash" placeholder layout
- **Checkpoint**: `python run_dash.py` starts, shows "Hello Dash" at localhost:8050 ✓
### 1.2 Merge related asthma Search_Terms in CLUSTER_MAPPING_SQL
- [x] In `CLUSTER_MAPPING_SQL` (diagnosis_lookup.py), merge these 3 Search_Terms into one `"asthma"` entry:
- `allergic asthma` (Cluster: OMALIZUMAB only)
- `asthma` (Cluster: BENRALIZUMAB, DUPILUMAB, INHALED, MEPOLIZUMAB, OMALIZUMAB, RESLIZUMAB)
- `severe persistent allergic asthma` (Cluster: OMALIZUMAB only)
- [x] Map all 3 Cluster_IDs to `Search_Term = 'asthma'` in the CTE VALUES
- [x] `urticaria` (OMALIZUMAB, DERMATOLOGY) stays SEPARATE — do NOT merge with asthma
- [x] Also update `load_drug_indication_mapping()` to apply the same merge when loading DimSearchTerm.csv:
- Combine drug lists from all 3 entries under a single `"asthma"` key
- Deduplicate drug fragments (OMALIZUMAB appears in all 3)
- [x] Verify: GP code lookup returns `"asthma"` (not `"allergic asthma"` or `"severe persistent allergic asthma"`)
- [x] Verify: Drug mapping for `"asthma"` includes full combined drug list: BENRALIZUMAB, DUPILUMAB, INHALED, MEPOLIZUMAB, OMALIZUMAB, RESLIZUMAB
### 1.3 Build drug-to-Search_Term lookup from DimSearchTerm.csv
- [x] Add function `load_drug_indication_mapping()` to `diagnosis_lookup.py`:
- Loads `data/DimSearchTerm.csv`
- Builds dict: `drug_fragment (uppercase) → list[Search_Term]`
- Also builds reverse: `search_term → list[drug_fragments]`
- CleanedDrugName is pipe-separated (e.g., "ADALIMUMAB|GOLIMUMAB|IXEKIZUMAB")
- [x] Add function `get_search_terms_for_drug(drug_name, search_term_to_fragments) -> list[str]`:
- Returns all Search_Terms whose drug fragments are substrings of the drug name (case-insensitive)
- More practical than per-term boolean check — returns all matches at once for Phase 2 use
- [x] Verify: ADALIMUMAB matches "axial spondyloarthritis", OMALIZUMAB matches "asthma"
### 0.2 Extract CSS from 01_nhs_classic.html into dash_app/assets/nhs.css
- [x] Copy the `<style>` block from `01_nhs_classic.html` (lines 8-314) into `dash_app/assets/nhs.css`
- [x] Add Google Fonts `@import` for Source Sans 3 at top of CSS file
- [x] Remove the mock icicle chart CSS (`.icicle`, `.icicle__row`, `.icicle__cell`, `.lvl-*` classes) — Plotly handles the real chart
- [x] Verify CSS loads by checking browser dev tools when app starts
- **Checkpoint**: `python run_dash.py` loads CSS (check font renders as Source Sans 3) ✓
---
## Phase 2: Drug-Aware Indication Matching Logic
## Phase 1: Data Access Layer
### 2.1 Create `assign_drug_indications()` function
- [x] Add to `diagnosis_lookup.py` or `pathway_pipeline.py`:
```
def assign_drug_indications(
df: pd.DataFrame, # HCD data with UPID, Drug Name columns
gp_matches_df: pd.DataFrame, # PatientPseudonym → list of matched Search_Terms
drug_mapping: dict, # From load_drug_indication_mapping()
) -> tuple[pd.DataFrame, pd.DataFrame]:
Returns: (modified_df, indication_df)
- modified_df: HCD data with UPID replaced by {UPID}|{indication}
- indication_df: mapping modified_UPID → Search_Term
```
- [x] Logic per UPID + Drug Name pair:
1. Get patient's GP-matched Search_Terms with code_frequency (from gp_matches_df via PseudoNHSNoLinked)
2. Get which Search_Terms include this drug (from drug_mapping)
3. Intersection = valid indications for this drug-patient pair
4. If 1 match: use it
5. If multiple matches: use highest code_frequency as tiebreaker (most GP coding activity = most likely treatment indication)
6. If 0 matches: use fallback directory
- [x] Modify UPID in df rows: `{original_UPID}|{matched_search_term}`
- [x] Build indication_df: `{modified_UPID}` → `Search_Term` (or fallback label)
- [x] Verify: Function compiles, handles edge cases (no GP match, no drug match)
### 1.1 Create shared data access functions
- [ ] Add query functions to `src/data_processing/database.py` (or a new `src/data_processing/pathway_queries.py` if database.py is already large):
- `load_initial_data(db_path) -> dict` — extracted from `AppState.load_data()` (pathways_app.py lines 407-488): returns `{"available_drugs": [...], "available_directorates": [...], "available_indications": [...], "total_records": int, "last_updated": str}`
- `load_pathway_data(db_path, filter_id, chart_type, selected_drugs=None, selected_directorates=None) -> dict` — extracted from `AppState.load_pathway_data()` (lines 490-642): returns `{"nodes": [...], "unique_patients": int, "total_drugs": int, "total_cost": float, "last_updated": str}`
- These are plain Python functions that accept `db_path` as a parameter (no Reflex state objects)
- [ ] Create thin `dash_app/data/queries.py` that imports and calls the shared functions with the correct `db_path`
- [ ] Return plain dicts/lists — JSON-serializable for dcc.Store
- **Checkpoint**: `python -c "from dash_app.data.queries import load_initial_data; print(load_initial_data())"` returns valid data
### 2.2 Handle tiebreaker for multiple indication matches
- [x] When a drug matches multiple Search_Terms AND patient has GP dx for multiple:
- Use `code_frequency` from the GP query (COUNT of matching SNOMED codes per Search_Term)
- Higher code_frequency = more clinical activity for that condition = more likely treatment indication
- E.g., patient with 47 RA codes and 2 crohn's codes → ADALIMUMAB assigned to RA
- code_frequency is already returned by the updated query in Task 1.1
- [x] Verify: Tiebreaker logic correctly picks highest-frequency diagnosis
- [x] Verify: Tie on frequency (rare but possible) falls back to alphabetical Search_Term for determinism
### 1.2 Build directorate card tree from DimSearchTerm.csv
- [ ] Create `dash_app/data/card_browser.py` with:
- `build_directorate_tree()` → dict structured as `{PrimaryDirectorate: {Search_Term: [drug_fragment, ...]}}`
- Loads `data/DimSearchTerm.csv`, groups by PrimaryDirectorate → Search_Term → split CleanedDrugName by pipe
- Applies SEARCH_TERM_MERGE_MAP from `data_processing.diagnosis_lookup` (merge asthma variants)
- `get_all_drugs()` → sorted flat list of all unique drug labels from `pathway_nodes` level 3
- **Checkpoint**: `python -c "from dash_app.data.card_browser import build_directorate_tree; import json; print(json.dumps(build_directorate_tree(), indent=2))"` returns valid tree
---
## Phase 3: Pipeline Integration
## Phase 2: Static Layout
### 3.1 Update `refresh_pathways.py` indication processing
- [x] In the `elif current_chart_type == "indication":` block:
1. Call `get_patient_indication_groups()` as before (but now returns ALL matches)
2. Load drug mapping: `drug_mapping = load_drug_indication_mapping()`
3. Call `assign_drug_indications(df, gp_matches_df, drug_mapping)`
4. Use modified_df (with indication-aware UPIDs) for pathway processing
5. Use indication_df for the indication mapping
- [x] Pass modified_df (not original df) to `process_indication_pathway_for_date_filter()`
- [x] Verify: Pipeline compiles, `python -m py_compile cli/refresh_pathways.py`
### 2.1 Header + sidebar components
- [ ] Create `dash_app/components/header.py``make_header()` function returning Dash HTML component
- NHS logo, title "HCD Analysis", breadcrumb, data freshness indicator (status dot + record count + last updated)
- Use CSS classes from `nhs.css`: `.top-header`, `.top-header__brand`, `.top-header__logo`, `.top-header__title`, etc.
- Record count and last updated are `html.Span` with IDs for callback updates: `id="header-record-count"`, `id="header-last-updated"`
- [ ] Create `dash_app/components/sidebar.py``make_sidebar()` function
- Navigation items matching 01_nhs_classic.html sidebar (Pathway Overview active, Drug Selection, Trust Selection, Directory Selection, Indications, Cost Analysis, Export Data)
- SVG icons as raw HTML (copy from 01_nhs_classic.html)
- "Drug Selection" and "Indications" items trigger the dmc.Drawer (via callback, wired in Phase 4)
- Footer: "NHS Norfolk & Waveney ICB / High Cost Drugs Programme"
- **Checkpoint**: Components render in browser with correct NHS styling
### 3.2 Test with dry run
- [x] Run `python -m cli.refresh_pathways --chart-type indication --dry-run -v`
- [x] Verify:
- Modified UPIDs appear in pipeline log (42,072 unique modified UPIDs)
- Patient counts are reasonable (42,072 modified UPIDs vs 36,628 original patients)
- Drug-indication matching is logged (49.3% match, 50.7% fallback, 15,238 tiebreakers)
- Pathway hierarchy shows drug-specific grouping under correct indications (1,846 total nodes)
- [x] Fixed: network_timeout increased from 30→600 (was killing GP lookup queries)
- [x] Fixed: batch_size increased from 500→5000 (reduces CTE compilation overhead from 74 to 8 batches)
### 2.2 Main content area: KPI row + filter bar + chart card
- [ ] Create `dash_app/components/kpi_row.py``make_kpi_row()` function
- 4 KPI cards: Unique Patients, Drug Types, Total Cost, Indication Match Rate
- Each card value has an ID for callback updates: `id="kpi-patients"`, `id="kpi-drugs"`, `id="kpi-cost"`, `id="kpi-match"`
- CSS classes: `.kpi-row`, `.kpi-card`, `.kpi-card__label`, `.kpi-card__value`, `.kpi-card__sub`
- [ ] Create `dash_app/components/filter_bar.py``make_filter_bar()` function
- Chart type toggle pills ("By Directory" / "By Indication") — use `html.Button` with `.toggle-pill` CSS
- Initiated dropdown: All years, Last 2 years, Last 1 year — use `dcc.Dropdown` or `html.Select` with `.filter-select`
- Last seen dropdown: Last 6 months, Last 12 months
- NO drug/directorate dropdowns here (those are in the drawer)
- Component IDs: `id="chart-type-directory"`, `id="chart-type-indication"`, `id="filter-initiated"`, `id="filter-last-seen"`
- [ ] Create `dash_app/components/chart_card.py``make_chart_card()` function
- Card header with title + dynamic subtitle (hierarchy label: "Trust → Directorate → Drug → Pathway")
- Tab row: Icicle (active), Sankey (disabled placeholder), Timeline (disabled placeholder)
- `dcc.Graph(id="pathway-chart")` filling the card body
- CSS classes: `.chart-card`, `.chart-card__header`, `.chart-card__tabs`, `.chart-tab`
- **Checkpoint**: All three components render with correct layout and styling
### 2.3 Footer + full page assembly
- [ ] Create `dash_app/components/footer.py``make_footer()` function
- CSS class `.page-footer`, same text as 01_nhs_classic.html
- [ ] Update `dash_app/app.py` to assemble full page layout:
- `dmc.MantineProvider(children=[header, sidebar, main_content])`
- Main content: KPI row → filter bar → chart card → footer
- Add 3 `dcc.Store` components: `id="app-state"`, `id="chart-data"`, `id="reference-data"`
- Wrap main content in `html.Main(className="main")`
- **Checkpoint**: Full page renders at localhost:8050, layout matches 01_nhs_classic.html visually
---
## Phase 4: Full Refresh & Validation
## Phase 3: Core Callbacks
### 4.1 Full refresh with both chart types
- [x] Run `python -m cli.refresh_pathways --chart-type all`
- [x] Verify:
- Both chart types generate data (directory: 1,101 nodes, indication: 1,846 nodes)
- Directory charts unchanged (293-329 nodes per date filter, same as before)
- Indication charts reflect drug-aware matching (42,072 modified UPIDs, 49.3% match rate)
### 3.1 Reference data loading + filter state management
- [ ] Create `dash_app/callbacks/filters.py`:
- `load_reference_data` callback: fires on page load, calls `queries.load_initial_data()`, populates `reference-data` store + header indicators
- `update_app_state` callback: fires when chart-type toggle or date dropdowns change, computes `date_filter_id` (e.g., `"all_6mo"`), updates `app-state` store
- Chart type toggle: use `callback_context` to determine which button was clicked, set active class via `className`
- [ ] Create `dash_app/callbacks/__init__.py` with `register_callbacks(app)` that imports and registers all callback modules
- [ ] Wire `register_callbacks(app)` in `app.py`
- **Checkpoint**: Page loads reference data, filter dropdowns update app-state store (verify via browser dev tools → dcc.Store)
### 4.2 Validate indication chart correctness
- [x] Check that drugs under an indication all appear in that Search_Term's drug list
- RA: ADALIMUMAB, RITUXIMAB, BARICITINIB, CERTOLIZUMAB PEGOL, TOCILIZUMAB ✓
- Asthma: DUPILUMAB, OMALIZUMAB ✓
- [x] Verify that a patient on drugs for different indications creates separate pathway branches
- 42,072 modified UPIDs vs 36,628 original patients confirms splitting ✓
- [x] Verify that drugs sharing an indication are grouped in the same pathway
- Multiple RA drugs (ADALIMUMAB, RITUXIMAB, etc.) all under "rheumatoid arthritis" ✓
- [x] Log: patient count comparison (old vs new approach)
- Old: 36,628 patients → single indication each
- New: 42,072 modified UPIDs → drug-specific indications (15% increase from splitting)
### 3.2 Pathway data loading callback
- [ ] Create `dash_app/callbacks/chart.py` (or add to filters.py):
- `load_pathway_data` callback: Input=`app-state` store, Output=`chart-data` store
- Calls `queries.load_pathway_data(filter_id, chart_type, selected_drugs, selected_directorates)`
- Runs on page load AND whenever `app-state` changes
- **Checkpoint**: Changing date filter updates chart-data store with new pathway nodes
### 4.3 Validate Reflex UI
- [x] Run `python -m reflex compile` to verify app compiles (compiled in 16.6s)
- [x] Verify chart type toggle still works (no code changes to UI, toggle mechanism unchanged)
- [x] Verify indication chart shows correct hierarchy (42 unique search_terms at level 2 for all_6mo)
### 3.3 KPI update callback
- [ ] Create `dash_app/callbacks/kpi.py`:
- `update_kpis` callback: Input=`chart-data` store, Output=KPI card values (4 outputs)
- Extracts `unique_patients`, `total_drugs`, `total_cost` from chart-data
- Formats numbers: patients with commas, cost as "£XXX.XM", drugs as plain number
- **Checkpoint**: KPIs update when date filters change
### 3.4 Icicle chart rendering callback
- [ ] Add a `create_icicle_from_nodes(nodes: list[dict], title: str) -> go.Figure` function to `src/visualization/plotly_generator.py`:
- Accepts list-of-dicts (the format stored in `chart-data` dcc.Store / returned by `load_pathway_data`)
- Same 10-field customdata, colorscale, texttemplate, hovertemplate as the existing Reflex `icicle_figure` (pathways_app.py lines 769-920)
- The existing `create_icicle_figure(ice_df)` stays untouched — the new function is an additional entry point for dict-based data
- Use the NHS blue gradient colorscale from the Reflex version: `[[0.0, "#003087"], [0.25, "#0066CC"], ...]`
- [ ] Add to `dash_app/callbacks/chart.py`:
- `update_chart` callback: Input=`chart-data` store, Output=`pathway-chart` figure
- Calls `create_icicle_from_nodes(chart_data["nodes"], title)` from the shared visualization module
- Dynamic title based on chart type and filters
- **Checkpoint**: Real icicle chart renders with SQLite data, filters change the chart, hover shows full statistics
---
## Phase 4: Directorate Card Browser
### 4.1 dmc.Drawer layout
- [ ] Create `dash_app/components/drawer.py``make_drawer()` function:
- `dmc.Drawer(id="drug-drawer", position="right", size="480px")`
- **Top section**: "All Drugs" card — flat alphabetical list of all drug names from pathway_nodes level 3
- Each drug as a `dmc.Chip` or clickable badge, ID pattern: `{"type": "drug-chip", "index": drug_name}`
- **Below**: One `dmc.Card` per PrimaryDirectorate from DimSearchTerm.csv
- Card title = PrimaryDirectorate name
- Inside: `dmc.Accordion` with one item per Search_Term (indication)
- Inside each accordion item: drug fragment chips
- **Bottom**: `dmc.Button("Clear Filters", id="clear-drug-filters")` — full width
- **Checkpoint**: Drawer opens with correct layout, all directorates and drugs visible
### 4.2 Drawer callbacks
- [ ] Create `dash_app/callbacks/drawer.py`:
- Open/close drawer: sidebar "Drug Selection" or "Indications" click → open drawer
- Drug selection: clicking a drug chip → adds drug to `selected_drugs` in `app-state` → triggers chart reload
- Indication selection: clicking an indication accordion item → filters to drugs under that indication
- Visual highlights: selected drugs get active styling (e.g., blue background on chips)
- Clear filters: resets `selected_drugs` and `selected_directorates` in `app-state`
- Use pattern-matching callbacks for dynamic drug chips: `@app.callback(..., Input({"type": "drug-chip", "index": ALL}, "n_clicks"))`
- **Checkpoint**: Select drug from drawer → chart filters to show that drug → clear resets
---
## Phase 5: Polish & Cleanup
### 5.1 Trust selection
- [ ] Add trust selection either:
- In the dmc.Drawer as a "Trusts" section (preferred — keeps all filters in one place), OR
- As sidebar checkboxes
- [ ] Wire trust selection to `selected_trusts` in `app-state` → pathway data reload
- **Checkpoint**: Selecting trusts filters the chart correctly
### 5.2 Loading/error/empty states + dynamic hierarchy label
- [ ] Add `dcc.Loading` wrapper around chart area
- [ ] Show "No data" message when chart-data is empty
- [ ] Show error toast/alert when database query fails
- [ ] Dynamic chart subtitle: "Trust → Directorate → Drug → Pathway" or "Trust → Indication → Drug → Pathway" based on chart type
- **Checkpoint**: Loading spinner appears during data fetch, empty state shows message
### 5.3 Data freshness indicator
- [ ] Header shows: green dot + "{N} records" + "Last updated: {relative_time}"
- [ ] Pull from `pathway_refresh_log` via `queries.load_initial_data()`
- [ ] Format as relative time (e.g., "2h ago", "yesterday")
- **Checkpoint**: Header shows correct data freshness
### 5.4 Remove Reflex + final validation
- [ ] Remove `reflex` from `pyproject.toml` dependencies
- [ ] Delete or archive `pathways_app/` directory (move to `archive/`)
- [ ] Delete `pathways_app/styles.py` and any Reflex-specific files
- [ ] Update project `CLAUDE.md` to document Dash app structure, new run command, callback architecture
- [ ] Verify: `python run_dash.py` starts cleanly, full end-to-end workflow works
- [ ] Verify: No Reflex imports anywhere in `dash_app/`
- **Checkpoint**: Full application works, no Reflex remnants, CLAUDE.md updated
---
## Completion Criteria
All tasks marked `[x]` AND:
- [x] App compiles without errors (`reflex compile` succeeds — 16.6s)
- [x] Both chart types generate pathway data (directory: 1,101, indication: 1,846)
- [x] Indication charts show drug-specific indication matching (49.3% match rate)
- [x] Drugs under the same indication for the same patient are in one pathway (validated via SQLite queries)
- [x] Drugs under different indications for the same patient create separate pathways (42,072 modified UPIDs > 36,628 original)
- [x] Fallback works for drugs with no indication match (RHEUMATOLOGY/OPHTHALMOLOGY/etc. "(no GP dx)" labels present)
- [x] Full refresh completes successfully (2,947 records in 738.4s)
- [x] Existing directory charts are unaffected (1,101 nodes, same count range as previous refresh)
- [ ] `python run_dash.py` starts cleanly at localhost:8050
- [ ] Layout matches 01_nhs_classic.html (header, sidebar, KPIs, filter bar, chart card, footer)
- [ ] Icicle chart renders with real SQLite data (pathway_nodes)
- [ ] Date filters + chart type toggle update chart correctly
- [ ] dmc.Drawer opens, shows directorate cards with indications/drugs
- [ ] Selecting a drug from drawer filters the chart
- [ ] "All Drugs" card allows selecting any drug across all contexts
- [ ] "Clear Filters" resets all selections
- [ ] KPIs update dynamically (patients, drugs, cost)
- [ ] No Reflex imports in `dash_app/`
---
## Reference
## Key Reference Files
### DimSearchTerm.csv Structure
```
Search_Term,CleanedDrugName,PrimaryDirectorate
rheumatoid arthritis,ABATACEPT|ADALIMUMAB|ANAKINRA|BARICITINIB|...,RHEUMATOLOGY
asthma,BENRALIZUMAB|DUPILUMAB|INHALED|MEPOLIZUMAB|OMALIZUMAB|RESLIZUMAB,THORACIC MEDICINE
```
### Modified UPID Format
```
Original: RMV12345
Modified: RMV12345|rheumatoid arthritis
Fallback: RMV12345|RHEUMATOLOGY (no GP dx)
```
### Current vs New Indication Flow
```
CURRENT:
Patient → GP dx (most recent) → single Search_Term → one pathway
NEW:
Patient + Drug A → GP dx matching Drug A → Search_Term X
Patient + Drug B → GP dx matching Drug B → Search_Term Y
→ If X == Y: one pathway under X
→ If X != Y: two pathways (modified UPIDs)
```
### Key Files
| File | Changes |
| File | Purpose |
|------|---------|
| `data_processing/diagnosis_lookup.py` | Update query, add drug mapping functions |
| `data_processing/pathway_pipeline.py` | Possibly minor changes for modified UPIDs |
| `cli/refresh_pathways.py` | Integrate drug-aware matching into pipeline |
| `data/DimSearchTerm.csv` | Reference data (read-only) |
| `analysis/pathway_analyzer.py` | No changes expected (UPID changes are transparent) |
| `pathways_app/pathways_app.py` | No changes expected |
| `01_nhs_classic.html` | Design reference — CSS classes, layout structure, visual targets |
| `pathways_app/pathways_app.py` | Source of truth for data loading logic (lines 407-642) and icicle chart (lines 769-920) |
| `data/pathways.db` | SQLite database with pre-computed pathway_nodes |
| `data/DimSearchTerm.csv` | Directorate → Search_Term → drug mapping for card browser |
| `src/data_processing/diagnosis_lookup.py` | SEARCH_TERM_MERGE_MAP constant for asthma normalization |
## Key Data Patterns
### Date Filter IDs
| ID | Initiated | Last Seen |
|----|-----------|-----------|
| `all_6mo` | All years | Last 6 months (DEFAULT) |
| `all_12mo` | All years | Last 12 months |
| `1yr_6mo` | Last 1 year | Last 6 months |
| `1yr_12mo` | Last 1 year | Last 12 months |
| `2yr_6mo` | Last 2 years | Last 6 months |
| `2yr_12mo` | Last 2 years | Last 12 months |
### Pathway Node Columns (from SQLite)
`parents, ids, labels, level, value, cost, costpp, cost_pp_pa, colour, first_seen, last_seen, first_seen_parent, last_seen_parent, average_spacing, average_administered, avg_days, trust_name, directory, drug_sequence, chart_type, date_filter_id`
### Icicle Chart Customdata (10 fields)
```
[0] value — patient count
[1] colour — proportion of parent
[2] cost — total cost
[3] costpp — cost per patient
[4] first_seen — first intervention date
[5] last_seen — last intervention date
[6] first_seen_parent — earliest date in parent group
[7] last_seen_parent — latest date in parent group
[8] average_spacing — dosing information string
[9] cost_pp_pa — cost per patient per annum
```