Files
HighCostDrugsDemo/progress.txt
T

242 lines
18 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Progress Log — Reflex → Dash Migration
## Project Context
Migrating the HCD Analysis frontend from Reflex to Dash (Plotly) + Dash Mantine Components. Pipeline/analysis logic in `src/` is untouched, but shared utilities (data queries, figure construction) should be added TO `src/` so Dash callbacks call into them rather than duplicating code.
**Previous state**: Fully working Reflex app with pre-computed pathway architecture (SQLite), dual chart types (directory + indication), drug-aware indication matching. All pipeline work is done.
**New goal**: Replace Reflex with Dash for better control over layout, CSS, and component behavior. Add a dmc.Drawer-based "card browser" for drug/indication selection organized by clinical directorate.
## Key Data Patterns
### SQLite pathway_nodes table
- ~3,600 rows across 12 datasets (6 date filters × 2 chart types)
- Key columns: `parents, ids, labels, level, value, cost, costpp, cost_pp_pa, colour, first_seen, last_seen, first_seen_parent, last_seen_parent, average_spacing, trust_name, directory, drug_sequence, chart_type, date_filter_id`
- Level 0 = Root, Level 1 = Trust, Level 2 = Directory/Indication, Level 3 = Drug, Level 4+ = Pathway
- `chart_type`: "directory" or "indication"
- `date_filter_id`: "all_6mo" (default), "all_12mo", "1yr_6mo", "1yr_12mo", "2yr_6mo", "2yr_12mo"
- UNIQUE constraint: (date_filter_id, chart_type, ids)
### DimSearchTerm.csv (for card browser)
- Located at `data/DimSearchTerm.csv`
- Columns: Search_Term, CleanedDrugName (pipe-separated drug fragments), PrimaryDirectorate
- ~165 rows; some Search_Terms appear twice (e.g., "diabetes" under DIABETIC MEDICINE and OPHTHALMOLOGY)
- Drug fragments are UPPERCASE substrings matched against standardized drug names
- SEARCH_TERM_MERGE_MAP in `src/data_processing/diagnosis_lookup.py` merges asthma variants: {"allergic asthma": "asthma", "severe persistent allergic asthma": "asthma"}
### Data loading logic to extract
- `pathways_app/pathways_app.py` lines 407-488: `load_data()` — loads available drugs, directorates, indications, total records, last updated from SQLite
- `pathways_app/pathways_app.py` lines 490-642: `load_pathway_data()` — queries pathway_nodes with date_filter_id + chart_type + optional drug/directory filters
- `pathways_app/pathways_app.py` lines 769-920: `icicle_figure` — builds go.Icicle with 10-field customdata, NHS colorscale, texttemplate, hovertemplate
### CSS from 01_nhs_classic.html
- Lines 8-314 contain the full CSS (copy to dash_app/assets/nhs.css)
- Google Fonts: `Source Sans 3` weights 300,400,600,700,900
- CSS variables: `--nhs-blue: #005EB8`, `--nhs-dark-blue: #003087`, `--nhs-light-blue: #41B6E6`, etc.
- Key classes: `.top-header`, `.sidebar`, `.main`, `.kpi-row`, `.kpi-card`, `.filter-bar`, `.toggle-pill`, `.chart-card`, `.chart-tab`, `.page-footer`
- Remove `.icicle`, `.icicle__row`, `.icicle__cell`, `.lvl-*` classes — those are mock chart CSS, Plotly handles the real chart
### Dash-specific patterns
- State via `dcc.Store`: 3 stores (app-state, chart-data, reference-data)
- Callbacks: unidirectional flow (filter change → app-state → chart-data → UI components)
- DMC components: `dmc.MantineProvider` wraps everything, `dmc.Drawer` for card browser
- Pattern-matching callbacks: `{"type": "drug-chip", "index": drug_name}` for dynamic drug chip selection
- Assets auto-served from `dash_app/assets/` directory
### Database path from dash_app/
- From `dash_app/data/queries.py`: `Path(__file__).resolve().parents[2] / "data" / "pathways.db"`
- From `dash_app/data/card_browser.py`: same pattern for `data/DimSearchTerm.csv`
### Existing src/ code to build on (not duplicate)
- `src/visualization/plotly_generator.py` already has `create_icicle_figure(ice_df, title)` that takes a DataFrame with columns like `"First seen"`, `"Last seen (Parent)"` (with spaces). The Reflex AppState `icicle_figure` (pathways_app.py:769) takes list-of-dicts with keys like `first_seen`, `last_seen_parent` (underscores). For Dash, add a NEW function `create_icicle_from_nodes(nodes, title)` that accepts list-of-dicts. Don't modify the existing DataFrame-based function.
- `src/data_processing/database.py` has `DatabaseManager` class. Add standalone query functions here (or a new `pathway_queries.py`) so Dash and Reflex share the same SQL.
- The existing `create_icicle_figure` uses Viridis colorscale; the Reflex version uses NHS blue gradient. The Dash version should use the NHS blue gradient from pathways_app.py.
### Architecture decision: shared code in src/
- Pipeline/analysis logic is OFF LIMITS: pathway_pipeline.py, transforms.py, diagnosis_lookup.py (matching), pathway_analyzer.py, statistics.py, refresh_pathways.py
- Shared utilities are ENCOURAGED to add to src/: plotly_generator.py (new dict-based function), database.py (query functions)
- dash_app/data/queries.py should be a thin wrapper that calls into src/
## Iteration Log
## Iteration 1 — 2026-02-06
### Task: Phase 0 — Tasks 0.1 + 0.2 (Project Scaffolding + CSS Extraction)
### Why this task:
- This is the first iteration. Phase 0 scaffolding is the foundation everything else depends on.
- Tasks 0.1 and 0.2 are tightly coupled (CSS needs the assets dir from 0.1), so both done together.
### Status: COMPLETE
### What was done:
- Created `dash_app/` directory with subdirectories: `assets/`, `data/`, `components/`, `callbacks/`, `utils/`
- Created `__init__.py` in all packages
- Created `run_dash.py` entry point at project root
- Updated `pyproject.toml` with `dash>=2.14.0` and `dash-mantine-components>=0.14.0`
- Ran `uv sync` — installed Dash 4.0.0 and DMC 2.5.1 (newer than plan expected)
- Created `dash_app/app.py` with `MantineProvider` wrapper, 3 `dcc.Store` components (app-state, chart-data, reference-data), and placeholder layout
- Extracted CSS from `01_nhs_classic.html` into `dash_app/assets/nhs.css` (7.5KB)
- Added Google Fonts `@import` for Source Sans 3
- Removed mock icicle chart CSS (`.icicle`, `.icicle__row`, `.icicle__cell`, `.lvl-*`)
- Kept all real component CSS: header, sidebar, KPI, filter bar, chart card, footer, responsive
### Validation results:
- Tier 1 (Code): `python -c "from dash_app.app import app"` — OK, layout type is MantineProvider
- Tier 1 (App starts): `python run_dash.py` — serves at http://127.0.0.1:8050/
- CSS: nhs.css (7,491 bytes) found in Dash assets folder, auto-served
### Files changed:
- `pyproject.toml` — added dash + dmc dependencies
- `run_dash.py` — new entry point
- `dash_app/__init__.py` — new package
- `dash_app/app.py` — new app with MantineProvider + stores + placeholder
- `dash_app/assets/nhs.css` — extracted from 01_nhs_classic.html
- `dash_app/data/__init__.py` — empty package
- `dash_app/components/__init__.py` — empty package
- `dash_app/callbacks/__init__.py` — empty package
- `dash_app/utils/__init__.py` — empty package
### Committed: 1c3ece6 "feat: create dash_app skeleton with nhs.css and MantineProvider (Phase 0)"
### Patterns discovered:
- Dash 4.0.0 and DMC 2.5.1 installed (plan said 2.x and 0.14.x). The API is compatible — `MantineProvider`, `Drawer`, `Accordion`, `Chip` all available. DMC 2.x is based on Mantine v7.
- `dmc.MantineProvider(children=[...])` works for wrapping the layout in both DMC versions.
- `suppress_callback_exceptions=True` needed in Dash app since callbacks reference IDs not yet in layout during phased development.
### Next iteration should:
- Start Phase 1: Task 1.1 — Create shared data access functions
- Read `pathways_app/pathways_app.py` lines 407-642 to extract `load_data()` and `load_pathway_data()` logic
- Create shared functions in `src/data_processing/pathway_queries.py` (database.py is already used for DatabaseManager)
- Create thin wrapper `dash_app/data/queries.py` that calls the shared functions
- Verify with `python -c "from dash_app.data.queries import load_initial_data; print(load_initial_data())"`
### Blocked items:
- None
## Iteration 2 — 2026-02-06
### Task: Phase 1 — Task 1.1 (Create shared data access functions)
### Why this task:
- Phase 0 complete in iteration 1; Phase 1 is next in dependency order
- progress.txt from iteration 1 explicitly recommended this task
- All UI components and callbacks (Phases 2-5) depend on having data access working
### Status: COMPLETE
### What was done:
- Created `src/data_processing/pathway_queries.py` with two shared functions:
- `load_initial_data(db_path)` — returns available drugs (42), directorates (14), indications (32), total_records, last_updated
- `load_pathway_nodes(db_path, filter_id, chart_type, selected_drugs, selected_directorates)` — returns nodes list, unique_patients, total_drugs, total_cost, last_updated
- Both functions extracted directly from AppState methods in pathways_app.py (lines 407-642), with Reflex `self.*` references replaced by function parameters
- All return values are plain dicts/lists — JSON-serializable for dcc.Store
- Created thin wrapper `dash_app/data/queries.py` that resolves DB_PATH and delegates to shared functions
- Used separate file (pathway_queries.py) rather than adding to database.py because database.py is connection management (240 lines), queries are a distinct concern
### Validation results:
- Tier 1 (Code): `python -c "from dash_app.data.queries import load_initial_data"` — OK (requires uv run for .pth file)
- Tier 1 (App starts): `from dash_app.app import app` — OK, layout type is MantineProvider
- Tier 3 (Functional):
- `load_initial_data()`: 42 drugs, 14 directorates, 32 indications, last_updated=2026-02-06T00:08:55
- `load_pathway_data("all_6mo", "directory")`: 293 nodes, 11,118 patients, 39 drugs, £130.5M cost
- `load_pathway_data("all_6mo", "indication")`: 438 nodes, 11,252 patients
- `load_pathway_data("all_6mo", "directory", selected_drugs=["ADALIMUMAB"])`: 70 nodes (drug filter works)
### Files changed:
- `src/data_processing/pathway_queries.py` — NEW: shared query functions
- `dash_app/data/queries.py` — NEW: thin Dash wrapper with DB_PATH resolution
- `IMPLEMENTATION_PLAN.md` — Task 1.1 marked [x]
### Committed: b71748f "feat: add shared pathway query functions for Dash data access (Task 1.1)"
### Patterns discovered:
- `src/` is on sys.path only when using `uv run` (via .pth file created by setup_dev.py). Running `python` directly won't find `data_processing` module. Always use `uv run python` for testing.
- `total_records` from `pathway_refresh_log` returns 0 — the refresh log's `source_row_count` field appears empty despite `completed_at` having a value. This is cosmetic — the KPI can use `unique_patients` from chart-data instead.
- Drug filtering correctly includes nodes with NULL drug_sequence (root, trust, directory levels) alongside matching drug nodes. Root node patient count becomes 0 when drug filter is active — this matches Reflex behavior.
### Next iteration should:
- Start Task 1.2 — Build directorate card tree from DimSearchTerm.csv
- Create `dash_app/data/card_browser.py` with `build_directorate_tree()` and `get_all_drugs()`
- Read `data/DimSearchTerm.csv` to understand the data format
- Import SEARCH_TERM_MERGE_MAP from `data_processing.diagnosis_lookup` for asthma normalization
- Remember: drug fragments in CleanedDrugName are UPPERCASE substrings, not exact matches
### Blocked items:
- None
## Iteration 3 — 2026-02-06
### Task: Phase 1 — Task 1.2 (Build directorate card tree from DimSearchTerm.csv)
### Why this task:
- Phase 1 data access layer needs both query functions (done in 1.1) AND the card browser data (this task)
- Progress.txt from iteration 2 explicitly recommended this task
- The drawer component (Phase 4) depends on this data structure
### Status: COMPLETE
### What was done:
- Created `dash_app/data/card_browser.py` with two functions:
- `build_directorate_tree()` — reads DimSearchTerm.csv, applies SEARCH_TERM_MERGE_MAP, returns nested dict: {PrimaryDirectorate: {Search_Term: [drug_fragment, ...]}}
- `get_all_drugs()` — delegates to `load_initial_data()` from queries.py (no SQL duplication)
- SEARCH_TERM_MERGE_MAP correctly imported from `data_processing.diagnosis_lookup`
- Asthma variants ("allergic asthma", "severe persistent allergic asthma") merge into single "asthma" entry with combined/deduplicated drug fragments
- Tree uses sets during construction to deduplicate drug fragments from merged terms, then converts to sorted lists
### Validation results:
- Tier 1 (Code): `from dash_app.data.card_browser import build_directorate_tree` — OK
- Tier 1 (App starts): `from dash_app.app import app` — OK, layout type is MantineProvider
- Tier 3 (Functional):
- `build_directorate_tree()`: 19 directorates, correctly nested dict
- CARDIOLOGY has 10 indications, CLINICAL HAEMATOLOGY has many more
- Asthma merge verified: only "asthma" present (not "allergic asthma" or "severe persistent allergic asthma"), OMALIZUMAB included in merged drug list
- `get_all_drugs()`: 42 drugs from pathway_nodes level 3
### Files changed:
- `dash_app/data/card_browser.py` — NEW: directorate tree builder + drug list accessor
- `IMPLEMENTATION_PLAN.md` — Task 1.2 marked [x]
### Committed: 7654942 "feat: add directorate card tree builder for drug browser drawer (Task 1.2)"
### Patterns discovered:
- `get_all_drugs()` uses a lazy import (`from dash_app.data.queries import load_initial_data`) to avoid circular imports since both modules are in `dash_app/data/`
- Drug fragments in DimSearchTerm.csv are already UPPERCASE in the CSV, but `.upper()` is applied defensively
- "diabetes" appears twice in DimSearchTerm.csv (DIABETIC MEDICINE and OPHTHALMOLOGY) — this is handled naturally by the directorate grouping (same search_term under two different directorates)
### Next iteration should:
- Start Phase 2: Task 2.1 — Header + sidebar components
- Read `01_nhs_classic.html` to get the exact HTML structure, CSS classes, and SVG icons
- Create `dash_app/components/header.py` with `make_header()` function
- Create `dash_app/components/sidebar.py` with `make_sidebar()` function
- Use CSS classes from nhs.css (`.top-header`, `.sidebar`, etc.) — NOT inline styles
- Header needs placeholder `html.Span` with IDs for callback updates: `id="header-record-count"`, `id="header-last-updated"`
- Sidebar needs click targets for drawer open (Drug Selection, Indications items)
### Blocked items:
- None
## Iteration 4 — 2026-02-06
### Task: Phase 2 — Task 2.1 (Header + sidebar components)
### Why this task:
- Phases 0 and 1 complete. Phase 2 is the static layout phase, and 2.1 is the first task.
- Previous iteration (3) explicitly recommended this as the next task.
- Header and sidebar are foundational layout components — everything else renders inside this shell.
### Status: COMPLETE
### What was done:
- Created `dash_app/components/header.py` with `make_header()` function:
- Top bar with `.top-header` class, NHS logo badge, "HCD Analysis" title, breadcrumb
- Data freshness indicators: green status dot + `html.Span(id="header-record-count")` + `html.Span(id="header-last-updated")` for callback updates
- Structure matches `01_nhs_classic.html` lines 319-333 exactly
- Created `dash_app/components/sidebar.py` with `make_sidebar()` function:
- 7 navigation items across 2 sections (Analysis: 5 items, Reports: 2 items)
- SVG icons from the HTML concept, embedded via data URI `html.Img` elements (Dash lacks inline SVG support)
- Added `.sidebar__icon` CSS class matching `.sidebar__item svg` sizing (18x18px)
- "Drug Selection" (`id="sidebar-drug-selection"`) and "Indications" (`id="sidebar-indications"`) have IDs + `n_clicks=0` for drawer callbacks in Phase 4
- Footer: "NHS Norfolk & Waveney ICB / High Cost Drugs Programme"
- Updated `dash_app/app.py`:
- Imports and uses `make_header()` and `make_sidebar()`
- Layout: MantineProvider → [3 stores, Header, Nav(sidebar), Main(placeholder)]
- Main content area uses `html.Main(className="main")` — ready for KPIs/filter bar/chart card in Task 2.2
### Validation results:
- Tier 1 (Code): All imports OK — `from dash_app.components.header import make_header`, sidebar, app all pass
- Tier 1 (App starts): `python run_dash.py` → "Dash is running on http://127.0.0.1:8050/" — no errors
- Tier 2 (Layout): Programmatic check confirms structure: MantineProvider with 6 children (3 stores + Header(top-header) + Nav(sidebar) + Main(main))
### Files changed:
- `dash_app/components/header.py` — NEW: header component
- `dash_app/components/sidebar.py` — NEW: sidebar component with SVG icons
- `dash_app/app.py` — Updated to import and assemble header + sidebar
- `dash_app/assets/nhs.css` — Added `.sidebar__icon` rule
- `IMPLEMENTATION_PLAN.md` — Task 2.1 marked [x]
### Committed: bdc1690 "feat: add header and sidebar components for Dash layout (Task 2.1)"
### Patterns discovered:
- Dash doesn't support inline SVG elements natively (no `html.Svg` or similar). Workaround: embed SVG as data URI in `html.Img` elements using `urllib.parse.quote()`. Limitation: `stroke="currentColor"` doesn't inherit from parent CSS, so icons appear in their default SVG color (black) rather than inheriting the sidebar item color. This is cosmetic and acceptable for now.
- Dash serves a minimal HTML shell; all React components render client-side. Don't test for CSS classes in the HTTP response — use programmatic layout inspection instead.
- `html.A` elements in Dash accept `n_clicks=0` property, making them callback-compatible for click events (needed for sidebar drawer triggers).
### Next iteration should:
- Start Task 2.2 — Main content area: KPI row + filter bar + chart card
- Create three component files:
- `dash_app/components/kpi_row.py` with `make_kpi_row()` — 4 KPI cards with IDs for callback updates
- `dash_app/components/filter_bar.py` with `make_filter_bar()` — chart type toggle pills + date dropdowns
- `dash_app/components/chart_card.py` with `make_chart_card()` — chart area with tabs + `dcc.Graph(id="pathway-chart")`
- Read `01_nhs_classic.html` lines 377-516 for the exact structure
- Use CSS classes: `.kpi-row`, `.kpi-card`, `.filter-bar`, `.toggle-pill`, `.chart-card`, `.chart-tab`
- Filter bar needs: chart type toggle pills (By Directory / By Indication), Initiated dropdown, Last seen dropdown
- Chart card needs: title, dynamic subtitle (hierarchy label), tab row (Icicle active, Sankey/Timeline disabled), `dcc.Graph`
- KPI card IDs: `id="kpi-patients"`, `id="kpi-drugs"`, `id="kpi-cost"`, `id="kpi-match"`
- Toggle pill IDs: `id="chart-type-directory"`, `id="chart-type-indication"`
- Filter IDs: `id="filter-initiated"`, `id="filter-last-seen"`
### Blocked items:
- None