b71748fa7d
Extract load_data() and load_pathway_data() logic from Reflex AppState into standalone functions in src/data_processing/pathway_queries.py. Create thin dash_app/data/queries.py wrapper with DB_PATH resolution.
148 lines
11 KiB
Plaintext
148 lines
11 KiB
Plaintext
# Progress Log — Reflex → Dash Migration
|
||
|
||
## Project Context
|
||
|
||
Migrating the HCD Analysis frontend from Reflex to Dash (Plotly) + Dash Mantine Components. Pipeline/analysis logic in `src/` is untouched, but shared utilities (data queries, figure construction) should be added TO `src/` so Dash callbacks call into them rather than duplicating code.
|
||
|
||
**Previous state**: Fully working Reflex app with pre-computed pathway architecture (SQLite), dual chart types (directory + indication), drug-aware indication matching. All pipeline work is done.
|
||
|
||
**New goal**: Replace Reflex with Dash for better control over layout, CSS, and component behavior. Add a dmc.Drawer-based "card browser" for drug/indication selection organized by clinical directorate.
|
||
|
||
## Key Data Patterns
|
||
|
||
### SQLite pathway_nodes table
|
||
- ~3,600 rows across 12 datasets (6 date filters × 2 chart types)
|
||
- Key columns: `parents, ids, labels, level, value, cost, costpp, cost_pp_pa, colour, first_seen, last_seen, first_seen_parent, last_seen_parent, average_spacing, trust_name, directory, drug_sequence, chart_type, date_filter_id`
|
||
- Level 0 = Root, Level 1 = Trust, Level 2 = Directory/Indication, Level 3 = Drug, Level 4+ = Pathway
|
||
- `chart_type`: "directory" or "indication"
|
||
- `date_filter_id`: "all_6mo" (default), "all_12mo", "1yr_6mo", "1yr_12mo", "2yr_6mo", "2yr_12mo"
|
||
- UNIQUE constraint: (date_filter_id, chart_type, ids)
|
||
|
||
### DimSearchTerm.csv (for card browser)
|
||
- Located at `data/DimSearchTerm.csv`
|
||
- Columns: Search_Term, CleanedDrugName (pipe-separated drug fragments), PrimaryDirectorate
|
||
- ~165 rows; some Search_Terms appear twice (e.g., "diabetes" under DIABETIC MEDICINE and OPHTHALMOLOGY)
|
||
- Drug fragments are UPPERCASE substrings matched against standardized drug names
|
||
- SEARCH_TERM_MERGE_MAP in `src/data_processing/diagnosis_lookup.py` merges asthma variants: {"allergic asthma": "asthma", "severe persistent allergic asthma": "asthma"}
|
||
|
||
### Data loading logic to extract
|
||
- `pathways_app/pathways_app.py` lines 407-488: `load_data()` — loads available drugs, directorates, indications, total records, last updated from SQLite
|
||
- `pathways_app/pathways_app.py` lines 490-642: `load_pathway_data()` — queries pathway_nodes with date_filter_id + chart_type + optional drug/directory filters
|
||
- `pathways_app/pathways_app.py` lines 769-920: `icicle_figure` — builds go.Icicle with 10-field customdata, NHS colorscale, texttemplate, hovertemplate
|
||
|
||
### CSS from 01_nhs_classic.html
|
||
- Lines 8-314 contain the full CSS (copy to dash_app/assets/nhs.css)
|
||
- Google Fonts: `Source Sans 3` weights 300,400,600,700,900
|
||
- CSS variables: `--nhs-blue: #005EB8`, `--nhs-dark-blue: #003087`, `--nhs-light-blue: #41B6E6`, etc.
|
||
- Key classes: `.top-header`, `.sidebar`, `.main`, `.kpi-row`, `.kpi-card`, `.filter-bar`, `.toggle-pill`, `.chart-card`, `.chart-tab`, `.page-footer`
|
||
- Remove `.icicle`, `.icicle__row`, `.icicle__cell`, `.lvl-*` classes — those are mock chart CSS, Plotly handles the real chart
|
||
|
||
### Dash-specific patterns
|
||
- State via `dcc.Store`: 3 stores (app-state, chart-data, reference-data)
|
||
- Callbacks: unidirectional flow (filter change → app-state → chart-data → UI components)
|
||
- DMC components: `dmc.MantineProvider` wraps everything, `dmc.Drawer` for card browser
|
||
- Pattern-matching callbacks: `{"type": "drug-chip", "index": drug_name}` for dynamic drug chip selection
|
||
- Assets auto-served from `dash_app/assets/` directory
|
||
|
||
### Database path from dash_app/
|
||
- From `dash_app/data/queries.py`: `Path(__file__).resolve().parents[2] / "data" / "pathways.db"`
|
||
- From `dash_app/data/card_browser.py`: same pattern for `data/DimSearchTerm.csv`
|
||
|
||
### Existing src/ code to build on (not duplicate)
|
||
- `src/visualization/plotly_generator.py` already has `create_icicle_figure(ice_df, title)` that takes a DataFrame with columns like `"First seen"`, `"Last seen (Parent)"` (with spaces). The Reflex AppState `icicle_figure` (pathways_app.py:769) takes list-of-dicts with keys like `first_seen`, `last_seen_parent` (underscores). For Dash, add a NEW function `create_icicle_from_nodes(nodes, title)` that accepts list-of-dicts. Don't modify the existing DataFrame-based function.
|
||
- `src/data_processing/database.py` has `DatabaseManager` class. Add standalone query functions here (or a new `pathway_queries.py`) so Dash and Reflex share the same SQL.
|
||
- The existing `create_icicle_figure` uses Viridis colorscale; the Reflex version uses NHS blue gradient. The Dash version should use the NHS blue gradient from pathways_app.py.
|
||
|
||
### Architecture decision: shared code in src/
|
||
- Pipeline/analysis logic is OFF LIMITS: pathway_pipeline.py, transforms.py, diagnosis_lookup.py (matching), pathway_analyzer.py, statistics.py, refresh_pathways.py
|
||
- Shared utilities are ENCOURAGED to add to src/: plotly_generator.py (new dict-based function), database.py (query functions)
|
||
- dash_app/data/queries.py should be a thin wrapper that calls into src/
|
||
|
||
## Iteration Log
|
||
|
||
## Iteration 1 — 2026-02-06
|
||
### Task: Phase 0 — Tasks 0.1 + 0.2 (Project Scaffolding + CSS Extraction)
|
||
### Why this task:
|
||
- This is the first iteration. Phase 0 scaffolding is the foundation everything else depends on.
|
||
- Tasks 0.1 and 0.2 are tightly coupled (CSS needs the assets dir from 0.1), so both done together.
|
||
### Status: COMPLETE
|
||
### What was done:
|
||
- Created `dash_app/` directory with subdirectories: `assets/`, `data/`, `components/`, `callbacks/`, `utils/`
|
||
- Created `__init__.py` in all packages
|
||
- Created `run_dash.py` entry point at project root
|
||
- Updated `pyproject.toml` with `dash>=2.14.0` and `dash-mantine-components>=0.14.0`
|
||
- Ran `uv sync` — installed Dash 4.0.0 and DMC 2.5.1 (newer than plan expected)
|
||
- Created `dash_app/app.py` with `MantineProvider` wrapper, 3 `dcc.Store` components (app-state, chart-data, reference-data), and placeholder layout
|
||
- Extracted CSS from `01_nhs_classic.html` into `dash_app/assets/nhs.css` (7.5KB)
|
||
- Added Google Fonts `@import` for Source Sans 3
|
||
- Removed mock icicle chart CSS (`.icicle`, `.icicle__row`, `.icicle__cell`, `.lvl-*`)
|
||
- Kept all real component CSS: header, sidebar, KPI, filter bar, chart card, footer, responsive
|
||
### Validation results:
|
||
- Tier 1 (Code): `python -c "from dash_app.app import app"` — OK, layout type is MantineProvider
|
||
- Tier 1 (App starts): `python run_dash.py` — serves at http://127.0.0.1:8050/
|
||
- CSS: nhs.css (7,491 bytes) found in Dash assets folder, auto-served
|
||
### Files changed:
|
||
- `pyproject.toml` — added dash + dmc dependencies
|
||
- `run_dash.py` — new entry point
|
||
- `dash_app/__init__.py` — new package
|
||
- `dash_app/app.py` — new app with MantineProvider + stores + placeholder
|
||
- `dash_app/assets/nhs.css` — extracted from 01_nhs_classic.html
|
||
- `dash_app/data/__init__.py` — empty package
|
||
- `dash_app/components/__init__.py` — empty package
|
||
- `dash_app/callbacks/__init__.py` — empty package
|
||
- `dash_app/utils/__init__.py` — empty package
|
||
### Committed: 1c3ece6 "feat: create dash_app skeleton with nhs.css and MantineProvider (Phase 0)"
|
||
### Patterns discovered:
|
||
- Dash 4.0.0 and DMC 2.5.1 installed (plan said 2.x and 0.14.x). The API is compatible — `MantineProvider`, `Drawer`, `Accordion`, `Chip` all available. DMC 2.x is based on Mantine v7.
|
||
- `dmc.MantineProvider(children=[...])` works for wrapping the layout in both DMC versions.
|
||
- `suppress_callback_exceptions=True` needed in Dash app since callbacks reference IDs not yet in layout during phased development.
|
||
### Next iteration should:
|
||
- Start Phase 1: Task 1.1 — Create shared data access functions
|
||
- Read `pathways_app/pathways_app.py` lines 407-642 to extract `load_data()` and `load_pathway_data()` logic
|
||
- Create shared functions in `src/data_processing/pathway_queries.py` (database.py is already used for DatabaseManager)
|
||
- Create thin wrapper `dash_app/data/queries.py` that calls the shared functions
|
||
- Verify with `python -c "from dash_app.data.queries import load_initial_data; print(load_initial_data())"`
|
||
### Blocked items:
|
||
- None
|
||
|
||
## Iteration 2 — 2026-02-06
|
||
### Task: Phase 1 — Task 1.1 (Create shared data access functions)
|
||
### Why this task:
|
||
- Phase 0 complete in iteration 1; Phase 1 is next in dependency order
|
||
- progress.txt from iteration 1 explicitly recommended this task
|
||
- All UI components and callbacks (Phases 2-5) depend on having data access working
|
||
### Status: COMPLETE
|
||
### What was done:
|
||
- Created `src/data_processing/pathway_queries.py` with two shared functions:
|
||
- `load_initial_data(db_path)` — returns available drugs (42), directorates (14), indications (32), total_records, last_updated
|
||
- `load_pathway_nodes(db_path, filter_id, chart_type, selected_drugs, selected_directorates)` — returns nodes list, unique_patients, total_drugs, total_cost, last_updated
|
||
- Both functions extracted directly from AppState methods in pathways_app.py (lines 407-642), with Reflex `self.*` references replaced by function parameters
|
||
- All return values are plain dicts/lists — JSON-serializable for dcc.Store
|
||
- Created thin wrapper `dash_app/data/queries.py` that resolves DB_PATH and delegates to shared functions
|
||
- Used separate file (pathway_queries.py) rather than adding to database.py because database.py is connection management (240 lines), queries are a distinct concern
|
||
### Validation results:
|
||
- Tier 1 (Code): `python -c "from dash_app.data.queries import load_initial_data"` — OK (requires uv run for .pth file)
|
||
- Tier 1 (App starts): `from dash_app.app import app` — OK, layout type is MantineProvider
|
||
- Tier 3 (Functional):
|
||
- `load_initial_data()`: 42 drugs, 14 directorates, 32 indications, last_updated=2026-02-06T00:08:55
|
||
- `load_pathway_data("all_6mo", "directory")`: 293 nodes, 11,118 patients, 39 drugs, £130.5M cost
|
||
- `load_pathway_data("all_6mo", "indication")`: 438 nodes, 11,252 patients
|
||
- `load_pathway_data("all_6mo", "directory", selected_drugs=["ADALIMUMAB"])`: 70 nodes (drug filter works)
|
||
### Files changed:
|
||
- `src/data_processing/pathway_queries.py` — NEW: shared query functions
|
||
- `dash_app/data/queries.py` — NEW: thin Dash wrapper with DB_PATH resolution
|
||
- `IMPLEMENTATION_PLAN.md` — Task 1.1 marked [x]
|
||
### Committed: (pending)
|
||
### Patterns discovered:
|
||
- `src/` is on sys.path only when using `uv run` (via .pth file created by setup_dev.py). Running `python` directly won't find `data_processing` module. Always use `uv run python` for testing.
|
||
- `total_records` from `pathway_refresh_log` returns 0 — the refresh log's `source_row_count` field appears empty despite `completed_at` having a value. This is cosmetic — the KPI can use `unique_patients` from chart-data instead.
|
||
- Drug filtering correctly includes nodes with NULL drug_sequence (root, trust, directory levels) alongside matching drug nodes. Root node patient count becomes 0 when drug filter is active — this matches Reflex behavior.
|
||
### Next iteration should:
|
||
- Start Task 1.2 — Build directorate card tree from DimSearchTerm.csv
|
||
- Create `dash_app/data/card_browser.py` with `build_directorate_tree()` and `get_all_drugs()`
|
||
- Read `data/DimSearchTerm.csv` to understand the data format
|
||
- Import SEARCH_TERM_MERGE_MAP from `data_processing.diagnosis_lookup` for asthma normalization
|
||
- Remember: drug fragments in CleanedDrugName are UPPERCASE substrings, not exact matches
|
||
### Blocked items:
|
||
- None
|