docs: update progress.txt with iteration 24 (Task 9.2 complete — query functions)

This commit is contained in:
Andrew Charlwood
2026-02-06 19:22:23 +00:00
parent d98cd4fd69
commit 4375d22022
+45
View File
@@ -1341,3 +1341,48 @@ Console error: `WARN: Multiple implied roots, cannot build icicle hierarchy of t
- Test each query with `python -c "..."` against real data - Test each query with `python -c "..."` against real data
### Blocked items: ### Blocked items:
- None - None
## Iteration 24 — 2026-02-06
### Task: Phase 9 — Task 9.2 (Query functions for all 7 chart types)
### Why this task:
- Task 9.1 (parsing + tab infrastructure) complete in iteration 23
- Progress.txt explicitly recommended this task next
- All 7 chart implementations (9.39.9) depend on these query functions
### Status: COMPLETE
### What was done:
- **Added 7 query functions to `src/data_processing/pathway_queries.py`**:
1. `get_drug_market_share()` — Level 3 nodes aggregated across trusts by directory+drug, with computed proportions. 63 rows for directory chart, 111 for indication.
2. `get_pathway_costs()` — Level 4+ nodes with pathway labels (drug_sequence joined with →). 38 pathways for RHEUMATOLOGY.
3. `get_cost_waterfall()` — Aggregates level 3 cost/patients by directory (level 2 cost_pp_pa is always "N/A"). 12 directorates sorted by cost_pp desc.
4. `get_drug_transitions()` — Parses drug_sequence into source→target transitions with ordinal line labels (e.g., "ADALIMUMAB (1st)" → "ETANERCEPT (2nd)"). Returns {nodes, links} for Sankey. 49 nodes, 65 links.
5. `get_dosing_intervals()` — Uses `parse_average_spacing()` to extract weekly_interval/dose_count/total_weeks. 124 rows for all drugs.
6. `get_drug_directory_matrix()` — Pivots level 3 into directory × drug matrix with patients/cost/cost_pp_pa. 12×39 matrix, 63 non-empty cells.
7. `get_treatment_durations()` — Weighted avg of avg_days by patients across trusts. 59 entries unfiltered.
- **Added helpers**: `_safe_float()` for None/N/A handling, `_ordinal()` for Sankey node labels
- **Added 7 thin wrappers** to `dash_app/data/queries.py` (resolve DB_PATH, delegate to src/)
### Validation results:
- Tier 1 (Code): `from dash_app.app import app` — OK, 11 callbacks registered
- Tier 1 (App starts): `python run_dash.py` → "Dash is running on http://127.0.0.1:8050/" — no errors
- Tier 3 (Functional): All 7 queries tested with real data, both chart types
### Files changed:
- `src/data_processing/pathway_queries.py` — Added: 7 query functions + 2 helpers
- `dash_app/data/queries.py` — Added: 7 thin wrapper functions
- `IMPLEMENTATION_PLAN.md` — Task 9.2 marked [x]
### Committed: d98cd4f "feat: add 7 analytics chart query functions (Task 9.2)"
### Patterns discovered:
- `cost_pp_pa` at level 2 is always "N/A". Waterfall must compute from level 3 aggregation.
- `cost_pp_pa` at level 3 is a string (including "N/A"). Use `_safe_float()`.
- Sankey ordinal suffix ("1st", "2nd") prevents self-loops for same-drug transitions.
- Treatment duration uses weighted average for cross-trust aggregation.
- All queries work seamlessly with both "directory" and "indication" chart types.
### Next iteration should:
- Start Task 9.3 — First-Line Market Share chart (Tab 2)
- Sub-steps:
1. Create figure function in `src/visualization/` — `create_market_share_figure(data)` for horizontal grouped bar chart
2. Wire into `update_chart` in `dash_app/callbacks/chart.py` — dispatch on active_tab="market-share"
3. The query `get_drug_market_share()` returns [{directory, drug, patients, proportion, cost, cost_pp_pa}] sorted by total desc
4. Use NHS blue palette, one cluster per directorate, drugs as bars within
- Read `dash_app/callbacks/chart.py` to understand the tab dispatch pattern
- Read `src/visualization/plotly_generator.py` to see existing figure function pattern
### Blocked items:
- None