Files
HighCostDrugsDemo/AdditionalAnalytics.md
T
Andrew Charlwood a943bee8f2 chore: cleanup archive, update ralph loop files, add analytics spec
- Remove old iteration logs and deprecated files from archive/can_delete/
- Update RALPH_PROMPT.md and guardrails.md for Phase 10+ work
- Update ralph.ps1 banner text
- Add AdditionalAnalytics.md chart specification
- Add run.bat convenience script
2026-02-07 02:10:03 +00:00

155 lines
8.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Additional Analytics Charts — Implementation Plan
## UI Approach: Tabbed Chart Area
Extend existing `chart_card.py` tab bar. Currently has Icicle (active), Sankey (disabled), Timeline (disabled). Replace/extend with new tabs.
## Charts to Build (Priority Order)
### Tab 1: Icicle (existing — no change)
### Tab 2: First-Line Market Share — Horizontal Bar Chart
**What**: % of patients starting on each first-line drug, grouped by directorate or indication
**Data source**: `pathway_nodes WHERE level = 3` (drug level). The `colour` column already holds proportion of parent. `value` = patient count.
**Query**: Filter by `chart_type`, `date_filter_id`, optionally `directory` or `trust_name`. Group by `directory`, then show drugs as bars.
**Viz**: Horizontal grouped bar chart. One cluster per directorate/indication (top N), bars within = drugs, length = % of patients. Sorted by total patients desc. NHS blue palette.
**Interaction**: Responds to all existing filters (date, chart type, trust, drug, directorate). Clicking a directorate cluster could filter the icicle.
### Tab 3: Pathway Cost Effectiveness — Lollipop/Dot Plot
**What**: Compare annualized cost per patient across complete treatment pathways within a directorate/indication. Highlights most vs least cost-effective pathways.
**Data source**: `pathway_nodes WHERE level >= 4` (pathway nodes). Fields: `cost_pp_pa` (annualized), `value` (patient count), `ids` (parse to get pathway sequence), `directory`.
**Calculation**: `cost_pp_pa` is already computed as `(total_cost / patients) * (365 / avg_days)` — this IS the "total cost over N years / N years" the user described.
**Query**: Filter to a specific directorate/indication, then show all pathway variants ranked by `cost_pp_pa`.
**Viz**: Horizontal lollipop chart (dot on stick). Y-axis = pathway label (e.g., "Adalimumab → Secukinumab → Rituximab"), X-axis = £ per patient per annum. Dot size = patient count. Colour gradient: green (cheap) → amber → red (expensive).
**Interaction**: Directorate/indication selector drives which pathways are shown. Could also compare across directorates at the drug level (level 3).
**Bonus metric — "Pathway Retention" (fewest switches)**:
- For each 2nd-line pathway (e.g., "Drug A → Drug B"), calculate what % of patients escalate to a 3rd line
- Derivation: `value("Drug A → Drug B") - SUM(value("Drug A → Drug B → *"))` = patients who stayed on 2nd line
- Show as a secondary annotation or companion chart: "Drug B retains 72% of patients (no 3rd-line switch needed)"
- This identifies the most effective 2nd-line choices
### Tab 4: Cost Waterfall — Waterfall Chart
**What**: Break down £ per patient per annum by directorate, showing relative cost contribution
**Data source**: `pathway_nodes WHERE level = 2` (directorate/indication level). Field: `cost_pp_pa`, `value`.
**Viz**: Plotly waterfall chart. Each bar = one directorate's average cost_pp_pa. Sorted highest to lowest. Running reference line optional. Use NHS colours.
**Note**: User specifically wants cost_pp_pa (annualized), not total cost.
**Interaction**: Responds to chart_type toggle, date filter, trust filter.
### Tab 5: Drug Switching Sankey — Sankey Diagram
**What**: Flow of patients from 1st-line → 2nd-line → 3rd-line drugs
**Data source**: `pathway_nodes WHERE level >= 3`. Parse `ids` to extract drug transition sequences.
**Parsing**: `ids` format at level 4+: `"TRUST - DIRECTORY - DRUG_A - DRUG_A|DRUG_B"`. Split by " - ", take segments from level 3 onwards, split by "|" to get ordered drug list.
**Viz**: Plotly Sankey. Left nodes = 1st-line drugs, middle = 2nd-line, right = 3rd-line. Link width = patient count. Colour by drug or by directorate.
**Interaction**: Filter by directorate/indication to see switching within a specialty. Filter by trust to compare switching patterns.
### Tab 6: Dosing Interval Comparison — Grouped Bar Chart
**What**: Compare average dosing frequency/weekly interval for a drug across trusts or directorates
**Data source**: Level 3+ nodes, `average_spacing` (HTML string), `average_administered` (JSON array)
**Parsing needed**:
- `average_spacing`: regex to extract weekly interval number from `"given X times with Y weekly interval"`
- `average_administered`: `json.loads()` to get dose counts
**Viz**: Horizontal grouped bars. Y-axis = trust or directorate, X-axis = weekly interval (or total administrations). One colour per drug if comparing multiple.
**Interaction**: Drug selector to pick which drug to compare. Group-by selector (trust vs directorate).
### Tab 7: Directorate × Drug Heatmap
**What**: Matrix showing which drugs are used in which directorates, cells coloured by patient count or cost_pp_pa
**Data source**: Level 3 nodes, pivot `directory` × drug (parsed from `labels` or `ids`)
**Viz**: Plotly heatmap. Rows = directorates (sorted by total patients), columns = drugs (sorted by frequency). Cell colour = patient count or cost. Hover shows details.
**Interaction**: Toggle between patient count / cost / cost_pp_pa colouring.
### Tab 8: Treatment Duration Bars
**What**: Compare average treatment durations across drugs within a directorate
**Data source**: Level 3 nodes, `avg_days` field
**Viz**: Horizontal bar chart. Y-axis = drug, X-axis = average days. Colour intensity by patient count.
**Interaction**: Directorate filter drives which drugs are shown.
---
## Data Layer Changes
### New query functions needed (in `src/data_processing/pathway_queries.py`):
```python
def get_drug_market_share(db_path, date_filter_id, chart_type, directory=None, trust=None):
"""Level 3 nodes grouped by directory, returning drug, value, colour."""
def get_pathway_costs(db_path, date_filter_id, chart_type, directory=None):
"""Level 4+ nodes with cost_pp_pa, parsed pathway labels, patient counts."""
def get_cost_waterfall(db_path, date_filter_id, chart_type, trust=None):
"""Level 2 nodes with cost_pp_pa per directorate/indication."""
def get_drug_transitions(db_path, date_filter_id, chart_type, directory=None):
"""Level 3+ nodes parsed into source→target drug transitions with patient counts."""
def get_dosing_intervals(db_path, date_filter_id, chart_type, drug=None):
"""Level 3 nodes for a specific drug, parsed average_spacing by trust/directory."""
def get_drug_directory_matrix(db_path, date_filter_id, chart_type):
"""Level 3 nodes pivoted as directory × drug with value/cost metrics."""
def get_treatment_durations(db_path, date_filter_id, chart_type, directory=None):
"""Level 3 nodes with avg_days by drug within a directorate."""
```
### Parsing utilities needed:
```python
def parse_average_spacing(spacing_html: str) -> dict:
"""Extract drug_name, dose_count, weekly_interval, total_weeks from HTML string."""
def parse_pathway_drugs(ids: str, level: int) -> list[str]:
"""Extract ordered drug list from ids column at level 4+."""
def calculate_retention_rate(nodes: list[dict]) -> dict:
"""For each N-drug pathway, calculate % not escalating to N+1 drugs."""
```
---
## Callback Architecture
Each tab gets its own callback triggered by `chart-data` store + `active-tab` state:
```
active-tab change → render selected chart
chart-data change → re-render active chart
```
Only the active tab's chart is computed (lazy rendering). Store the active tab in `app-state`.
New callback per chart type in `dash_app/callbacks/`:
- `market_share.py` — builds bar chart from level 3 data
- `pathway_costs.py` — builds lollipop + retention annotations
- `cost_waterfall.py` — builds waterfall from level 2 data
- `sankey.py` — builds Sankey from parsed transitions
- `dosing.py` — builds grouped bars from parsed spacing
- `heatmap.py` — builds heatmap from pivoted matrix
- `duration.py` — builds bar chart from avg_days
---
## Implementation Order
1. **Data parsing utilities** — shared parsing for spacing, pathway drugs, retention
2. **Query functions** — one per chart type in pathway_queries.py
3. **Tab infrastructure** — extend chart_card.py with all tab labels, lazy rendering
4. **Charts one at a time** (in priority order):
- First-Line Market Share (simplest, validates the tab pattern)
- Pathway Cost Effectiveness + Retention (user's key insight)
- Cost Waterfall
- Drug Switching Sankey
- Dosing Interval
- Heatmap
- Treatment Duration
---
## Verification
- Run `python run_dash.py` after each chart addition
- Verify each chart responds to filter changes (date, chart type, trust, directorate, drug)
- Test with both "directory" and "indication" chart types
- Verify icicle chart still works correctly (no regressions)
- Check tab switching is smooth with no unnecessary recomputation