Files
HighCostDrugsDemo/progress.txt
T

706 lines
65 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Progress Log — Dashboard Visualization Improvements
## Project Context
Working Dash application with 2 views (Patient Pathways + Trust Comparison), 13 chart functions in `plotly_generator.py`, and a complete callback chain. Now improving chart quality: bug fixes, visual polish, and new analytics.
**Current state**: Fully functional Dash app at http://localhost:8050 with icicle, Sankey, market share, cost effectiveness, cost waterfall, dosing, heatmap, and duration charts. Trust Comparison has 6 dedicated charts. All filters work.
**New goal**: Fix chart bugs (heatmap colorscale, legend overflow, trust color differentiation), add visual polish (consistent styling, smooth gradients), add new analytics (retention funnel, pathway depth, scatter, network), and new backend analytics (trends, dose distribution, timeline, NICE compliance).
## Key Architecture Patterns
### plotly_generator.py (PRIMARY target file)
- 13 chart functions, all accept list-of-dicts, return `go.Figure`
- Located at `src/visualization/plotly_generator.py` (~1782 lines)
- Key functions and approximate line numbers:
- `create_icicle_from_nodes(nodes, title)` — L113
- `create_market_share_figure(data, title)` — L247
- `create_cost_effectiveness_figure(data, retention, title)` — L384
- `create_cost_waterfall_figure(data, title)` — L562
- `create_sankey_figure(data, title)` — L706
- `create_dosing_figure(data, title, group_by)` — L837
- `_dosing_by_drug(data, colours)` — L926
- `_dosing_by_trust(data, colours)` — L1007
- `create_heatmap_figure(data, title, metric)` — L1189
- `create_duration_figure(data, title, show_directory)` — L1329
- `create_trust_market_share_figure(data, title)` — L1481
- `create_trust_heatmap_figure(data, title, metric)` — L1582
- `create_trust_duration_figure(data, title)` — L1689
- NOTE: Line numbers will shift as you edit. Re-read the file each iteration.
### Callback chain
- `dash_app/callbacks/chart.py` — Patient Pathways tab dispatch (`_render_*` helpers → `update_chart`)
- `dash_app/callbacks/trust_comparison.py` — 6 Trust Comparison chart callbacks
- Tab switching: `active-tab` dcc.Store, tab IDs = `"tab-{short_id}"`
- TAB_DEFINITIONS in `chart_card.py` — currently: icicle, sankey
### Adding a new Patient Pathways tab
1. Query function in `src/data_processing/pathway_queries.py` (accept `db_path` param)
2. Thin wrapper in `dash_app/data/queries.py` (resolve DB_PATH)
3. Figure function in `src/visualization/plotly_generator.py`
4. Add to `TAB_DEFINITIONS` in `dash_app/components/chart_card.py`
5. Add `_render_*()` helper in `dash_app/callbacks/chart.py`
6. Add elif case in `update_chart()` dispatch
### State management
- 4 `dcc.Store` components: app-state, chart-data, reference-data, active-tab
- Unidirectional: filter inputs → app-state → chart-data → UI
- 20 registered callbacks total
### DMC version
- Dash 4.0.0 + DMC 2.5.1 (Mantine v7 based)
- `dmc.MantineProvider` wraps layout
- `dmc.SegmentedControl` available for metric toggles
### Flex chain for chart filling viewport
- Full flex chain: `.main` → `#view-container` → `#patient-pathways-view` → `.chart-card` → loading wrapper → `#chart-container` → `#pathway-chart`
- `responsive=True` on dcc.Graph + `autosize=True` in figure layout
- `dcc.Loading` wraps children in `.dash-loading-callback > div` — CSS must propagate flex through both
### Known heatmap bugs (to fix)
- Non-linear colorscale compresses 99% of range into identical blues
- No cell text — must hover every cell
- Light end (#F0F4F8) invisible against transparent background
- Fixed width can overflow container
- Fixed l=200 left margin wastes space
### Known legend bugs (to fix)
- Horizontal legends at y=-0.15 with fixed bottom margins overflow with 42 drugs
- Affects: market_share, trust_market_share, dosing, trust_duration
### Known color bugs (to fix)
- First 6 of 10 trust palette colors are blue variants — nearly indistinguishable
- _dosing_by_drug interpolates from one blue to another blue
## Iteration Log
## Iteration 1 — 2026-02-07
### Task: A.1 — Extract shared styling constants + `_base_layout()` helper
### Why this task:
- A.1 is the foundation for all subsequent Phase A tasks (A.2-A.4 all reference `_base_layout()` and the palette constants). Must be done first.
### Status: COMPLETE
### What was done:
- Added 7 module-level constants after `logger` line: `CHART_FONT_FAMILY`, `CHART_TITLE_SIZE`, `CHART_TITLE_COLOR`, `GRID_COLOR`, `ANNOTATION_COLOR`, `TRUST_PALETTE` (7 colors), `DRUG_PALETTE` (15 colors)
- Created `_base_layout(title, **overrides)` helper returning dict with: title (font family/size/color, centered), hoverlabel (white bg, #CBD5E1 border, font), paper/plot bgcolor transparent, autosize=True, base font family
- Applied `_base_layout()` to `create_icicle_from_nodes()` — replaced 20-line explicit layout block with `_base_layout()` call + 3 overrides (margin, hoverlabel size=14, clickmode)
- Also replaced hardcoded `"Source Sans 3, system-ui, sans-serif"` in icicle textfont with `CHART_FONT_FAMILY` constant
### Validation results:
- Tier 1 (Code): all imports pass, `python run_dash.py` starts cleanly
- Tier 2 (Visual): icicle figure title correct ("Patient Pathways — By Directory"), font family/size/color all match expected values
### Files changed:
- `src/visualization/plotly_generator.py` — added constants + `_base_layout()` + refactored icicle layout
- `IMPLEMENTATION_PLAN.md` — marked A.1 subtasks [x]
### Committed: 63c1801 "feat: add shared styling constants and _base_layout() helper (Task A.1)"
### Patterns discovered:
- The `_base_layout()` returns a plain dict that gets unpacked via `fig.update_layout(**layout)`. Callers pass chart-specific overrides as kwargs.
- Icicle hoverlabel uses `size=14` (slightly larger than base `13`) — preserved as override.
- Constants are at module level, so all functions in the file can reference them directly.
- Line numbers in IMPLEMENTATION_PLAN.md are now stale (shifted ~70 lines due to constants/helper insertion). Future iterations should search by function name.
### Next iteration should:
- Start with Task A.2: Fix heatmap colorscale + cell annotations. Read `create_heatmap_figure()` and `create_trust_heatmap_figure()` functions by searching for their names (line numbers have shifted).
- The `_base_layout()` and palette constants are now available — A.2 should use `_base_layout()` for both heatmap functions.
- Key heatmap fixes: linear 5-stop colorscale, cell text annotations, zmin=0, autosize, automargin, subtitle when >25 drugs.
### Blocked items:
- None
## Iteration 2 — 2026-02-07
### Task: A.2 — Fix heatmap colorscale + cell annotations
### Why this task:
- A.2 is the next Phase A task after A.1. It depends on `_base_layout()` (done in A.1). It addresses the most-reported heatmap bugs: non-linear colorscale, missing cell text, fixed width overflow, fixed left margin.
### Status: COMPLETE
### What was done:
- **create_heatmap_figure()**: Replaced non-linear 7-stop colorscale with linear 5-stop `[0.0 #E3F2FD, 0.25 #90CAF9, 0.5 #42A5F5, 0.75 #1E88E5, 1.0 #003087]`. Added `text=text_values, texttemplate="%{text}"` with metric-specific formatting (patients: `N`, cost: `£Nk`, cost_pp_pa: `£N`). Set `zmin=0`. Removed fixed `width`, using `autosize=True` from `_base_layout()`. Replaced `l=200` with `l=8` + `yaxis automargin=True`. Added subtitle annotation when 25-drug cap is hit. Reduced `xgap/ygap` to 1 when >15 columns.
- **create_trust_heatmap_figure()**: Applied identical fixes. Also fixed Unicode escapes (`\u00a3`, `\u00d7`, `\u2014`) back to literal characters.
- Both functions now use `_base_layout(chart_title)` for consistent styling — title color is now `#1E293B` (CHART_TITLE_COLOR) instead of old hardcoded `#003087`, title size is 18 (CHART_TITLE_SIZE) for both (trust heatmap was 16 before).
### Validation results:
- Tier 1 (Code): imports pass, `uv run python -c "from dash_app.app import app"` OK, `uv run python run_dash.py` starts cleanly
- Tier 2 (Visual): Functional test confirms: zmin=0, text present, texttemplate="%{text}", 5-stop colorscale, autosize=True, yaxis automargin=True, title color #1E293B
### Files changed:
- `src/visualization/plotly_generator.py` — both heatmap functions rewritten
- `IMPLEMENTATION_PLAN.md` — marked A.2 subtasks [x]
### Committed: 56ca11e "fix: heatmap linear colorscale, cell annotations, autosize (Task A.2)"
### Patterns discovered:
- The `_base_layout()` helper makes chart function updates significantly easier — just `layout = _base_layout(title)` + `layout.update(...)` + `fig.update_layout(**layout)`. No need to repeat font/bgcolor/autosize.
- `textfont=dict(size=10)` on the heatmap trace controls cell annotation size — needs to be small enough not to overflow cells.
- Cost formatting: `£Nk` for cost ≥ 1000, `£N` for small values. This avoids `£125,000` flooding cells.
### Next iteration should:
- Start with Task A.3: Fix legend overflow in 4 charts. This requires creating `_smart_legend(n_items)` helper and applying it + `_base_layout()` to: `create_market_share_figure()`, `create_trust_market_share_figure()`, `create_dosing_figure()`, `create_trust_duration_figure()`.
- Search for these function names by `def create_market_share_figure` etc. — line numbers have shifted again.
- The `_smart_legend()` helper should: >15 items → vertical right legend; ≤15 → horizontal with dynamic bottom margin.
### Blocked items:
- None
## Iteration 3 — 2026-02-07
### Task: A.3 — Fix legend overflow in 4 charts
### Why this task:
- A.3 is the next Phase A task after A.1 and A.2. It has no blockers and was explicitly recommended by Iteration 2. It addresses the legend overflow bug reported for charts with 42 drugs.
### Status: COMPLETE
### What was done:
- **Created `_smart_legend(n_items, legend_title)` helper**: Returns legend dict with adaptive positioning. >15 items → vertical right legend (orientation="v", x=1.02, xanchor="left"). ≤15 items → horizontal below chart (orientation="h", y=-0.12, xanchor="center").
- **Created `_smart_legend_margin(n_items)` helper**: Returns margin dict. >15 items → r=140, b=40. ≤15 items → dynamic b based on estimated row count (~6 items per row), r=24.
- **`create_market_share_figure()`**: Replaced local `nhs_colours` with `DRUG_PALETTE`. Replaced manual layout block with `_base_layout()` + `_smart_legend()`. Replaced hardcoded `GRID_COLOR` reference.
- **`create_trust_market_share_figure()`**: Same treatment. Also replaced Unicode escapes (`\u00a3`, `\u2014`) with literal characters (`£`, `—`).
- **`create_dosing_figure()`**: Replaced local `nhs_colours` with `DRUG_PALETTE`. Legend adapts to trace count using `sum(1 for t in fig.data if t.showlegend is not False)`. Uses `_base_layout()`.
- **`create_trust_duration_figure()`**: Replaced local `nhs_colours` with `TRUST_PALETTE` (7 maximally-distinct colors — also satisfies A.4 subtask). Fixed `l=200` → `l=8` + automargin. Uses `_base_layout()`.
- **Note**: This also completed two A.4 subtasks: `create_trust_duration_figure()` now uses `TRUST_PALETTE`, and `create_trust_market_share_figure()` now uses `DRUG_PALETTE`.
### Validation results:
- Tier 1 (Code): All imports pass. `from dash_app.app import app` OK. `python run_dash.py` starts cleanly, HTTP 200.
- Tier 2 (Visual): Functional tests confirm: 20-drug chart → vertical right legend; 5-drug → horizontal; 7-trust → horizontal with TRUST_PALETTE colors [#005EB8, #DA291C, #009639, #ED8B00, #7C2855, #00A499, #330072].
- Tier 3 (Functional): All 4 chart functions produce correct figures with sample data. Legend orientation switches at 15-item threshold.
### Files changed:
- `src/visualization/plotly_generator.py` — added `_smart_legend()` + `_smart_legend_margin()`, refactored 4 chart functions
- `IMPLEMENTATION_PLAN.md` — marked A.3 subtasks [x], updated A.4 for completed subtasks
### Committed: 90de24c "fix: adaptive legends + _base_layout for 4 chart functions (Task A.3)"
### Patterns discovered:
- Plotly trace.showlegend defaults to None (not True), so checking `t.showlegend is not False` correctly identifies visible legend items.
- The `_smart_legend_margin()` helper returns only `b` and `r` keys, allowing callers to merge with their own `t` and `l` via `dict(t=60, l=8, **legend_margins)`.
- Net code reduction of 48 lines across 4 functions by replacing repeated layout blocks with `_base_layout()`.
- Several A.4 subtasks were completed opportunistically during A.3 (replacing nhs_colours with palette constants).
### Next iteration should:
- Start with Task A.4: Fix remaining trust comparison color differentiation. Two subtasks remain:
1. Add `is_trust_comparison=False` param to `create_cost_waterfall_figure()` — use `TRUST_PALETTE` when True. Update `tc_cost_waterfall` callback to pass `is_trust_comparison=True`.
2. Fix `_dosing_by_drug()` blue→blue interpolation: replace with `plotly.colors.sample_colorscale("Viridis", ...)` for meaningful gradient.
- The `create_trust_duration_figure()` TRUST_PALETTE fix and `create_trust_market_share_figure()` DRUG_PALETTE fix are already done (marked [x] in plan).
- Search for `create_cost_waterfall_figure` and `_dosing_by_drug` by function name (line numbers have shifted).
### Blocked items:
- None
## Iteration 4 — 2026-02-07
### Task: A.4 — Fix trust comparison color differentiation (remaining 3 subtasks)
### Why this task:
- A.4 is the last Phase A task. A.1-A.3 are complete. Two A.4 subtasks were already done opportunistically in A.3. The remaining 3 subtasks complete Phase A.
### Status: COMPLETE
### What was done:
- **`create_cost_waterfall_figure()`**: Added `is_trust_comparison=False` parameter. When True, uses `TRUST_PALETTE` (7 distinct colors) instead of `DRUG_PALETTE`. Replaced local `nhs_colours` list. Applied `_base_layout()` — removed 20+ lines of hardcoded layout. Updated annotation fonts to use `ANNOTATION_COLOR` and `CHART_FONT_FAMILY` constants. Used `GRID_COLOR` constant for gridcolor.
- **`tc_cost_waterfall` callback**: Updated call to pass `is_trust_comparison=True`, so Trust Comparison cost waterfall now uses 7 maximally-distinct trust colors.
- **`_dosing_by_drug()`**: Replaced manual RGB interpolation (blue `#005EB8` → blue `#41B6E6`) with `plotly.colors.sample_colorscale("Viridis", ratios)`. Result: bars now range from yellow (high interval) through teal to purple (low interval) — clearly distinguishable.
### Validation results:
- Tier 1 (Code): `from dash_app.app import app` OK. `python run_dash.py` starts cleanly, HTTP 200.
- Tier 2 (Visual): Cost waterfall normal mode uses DRUG_PALETTE colors (#005EB8, #DA291C, #009639...). Trust mode uses TRUST_PALETTE. Dosing by drug uses Viridis: rgb(253,231,37), rgb(34,144,140), rgb(59,81,138) — visually distinct.
- Tier 3 (Functional): is_trust_comparison=False (default) preserves existing behavior. is_trust_comparison=True switches to TRUST_PALETTE. Viridis sampling produces correct gradients.
### Files changed:
- `src/visualization/plotly_generator.py` — `create_cost_waterfall_figure()` + `_dosing_by_drug()`
- `dash_app/callbacks/trust_comparison.py` — `tc_cost_waterfall` callback
- `IMPLEMENTATION_PLAN.md` — marked A.4 subtasks [x]
### Committed: 950d93b "fix: trust palette for cost waterfall + Viridis dosing gradient (Task A.4)"
### Patterns discovered:
- `plotly.colors.sample_colorscale("Viridis", ratios)` returns a list of `rgb(r,g,b)` strings — can be passed directly to `marker.color` as a list. Very clean replacement for manual interpolation.
- The `_base_layout()` + `layout.update()` pattern removed ~30 lines from `create_cost_waterfall_figure()` (net -33 lines in diff). Each function converted gets simpler.
- Phase A is now COMPLETE. All 4 tasks (A.1-A.4) are done.
### Next iteration should:
- Start with Task B.1: Fix title inconsistencies across all charts. This requires:
1. Apply `_base_layout()` to remaining unconverted chart functions: `create_sankey_figure()`, `create_cost_effectiveness_figure()`, `create_duration_figure()`
2. These functions still have hardcoded `#003087` title colors and `"Source Sans 3"` font strings
3. Some Trust Comparison functions may still use `size=16` instead of `CHART_TITLE_SIZE` (18)
4. Search for `def create_sankey_figure`, `def create_cost_effectiveness_figure`, `def create_duration_figure` to find current line numbers
- After B.1, the next tasks are B.2 (cost effectiveness gradient) and B.3 (Sankey narrow-screen fix) — both are small and independent.
### Blocked items:
- None
## Iteration 5 — 2026-02-07
### Task: B.1 — Fix title inconsistencies across all charts
### Why this task:
- B.1 is the first Phase B task. Phase A is complete. Progress.txt from Iteration 4 explicitly recommended B.1 next. It ensures all chart functions use consistent styling via `_base_layout()`.
### Status: COMPLETE
### What was done:
- **`create_sankey_figure()`**: Replaced local `nhs_colours` (15 blue-heavy colors) with `DRUG_PALETTE`. Replaced 20-line hardcoded layout block (title color `#003087`, manual font/bgcolor) with `_base_layout()` call + 2 overrides (font size 12, margin/height).
- **`create_cost_effectiveness_figure()`**: Replaced 38-line manual layout block (title, xaxis, yaxis, margin, bgcolor, hoverlabel, font) with `_base_layout()` + 5-key update. Replaced hardcoded annotation font strings with `ANNOTATION_COLOR` and `CHART_FONT_FAMILY` constants. Replaced `gridcolor="#E2E8F0"` with `GRID_COLOR`.
- **`create_duration_figure()`**: Replaced 30-line manual layout (title color `#003087`, l=200 fixed margin, manual bgcolor/font) with `_base_layout()` + 6-key update. Fixed `margin.l` from 200 → 8 + `yaxis automargin=True`. Replaced hardcoded annotation font with constants. Used `ANNOTATION_COLOR` in subtitle HTML span.
- Net result: -52 lines (24 added, 76 removed). All 11 chart functions now use `_base_layout()`.
### Validation results:
- Tier 1 (Code): `from dash_app.app import app` OK. `python run_dash.py` starts cleanly, no errors.
- Tier 2 (Visual): All three functions produce figures with title color `#1E293B`, title size 18, font family `Source Sans 3, system-ui, sans-serif`, transparent bgcolor. Duration uses automargin instead of fixed l=200.
### Files changed:
- `src/visualization/plotly_generator.py` — converted 3 remaining chart functions to `_base_layout()`
- `IMPLEMENTATION_PLAN.md` — marked B.1 subtasks [x]
### Committed: 8d05adc "fix: consistent titles via _base_layout() for Sankey, Cost Effectiveness, Duration (Task B.1)"
### Patterns discovered:
- All 11 chart functions in plotly_generator.py now use `_base_layout()`. No more hardcoded `"Source Sans 3"` strings or `#003087` title colors exist outside of constants/colorscale definitions.
- Sankey's local `nhs_colours` was different from `DRUG_PALETTE` (had more blue variants like `#003087`, `#41B6E6`, `#0066CC`). Replacing with `DRUG_PALETTE` gives more distinguishable drug colors.
- The `_base_layout()` conversion is a reliable pattern: replace the entire `fig.update_layout(...)` call with `layout = _base_layout(title)` + `layout.update({chart-specific})` + `fig.update_layout(**layout)`. Removes ~15-30 lines per function.
### Next iteration should:
- Choose Task B.2 (cost effectiveness smooth gradient) or B.3 (Sankey narrow-screen fix) — both are small, independent tasks with no blockers.
- B.2: In `create_cost_effectiveness_figure()`, replace the 3-bin hard threshold (green/amber/red at 0.33/0.66) with smooth RGB interpolation. Green (#009639) → Amber (#ED8B00) at ratio 00.5, Amber → Red (#DA291C) at 0.51.0.
- B.3: In `create_sankey_figure()`, change `arrangement="snap"` → `arrangement="freeform"` and increase `pad` from 20 → 25.
- Both are quick changes — could potentially do B.2+B.3 together if scoped carefully, but the loop rules say one task per iteration.
### Blocked items:
- None
## Iteration 6 — 2026-02-07
### Task: B.2 — Cost effectiveness smooth gradient
### Why this task:
- B.2 is the next Phase B task after B.1. Iteration 5 explicitly recommended B.2 or B.3. B.2 addresses the crude 3-bin color threshold in the cost effectiveness lollipop chart.
### Status: COMPLETE
### What was done:
- **`create_cost_effectiveness_figure()`**: Replaced 3-bin hard threshold color assignment (green if ratio<0.33, amber if <0.66, red otherwise) with a smooth `_lerp_color()` inner function that does linear RGB interpolation:
- Ratio 00.5: Green (#009639) → Amber (#ED8B00)
- Ratio 0.51.0: Amber (#ED8B00) → Red (#DA291C)
- Includes clamping to [0,1] for safety
- Net change: +9 lines (replaced 8-line loop with 17-line function + 1-line list comprehension)
- `_base_layout()` was already applied in B.1 — no further layout changes needed
### Validation results:
- Tier 1 (Code): `from dash_app.app import app` OK. `python run_dash.py` starts cleanly, HTTP 200.
- Tier 2 (Visual): Functional test confirmed 5 gradient points: 1000→rgb(0,150,57) green, 3000→rgb(118,144,28) yellow-green, 5000→rgb(237,139,0) amber, 7000→rgb(227,90,14) orange-red, 9000→rgb(218,41,28) red. Smooth interpolation verified.
### Files changed:
- `src/visualization/plotly_generator.py` — replaced 3-bin color logic with `_lerp_color()` smooth gradient
- `IMPLEMENTATION_PLAN.md` — marked B.2 subtasks [x]
### Committed: cbac37e "fix: smooth green→amber→red gradient for cost effectiveness chart (Task B.2)"
### Patterns discovered:
- Inner function `_lerp_color()` defined inside `create_cost_effectiveness_figure()` keeps it scoped to where it's used. If other charts need similar gradients, could promote to module level.
- The data is reversed before coloring (line 486), so colors are assigned based on position in the reversed list. But since `_lerp_color` uses the actual cost value via `(c - min_cost) / cost_range`, the reversal doesn't affect color correctness.
### Next iteration should:
- Do Task B.3: Sankey narrow-screen fix. Very small change:
1. Search for `def create_sankey_figure` to find the function
2. Change `arrangement="snap"` → `arrangement="freeform"`
3. Increase `pad` from 20 → 25
- After B.3, Task B.4 (heatmap metric toggle) is the last Phase B task — it's more involved (adding SegmentedControl components + callback wiring).
### Blocked items:
- None
## Iteration 7 — 2026-02-07
### Task: B.3 — Sankey narrow-screen fix
### Why this task:
- B.3 is the next uncompleted Phase B task. Iteration 6 explicitly recommended it. It's a small, focused change with no dependencies.
### Status: COMPLETE
### What was done:
- **`create_sankey_figure()`** (line 808): Changed `arrangement="snap"` → `arrangement="freeform"` so users can drag nodes freely on narrow screens. Increased `pad` from 20 → 25 for more spacing between nodes.
### Validation results:
- Tier 1 (Code): `from dash_app.app import app` OK. `python run_dash.py` starts cleanly.
- Tier 2 (Visual): Functional test confirms arrangement="freeform" and pad=25 on generated figure.
### Files changed:
- `src/visualization/plotly_generator.py` — Sankey arrangement + pad
- `IMPLEMENTATION_PLAN.md` — marked B.3 [x]
### Committed: fb30f5f "fix: Sankey freeform arrangement + increased padding (Task B.3)"
### Patterns discovered:
- `arrangement="freeform"` allows users to manually reposition Sankey nodes by dragging, which is useful when nodes overlap on narrow viewports. `"snap"` constrains nodes to columns.
### Next iteration should:
- Do Task B.4: Heatmap metric toggle (both views). This is the last Phase B task and is more involved:
1. Add `dmc.SegmentedControl` component next to Patient Pathways heatmap (id: `heatmap-metric-toggle`, visible only when heatmap tab active). Add to `dash_app/components/chart_card.py`.
2. Add `dmc.SegmentedControl` next to Trust Comparison heatmap (id: `tc-heatmap-metric-toggle`). Add to `dash_app/components/trust_comparison.py`.
3. Update `_render_heatmap()` in `dash_app/callbacks/chart.py` to read the metric toggle value.
4. Update `tc_heatmap` callback in `dash_app/callbacks/trust_comparison.py` to read the metric toggle value.
5. Both heatmap functions (`create_heatmap_figure`, `create_trust_heatmap_figure`) already accept a `metric` parameter — the toggle just needs to pass the selected metric through.
- Key files to read: `dash_app/components/chart_card.py` (for PP toggle placement), `dash_app/components/trust_comparison.py` (for TC toggle placement), `dash_app/callbacks/chart.py` (for `_render_heatmap`), `dash_app/callbacks/trust_comparison.py` (for `tc_heatmap`).
### Blocked items:
- None
## Iteration 8 — 2026-02-07
### Task: B.4 — Heatmap metric toggle (both views)
### Why this task:
- B.4 is the last Phase B task. B.1-B.3 are complete. Iteration 7 explicitly recommended B.4. It requires adding `dmc.SegmentedControl` components and wiring them into callbacks.
### Status: COMPLETE
### What was done:
- **chart_card.py**: Added "heatmap" to `TAB_DEFINITIONS` (was only in `ALL_TAB_DEFINITIONS`). Added `dmc.SegmentedControl` (id: `heatmap-metric-toggle`) in `heatmap-metric-wrapper` div inside chart card header, hidden by default (`display: none`).
- **trust_comparison.py**: Replaced generic `_tc_chart_cell` for heatmap with custom inline layout containing `dmc.SegmentedControl` (id: `tc-heatmap-metric-toggle`) in a flex row next to the title.
- **chart.py**: Updated `_render_heatmap()` to accept `metric` parameter (default "patients"). Updated `update_chart` callback to:
- Add `Output("heatmap-metric-wrapper", "style")` — controls toggle visibility
- Add `Input("heatmap-metric-toggle", "value")` — reads metric selection
- Show toggle when `active_tab == "heatmap"`, hide otherwise
- Pass metric to `_render_heatmap()` when heatmap tab is active
- **trust_comparison.py callbacks**: Updated `tc_heatmap` callback to add `Input("tc-heatmap-metric-toggle", "value")` and pass `metric=heatmap_metric` to `create_trust_heatmap_figure()`.
- Both `create_heatmap_figure()` and `create_trust_heatmap_figure()` already accept `metric` param — no changes needed in plotly_generator.py.
### Validation results:
- Tier 1 (Code): `from dash_app.app import app` passes. `python run_dash.py` starts cleanly, HTTP 200.
- Tier 2 (Visual): N/A — toggle components rendered in layout, heatmap functions already tested in A.2.
- Tier 3 (Functional): Callback wiring verified — `update_chart` has 3 Outputs (figure, subtitle, toggle style) and 4 Inputs (chart-data, active-tab, app-state, heatmap-metric-toggle). TC heatmap callback has 2 Inputs (app-state, tc-heatmap-metric-toggle). Toggle options: patients, cost, cost_pp_pa.
### Files changed:
- `dash_app/components/chart_card.py` — added heatmap to TAB_DEFINITIONS, added SegmentedControl
- `dash_app/components/trust_comparison.py` — custom heatmap cell with SegmentedControl
- `dash_app/callbacks/chart.py` — updated _render_heatmap + update_chart callback
- `dash_app/callbacks/trust_comparison.py` — updated tc_heatmap callback
- `IMPLEMENTATION_PLAN.md` — marked B.4 subtasks [x]
### Committed: 8e2e2b7 "feat: heatmap metric toggle for both PP and TC views (Task B.4)"
### Patterns discovered:
- Adding a new Output to an existing callback (e.g., `heatmap-metric-wrapper` style) requires updating ALL return paths in that callback to include the new output value. The toggle_style is computed once at the top and returned in every code path.
- `dmc.SegmentedControl` with `size="xs"` fits nicely in chart headers without overwhelming the title.
- The heatmap tab was missing from PP `TAB_DEFINITIONS` — only in `ALL_TAB_DEFINITIONS`. Adding it to `TAB_DEFINITIONS` automatically creates the tab button and wires it into `switch_tab` because `_TAB_IDS` is built dynamically.
- Phase B is now COMPLETE. All 4 tasks (B.1-B.4) are done.
### Next iteration should:
- Start Phase C: New Analytics (Existing Data). Task C.1 (Retention funnel chart) is the first task.
- C.1 requires the 6-step pattern for adding a new chart tab:
1. Create `get_retention_funnel()` in `src/data_processing/pathway_queries.py` — query level 4+ nodes, aggregate by treatment line depth
2. Add thin wrapper in `dash_app/data/queries.py`
3. Create `create_retention_funnel_figure(data, title)` in `src/visualization/plotly_generator.py` — use `go.Funnel`
4. Add "Funnel" tab to `TAB_DEFINITIONS` in `chart_card.py`
5. Add `_render_funnel()` helper in `dash_app/callbacks/chart.py`
6. Add elif case in `update_chart()` dispatch
- Key: The query should count patients at each treatment line depth (1-drug, 2-drug, 3-drug, etc.) using the level column in pathway_nodes. Level 3 = 1st drug, level 4 = 2-drug pathway, etc.
- Use `_base_layout()` and NHS blue gradient for the funnel figure.
### Blocked items:
- None
## Iteration 9 — 2026-02-07
### Task: C.1 — Retention funnel chart
### Why this task:
- C.1 is the first Phase C task. Phases A and B are complete. Iteration 8 explicitly recommended C.1. It follows the 6-step pattern for adding a new chart tab.
### Status: COMPLETE
### What was done:
- **`get_retention_funnel()`** in `pathway_queries.py`: Queries level 3+ nodes grouped by level, aggregates patient counts. Level 3→depth 1 (1st drug), level 4→depth 2 (2nd drug), etc. Supports directory/trust filters. Returns list of dicts with depth, label, patients, pct.
- **Thin wrapper** in `dash_app/data/queries.py`: Imports and delegates to shared function with DB_PATH resolution.
- **`create_retention_funnel_figure()`** in `plotly_generator.py`: Uses `go.Funnel` with NHS blue gradient (#003087→#005EB8→#1E88E5). Text inside bars shows "N patients (X%)". Uses `_base_layout()` for consistent styling. Dynamic height based on depth count.
- **TAB_DEFINITIONS**: Added `("funnel", "Funnel")` — now 4 tabs: Icicle, Sankey, Heatmap, Funnel.
- **`_render_funnel()`** in `chart.py`: Reads filter state, calls query, passes to figure function. Handles empty data and errors.
- **Dispatch case**: Added `elif active_tab == "funnel"` in `update_chart()`.
### Validation results:
- Tier 1 (Code): `from dash_app.app import app` OK. `python run_dash.py` starts cleanly, HTTP 200.
- Tier 2 (Visual): Funnel shows 3 levels: 1st drug (10,819 patients, 100%), 2nd drug (2,142, 19.8%), 3rd drug (176, 1.6%). NHS blue gradient applied.
- Tier 3 (Functional): Responds to directory filter (RHEUMATOLOGY: 3,448→551→50). Works with indication chart type (10,782→1,519→125). Returns empty figure for nonexistent directory. Tab switching wired via dynamic `_TAB_IDS`.
### Files changed:
- `src/data_processing/pathway_queries.py` — added `get_retention_funnel()`
- `dash_app/data/queries.py` — added import + thin wrapper
- `src/visualization/plotly_generator.py` — added `create_retention_funnel_figure()`
- `dash_app/components/chart_card.py` — added funnel to TAB_DEFINITIONS
- `dash_app/callbacks/chart.py` — added `_render_funnel()` + dispatch case
- `IMPLEMENTATION_PLAN.md` — marked C.1 subtasks [x]
### Committed: a6cf6ef "feat: retention funnel chart tab with treatment line depth (Task C.1)"
### Patterns discovered:
- The 6-step pattern works cleanly: query → wrapper → figure → tab def → render helper → dispatch. No surprises.
- Level 3 sum (10,819) ≈ root value (11,118) — difference is due to minimum_patients thresholds. Close enough for funnel percentages.
- `go.Funnel` positions text automatically. `textposition="inside"` with white text on dark NHS blue is readable.
- Funnel connector lines use `GRID_COLOR` for visual consistency with other charts.
### Next iteration should:
- Do Task C.2: Pathway depth distribution chart. Same 6-step pattern.
1. Create `get_pathway_depth_distribution()` in `pathway_queries.py` — aggregate patients who STOPPED at each depth (not cumulative like funnel, but exclusive)
2. Key difference from funnel: subtract child counts. Patients at depth 1 only = level 3 total - level 4 total. Patients at depth 2 only = level 4 total - level 5 total.
3. Create `create_pathway_depth_figure(data, title)` — horizontal bar chart with NHS blue gradient
4. Add "Depth" tab to TAB_DEFINITIONS (will be 5th tab)
5. Wire callback helpers
- The query logic for "stopped at depth N" is: patients_at_level_N - patients_at_level_(N+1). The last level has no children so stopped = total.
### Blocked items:
- None
## Iteration 10 — 2026-02-07
### Task: C.2 — Pathway depth distribution chart
### Why this task:
- C.2 is the next Phase C task after C.1. Iteration 9 explicitly recommended it. Same 6-step pattern as C.1.
### Status: COMPLETE
### What was done:
- **`get_pathway_depth_distribution()`** in `pathway_queries.py`: Queries level 3+ nodes grouped by level, then subtracts next-level counts to get exclusive "stopped at depth N" patients. Supports directory/trust filters. Returns list of dicts with depth, label, patients, pct.
- **Thin wrapper** in `dash_app/data/queries.py`: Imports and delegates with DB_PATH resolution.
- **`create_pathway_depth_figure()`** in `plotly_generator.py`: Horizontal bar chart (`go.Bar` with `orientation="h"`) with NHS blue gradient (#003087→#005EB8→#1E88E5). Text inside bars shows "N (pct%)". Uses `_base_layout()`. Dynamic height based on depth count. Y-axis reversed so depth 1 (most patients) is at top.
- **TAB_DEFINITIONS**: Added `("depth", "Depth")` — now 5 tabs: Icicle, Sankey, Heatmap, Funnel, Depth.
- **`_render_depth()`** in `chart.py`: Reads filter state, calls query, passes to figure function. Handles empty data and errors.
- **Dispatch case**: Added `elif active_tab == "depth"` in `update_chart()`.
### Validation results:
- Tier 1 (Code): `from dash_app.app import app` OK. `python run_dash.py` starts cleanly. 20 callbacks registered.
- Tier 2 (Visual): Depth chart shows 3 levels: 1 drug only (8,677, 80.2%), 2 drugs only (1,966, 18.2%), 3 drugs only (176, 1.6%). NHS blue gradient applied. Autosize + automargin.
- Tier 3 (Functional): Directory filter works (RHEUMATOLOGY: 2,897/501/50). Indication chart type works (9,263/1,394/125). Empty data returns empty figure. Tab switching wired via dynamic `_TAB_IDS`.
### Files changed:
- `src/data_processing/pathway_queries.py` — added `get_pathway_depth_distribution()`
- `dash_app/data/queries.py` — added import + thin wrapper
- `src/visualization/plotly_generator.py` — added `create_pathway_depth_figure()`
- `dash_app/components/chart_card.py` — added depth to TAB_DEFINITIONS
- `dash_app/callbacks/chart.py` — added `_render_depth()` + dispatch case
- `IMPLEMENTATION_PLAN.md` — marked C.2 subtasks [x]
### Committed: 55c9af2 "feat: pathway depth distribution chart tab (Task C.2)"
### Patterns discovered:
- The depth calculation (cumulative - next level) is simple: iterate pairs and subtract. Total of exclusive counts equals the level 3 total, confirming correctness (8,677 + 1,966 + 176 = 10,819 = funnel total).
- `autorange="reversed"` on yaxis puts depth 1 at the top of the horizontal bar chart, matching the natural reading order (most patients first).
- The 6-step pattern continues to work cleanly for new tabs. Each step is small and independently verifiable.
### Next iteration should:
- Do Task C.3: Duration vs Cost scatter plot. Same 6-step pattern:
1. Create `get_duration_cost_scatter()` in `pathway_queries.py` — query level 3 nodes for drug-level data (drug, directory, avg_days, cost_pp_pa, patients)
2. Add thin wrapper in `queries.py`
3. Create `create_duration_cost_scatter_figure(data, title)` in `plotly_generator.py` — scatter: x=avg_days, y=cost_pp_pa, size=patients, color=directory. Add quadrant lines at median values.
4. Add "Scatter" tab to TAB_DEFINITIONS (6th tab)
5. Wire `_render_scatter()` + dispatch
- Key design decision: use `go.Scatter` with marker size proportional to patient count. Color by directory (use DRUG_PALETTE cycling or assign by directory). Quadrant lines use median avg_days and median cost_pp_pa as thresholds.
### Blocked items:
- None
## Iteration 11 — 2026-02-07
### Task: C.3 — Duration vs Cost scatter plot
### Why this task:
- C.3 is the next Phase C task after C.1 and C.2. Iteration 10 explicitly recommended it with design details. Same 6-step pattern.
### Status: COMPLETE
### What was done:
- **`get_duration_cost_scatter()`** in `pathway_queries.py`: Queries level 3 nodes with avg_days and cost_pp_pa, aggregates across trusts using weighted averages. Supports directory/trust filters. Returns list of dicts.
- **Thin wrapper** in `dash_app/data/queries.py`: Standard import + DB_PATH delegation.
- **`create_duration_cost_scatter_figure()`** in `plotly_generator.py`: `go.Scatter` with one trace per directory for legend grouping. Marker size proportional to patient count (global max for consistent sizing). DRUG_PALETTE for directory colors. Quadrant lines at median avg_days and median cost_pp_pa with annotations. Uses `_base_layout()` + `_smart_legend()`.
- **TAB_DEFINITIONS**: Added `("scatter", "Scatter")` — now 6 tabs: Icicle, Sankey, Heatmap, Funnel, Depth, Scatter.
- **`_render_scatter()`** in `chart.py`: Standard render helper with filter extraction and error handling.
- **Dispatch case**: Added `elif active_tab == "scatter"` in `update_chart()`.
### Validation results:
- Tier 1 (Code): `from dash_app.app import app` OK. `python run_dash.py` starts cleanly.
- Tier 2 (Visual): 59 data points across 12 directories. Days range 482237, cost range £994–£162k. Median quadrant lines at 928 days and £4,629. Marker sizes proportional (840px).
- Tier 3 (Functional): Directory filter works (RHEUMATOLOGY: 16 drugs). Indication chart type works (108 points). Empty data returns empty figure. Tab switching wired via dynamic `_TAB_IDS`.
### Files changed:
- `src/data_processing/pathway_queries.py` — added `get_duration_cost_scatter()`
- `dash_app/data/queries.py` — added import + thin wrapper
- `src/visualization/plotly_generator.py` — added `create_duration_cost_scatter_figure()`
- `dash_app/components/chart_card.py` — added scatter to TAB_DEFINITIONS
- `dash_app/callbacks/chart.py` — added `_render_scatter()` + dispatch case
- `IMPLEMENTATION_PLAN.md` — marked C.3 subtasks [x]
### Committed: d8df416 "feat: duration vs cost scatter plot tab (Task C.3)"
### Patterns discovered:
- `statistics.median()` imported inside the function to avoid module-level import for a stdlib module only used by one function.
- Marker size must use global max (not per-directory max) for consistent visual comparison across all directories. Initially coded per-directory, fixed before commit.
- `fig.add_hline()` and `fig.add_vline()` are the clean Plotly API for quadrant lines — they create shape objects and annotation objects automatically.
### Next iteration should:
- Do Task C.4: Drug switching network graph. This is the last Phase C task. Options from IMPLEMENTATION_PLAN.md:
1. Create `get_drug_network()` in `pathway_queries.py` — returns undirected edges (source, target, patients) and nodes (name, total_patients). Different from `get_drug_transitions()` which returns directed Sankey data.
2. Create `create_drug_network_figure(data, title)` using `go.Scatter` for circular layout nodes + edges as lines.
3. Add as separate "Network" tab or sub-toggle within Sankey tab.
4. The plan says "Add as sub-toggle within Sankey tab or as separate Network tab" — separate tab is simpler (follows established pattern).
- After C.4, Phase D begins (backend work: trends, dose distribution, timeline, NICE compliance).
### Blocked items:
- None
## Iteration 12 — 2026-02-07
### Task: C.4 — Drug switching network graph
### Why this task:
- C.4 is the last Phase C task. C.1-C.3 are complete. Iteration 11 explicitly recommended C.4. It follows the established 6-step pattern for adding a new chart tab.
### Status: COMPLETE
### What was done:
- **`get_drug_network()`** in `pathway_queries.py`: Queries level 4+ nodes for drug_sequence co-occurrence edges (undirected, sorted pairs to avoid duplicates). Also queries level 3 nodes for per-drug patient totals. Supports directory/trust filters. Returns `{nodes: [{name, total_patients}], edges: [{source, target, patients}]}`.
- **Thin wrapper** in `dash_app/data/queries.py`: Standard import + DB_PATH delegation.
- **`create_drug_network_figure()`** in `plotly_generator.py`: Circular layout using `math.cos/sin` for node positions. Individual `go.Scatter` traces for each edge (variable width 0.56px and opacity 0.150.7 scaled by patient count). Node scatter with `markers+text` mode, size 1250px proportional to patients, colors from `DRUG_PALETTE`. Uses `_base_layout()`. Axes hidden, `scaleanchor="y"` for square aspect ratio.
- **TAB_DEFINITIONS**: Added `("network", "Network")` — now 7 tabs: Icicle, Sankey, Heatmap, Funnel, Depth, Scatter, Network.
- **`_render_network()`** in `chart.py`: Standard render helper with filter extraction and error handling. Checks `data.get("nodes")` for empty state.
- **Dispatch case**: Added `elif active_tab == "network"` in `update_chart()`.
### Validation results:
- Tier 1 (Code): `from dash_app.app import app` OK. `python run_dash.py` starts cleanly, HTTP 200.
- Tier 2 (Visual): 39 drug nodes, 45 co-occurrence edges. Top connections: FARICIMAB↔RANIBIZUMAB (452), AFLIBERCEPT↔FARICIMAB (392), ADALIMUMAB↔ETANERCEPT (305). Figure has 46 traces (45 edges + 1 node scatter).
- Tier 3 (Functional): Directory filter works (RHEUMATOLOGY: 17 nodes, 20 edges). Indication chart type works (39 nodes, 28 edges). Empty data returns empty figure. Tab switching wired via dynamic `_TAB_IDS`.
### Files changed:
- `src/data_processing/pathway_queries.py` — added `get_drug_network()`
- `dash_app/data/queries.py` — added import + thin wrapper
- `src/visualization/plotly_generator.py` — added `create_drug_network_figure()`
- `dash_app/components/chart_card.py` — added network to TAB_DEFINITIONS
- `dash_app/callbacks/chart.py` — added `_render_network()` + dispatch case
- `IMPLEMENTATION_PLAN.md` — marked C.4 subtasks [x]
### Committed: 1405476 "feat: drug switching network graph tab (Task C.4)"
### Patterns discovered:
- Individual edge traces (one `go.Scatter` per edge) is necessary for variable width/opacity per edge. A single trace would only support uniform line properties.
- `scaleanchor="y", scaleratio=1` on xaxis ensures the circular layout is actually circular, not elliptical.
- The undirected edge approach (sort pair to canonical form) correctly deduplicates A→B and B→A transitions.
- Phase C is now COMPLETE. All 4 tasks (C.1-C.4) are done. 7 Patient Pathways tabs total.
### Next iteration should:
- Start Phase D: New Analytics (Backend Work). D.1 (Temporal trend analysis) is the first task.
- **Important**: D.1 requires schema changes (`pathway_trends` table) and CLI pipeline extension — this is backend work that has been avoided so far. Read guardrails.md carefully: DO NOT modify the existing pipeline logic, but adding new tables and extending refresh_pathways.py to compute trends is allowed per the plan.
- D.1 steps: (1) Design `pathway_trends` table schema in schema.py, (2) Add migration, (3) Extend refresh_pathways.py to insert trend snapshots, (4) Create query function, (5) Create figure function (line chart), (6) Add tab + callbacks.
- However: trends need at least 2 refresh cycles to show meaningful data. Consider whether D.1 is worth implementing vs. skipping to D.2 (dose distribution) or D.3 (drug timeline) which work with existing data.
- Alternative: D.3 (Drug timeline / Gantt chart) uses existing `first_seen`/`last_seen` data in pathway_nodes — no backend changes needed. Could be a better next task.
### Blocked items:
- None
## Iteration 13 — 2026-02-07
### Task: D.3 — Drug timeline (Gantt chart)
### Why this task:
- D.3 uses existing `first_seen`/`last_seen` data in pathway_nodes — no backend/schema changes needed. D.1 requires schema changes + pipeline extension (both in guardrail DO NOT MODIFY list) and needs ≥2 refresh cycles for meaningful data. D.2 requires parsing complex `average_administered` JSON. D.3 was the cleanest next task.
### Status: COMPLETE
### What was done:
- **`get_drug_timeline()`** in `pathway_queries.py`: Queries level 3 nodes aggregated across trusts — MIN(first_seen), MAX(last_seen), SUM(value), weighted avg cost_pp_pa per drug × directory. Supports directory/trust filters. Returns 59 entries for all-directory view.
- **Thin wrapper** in `dash_app/data/queries.py`: Standard import + DB_PATH delegation.
- **`create_drug_timeline_figure()`** in `plotly_generator.py`: Gantt-style using `go.Bar(orientation="h")` with `base` set to `first_seen` datetime and `x` as duration in milliseconds. One trace per bar, legend grouped by directory. Colors from `DRUG_PALETTE` (one color per directory). Patient count as white text inside bars. Hover shows drug, directory, first/last seen (month/year), duration in days, patients, cost p.a. Dynamic height (28px per bar). Uses `_base_layout()` + `_smart_legend()` + `_smart_legend_margin()`.
- **TAB_DEFINITIONS**: Added `("timeline", "Timeline")` — now 8 tabs: Icicle, Sankey, Heatmap, Funnel, Depth, Scatter, Network, Timeline.
- **`_render_timeline()`** in `chart.py`: Standard render helper with directory/trust filter extraction and error handling.
- **Dispatch case**: Added `elif active_tab == "timeline"` in `update_chart()`.
### Validation results:
- Tier 1 (Code): `from dash_app.app import app` OK. `python run_dash.py` starts cleanly, HTTP 200.
- Tier 2 (Visual): 59 data points across 12 directories. Date x-axis with 6-month ticks. Bars span 20192025. Newest drug DOCETAXEL (BREAST SURGERY) starts May 2025. Single-directory mode (RHEUMATOLOGY): 16 drugs, y-labels without directory suffix.
- Tier 3 (Functional): Directory filter works (RHEUMATOLOGY: 16 drugs). Trust filter works. Empty data returns empty figure. Tab switching wired via dynamic `_TAB_IDS`. 8 tabs visible.
### Files changed:
- `src/data_processing/pathway_queries.py` — added `get_drug_timeline()`
- `dash_app/data/queries.py` — added import + thin wrapper
- `src/visualization/plotly_generator.py` — added `create_drug_timeline_figure()`
- `dash_app/components/chart_card.py` — added timeline to TAB_DEFINITIONS
- `dash_app/callbacks/chart.py` — added `_render_timeline()` + dispatch case
- `IMPLEMENTATION_PLAN.md` — marked D.3 subtasks [x]
### Committed: 0a14f1f "feat: drug timeline Gantt chart tab (Task D.3)"
### Patterns discovered:
- Plotly `go.Bar` Gantt trick: set `base` to start datetime, `x` to duration in milliseconds (days × 86,400,000), `orientation="h"`. Plotly auto-detects date axis type from the datetime base values.
- `datetime.fromisoformat()` handles the `T00:00:00` suffix in ISO timestamps from SQLite without issue.
- Single-directory detection (`len(directories) == 1`) lets us simplify y-labels to just drug names, avoiding redundant "(RHEUMATOLOGY)" suffix when the user already filtered to that directory.
- With 59 bars, 12 directories → `_smart_legend()` uses horizontal mode (≤15 items), which works well since directory names aren't too long.
### Next iteration should:
- Choose between D.1 (Temporal trends), D.2 (Dose distribution), or D.4 (NICE TA compliance).
- **D.1** is problematic: requires modifying `schema.py` (guardrail protected), `reference_data.py` (guardrail protected), and `refresh_pathways.py` (guardrail protected). The plan allows it as an exception, but it also needs ≥2 refresh cycles for meaningful data. Consider marking D.1 as [B] (blocked on pipeline changes being out of scope).
- **D.2** (Dose distribution): Requires parsing `average_administered` JSON from pathway_nodes. Check if the data exists and is parseable first — run `SELECT average_administered FROM pathway_nodes WHERE average_administered IS NOT NULL AND average_administered != '' LIMIT 5` to inspect the format.
- **D.4** (NICE TA compliance): Requires parsing `data/ta-recommendations.xlsx` — check if this file exists and what it contains. This is also substantial (schema + migration + compliance scoring).
- Recommendation: Try D.2 next if `average_administered` data is available and parseable. If the JSON format is too complex or data is sparse, mark D.2 as [B] and assess D.4.
### Blocked items:
- D.1: Likely blocked — requires modifying guardrail-protected files (schema.py, reference_data.py, refresh_pathways.py) + needs multiple refresh cycles for meaningful data.
## Iteration 14 — 2026-02-07
### Task: D.2 — Average administered doses analysis
### Why this task:
- D.2 was explicitly recommended by Iteration 13. The `average_administered` JSON data exists (2031 rows, simple array format) and requires no schema changes. D.1 is blocked (guardrail-protected files). D.4 is complex (schema + migration + Excel parsing). D.2 was the cleanest ready task.
### Status: COMPLETE
### What was done:
- **`get_dosing_distribution()`** in `pathway_queries.py`: Queries level 3 nodes with `average_administered` JSON, parses position 0 (average dose count for the drug), aggregates across trusts using weighted averages by patient count. Supports directory/trust filters. Returns `[{drug, directory, avg_doses, patients}]`.
- **Thin wrapper** in `dash_app/data/queries.py`: Standard import + DB_PATH delegation.
- **`create_dosing_distribution_figure()`** in `plotly_generator.py`: Horizontal bar chart (`go.Bar` with `orientation="h"`) showing average administered doses per drug. One trace per bar with legend grouped by directory. Colors from `DRUG_PALETTE`. Dynamic height (24px per bar). `_base_layout()` + `_smart_legend()`. Hover shows drug, directory, avg doses, patients.
- **TAB_DEFINITIONS**: Added `("doses", "Doses")` — now 9 tabs: Icicle, Sankey, Heatmap, Funnel, Depth, Scatter, Network, Timeline, Doses.
- **`_render_doses()`** in `chart.py`: Standard render helper with directory/trust filter extraction and error handling.
- **Dispatch case**: Added `elif active_tab == "doses"` in `update_chart()`.
### Validation results:
- Tier 1 (Code): `from dash_app.app import app` OK. `uv run python run_dash.py` starts cleanly, HTTP 200.
- Tier 2 (Visual): 59 data points across 12 directories. Top: TOCILIZUMAB (RHEUMATOLOGY) avg 70.5 doses, INFLIXIMAB (OPHTHALMOLOGY) 47.7, EVOLOCUMAB (CHEMICAL PATHOLOGY) 46.6. Dynamic height 1536px for all, 504px for single directory.
- Tier 3 (Functional): Directory filter works (RHEUMATOLOGY: 16 drugs). Empty data returns empty figure. Tab switching wired via dynamic `_TAB_IDS`. 9 tabs visible.
### Files changed:
- `src/data_processing/pathway_queries.py` — added `get_dosing_distribution()`
- `dash_app/data/queries.py` — added import + thin wrapper
- `src/visualization/plotly_generator.py` — added `create_dosing_distribution_figure()`
- `dash_app/components/chart_card.py` — added doses to TAB_DEFINITIONS
- `dash_app/callbacks/chart.py` — added `_render_doses()` + dispatch case
- `IMPLEMENTATION_PLAN.md` — marked D.2 subtasks [x]
### Committed: c7e9398 "feat: average administered doses chart tab (Task D.2)"
### Patterns discovered:
- The `average_administered` column is a JSON array of floats (with NaN as string). Position 0 = average doses for the drug at that node. Level 3 nodes have only position 0, level 4 has positions 0+1, etc.
- `json.loads(s.replace("NaN", "null"))` safely handles NaN values in the JSON. Returns None for NaN positions.
- No need for a separate parsing function in `parsing.py` — the JSON parsing is simple enough to inline in the query function (3 lines).
- With 59 bars (one per drug×directory), the chart is readable with 24px per bar and legend grouped by directory.
### Next iteration should:
- Two Phase D tasks remain: D.1 (Temporal trends) and D.4 (NICE TA compliance).
- **D.1** is very likely blocked: requires modifying `schema.py`, `reference_data.py`, and `refresh_pathways.py` — all in the guardrails DO NOT MODIFY list. Also needs multiple refresh cycles. Recommend marking D.1 as [B] (blocked).
- **D.4** (NICE TA compliance): Check if `data/ta-recommendations.xlsx` exists and inspect its contents. This requires: (1) parsing the Excel into a reference table, (2) creating schema + migration (guardrail-protected), (3) compliance scoring algorithm, (4) figure function, (5) tab/callback wiring. Steps 1-2 are also likely blocked by guardrails.
- If both D.1 and D.4 are blocked, the project may be effectively complete for the current scope. Mark them [B] and check completion criteria.
- Alternatively: assess whether D.4 can be done without schema changes by keeping TA data in memory (load Excel directly from disk, no migration needed).
### Blocked items:
- D.1: Blocked — requires modifying guardrail-protected files + needs multiple refresh cycles.
- D.4: Potentially blocked — requires schema + migration for TA reference data (guardrail-protected files).
## Iteration 15 — 2026-02-07
### Task: D.1 + D.4 — Final assessment and project completion
### Why this task:
- Only D.1 and D.4 remain. Both were flagged as potentially blocked in Iteration 14. This iteration assesses their feasibility and makes a final determination.
### Status: COMPLETE (project finalization)
### What was done:
- **D.1 (Temporal trends)**: Confirmed BLOCKED. Requires modifying 3 guardrail-protected files (`schema.py`, `reference_data.py`, `refresh_pathways.py`) + needs ≥2 data refresh cycles to show meaningful trend data. Marked `[B]` in IMPLEMENTATION_PLAN.md.
- **D.4 (NICE TA compliance)**: Confirmed BLOCKED. `data/ta-recommendations.xlsx` does not exist — the source data file is missing entirely. Even if it existed, would require schema + migration changes to guardrail-protected files. Marked `[B]` in IMPLEMENTATION_PLAN.md.
- **Completion criteria**: Updated all achievable criteria to `[x]`. Phase D blocked items marked `[B]` with explanations.
- **Final validation**: `from dash_app.app import app` OK. `python run_dash.py` starts cleanly on http://127.0.0.1:8050/.
### Validation results:
- Tier 1 (Code): Import check passes, app starts without errors
### Files changed:
- `IMPLEMENTATION_PLAN.md` — marked D.1 and D.4 as [B] (blocked), updated all completion criteria
- `progress.txt` — this entry
### Summary of all completed work (14 iterations):
- **Phase A (4 tasks)**: Shared constants + `_base_layout()`, heatmap fixes (colorscale + annotations), legend overflow fix (4 charts), trust color differentiation
- **Phase B (4 tasks)**: Title consistency (all 11 charts), cost effectiveness gradient, Sankey freeform, heatmap metric toggle
- **Phase C (4 tasks)**: Retention funnel, pathway depth distribution, duration vs cost scatter, drug network graph
- **Phase D (2 of 4 tasks)**: Drug timeline Gantt chart, average administered doses chart
- **Total**: 14 tasks completed, 2 blocked, 9 Patient Pathways tabs, 15+ chart functions, all using shared styling
### Blocked items:
- D.1: BLOCKED — guardrail-protected file modifications required + needs ≥2 refresh cycles
- D.4: BLOCKED — source data file (`ta-recommendations.xlsx`) missing + guardrail-protected file modifications required
## Manual Intervention — 2026-02-07
### Reason: Unblock D.1 with historical snapshots approach, remove D.4
### Changes made:
- `IMPLEMENTATION_PLAN.md` — rewrote D.1 as two subtasks (D.1a: CLI script, D.1b: Dash tab), removed D.4 entirely, updated completion criteria
- `guardrails.md` — updated DB guardrail exception for trends CLI script, added new guardrail about using existing pipeline functions as-is
- `progress.txt` — this entry
### Tasks reset: None (D.1 was already blocked `[B]`, now unblocked as `[ ]`)
### Tasks added: None (D.1 rewritten in-place as D.1a + D.1b)
### Tasks removed: D.4 (NICE TA compliance — source data doesn't exist, not viable)
### Context for next iteration:
- D.1a is the next task: create `cli/compute_trends.py` standalone CLI script
- The script imports `fetch_and_transform_data()` and `process_pathway_for_date_filter()` from `pathway_pipeline.py` — calls them, does NOT modify them
- Creates `pathway_trends` table via `CREATE TABLE IF NOT EXISTS` in pathways.db
- Loops over 6-month endpoints (2021-06-30 through 2025-12-31) using `all_6mo` date filter config with different `max_date` values
- For each endpoint: run pathway computation, extract level 3 node summaries (drug, directory, patients, cost, cost_pp_pa)
- The key insight: `compute_date_ranges(config, max_date)` and `process_pathway_for_date_filter(df, config, ..., max_date=...)` already accept `max_date` as a parameter — no pipeline changes needed
- This requires Snowflake access (fetches all activity data once, then filters locally for each period)
- After D.1a, D.1b adds the Dash tab using the standard 6-step pattern
### New guardrails added:
- "Trend computation uses existing pipeline functions as-is" — import and call, don't modify
## Iteration 16 — 2026-02-07
### Task: D.1 — Temporal trend analysis (D.1a + D.1b combined)
### Why this task:
- D.1 is the only remaining task. D.1a (CLI script) and D.1b (Dash tab) are both needed to complete D.1. Manual intervention unblocked D.1 by rewriting it as a standalone CLI that imports existing pipeline functions without modifying them.
### Status: COMPLETE
### What was done:
- **D.1a — `cli/compute_trends.py`**: Standalone CLI script. Imports `fetch_and_transform_data()` and `process_pathway_for_date_filter()` from `pathway_pipeline.py`. Creates `pathway_trends` table via `CREATE TABLE IF NOT EXISTS`. Generates period endpoints from 2021-06-30 to 2025-12-31 at configurable intervals (default 6 months). For each endpoint: runs pathway computation with `max_date=endpoint`, extracts level 3 node summaries (drug, directory, patients, total_cost, cost_pp_pa). Supports `--dry-run`, `--start/--end`, `--interval`, `--verbose`.
- **D.1b — Trends tab (6-step pattern)**:
1. `get_trend_data()` in `pathway_queries.py` — checks table existence first, aggregates by drug (or directory), supports directory/drug filters, handles cost_pp_pa as weighted average
2. Thin wrapper in `dash_app/data/queries.py`
3. `create_trend_figure()` in `plotly_generator.py` — line chart with `go.Scatter` (lines+markers), one trace per drug/directory, `_base_layout()` + `_smart_legend()`, empty state shows "Run python -m cli.compute_trends" message
4. Added "Trends" tab to `TAB_DEFINITIONS` (10th tab: Icicle, Sankey, Heatmap, Funnel, Depth, Scatter, Network, Timeline, Doses, Trends)
5. Added `_render_trends()` helper + dispatch case. Trends tab handles empty data independently of chart-data store.
6. Added `dmc.SegmentedControl` metric toggle (patients/cost/cost_pp_pa) in chart card header, visible only when trends tab active
### Validation results:
- Tier 1 (Code): `from dash_app.app import app` OK. `python run_dash.py` starts cleanly on http://127.0.0.1:8050/. `python -m cli.compute_trends --help` works.
- Tier 2 (Visual): Empty data figure shows "No trend data available. Run python -m cli.compute_trends to generate." Sample data (2 drugs × 2 periods) renders 2 traces with correct names and Patients y-axis label.
- Tier 3 (Functional): Query returns [] when pathway_trends table doesn't exist (graceful empty state). Tab switching wired via dynamic `_TAB_IDS`. Metric toggle shows/hides based on active tab. 10 tabs visible.
### Files changed:
- `src/cli/compute_trends.py` — NEW: standalone CLI script for computing historical trend snapshots
- `src/data_processing/pathway_queries.py` — added `get_trend_data()`
- `dash_app/data/queries.py` — added import + thin wrapper
- `src/visualization/plotly_generator.py` — added `create_trend_figure()`
- `dash_app/components/chart_card.py` — added trends to TAB_DEFINITIONS + metric toggle
- `dash_app/callbacks/chart.py` — added `_render_trends()` + dispatch case + trends metric toggle I/O
- `IMPLEMENTATION_PLAN.md` — marked D.1a, D.1b, and completion criteria [x]
### Committed: d0404aa "feat: temporal trends CLI script + Dash tab (Task D.1)"
### Patterns discovered:
- The trends tab is unique: it doesn't depend on the `chart-data` store (which holds pathway_nodes). It queries `pathway_trends` directly. The dispatch case checks for "trends" before the "no nodes" check.
- `sqlite3` table existence check via `SELECT name FROM sqlite_master WHERE type='table' AND name='pathway_trends'` is the clean way to handle the table-doesn't-exist case.
- Adding a new Output/Input to an existing callback requires updating ALL return paths (4 return sites in update_chart).
### Next iteration should:
- See Manual Intervention below — Phase E tasks added.
### Blocked items:
- None — all tasks complete
## Manual Intervention — 2026-02-07
### Reason: Redesign temporal trends as standalone view + fix chart height
### Changes made:
- `IMPLEMENTATION_PLAN.md` — added Phase E with 5 tasks (E.1E.5), updated "What Changes" section, added Phase E completion criteria
- `guardrails.md` — added guardrails for 3-view navigation and Trends view state
- `progress.txt` — this entry
### Tasks reset: None (all Phase AD tasks remain complete)
### Tasks added:
- E.1: Remove Trends tab from Patient Pathways
- E.2: Add Trends sidebar nav item + view container (3rd top-level view)
- E.3: Create Trends landing page — directorate-level overview chart with metric toggle
- E.4: Add drug drill-down within Trends view (click directorate → drug-level trends)
- E.5: Fix chart height to fill viewport + rename "Cost" to "Cost per Patient"
### Context for next iteration:
- Start with E.1 (remove Trends from Patient Pathways) — this is a cleanup task that simplifies the codebase before adding the new view
- E.1 involves removing the trends tab from TAB_DEFINITIONS, removing the trends-metric-wrapper/toggle from chart_card.py, removing _render_trends() and its dispatch case from chart.py, and cleaning up the update_chart() callback signature (remove trends Output/Input). CRITICAL: update ALL return paths in update_chart() when removing the trends toggle style output.
- After E.1, E.2 adds the 3rd sidebar item and empty view container. Key files: sidebar.py (add icon + nav item), app.py (add trends-view div), navigation.py (3-way switch_view), filters.py (add nav-trends Input)
- E.3 creates the new Trends view components and callbacks. The existing `get_trend_data()` in pathway_queries.py needs a `group_by` parameter added. `create_trend_figure()` in plotly_generator.py is reused as-is.
- E.4 adds drill-down using the same landing/detail toggle pattern as Trust Comparison (selected_trends_directorate in app-state)
- E.5 fixes chart height by removing fixed height values and relying on CSS flex + responsive=True
- The existing `get_trend_data()` query already supports directory filter and drug filter. For directorate-level grouping, add a `group_by="directory"` parameter that changes the SQL GROUP BY from drug to directory.
- Keep `create_trend_figure()` — it already handles any number of named series
### New guardrails added:
- "3-view navigation pattern" — when modifying switch_view or update_app_state, update ALL outputs/return paths for 3 views
- "Trends view state in app-state" — selected_trends_directorate must be in app-state initial data
## Iteration 17 — 2026-02-07
### Task: E.1 — Remove Trends tab from Patient Pathways
### Why this task:
- E.1 is the first Phase E task. Progress.txt explicitly recommended it. It's a cleanup prerequisite — the Trends tab must be removed from Patient Pathways before re-adding it as a standalone 3rd view in E.2E.4.
### Status: COMPLETE
### What was done:
- **chart_card.py**: Removed `("trends", "Trends")` from `TAB_DEFINITIONS` (now 9 tabs: Icicle through Doses). Removed `trends-metric-wrapper` div and `trends-metric-toggle` SegmentedControl from chart card header.
- **chart.py**: Removed `_render_trends()` helper function (~17 lines). Removed `Output("trends-metric-wrapper", "style")` and `Input("trends-metric-toggle", "value")` from `update_chart()` callback. Removed `trends_metric` parameter. Removed `trends_toggle_style` variable and the entire `if active_tab == "trends"` dispatch block. Updated ALL 4 return paths from 4-tuple to 3-tuple (figure, subtitle, heatmap_toggle_style).
- **queries.py**: Removed `get_trend_data` import from the import block and removed the `get_trend_data()` thin wrapper function.
- **Preserved**: `get_trend_data()` in `pathway_queries.py` and `create_trend_figure()` in `plotly_generator.py` — both still needed for the new Trends view.
### Validation results:
- Tier 1 (Code): `from dash_app.app import app` OK. `python run_dash.py` starts cleanly on http://127.0.0.1:8050/.
- Tier 2 (Visual): 9 tabs visible (Icicle, Sankey, Heatmap, Funnel, Depth, Scatter, Network, Timeline, Doses). No Trends tab.
- Tier 3 (Functional): Callback signature now has 3 Outputs, 4 Inputs (was 4 Outputs, 5 Inputs). All 4 return paths updated correctly.
### Files changed:
- `dash_app/components/chart_card.py` — removed trends tab + metric toggle
- `dash_app/callbacks/chart.py` — removed _render_trends + trends dispatch + trends I/O
- `dash_app/data/queries.py` — removed get_trend_data import + wrapper
- `IMPLEMENTATION_PLAN.md` — marked E.1 subtasks [x]
### Committed: d052d2b "refactor: remove Trends tab from Patient Pathways (Task E.1)"
### Patterns discovered:
- Removing a callback Output/Input required updating exactly 4 return paths: (1) no chart_data, (2) error_msg, (3) no nodes, (4) final return with fig. Guardrail about counting return paths before/after was essential.
- The `dmc` import in chart_card.py is still needed for the heatmap metric toggle — only the trends toggle was removed.
### Next iteration should:
- Do Task E.2: Add Trends sidebar nav item + view container. Key files to read:
1. `dash_app/components/sidebar.py` — add "trends" icon + 3rd nav item
2. `dash_app/app.py` — add `trends-view` div to layout, add `selected_trends_directorate` to app-state initial data
3. `dash_app/callbacks/navigation.py` — update `switch_view()` for 3 views (6 outputs: 3 view styles + 3 nav classNames)
4. `dash_app/callbacks/filters.py` — add `Input("nav-trends", "n_clicks")` to `update_app_state()`
- CRITICAL: The `switch_view()` callback must handle 3 views with 6 Outputs. Read the existing 2-view implementation first to understand the pattern, then extend to 3.
- CRITICAL: `update_app_state()` has multiple return paths — ensure ALL handle the new "trends" active_view case.
### Blocked items:
- None