Files
HighCostDrugsDemo/IMPLEMENTATION_PLAN.md
T
Andrew Charlwood 7948ca7da3 feat: update AppState to query pre-computed pathway_nodes (Task 3.1)
- Add dropdown state for date filters (selected_initiated, selected_last_seen)
- Add date_filter_id computed property combining the two selections
- Add load_pathway_data() method to query pathway_nodes table
- Add recalculate_parent_totals() for filtered hierarchies
- Update all filter handlers to call load_pathway_data()
- Update KPI calculations from root node data

Phase 3 Reflex integration: Task 3.1 complete
2026-02-05 00:26:21 +00:00

9.0 KiB

Implementation Plan - Pathway Data Architecture

Project Overview

Pre-compute patient treatment pathways from Snowflake and store in SQLite for fast Reflex filtering. This replaces the current simplified prepare_chart_data() with full pathway hierarchy support.

Architecture: Snowflake → Pathway Processing → SQLite (pre-computed) → Reflex (filter & view)

Key Benefits:

  • Performance: Pathway calculation done once during data refresh, not on every filter
  • Simplicity: Reflex filters pre-computed data with simple SQL WHERE clauses
  • Full Pathways: Sequential treatment pathways (drug_0 → drug_1 → drug_2...) with statistics

Design Reference: See PATHWAY_DATA_ARCHITECTURE_PLAN.md for detailed architecture, schema, and data flow.

Source Code:

  • Existing analysis: analysis/pathway_analyzer.py
  • Existing visualization: visualization/plotly_generator.py
  • Existing Reflex app: pathways_app/app_v2.py

Quality Checks

Run after each task:

# Syntax check for Python files
python -m py_compile <file.py>

# Import verification
python -c "from <module> import <class>"

# For Reflex changes
cd pathways_app && timeout 60 python -m reflex run 2>&1 | head -30

Phase 1: Schema & Data Pipeline Foundation

1.1 Extend Database Schema

  • Add pathway_date_filters table with 6 pre-defined combinations:
    • all_6mo, all_12mo, 1yr_6mo, 1yr_12mo, 2yr_6mo, 2yr_12mo
  • Add pathway_nodes table with:
    • Hierarchy structure (parents, ids, labels, level)
    • Patient counts and costs (value, cost, costpp, cost_pp_pa)
    • Date ranges (first_seen, last_seen, first_seen_parent, last_seen_parent)
    • Treatment statistics (average_spacing, average_administered, avg_days)
    • Denormalized filter columns (trust_name, directory, drug_sequence)
    • Foreign key to date_filter_id
  • Add pathway_refresh_log table for tracking refresh status
  • Create indexes for efficient filtering
  • Verify schema with: python -c "from data_processing.schema import *"

1.2 Create Pathway Pipeline Module

  • Create data_processing/pathway_pipeline.py with:
    • fetch_and_transform_data() - Snowflake fetch + UPID/drug/directory transformations
    • process_pathway_for_date_filter(df, date_filter_config) - Single filter processing
    • extract_denormalized_fields(ice_df) - Extract trust, directory, drug_sequence from ids
    • convert_to_records(ice_df, date_filter_id) - Convert ice_df to list of dicts for SQLite
  • Integrate with existing analysis/pathway_analyzer.py functions
  • Verify: python -c "from data_processing.pathway_pipeline import *"

1.3 Create Migration Script

  • Create script to set up new tables in existing data/pathways.db
    • Note: Existing python -m data_processing.migrate handles this (updated in Task 1.1)
  • Pre-populate pathway_date_filters with 6 combinations
    • Note: Auto-populated via INSERT OR REPLACE in PATHWAY_DATE_FILTERS_SCHEMA
  • Verify migration runs cleanly on fresh database
    • Verified: All 3 pathway tables created, 6 date filters populated correctly

Phase 2: CLI Refresh Command

2.1 Create Refresh Command

  • Create cli/refresh_pathways.py with:
    • Uses DATE_FILTER_CONFIGS and compute_date_ranges from pathway_pipeline.py
    • refresh_pathways(minimum_patients, provider_codes, ...) main function
    • insert_pathway_records() for SQLite insertion
    • log_refresh_start/complete/failed() for refresh tracking
  • Implement refresh flow:
    1. Fetch ALL data from Snowflake (full date range) via fetch_and_transform_data()
    2. Apply transformations (UPID, drug names, directory) - handled by pipeline
    3. Clear existing pathway_nodes via clear_pathway_nodes()
    4. For each of 6 date filter configs: filter → process → insert
    5. Update pathway_refresh_log
  • Add CLI argument parsing (--minimum-patients, --provider-codes, --dry-run, --verbose)
  • Verify: python -m cli.refresh_pathways --help

2.2 Test Refresh Pipeline

  • Run refresh with Snowflake data
    • Successfully fetched 656,695 records from Snowflake in ~7s
    • Transformed to 519,848 records after UPID/drug/directory processing
  • Verify all 6 date_filter_ids populated in pathway_nodes
    • Note: Only all_6mo has data (293 nodes) due to test data freshness
    • Other filters (all_12mo, 1yr_, 2yr_) have no matching data in current Snowflake snapshot
    • This is expected — the pipeline works, data just doesn't match date filters
  • Verify pathway structure matches original generate_icicle_chart() output
    • Structure verified: N&WICS - TRUST - DIRECTORY - DRUG - PATHWAY levels
    • 8 trusts, 14 directories represented correctly
  • Verify patient counts are correct (compare with original app)
    • Sample: QEH RHEUMATOLOGY has 591 patients — consistent with expected volumes
  • Document estimated processing time (expect 6-12 minutes for 440K records)
    • Actual: ~6.2 minutes (371.7s) for 656K → 519K → 293 nodes
    • Breakdown: Snowflake fetch 7s, Transformations ~6min, Pathway processing ~30s

Phase 3: Reflex Integration

3.1 Update AppState

  • Replace date picker state with dropdown state:
    • selected_initiated: str = "all" ("all", "1yr", "2yr")
    • selected_last_seen: str = "6mo" ("6mo", "12mo")
    • Added initiated_options and last_seen_options for dropdown rendering
    • Added set_initiated_filter() and set_last_seen_filter() event handlers
  • Add date_filter_id computed property: f"{selected_initiated}_{selected_last_seen}"
  • Rewrite load_pathway_data() to query pathway_nodes table:
    • Base filter: WHERE date_filter_id = ?
    • Trust/directory/drug filters on denormalized columns
    • Updated all filter handlers to call load_pathway_data() instead of apply_filters()
  • Add recalculate_parent_totals() for filtered hierarchies
  • Update KPI calculations from root node data
    • KPIs now extracted from root node (level 0) in pathway_nodes
    • unique_patients, total_cost, total_drugs updated from query results

3.2 Update Icicle Figure

  • Update icicle_figure computed property to use all pathway_nodes columns
  • Match original 10-field customdata structure:
    • values, colours, costs, costpp
    • first_seen, last_seen, first_seen_parent, last_seen_parent
    • average_spacing, cost_pp_pa
  • Restore full hover/text templates from visualization/plotly_generator.py
  • Verify chart renders correctly with treatment statistics

3.3 Update UI Components

  • Replace date pickers with select dropdowns:
    • Initiated: "All years", "Last 2 years", "Last 1 year"
    • Last Seen: "Last 6 months", "Last 12 months"
  • Add "Data refreshed: X ago" indicator from pathway_refresh_log
  • Update filter section layout
  • Verify UI compiles and renders correctly

Phase 4: Testing & Validation

4.1 End-to-End Validation

  • Pathway hierarchy matches original: Compare specific pathway ids structure
  • Patient counts match: Compare root patient count for same date range
  • Treatment statistics display correctly: Verify "Average treatment duration" hover data
  • Drug filtering works: Filter to FARICIMAB, verify correct pathways shown
  • Chart renders with all tooltip data: Verify 10-field customdata structure

4.2 Performance Testing

  • Measure filter change response time (target: <500ms)
  • Measure initial page load (target: <2s including data load)
  • Verify chart interaction (zoom, hover) is smooth with no lag
  • Test with full dataset

4.3 Documentation

  • Update CLAUDE.md with new architecture
  • Document CLI usage for refresh_pathways
  • Update README with new run instructions
  • Document any breaking changes from original app

Completion Criteria

All tasks marked [x] AND:

  • App compiles without errors (reflex run succeeds)
  • All 6 date filter combinations work correctly
  • Drug/directory/trust filters work with instant updates
  • KPIs display correct numbers matching filter state
  • Icicle chart renders with full pathway data and statistics
  • Treatment duration and dosing information displays in tooltips
  • No console errors during normal operation
  • Verified with real patient data from Snowflake

Reference

Date Filter Combinations

ID Initiated Last Seen Default
all_6mo All years Last 6 months Yes
all_12mo All years Last 12 months No
1yr_6mo Last 1 year Last 6 months No
1yr_12mo Last 1 year Last 12 months No
2yr_6mo Last 2 years Last 6 months No
2yr_12mo Last 2 years Last 12 months No

Key Files

File Purpose
data_processing/schema.py Database schema definitions
data_processing/pathway_pipeline.py New pathway processing pipeline
cli/refresh_pathways.py CLI refresh command
analysis/pathway_analyzer.py Existing pathway analysis logic
visualization/plotly_generator.py Existing chart generation
pathways_app/app_v2.py Reflex application