34396fef5e
Add three new tables to support pre-computed pathway data: - pathway_date_filters: 6 pre-defined date filter combinations - pathway_nodes: pre-computed pathway hierarchy with all visualization data - pathway_refresh_log: tracks data refresh status Includes: - 8 indexes for efficient filtering by date_filter_id, trust, directory, drug - Helper functions: create/drop/verify/get_counts for pathway tables - clear_pathway_nodes() for selective or full data clearing - get_pathway_refresh_status() for checking last refresh - Integration with existing ALL_TABLES_SCHEMA and combined helpers
7.5 KiB
7.5 KiB
Implementation Plan - Pathway Data Architecture
Project Overview
Pre-compute patient treatment pathways from Snowflake and store in SQLite for fast Reflex filtering. This replaces the current simplified prepare_chart_data() with full pathway hierarchy support.
Architecture: Snowflake → Pathway Processing → SQLite (pre-computed) → Reflex (filter & view)
Key Benefits:
- Performance: Pathway calculation done once during data refresh, not on every filter
- Simplicity: Reflex filters pre-computed data with simple SQL WHERE clauses
- Full Pathways: Sequential treatment pathways (drug_0 → drug_1 → drug_2...) with statistics
Design Reference: See PATHWAY_DATA_ARCHITECTURE_PLAN.md for detailed architecture, schema, and data flow.
Source Code:
- Existing analysis:
analysis/pathway_analyzer.py - Existing visualization:
visualization/plotly_generator.py - Existing Reflex app:
pathways_app/app_v2.py
Quality Checks
Run after each task:
# Syntax check for Python files
python -m py_compile <file.py>
# Import verification
python -c "from <module> import <class>"
# For Reflex changes
cd pathways_app && timeout 60 python -m reflex run 2>&1 | head -30
Phase 1: Schema & Data Pipeline Foundation
1.1 Extend Database Schema
- Add
pathway_date_filterstable with 6 pre-defined combinations:all_6mo,all_12mo,1yr_6mo,1yr_12mo,2yr_6mo,2yr_12mo
- Add
pathway_nodestable with:- Hierarchy structure (parents, ids, labels, level)
- Patient counts and costs (value, cost, costpp, cost_pp_pa)
- Date ranges (first_seen, last_seen, first_seen_parent, last_seen_parent)
- Treatment statistics (average_spacing, average_administered, avg_days)
- Denormalized filter columns (trust_name, directory, drug_sequence)
- Foreign key to date_filter_id
- Add
pathway_refresh_logtable for tracking refresh status - Create indexes for efficient filtering
- Verify schema with:
python -c "from data_processing.schema import *"
1.2 Create Pathway Pipeline Module
- Create
data_processing/pathway_pipeline.pywith:fetch_and_transform_data()- Snowflake fetch + UPID/drug/directory transformationsprocess_pathway_for_date_filter(df, date_filter_config)- Single filter processingextract_denormalized_fields(ice_df)- Extract trust, directory, drug_sequence from idsconvert_to_records(ice_df, date_filter_id)- Convert ice_df to list of dicts for SQLite
- Integrate with existing
analysis/pathway_analyzer.pyfunctions - Verify:
python -c "from data_processing.pathway_pipeline import *"
1.3 Create Migration Script
- Create script to set up new tables in existing
data/pathways.db - Pre-populate
pathway_date_filterswith 6 combinations - Verify migration runs cleanly on fresh database
Phase 2: CLI Refresh Command
2.1 Create Refresh Command
- Create
cli/refresh_pathways.pywith:- DATE_FILTER_CONFIGS constant (6 combinations)
compute_date_ranges(config, max_date)- Calculate actual dates from configrefresh_pathways(minimum_patients, provider_codes, ...)main function
- Implement refresh flow:
- Fetch ALL data from Snowflake (full date range)
- Apply transformations (UPID, drug names, directory)
- Clear existing pathway_nodes
- For each of 6 date filter configs: filter → process → insert
- Update pathway_refresh_log
- Add CLI argument parsing (--minimum-patients, --provider-codes, etc.)
- Verify:
python -m cli.refresh_pathways --help
2.2 Test Refresh Pipeline
- Run refresh with Snowflake data
- Verify all 6 date_filter_ids populated in pathway_nodes
- Verify pathway structure matches original
generate_icicle_chart()output - Verify patient counts are correct (compare with original app)
- Document estimated processing time (expect 6-12 minutes for 440K records)
Phase 3: Reflex Integration
3.1 Update AppState
- Replace date picker state with dropdown state:
selected_initiated: str = "all"("all", "1yr", "2yr")selected_last_seen: str = "6mo"("6mo", "12mo")
- Add
date_filter_idcomputed property:f"{selected_initiated}_{selected_last_seen}" - Rewrite
load_pathway_data()to querypathway_nodestable:- Base filter:
WHERE date_filter_id = ? - Trust/directory/drug filters on denormalized columns
- Base filter:
- Add
recalculate_parent_totals()for filtered hierarchies - Update KPI calculations from root node data
3.2 Update Icicle Figure
- Update
icicle_figurecomputed property to use all pathway_nodes columns - Match original 10-field customdata structure:
- values, colours, costs, costpp
- first_seen, last_seen, first_seen_parent, last_seen_parent
- average_spacing, cost_pp_pa
- Restore full hover/text templates from
visualization/plotly_generator.py - Verify chart renders correctly with treatment statistics
3.3 Update UI Components
- Replace date pickers with select dropdowns:
- Initiated: "All years", "Last 2 years", "Last 1 year"
- Last Seen: "Last 6 months", "Last 12 months"
- Add "Data refreshed: X ago" indicator from pathway_refresh_log
- Update filter section layout
- Verify UI compiles and renders correctly
Phase 4: Testing & Validation
4.1 End-to-End Validation
- Pathway hierarchy matches original: Compare specific pathway ids structure
- Patient counts match: Compare root patient count for same date range
- Treatment statistics display correctly: Verify "Average treatment duration" hover data
- Drug filtering works: Filter to FARICIMAB, verify correct pathways shown
- Chart renders with all tooltip data: Verify 10-field customdata structure
4.2 Performance Testing
- Measure filter change response time (target: <500ms)
- Measure initial page load (target: <2s including data load)
- Verify chart interaction (zoom, hover) is smooth with no lag
- Test with full dataset
4.3 Documentation
- Update CLAUDE.md with new architecture
- Document CLI usage for
refresh_pathways - Update README with new run instructions
- Document any breaking changes from original app
Completion Criteria
All tasks marked [x] AND:
- App compiles without errors (
reflex runsucceeds) - All 6 date filter combinations work correctly
- Drug/directory/trust filters work with instant updates
- KPIs display correct numbers matching filter state
- Icicle chart renders with full pathway data and statistics
- Treatment duration and dosing information displays in tooltips
- No console errors during normal operation
- Verified with real patient data from Snowflake
Reference
Date Filter Combinations
| ID | Initiated | Last Seen | Default |
|---|---|---|---|
all_6mo |
All years | Last 6 months | Yes |
all_12mo |
All years | Last 12 months | No |
1yr_6mo |
Last 1 year | Last 6 months | No |
1yr_12mo |
Last 1 year | Last 12 months | No |
2yr_6mo |
Last 2 years | Last 6 months | No |
2yr_12mo |
Last 2 years | Last 12 months | No |
Key Files
| File | Purpose |
|---|---|
data_processing/schema.py |
Database schema definitions |
data_processing/pathway_pipeline.py |
New pathway processing pipeline |
cli/refresh_pathways.py |
CLI refresh command |
analysis/pathway_analyzer.py |
Existing pathway analysis logic |
visualization/plotly_generator.py |
Existing chart generation |
pathways_app/app_v2.py |
Reflex application |