# Implementation Plan - Pathway Data Architecture ## Project Overview Pre-compute patient treatment pathways from Snowflake and store in SQLite for fast Reflex filtering. This replaces the current simplified `prepare_chart_data()` with full pathway hierarchy support. **Architecture**: Snowflake → Pathway Processing → SQLite (pre-computed) → Reflex (filter & view) **Key Benefits**: - Performance: Pathway calculation done once during data refresh, not on every filter - Simplicity: Reflex filters pre-computed data with simple SQL WHERE clauses - Full Pathways: Sequential treatment pathways (drug_0 → drug_1 → drug_2...) with statistics **Design Reference**: See `PATHWAY_DATA_ARCHITECTURE_PLAN.md` for detailed architecture, schema, and data flow. **Source Code**: - Existing analysis: `analysis/pathway_analyzer.py` - Existing visualization: `visualization/plotly_generator.py` - Existing Reflex app: `pathways_app/app_v2.py` ## Quality Checks Run after each task: ```bash # Syntax check for Python files python -m py_compile # Import verification python -c "from import " # For Reflex changes cd pathways_app && timeout 60 python -m reflex run 2>&1 | head -30 ``` ## Phase 1: Schema & Data Pipeline Foundation ### 1.1 Extend Database Schema - [x] Add `pathway_date_filters` table with 6 pre-defined combinations: - `all_6mo`, `all_12mo`, `1yr_6mo`, `1yr_12mo`, `2yr_6mo`, `2yr_12mo` - [x] Add `pathway_nodes` table with: - Hierarchy structure (parents, ids, labels, level) - Patient counts and costs (value, cost, costpp, cost_pp_pa) - Date ranges (first_seen, last_seen, first_seen_parent, last_seen_parent) - Treatment statistics (average_spacing, average_administered, avg_days) - Denormalized filter columns (trust_name, directory, drug_sequence) - Foreign key to date_filter_id - [x] Add `pathway_refresh_log` table for tracking refresh status - [x] Create indexes for efficient filtering - [x] Verify schema with: `python -c "from data_processing.schema import *"` ### 1.2 Create Pathway Pipeline Module - [x] Create `data_processing/pathway_pipeline.py` with: - `fetch_and_transform_data()` - Snowflake fetch + UPID/drug/directory transformations - `process_pathway_for_date_filter(df, date_filter_config)` - Single filter processing - `extract_denormalized_fields(ice_df)` - Extract trust, directory, drug_sequence from ids - `convert_to_records(ice_df, date_filter_id)` - Convert ice_df to list of dicts for SQLite - [x] Integrate with existing `analysis/pathway_analyzer.py` functions - [x] Verify: `python -c "from data_processing.pathway_pipeline import *"` ### 1.3 Create Migration Script - [x] Create script to set up new tables in existing `data/pathways.db` - Note: Existing `python -m data_processing.migrate` handles this (updated in Task 1.1) - [x] Pre-populate `pathway_date_filters` with 6 combinations - Note: Auto-populated via INSERT OR REPLACE in PATHWAY_DATE_FILTERS_SCHEMA - [x] Verify migration runs cleanly on fresh database - Verified: All 3 pathway tables created, 6 date filters populated correctly ## Phase 2: CLI Refresh Command ### 2.1 Create Refresh Command - [x] Create `cli/refresh_pathways.py` with: - Uses DATE_FILTER_CONFIGS and compute_date_ranges from pathway_pipeline.py - `refresh_pathways(minimum_patients, provider_codes, ...)` main function - `insert_pathway_records()` for SQLite insertion - `log_refresh_start/complete/failed()` for refresh tracking - [x] Implement refresh flow: 1. Fetch ALL data from Snowflake (full date range) via fetch_and_transform_data() 2. Apply transformations (UPID, drug names, directory) - handled by pipeline 3. Clear existing pathway_nodes via clear_pathway_nodes() 4. For each of 6 date filter configs: filter → process → insert 5. Update pathway_refresh_log - [x] Add CLI argument parsing (--minimum-patients, --provider-codes, --dry-run, --verbose) - [x] Verify: `python -m cli.refresh_pathways --help` ### 2.2 Test Refresh Pipeline - [ ] Run refresh with Snowflake data - [ ] Verify all 6 date_filter_ids populated in pathway_nodes - [ ] Verify pathway structure matches original `generate_icicle_chart()` output - [ ] Verify patient counts are correct (compare with original app) - [ ] Document estimated processing time (expect 6-12 minutes for 440K records) ## Phase 3: Reflex Integration ### 3.1 Update AppState - [ ] Replace date picker state with dropdown state: - `selected_initiated: str = "all"` ("all", "1yr", "2yr") - `selected_last_seen: str = "6mo"` ("6mo", "12mo") - [ ] Add `date_filter_id` computed property: `f"{selected_initiated}_{selected_last_seen}"` - [ ] Rewrite `load_pathway_data()` to query `pathway_nodes` table: - Base filter: `WHERE date_filter_id = ?` - Trust/directory/drug filters on denormalized columns - [ ] Add `recalculate_parent_totals()` for filtered hierarchies - [ ] Update KPI calculations from root node data ### 3.2 Update Icicle Figure - [ ] Update `icicle_figure` computed property to use all pathway_nodes columns - [ ] Match original 10-field customdata structure: - values, colours, costs, costpp - first_seen, last_seen, first_seen_parent, last_seen_parent - average_spacing, cost_pp_pa - [ ] Restore full hover/text templates from `visualization/plotly_generator.py` - [ ] Verify chart renders correctly with treatment statistics ### 3.3 Update UI Components - [ ] Replace date pickers with select dropdowns: - Initiated: "All years", "Last 2 years", "Last 1 year" - Last Seen: "Last 6 months", "Last 12 months" - [ ] Add "Data refreshed: X ago" indicator from pathway_refresh_log - [ ] Update filter section layout - [ ] Verify UI compiles and renders correctly ## Phase 4: Testing & Validation ### 4.1 End-to-End Validation - [ ] **Pathway hierarchy matches original**: Compare specific pathway ids structure - [ ] **Patient counts match**: Compare root patient count for same date range - [ ] **Treatment statistics display correctly**: Verify "Average treatment duration" hover data - [ ] **Drug filtering works**: Filter to FARICIMAB, verify correct pathways shown - [ ] **Chart renders with all tooltip data**: Verify 10-field customdata structure ### 4.2 Performance Testing - [ ] Measure filter change response time (target: <500ms) - [ ] Measure initial page load (target: <2s including data load) - [ ] Verify chart interaction (zoom, hover) is smooth with no lag - [ ] Test with full dataset ### 4.3 Documentation - [ ] Update CLAUDE.md with new architecture - [ ] Document CLI usage for `refresh_pathways` - [ ] Update README with new run instructions - [ ] Document any breaking changes from original app ## Completion Criteria All tasks marked `[x]` AND: - [ ] App compiles without errors (`reflex run` succeeds) - [ ] All 6 date filter combinations work correctly - [ ] Drug/directory/trust filters work with instant updates - [ ] KPIs display correct numbers matching filter state - [ ] Icicle chart renders with full pathway data and statistics - [ ] Treatment duration and dosing information displays in tooltips - [ ] No console errors during normal operation - [ ] Verified with real patient data from Snowflake ## Reference ### Date Filter Combinations | ID | Initiated | Last Seen | Default | |----|-----------|-----------|---------| | `all_6mo` | All years | Last 6 months | Yes | | `all_12mo` | All years | Last 12 months | No | | `1yr_6mo` | Last 1 year | Last 6 months | No | | `1yr_12mo` | Last 1 year | Last 12 months | No | | `2yr_6mo` | Last 2 years | Last 6 months | No | | `2yr_12mo` | Last 2 years | Last 12 months | No | ### Key Files | File | Purpose | |------|---------| | `data_processing/schema.py` | Database schema definitions | | `data_processing/pathway_pipeline.py` | New pathway processing pipeline | | `cli/refresh_pathways.py` | CLI refresh command | | `analysis/pathway_analyzer.py` | Existing pathway analysis logic | | `visualization/plotly_generator.py` | Existing chart generation | | `pathways_app/app_v2.py` | Reflex application |