34396fef5e
Add three new tables to support pre-computed pathway data: - pathway_date_filters: 6 pre-defined date filter combinations - pathway_nodes: pre-computed pathway hierarchy with all visualization data - pathway_refresh_log: tracks data refresh status Includes: - 8 indexes for efficient filtering by date_filter_id, trust, directory, drug - Helper functions: create/drop/verify/get_counts for pathway tables - clear_pathway_nodes() for selective or full data clearing - get_pathway_refresh_status() for checking last refresh - Integration with existing ALL_TABLES_SCHEMA and combined helpers
175 lines
7.5 KiB
Markdown
175 lines
7.5 KiB
Markdown
# Implementation Plan - Pathway Data Architecture
|
|
|
|
## Project Overview
|
|
|
|
Pre-compute patient treatment pathways from Snowflake and store in SQLite for fast Reflex filtering. This replaces the current simplified `prepare_chart_data()` with full pathway hierarchy support.
|
|
|
|
**Architecture**: Snowflake → Pathway Processing → SQLite (pre-computed) → Reflex (filter & view)
|
|
|
|
**Key Benefits**:
|
|
- Performance: Pathway calculation done once during data refresh, not on every filter
|
|
- Simplicity: Reflex filters pre-computed data with simple SQL WHERE clauses
|
|
- Full Pathways: Sequential treatment pathways (drug_0 → drug_1 → drug_2...) with statistics
|
|
|
|
**Design Reference**: See `PATHWAY_DATA_ARCHITECTURE_PLAN.md` for detailed architecture, schema, and data flow.
|
|
|
|
**Source Code**:
|
|
- Existing analysis: `analysis/pathway_analyzer.py`
|
|
- Existing visualization: `visualization/plotly_generator.py`
|
|
- Existing Reflex app: `pathways_app/app_v2.py`
|
|
|
|
## Quality Checks
|
|
|
|
Run after each task:
|
|
|
|
```bash
|
|
# Syntax check for Python files
|
|
python -m py_compile <file.py>
|
|
|
|
# Import verification
|
|
python -c "from <module> import <class>"
|
|
|
|
# For Reflex changes
|
|
cd pathways_app && timeout 60 python -m reflex run 2>&1 | head -30
|
|
```
|
|
|
|
## Phase 1: Schema & Data Pipeline Foundation
|
|
|
|
### 1.1 Extend Database Schema
|
|
- [x] Add `pathway_date_filters` table with 6 pre-defined combinations:
|
|
- `all_6mo`, `all_12mo`, `1yr_6mo`, `1yr_12mo`, `2yr_6mo`, `2yr_12mo`
|
|
- [x] Add `pathway_nodes` table with:
|
|
- Hierarchy structure (parents, ids, labels, level)
|
|
- Patient counts and costs (value, cost, costpp, cost_pp_pa)
|
|
- Date ranges (first_seen, last_seen, first_seen_parent, last_seen_parent)
|
|
- Treatment statistics (average_spacing, average_administered, avg_days)
|
|
- Denormalized filter columns (trust_name, directory, drug_sequence)
|
|
- Foreign key to date_filter_id
|
|
- [x] Add `pathway_refresh_log` table for tracking refresh status
|
|
- [x] Create indexes for efficient filtering
|
|
- [x] Verify schema with: `python -c "from data_processing.schema import *"`
|
|
|
|
### 1.2 Create Pathway Pipeline Module
|
|
- [ ] Create `data_processing/pathway_pipeline.py` with:
|
|
- `fetch_and_transform_data()` - Snowflake fetch + UPID/drug/directory transformations
|
|
- `process_pathway_for_date_filter(df, date_filter_config)` - Single filter processing
|
|
- `extract_denormalized_fields(ice_df)` - Extract trust, directory, drug_sequence from ids
|
|
- `convert_to_records(ice_df, date_filter_id)` - Convert ice_df to list of dicts for SQLite
|
|
- [ ] Integrate with existing `analysis/pathway_analyzer.py` functions
|
|
- [ ] Verify: `python -c "from data_processing.pathway_pipeline import *"`
|
|
|
|
### 1.3 Create Migration Script
|
|
- [ ] Create script to set up new tables in existing `data/pathways.db`
|
|
- [ ] Pre-populate `pathway_date_filters` with 6 combinations
|
|
- [ ] Verify migration runs cleanly on fresh database
|
|
|
|
## Phase 2: CLI Refresh Command
|
|
|
|
### 2.1 Create Refresh Command
|
|
- [ ] Create `cli/refresh_pathways.py` with:
|
|
- DATE_FILTER_CONFIGS constant (6 combinations)
|
|
- `compute_date_ranges(config, max_date)` - Calculate actual dates from config
|
|
- `refresh_pathways(minimum_patients, provider_codes, ...)` main function
|
|
- [ ] Implement refresh flow:
|
|
1. Fetch ALL data from Snowflake (full date range)
|
|
2. Apply transformations (UPID, drug names, directory)
|
|
3. Clear existing pathway_nodes
|
|
4. For each of 6 date filter configs: filter → process → insert
|
|
5. Update pathway_refresh_log
|
|
- [ ] Add CLI argument parsing (--minimum-patients, --provider-codes, etc.)
|
|
- [ ] Verify: `python -m cli.refresh_pathways --help`
|
|
|
|
### 2.2 Test Refresh Pipeline
|
|
- [ ] Run refresh with Snowflake data
|
|
- [ ] Verify all 6 date_filter_ids populated in pathway_nodes
|
|
- [ ] Verify pathway structure matches original `generate_icicle_chart()` output
|
|
- [ ] Verify patient counts are correct (compare with original app)
|
|
- [ ] Document estimated processing time (expect 6-12 minutes for 440K records)
|
|
|
|
## Phase 3: Reflex Integration
|
|
|
|
### 3.1 Update AppState
|
|
- [ ] Replace date picker state with dropdown state:
|
|
- `selected_initiated: str = "all"` ("all", "1yr", "2yr")
|
|
- `selected_last_seen: str = "6mo"` ("6mo", "12mo")
|
|
- [ ] Add `date_filter_id` computed property: `f"{selected_initiated}_{selected_last_seen}"`
|
|
- [ ] Rewrite `load_pathway_data()` to query `pathway_nodes` table:
|
|
- Base filter: `WHERE date_filter_id = ?`
|
|
- Trust/directory/drug filters on denormalized columns
|
|
- [ ] Add `recalculate_parent_totals()` for filtered hierarchies
|
|
- [ ] Update KPI calculations from root node data
|
|
|
|
### 3.2 Update Icicle Figure
|
|
- [ ] Update `icicle_figure` computed property to use all pathway_nodes columns
|
|
- [ ] Match original 10-field customdata structure:
|
|
- values, colours, costs, costpp
|
|
- first_seen, last_seen, first_seen_parent, last_seen_parent
|
|
- average_spacing, cost_pp_pa
|
|
- [ ] Restore full hover/text templates from `visualization/plotly_generator.py`
|
|
- [ ] Verify chart renders correctly with treatment statistics
|
|
|
|
### 3.3 Update UI Components
|
|
- [ ] Replace date pickers with select dropdowns:
|
|
- Initiated: "All years", "Last 2 years", "Last 1 year"
|
|
- Last Seen: "Last 6 months", "Last 12 months"
|
|
- [ ] Add "Data refreshed: X ago" indicator from pathway_refresh_log
|
|
- [ ] Update filter section layout
|
|
- [ ] Verify UI compiles and renders correctly
|
|
|
|
## Phase 4: Testing & Validation
|
|
|
|
### 4.1 End-to-End Validation
|
|
- [ ] **Pathway hierarchy matches original**: Compare specific pathway ids structure
|
|
- [ ] **Patient counts match**: Compare root patient count for same date range
|
|
- [ ] **Treatment statistics display correctly**: Verify "Average treatment duration" hover data
|
|
- [ ] **Drug filtering works**: Filter to FARICIMAB, verify correct pathways shown
|
|
- [ ] **Chart renders with all tooltip data**: Verify 10-field customdata structure
|
|
|
|
### 4.2 Performance Testing
|
|
- [ ] Measure filter change response time (target: <500ms)
|
|
- [ ] Measure initial page load (target: <2s including data load)
|
|
- [ ] Verify chart interaction (zoom, hover) is smooth with no lag
|
|
- [ ] Test with full dataset
|
|
|
|
### 4.3 Documentation
|
|
- [ ] Update CLAUDE.md with new architecture
|
|
- [ ] Document CLI usage for `refresh_pathways`
|
|
- [ ] Update README with new run instructions
|
|
- [ ] Document any breaking changes from original app
|
|
|
|
## Completion Criteria
|
|
|
|
All tasks marked `[x]` AND:
|
|
- [ ] App compiles without errors (`reflex run` succeeds)
|
|
- [ ] All 6 date filter combinations work correctly
|
|
- [ ] Drug/directory/trust filters work with instant updates
|
|
- [ ] KPIs display correct numbers matching filter state
|
|
- [ ] Icicle chart renders with full pathway data and statistics
|
|
- [ ] Treatment duration and dosing information displays in tooltips
|
|
- [ ] No console errors during normal operation
|
|
- [ ] Verified with real patient data from Snowflake
|
|
|
|
## Reference
|
|
|
|
### Date Filter Combinations
|
|
|
|
| ID | Initiated | Last Seen | Default |
|
|
|----|-----------|-----------|---------|
|
|
| `all_6mo` | All years | Last 6 months | Yes |
|
|
| `all_12mo` | All years | Last 12 months | No |
|
|
| `1yr_6mo` | Last 1 year | Last 6 months | No |
|
|
| `1yr_12mo` | Last 1 year | Last 12 months | No |
|
|
| `2yr_6mo` | Last 2 years | Last 6 months | No |
|
|
| `2yr_12mo` | Last 2 years | Last 12 months | No |
|
|
|
|
### Key Files
|
|
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| `data_processing/schema.py` | Database schema definitions |
|
|
| `data_processing/pathway_pipeline.py` | New pathway processing pipeline |
|
|
| `cli/refresh_pathways.py` | CLI refresh command |
|
|
| `analysis/pathway_analyzer.py` | Existing pathway analysis logic |
|
|
| `visualization/plotly_generator.py` | Existing chart generation |
|
|
| `pathways_app/app_v2.py` | Reflex application |
|