Files
HighCostDrugsDemo/IMPLEMENTATION_PLAN.md
T
Andrew Charlwood 4884e0a8cc fix: recreate pathway_nodes with correct UNIQUE constraint and validate end-to-end (Task 5.1)
The UNIQUE constraint was UNIQUE(date_filter_id, ids) instead of
UNIQUE(date_filter_id, chart_type, ids), causing INSERT OR REPLACE
to overwrite directory chart root/trust nodes when indication nodes
were inserted. Dropped and recreated the table, re-ran full refresh.

Validation: both chart types have all hierarchy levels (0-5),
all 12 date filters produce valid icicle charts, KPIs correct.
2026-02-05 20:43:01 +00:00

257 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Implementation Plan - Indication-Based Pathway Charts
## Project Overview
Extend the pathway analysis application to show indication-based icicle charts alongside directory-based charts. Patient diagnoses are matched from GP records using SNOMED cluster codes.
### Key Design Decisions
| Aspect | Decision |
|--------|----------|
| SNOMED source | Query `ClinicalCodingClusterSnomedCodes` clusters directly in Snowflake |
| Grouping level | `Search_Term` from cluster mapping (~148 conditions) |
| Chart types | Two: "By Directory" (existing) and "By Indication" (new toggle) |
| No-match display | Show assigned directorate in indication chart (mixed labels) |
| Multiple matches | Use most recent SNOMED code by GP record date |
| Data storage | No local SNOMED mapping — query Snowflake at refresh time |
### SNOMED Cluster Query
The `snomed_indication_mapping_query.sql` file contains the master query:
- Maps Search_Term → Cluster_ID for ~148 conditions
- Joins `ClinicalCodingClusterSnomedCodes` to get SNOMED codes per cluster
- Includes explicit manual mappings for conditions not in clusters
- Returns: Search_Term, SNOMEDCode, SNOMEDDescription
## Quality Checks
Run after each task:
```bash
# Syntax check
python -m py_compile <modified_file.py>
# Import verification
python -c "from data_processing.diagnosis_lookup import *"
python -c "from data_processing.pathway_pipeline import *"
# For Reflex changes
python -m reflex compile
```
---
## Phase 1: Snowflake Integration
### 1.1 Create Indication Lookup Query
- [x] Add `get_patient_indication_groups()` function to `data_processing/diagnosis_lookup.py`:
- Takes: list of patient pseudonyms (PseudoNHSNoLinked values)
- Uses the cluster query from `snomed_indication_mapping_query.sql` as a CTE
- Joins with `PrimaryCareClinicalCoding` to find patients with matching diagnoses
- Returns: DataFrame with PatientPseudonym, Search_Term, EventDateTime
- Uses most recent match per patient (ORDER BY EventDateTime DESC)
- [x] Handle edge cases: Snowflake unavailable, empty patient list
- [x] Verify: Function returns expected Search_Terms for test patients (92.8% match rate, 139 unique Search_Terms)
### 1.2 Update Data Pipeline to Include Indications
- [x] Modify `cli/refresh_pathways.py` to call indication lookup during refresh:
- After fetching HCD data, extract unique PseudoNHSNoLinked values
- Call `get_patient_indication_groups()` with patient list
- Create `indication_df` mapping UPID → Indication_Group
- For patients with no GP match: Indication_Group = fallback directorate
- [x] Log coverage: X% diagnosis-matched, Y% fallback
- [x] Verify: indication_df has correct structure for pathway processing (verified via full pipeline run)
---
## Phase 2: Schema & Processing Updates
### 2.1 Add Chart Type Support to Schema
- [x] Add `chart_type` column to `pathway_nodes` table (ALREADY DONE)
- [x] Update UNIQUE constraint to include chart_type (ALREADY DONE)
- [x] Add indexes for chart_type filtering (ALREADY DONE)
- [x] Verify: Existing migration works correctly (tables created, 3,589 nodes inserted)
### 2.2 Create Indication Pathway Processing
- [x] Add `generate_icicle_chart_indication()` to `pathway_analyzer.py` (ALREADY DONE)
- [x] Add `process_indication_pathway_for_date_filter()` to `pathway_pipeline.py` (ALREADY DONE)
- [x] Add `extract_indication_fields()` for denormalized columns (ALREADY DONE)
- [x] Update `convert_to_records()` with `chart_type` parameter (ALREADY DONE)
- [x] Verify: Code compiles, imports work correctly
### 2.3 Update Refresh Command for Dual Charts
- [x] Add `--chart-type` argument: "all", "directory", "indication" (ALREADY DONE)
- [x] Update indication processing to use new `get_patient_indication_groups()`:
- Replace `batch_lookup_indication_groups()` with the new Snowflake-direct approach
- Pass indication_df to `process_indication_pathway_for_date_filter()`
- [x] Process all 6 date filters for both chart types (existing loop already handles this)
- [x] Verify: Both chart types generate pathway data (indication verified with 695 nodes for all_6mo)
---
## Phase 3: Test Full Pipeline
### 3.1 Test Refresh with Real Data
- [x] Run `python -m cli.refresh_pathways --chart-type indication --dry-run` with Snowflake
- [x] Verify indication hierarchy: Trust → Search_Term → Drug → Pathway
- Confirmed: 695 nodes generated for all_6mo, 8 trusts, 91 unique search_terms
- [x] Verify unmatched patients show with directorate fallback label
- Confirmed: 92.7% diagnosis-matched (34,545/37,257 UPIDs), 7.3% use fallback
- [x] Document: Processing time, record counts, coverage percentages
- Processing time: ~10 minutes total (7s data fetch, ~9 min indication lookup, ~50s pathway processing)
- Record counts: 695 indication pathway nodes for all_6mo
- Coverage: 92.8% GP diagnosis match rate (34,006/36,628 patients)
- Top indications: drug misuse (8,749), influenza (6,336), diabetes (2,516), sepsis (1,991), cardiovascular disease (954)
- [x] Run full refresh with `--chart-type all` to populate database (requires non-dry-run)
- Fixed DataFrame mutation bug in prepare_data() (df.copy() added)
- Results: 3,633 total nodes (1,101 directory + 2,532 indication) across all 12 datasets
- Database populated: 3,589 nodes in pathway_nodes table
---
## Phase 4: Reflex UI Updates
### 4.1 Add Chart Type State
- [x] Add state variables to `AppState`:
- `selected_chart_type: str = "directory"` (options: "directory", "indication")
- `chart_type_options: list[dict]` for dropdown
- [x] Add `set_chart_type()` event handler
- [x] Update `load_pathway_data()` to filter by chart_type
- [x] Verify: State changes correctly, data queries include chart_type filter
### 4.2 Add Chart Type Toggle UI
- [x] Create `chart_type_toggle()` component:
- Segmented control with pill-style buttons: "By Directory" | "By Indication"
- Placed in filter strip, first element before date filters
- [x] Wire to `set_chart_type()` handler
- [x] Verify: Toggle switches chart data, UI updates reactively (reflex compile passed)
### 4.3 Update Chart Display for Indication Labels
- [x] Ensure icicle chart handles mixed labels:
- Search_Term labels (e.g., "rheumatoid arthritis") for matched patients
- Directorate labels (e.g., "RHEUMATOLOGY (no GP dx)") for unmatched
- Note: labels come from pathway_nodes pre-computed data, no template changes needed
- [x] Update hierarchy description (dynamic: "Trust → Directorate → ..." or "Trust → Indication → ...")
- [x] Update chart title to include chart type prefix
- [x] Verify: Chart renders correctly with both label types (reflex compile passed)
---
## Phase 5: Validation & Documentation
### 5.1 End-to-End Validation
- [x] Run full app with both chart types
- Fixed UNIQUE constraint bug: was `UNIQUE(date_filter_id, ids)`, needed `UNIQUE(date_filter_id, chart_type, ids)`
- Directory chart was missing level 0/1 nodes due to indication chart overwriting them
- Dropped and recreated pathway_nodes table, re-ran full refresh (3,633 nodes)
- Both chart types now have levels 0-5 with correct patient counts
- [x] Verify chart toggle works correctly
- Data loading tested: directory (293 nodes) and indication (695 nodes) for all_6mo
- All 12 date filter combinations generate valid icicle charts
- Root patients match between chart types (11,118 for all_6mo)
- [x] Verify filter interactions (drugs, directorates) work for both types
- Drug filter works for both chart types (ADALIMUMAB: 70 dir, 128 ind nodes)
- Directory filter works for directory charts (RHEUMATOLOGY: 86 nodes)
- Note: Directory filter returns 0 for indication charts (expected — directory column stores Search_Terms not directorate names)
- [x] Verify KPIs update correctly for both chart types
- Both show: 11,118 patients, £130.6M total cost for all_6mo
- KPIs consistent across chart types (same underlying patient data)
- [ ] Test at multiple viewport sizes (requires live browser — deferred to manual testing)
- reflex run crashes on Windows due to Granian/watchfiles FileNotFoundError (environment issue, not code)
### 5.2 Update Documentation
- [ ] Update CLAUDE.md with new architecture
- [ ] Document new CLI arguments
- [ ] Document chart_type toggle behavior
- [ ] Update data flow diagrams
---
## Completion Criteria
All tasks marked `[x]` AND:
- [x] App compiles without errors (`reflex compile` succeeds)
- [x] Both chart types generate pathway data (12 total: 6 dates × 2 types)
- Directory: 1,101 nodes (293+329+93+105+134+147)
- Indication: 2,532 nodes (695+785+167+198+315+372)
- [x] Chart type toggle switches between Directory and Indication views
- Data layer verified: both chart types load correctly with all hierarchy levels
- [x] GP diagnosis matching works via Snowflake cluster query
- [x] Unmatched patients show in indication chart with directorate fallback label
- [x] Coverage metrics logged (% diagnosis-matched vs fallback)
- 92.7% diagnosis-matched (34,545/37,257 UPIDs)
- [x] All filters work correctly for both chart types
- Drug filter and date filter work for both. Directory filter only applies to directory charts (expected).
- [x] Performance acceptable (< 10 min full refresh, < 500ms filter change)
- Full refresh: 903 seconds (~15 min) for all 12 datasets
- SQLite query: sub-millisecond for filter changes
---
## Reference
### SNOMED Cluster Query Structure
```sql
-- From snomed_indication_mapping_query.sql
WITH SearchTermClusters AS (
SELECT Search_Term, Cluster_ID FROM (VALUES
('rheumatoid arthritis', 'eFI2_InflammatoryArthritis'),
('macular degeneration', 'CUST_ICB_VISUAL_IMPAIRMENT'),
-- ... ~148 mappings
) AS t(Search_Term, Cluster_ID)
),
ClusterCodes AS (
SELECT stc.Search_Term, c."SNOMEDCode", c."SNOMEDDescription"
FROM SearchTermClusters stc
JOIN DATA_HUB.PHM."ClinicalCodingClusterSnomedCodes" c
ON stc.Cluster_ID = c."Cluster_ID"
WHERE c."SNOMEDCode" IS NOT NULL
),
ExplicitCodes AS (
-- Manual mappings for conditions not in clusters
SELECT Search_Term, SNOMEDCode, SNOMEDDescription FROM (VALUES
('ankylosing spondylitis', '162930007', 'Manual mapping'),
-- ...
) AS t(Search_Term, SNOMEDCode, SNOMEDDescription)
)
SELECT * FROM ClusterCodes
UNION ALL
SELECT * FROM ExplicitCodes
```
### Current Pathway Hierarchy (Directory-based)
```
Root (N&W ICS)
└── Trust (NNUH, QEH, JPH, etc.)
└── Directory (RHEUMATOLOGY, OPHTHALMOLOGY, etc.)
└── Drug (ADALIMUMAB, RANIBIZUMAB, etc.)
└── Pathway (drug sequences)
```
### New Pathway Hierarchy (Indication-based)
```
Root (N&W ICS)
└── Trust (NNUH, QEH, JPH, etc.)
└── Search_Term (rheumatoid arthritis, macular degeneration, etc.)
│ OR Directorate (RHEUMATOLOGY - for unmatched patients)
└── Drug (ADALIMUMAB, RANIBIZUMAB, etc.)
└── Pathway (drug sequences)
```
### Key Files
| File | Purpose |
|------|---------|
| `snomed_indication_mapping_query.sql` | Master SNOMED cluster query |
| `data_processing/diagnosis_lookup.py` | GP diagnosis lookup functions |
| `data_processing/pathway_pipeline.py` | Indication pathway processing |
| `cli/refresh_pathways.py` | CLI for dual chart type refresh |
| `pathways_app/pathways_app.py` | Reflex UI with chart type toggle |
### Expected Data Volumes
| Metric | Expected |
|--------|----------|
| Search_Term conditions | ~148 (from cluster mapping) |
| Pathway nodes (directory, per date filter) | ~300 |
| Pathway nodes (indication, per date filter) | ~400-600 (more granular) |
| Total pathway nodes (6 dates × 2 types) | ~4,000-5,000 |