6331d44165
prepare_data() mapped Provider Code → Name in-place. When called for directory charts first, then indication charts, the second call re-mapped already-mapped values to NaN, silently dropping all data. Added df.copy() to prevent mutation. Also fixes directory charts only generating data for the first date filter. Results: 3,633 pathway nodes now generated (1,101 directory + 2,532 indication) across all 12 datasets (6 date filters × 2 chart types).
10 KiB
10 KiB
Implementation Plan - Indication-Based Pathway Charts
Project Overview
Extend the pathway analysis application to show indication-based icicle charts alongside directory-based charts. Patient diagnoses are matched from GP records using SNOMED cluster codes.
Key Design Decisions
| Aspect | Decision |
|---|---|
| SNOMED source | Query ClinicalCodingClusterSnomedCodes clusters directly in Snowflake |
| Grouping level | Search_Term from cluster mapping (~148 conditions) |
| Chart types | Two: "By Directory" (existing) and "By Indication" (new toggle) |
| No-match display | Show assigned directorate in indication chart (mixed labels) |
| Multiple matches | Use most recent SNOMED code by GP record date |
| Data storage | No local SNOMED mapping — query Snowflake at refresh time |
SNOMED Cluster Query
The snomed_indication_mapping_query.sql file contains the master query:
- Maps Search_Term → Cluster_ID for ~148 conditions
- Joins
ClinicalCodingClusterSnomedCodesto get SNOMED codes per cluster - Includes explicit manual mappings for conditions not in clusters
- Returns: Search_Term, SNOMEDCode, SNOMEDDescription
Quality Checks
Run after each task:
# Syntax check
python -m py_compile <modified_file.py>
# Import verification
python -c "from data_processing.diagnosis_lookup import *"
python -c "from data_processing.pathway_pipeline import *"
# For Reflex changes
python -m reflex compile
Phase 1: Snowflake Integration
1.1 Create Indication Lookup Query
- Add
get_patient_indication_groups()function todata_processing/diagnosis_lookup.py:- Takes: list of patient pseudonyms (PseudoNHSNoLinked values)
- Uses the cluster query from
snomed_indication_mapping_query.sqlas a CTE - Joins with
PrimaryCareClinicalCodingto find patients with matching diagnoses - Returns: DataFrame with PatientPseudonym, Search_Term, EventDateTime
- Uses most recent match per patient (ORDER BY EventDateTime DESC)
- Handle edge cases: Snowflake unavailable, empty patient list
- Verify: Function returns expected Search_Terms for test patients (92.8% match rate, 139 unique Search_Terms)
1.2 Update Data Pipeline to Include Indications
- Modify
cli/refresh_pathways.pyto call indication lookup during refresh:- After fetching HCD data, extract unique PseudoNHSNoLinked values
- Call
get_patient_indication_groups()with patient list - Create
indication_dfmapping UPID → Indication_Group - For patients with no GP match: Indication_Group = fallback directorate
- Log coverage: X% diagnosis-matched, Y% fallback
- Verify: indication_df has correct structure for pathway processing (verified via full pipeline run)
Phase 2: Schema & Processing Updates
2.1 Add Chart Type Support to Schema
- Add
chart_typecolumn topathway_nodestable (ALREADY DONE) - Update UNIQUE constraint to include chart_type (ALREADY DONE)
- Add indexes for chart_type filtering (ALREADY DONE)
- Verify: Existing migration works correctly (tables created, 3,589 nodes inserted)
2.2 Create Indication Pathway Processing
- Add
generate_icicle_chart_indication()topathway_analyzer.py(ALREADY DONE) - Add
process_indication_pathway_for_date_filter()topathway_pipeline.py(ALREADY DONE) - Add
extract_indication_fields()for denormalized columns (ALREADY DONE) - Update
convert_to_records()withchart_typeparameter (ALREADY DONE) - Verify: Code compiles, imports work correctly
2.3 Update Refresh Command for Dual Charts
- Add
--chart-typeargument: "all", "directory", "indication" (ALREADY DONE) - Update indication processing to use new
get_patient_indication_groups():- Replace
batch_lookup_indication_groups()with the new Snowflake-direct approach - Pass indication_df to
process_indication_pathway_for_date_filter()
- Replace
- Process all 6 date filters for both chart types (existing loop already handles this)
- Verify: Both chart types generate pathway data (indication verified with 695 nodes for all_6mo)
Phase 3: Test Full Pipeline
3.1 Test Refresh with Real Data
- Run
python -m cli.refresh_pathways --chart-type indication --dry-runwith Snowflake - Verify indication hierarchy: Trust → Search_Term → Drug → Pathway
- Confirmed: 695 nodes generated for all_6mo, 8 trusts, 91 unique search_terms
- Verify unmatched patients show with directorate fallback label
- Confirmed: 92.7% diagnosis-matched (34,545/37,257 UPIDs), 7.3% use fallback
- Document: Processing time, record counts, coverage percentages
- Processing time: ~10 minutes total (7s data fetch, ~9 min indication lookup, ~50s pathway processing)
- Record counts: 695 indication pathway nodes for all_6mo
- Coverage: 92.8% GP diagnosis match rate (34,006/36,628 patients)
- Top indications: drug misuse (8,749), influenza (6,336), diabetes (2,516), sepsis (1,991), cardiovascular disease (954)
- Run full refresh with
--chart-type allto populate database (requires non-dry-run)- Fixed DataFrame mutation bug in prepare_data() (df.copy() added)
- Results: 3,633 total nodes (1,101 directory + 2,532 indication) across all 12 datasets
- Database populated: 3,589 nodes in pathway_nodes table
Phase 4: Reflex UI Updates
4.1 Add Chart Type State
- Add state variables to
AppState:selected_chart_type: str = "directory"(options: "directory", "indication")chart_type_options: list[dict]for dropdown
- Add
set_chart_type()event handler - Update
load_pathway_data()to filter by chart_type - Verify: State changes correctly, data queries include chart_type filter
4.2 Add Chart Type Toggle UI
- Create
chart_type_toggle()component:- Segmented control with pill-style buttons: "By Directory" | "By Indication"
- Placed in filter strip, first element before date filters
- Wire to
set_chart_type()handler - Verify: Toggle switches chart data, UI updates reactively (reflex compile passed)
4.3 Update Chart Display for Indication Labels
- Ensure icicle chart handles mixed labels:
- Search_Term labels (e.g., "rheumatoid arthritis") for matched patients
- Directorate labels (e.g., "RHEUMATOLOGY (no GP dx)") for unmatched
- Note: labels come from pathway_nodes pre-computed data, no template changes needed
- Update hierarchy description (dynamic: "Trust → Directorate → ..." or "Trust → Indication → ...")
- Update chart title to include chart type prefix
- Verify: Chart renders correctly with both label types (reflex compile passed)
Phase 5: Validation & Documentation
5.1 End-to-End Validation
- Run full app with both chart types
- Verify chart toggle works correctly
- Verify filter interactions (drugs, directorates) work for both types
- Verify KPIs update correctly for both chart types
- Test at multiple viewport sizes
5.2 Update Documentation
- Update CLAUDE.md with new architecture
- Document new CLI arguments
- Document chart_type toggle behavior
- Update data flow diagrams
Completion Criteria
All tasks marked [x] AND:
- App compiles without errors (
reflex compilesucceeds) - Both chart types generate pathway data (12 total: 6 dates × 2 types)
- Directory: 1,101 nodes (293+329+93+105+134+147)
- Indication: 2,532 nodes (695+785+167+198+315+372)
- Chart type toggle switches between Directory and Indication views
- GP diagnosis matching works via Snowflake cluster query
- Unmatched patients show in indication chart with directorate fallback label
- Coverage metrics logged (% diagnosis-matched vs fallback)
- 92.7% diagnosis-matched (34,545/37,257 UPIDs)
- All filters work correctly for both chart types
- Performance acceptable (< 10 min full refresh, < 500ms filter change)
Reference
SNOMED Cluster Query Structure
-- From snomed_indication_mapping_query.sql
WITH SearchTermClusters AS (
SELECT Search_Term, Cluster_ID FROM (VALUES
('rheumatoid arthritis', 'eFI2_InflammatoryArthritis'),
('macular degeneration', 'CUST_ICB_VISUAL_IMPAIRMENT'),
-- ... ~148 mappings
) AS t(Search_Term, Cluster_ID)
),
ClusterCodes AS (
SELECT stc.Search_Term, c."SNOMEDCode", c."SNOMEDDescription"
FROM SearchTermClusters stc
JOIN DATA_HUB.PHM."ClinicalCodingClusterSnomedCodes" c
ON stc.Cluster_ID = c."Cluster_ID"
WHERE c."SNOMEDCode" IS NOT NULL
),
ExplicitCodes AS (
-- Manual mappings for conditions not in clusters
SELECT Search_Term, SNOMEDCode, SNOMEDDescription FROM (VALUES
('ankylosing spondylitis', '162930007', 'Manual mapping'),
-- ...
) AS t(Search_Term, SNOMEDCode, SNOMEDDescription)
)
SELECT * FROM ClusterCodes
UNION ALL
SELECT * FROM ExplicitCodes
Current Pathway Hierarchy (Directory-based)
Root (N&W ICS)
└── Trust (NNUH, QEH, JPH, etc.)
└── Directory (RHEUMATOLOGY, OPHTHALMOLOGY, etc.)
└── Drug (ADALIMUMAB, RANIBIZUMAB, etc.)
└── Pathway (drug sequences)
New Pathway Hierarchy (Indication-based)
Root (N&W ICS)
└── Trust (NNUH, QEH, JPH, etc.)
└── Search_Term (rheumatoid arthritis, macular degeneration, etc.)
│ OR Directorate (RHEUMATOLOGY - for unmatched patients)
└── Drug (ADALIMUMAB, RANIBIZUMAB, etc.)
└── Pathway (drug sequences)
Key Files
| File | Purpose |
|---|---|
snomed_indication_mapping_query.sql |
Master SNOMED cluster query |
data_processing/diagnosis_lookup.py |
GP diagnosis lookup functions |
data_processing/pathway_pipeline.py |
Indication pathway processing |
cli/refresh_pathways.py |
CLI for dual chart type refresh |
pathways_app/pathways_app.py |
Reflex UI with chart type toggle |
Expected Data Volumes
| Metric | Expected |
|---|---|
| Search_Term conditions | ~148 (from cluster mapping) |
| Pathway nodes (directory, per date filter) | ~300 |
| Pathway nodes (indication, per date filter) | ~400-600 (more granular) |
| Total pathway nodes (6 dates × 2 types) | ~4,000-5,000 |