Files
HighCostDrugsDemo/IMPLEMENTATION_PLAN.md
T
Andrew Charlwood 506769470d feat: add get_directorate_from_diagnosis() function (Task 2.1)
- Added DirectorateAssignment dataclass for return type
- Added get_directorate_from_diagnosis() function to diagnosis_lookup.py
- Logic: Try diagnosis-based lookup first (direct SNOMED match)
- Returns FALLBACK source if no match found, letting caller handle fallback
- Extracts PatientPseudonym from UPID (last part after provider code)
- Updated __all__ exports with new dataclass and function
- Tested: function handles no-match cases correctly
2026-02-05 14:19:18 +00:00

9.6 KiB
Raw Blame History

Implementation Plan - Direct SNOMED Indication Mapping

Project Overview

Extend the pathway analysis application to use direct SNOMED code matching from GP records to:

  1. Improve directorate assignment - Use diagnosis-based directorate as primary method
  2. Add indication-based icicle chart - New chart type showing Trust → Search_Term → Drug → Pathway

Data Source

data/drug_snomed_mapping_enriched.csv - 163K rows mapping:

  • Drug → Indication → TA_ID → Search_Term → SNOMEDCode → PrimaryDirectorate

Key Design Decisions

Aspect Decision
Primary directorate method Diagnosis-based (SNOMED match → PrimaryDirectorate)
Fallback department_identification() chain
Grouping level Search_Term column (187 unique values)
Chart types Two: "By Directory" and "By Indication" (user toggle)
No-match display Show assigned directorate in indication chart (mixed labels)
Multiple matches Use most recent SNOMED code by GP record date
Data storage SQLite table ref_drug_snomed_mapping, accessed at ingestion

Quality Checks

Run after each task:

# Syntax check
python -m py_compile <modified_file.py>

# Import verification
python -c "from data_processing.diagnosis_lookup import *"
python -c "from data_processing.pathway_pipeline import *"

# For Reflex changes
python -m reflex compile

Phase 1: Data Infrastructure

1.1 Create SQLite Table for SNOMED Mapping

  • Add REF_DRUG_SNOMED_MAPPING_SCHEMA to data_processing/schema.py:
    • Columns: drug_name, indication, ta_id, search_term, snomed_code, snomed_description, cleaned_drug_name, primary_directorate, all_directorates
    • Index on: cleaned_drug_name, snomed_code, search_term
  • Add create_drug_snomed_mapping_table() helper function
  • Add to ALL_TABLES_SCHEMA and migration
  • Verify: python -m data_processing.migrate creates table

1.2 Load Enriched Mapping Data

  • Create data_processing/load_snomed_mapping.py script:
    • Read data/drug_snomed_mapping_enriched.csv
    • Insert into ref_drug_snomed_mapping table
    • Log: row count, unique drugs, unique search terms
  • Add CLI entry point: python -m data_processing.load_snomed_mapping
  • Verify: Query confirms 163K+ rows, 187 search terms

1.3 Extend Diagnosis Lookup Module

  • Add get_drug_snomed_codes(drug_name) to diagnosis_lookup.py:
    • Query ref_drug_snomed_mapping for all SNOMED codes for a drug
    • Return list of DrugSnomedMapping(snomed_code, snomed_description, search_term, primary_directorate, indication, ta_id)
  • Add patient_has_indication_direct(patient_pseudonym, snomed_codes, connector):
    • Query PrimaryCareClinicalCoding directly for exact SNOMED code matches
    • Return most recent match by EventDateTime
    • Return: DirectSnomedMatchResult(matched_code, search_term, primary_directorate, event_date) or unmatched
  • Verify: Tested with ADALIMUMAB (1320 mappings, 10 Search_Terms), RANIBIZUMAB (104 mappings), case-insensitivity

Phase 2: Pathway Processing Updates

2.1 Update Directorate Assignment Logic

  • Modify tools/data.py department_identification() or create wrapper:
    • Add get_directorate_from_diagnosis(upid, drug_name, connector) function
    • Logic: Try diagnosis-based first → fallback to department_identification()
    • Return: (directorate, source) where source is "DIAGNOSIS" or "FALLBACK"
  • Track assignment source for metrics (how many diagnosis-based vs fallback)
  • Verify: Test with sample patient data

2.2 Add Chart Type Support to Schema

  • Add chart_type column to pathway_nodes table:
    • Values: "directory" (existing), "indication" (new)
    • Update schema in data_processing/schema.py
  • Update pathway_date_filters or create pathway_chart_types reference table
  • Verify: Migration adds column, existing data defaults to "directory"

2.3 Create Indication Pathway Processing

  • Add process_indication_pathways() to pathway_pipeline.py:
    • Group by: Trust → Search_Term → Drug → Pathway
    • For unmatched patients: use directorate name as Search_Term fallback
    • Output: Same structure as directory pathways but with indication grouping
  • Add extract_indication_fields() for denormalized columns:
    • Extract: trust_name, search_term (or fallback_directorate), drug_sequence
  • Verify: Process sample data, check hierarchy structure

Phase 3: CLI & Data Refresh Updates

3.1 Update Refresh Command for Dual Chart Types

  • Modify cli/refresh_pathways.py:
    • Process both "directory" and "indication" chart types
    • For each of 6 date filters: generate 2 chart datasets
    • Total: 12 pathway datasets (6 dates × 2 chart types)
  • Add --chart-type argument: "all" (default), "directory", "indication"
  • Update progress logging to show both chart types
  • Verify: Dry run shows both chart types being processed

3.2 Integrate Diagnosis-Based Directorate in Pipeline

  • Update fetch_and_transform_data() to include diagnosis lookup:
    • After UPID creation, batch lookup SNOMED matches for all patients
    • Store: matched_search_term, matched_directorate, match_source
  • Handle Snowflake connection for GP record queries (batched for performance)
  • Log coverage: X% diagnosis-matched, Y% fallback
  • Verify: Test refresh with --dry-run, check coverage stats

3.3 Test Full Refresh Pipeline

  • Run python -m cli.refresh_pathways with real data
  • Verify pathway_nodes table has both chart_type values
  • Verify indication chart has expected hierarchy (Trust → SearchTerm → Drug)
  • Verify unmatched patients appear with directorate fallback label
  • Document: Processing time, record counts, coverage percentages

Phase 4: Reflex UI Updates

4.1 Add Chart Type State

  • Add state variables to AppState:
    • selected_chart_type: str = "directory" (options: "directory", "indication")
    • chart_type_options: list[dict] for dropdown
  • Add set_chart_type() event handler
  • Update load_pathway_data() to filter by chart_type
  • Verify: State changes correctly, data queries include chart_type filter

4.2 Add Chart Type Toggle UI

  • Create chart_type_toggle() component:
    • Radio buttons or segmented control: "By Directory" | "By Indication"
    • Place in filter strip or chart header area
  • Wire to set_chart_type() handler
  • Verify: Toggle switches chart data, UI updates reactively

4.3 Update Chart Display for Indication Labels

  • Ensure icicle chart handles mixed labels:
    • Search_Term labels (e.g., "rheumatoid arthritis") for matched patients
    • Directorate labels (e.g., "RHEUMATOLOGY (no GP dx)") for unmatched
  • Update hover templates if needed for indication context
  • Verify: Chart renders correctly with both label types

Phase 5: Validation & Documentation

5.1 Measure Coverage Improvement

  • Compare match rates: cluster-only vs cluster+direct SNOMED
  • Generate report: % of patients with diagnosis-based directorate
  • Identify drugs with best/worst coverage improvement
  • Document results in progress.txt

5.2 End-to-End Validation

  • Run full app with both chart types
  • Verify chart toggle works correctly
  • Verify filter interactions (drugs, directorates) work for both types
  • Verify KPIs update correctly for both chart types
  • Test at multiple viewport sizes

5.3 Update Documentation

  • Update CLAUDE.md with new architecture
  • Document new CLI arguments
  • Document chart_type toggle behavior
  • Update data flow diagrams

Completion Criteria

All tasks marked [x] AND:

  • App compiles without errors (reflex compile succeeds)
  • Both chart types generate pathway data (12 total: 6 dates × 2 types)
  • Chart type toggle switches between Directory and Indication views
  • Diagnosis-based directorate is primary method with fallback working
  • Unmatched patients show in indication chart with directorate fallback label
  • Coverage metrics logged (% diagnosis-matched vs fallback)
  • All filters work correctly for both chart types
  • Performance acceptable (< 10 min full refresh, < 500ms filter change)

Reference

Current Pathway Hierarchy (Directory-based)

Root (N&W ICS)
└── Trust (NNUH, QEH, JPH, etc.)
    └── Directory (RHEUMATOLOGY, OPHTHALMOLOGY, etc.)
        └── Drug (ADALIMUMAB, RANIBIZUMAB, etc.)
            └── Pathway (drug sequences)

New Pathway Hierarchy (Indication-based)

Root (N&W ICS)
└── Trust (NNUH, QEH, JPH, etc.)
    └── Search_Term (rheumatoid arthritis, macular degeneration, etc.)
        │   OR Directorate (RHEUMATOLOGY - for unmatched patients)
        └── Drug (ADALIMUMAB, RANIBIZUMAB, etc.)
            └── Pathway (drug sequences)

Key Files

File Purpose
data_processing/schema.py SQLite schema for ref_drug_snomed_mapping
data_processing/diagnosis_lookup.py Direct SNOMED lookup functions
data_processing/pathway_pipeline.py Indication pathway processing
cli/refresh_pathways.py CLI for dual chart type refresh
pathways_app/pathways_app.py Reflex UI with chart type toggle
data/drug_snomed_mapping_enriched.csv Source mapping data

Expected Data Volumes

Metric Expected
SNOMED mapping rows ~163K
Unique Search_Terms 187
Unique drugs ~364
Pathway nodes (directory, per date filter) ~300
Pathway nodes (indication, per date filter) ~400-600 (more granular)
Total pathway nodes (6 dates × 2 types) ~4,000-5,000