6d68b5eaa5
- Create data_processing/load_snomed_mapping.py with: - migrate_drug_snomed_mapping() for CSV to SQLite migration - get_drug_snomed_mapping_counts() for statistics - verify_drug_snomed_mapping_migration() for validation - clean_snomed_code() to remove trailing .0 from SNOMED codes - CLI interface: python -m data_processing.load_snomed_mapping - Loaded 144,056 mappings from enriched CSV: - 707 unique drugs - 187 unique search terms - 21,265 unique SNOMED codes
9.4 KiB
9.4 KiB
Implementation Plan - Direct SNOMED Indication Mapping
Project Overview
Extend the pathway analysis application to use direct SNOMED code matching from GP records to:
- Improve directorate assignment - Use diagnosis-based directorate as primary method
- Add indication-based icicle chart - New chart type showing Trust → Search_Term → Drug → Pathway
Data Source
data/drug_snomed_mapping_enriched.csv - 163K rows mapping:
- Drug → Indication → TA_ID → Search_Term → SNOMEDCode → PrimaryDirectorate
Key Design Decisions
| Aspect | Decision |
|---|---|
| Primary directorate method | Diagnosis-based (SNOMED match → PrimaryDirectorate) |
| Fallback | department_identification() chain |
| Grouping level | Search_Term column (187 unique values) |
| Chart types | Two: "By Directory" and "By Indication" (user toggle) |
| No-match display | Show assigned directorate in indication chart (mixed labels) |
| Multiple matches | Use most recent SNOMED code by GP record date |
| Data storage | SQLite table ref_drug_snomed_mapping, accessed at ingestion |
Quality Checks
Run after each task:
# Syntax check
python -m py_compile <modified_file.py>
# Import verification
python -c "from data_processing.diagnosis_lookup import *"
python -c "from data_processing.pathway_pipeline import *"
# For Reflex changes
python -m reflex compile
Phase 1: Data Infrastructure
1.1 Create SQLite Table for SNOMED Mapping
- Add
REF_DRUG_SNOMED_MAPPING_SCHEMAtodata_processing/schema.py:- Columns: drug_name, indication, ta_id, search_term, snomed_code, snomed_description, cleaned_drug_name, primary_directorate, all_directorates
- Index on: cleaned_drug_name, snomed_code, search_term
- Add
create_drug_snomed_mapping_table()helper function - Add to
ALL_TABLES_SCHEMAand migration - Verify:
python -m data_processing.migratecreates table
1.2 Load Enriched Mapping Data
- Create
data_processing/load_snomed_mapping.pyscript:- Read
data/drug_snomed_mapping_enriched.csv - Insert into
ref_drug_snomed_mappingtable - Log: row count, unique drugs, unique search terms
- Read
- Add CLI entry point:
python -m data_processing.load_snomed_mapping - Verify: Query confirms 163K+ rows, 187 search terms
1.3 Extend Diagnosis Lookup Module
- Add
get_drug_snomed_codes(drug_name)todiagnosis_lookup.py:- Query
ref_drug_snomed_mappingfor all SNOMED codes for a drug - Return list of (snomed_code, snomed_description, search_term, primary_directorate)
- Query
- Add
patient_has_indication_direct(patient_pseudonym, snomed_codes, connector):- Query
PrimaryCareClinicalCodingdirectly for exact SNOMED code matches - Return most recent match by EventDateTime
- Return: (matched_code, search_term, primary_directorate, event_date) or None
- Query
- Verify: Unit test with known drug/patient pair
Phase 2: Pathway Processing Updates
2.1 Update Directorate Assignment Logic
- Modify
tools/data.pydepartment_identification()or create wrapper:- Add
get_directorate_from_diagnosis(upid, drug_name, connector)function - Logic: Try diagnosis-based first → fallback to department_identification()
- Return: (directorate, source) where source is "DIAGNOSIS" or "FALLBACK"
- Add
- Track assignment source for metrics (how many diagnosis-based vs fallback)
- Verify: Test with sample patient data
2.2 Add Chart Type Support to Schema
- Add
chart_typecolumn topathway_nodestable:- Values: "directory" (existing), "indication" (new)
- Update schema in
data_processing/schema.py
- Update
pathway_date_filtersor createpathway_chart_typesreference table - Verify: Migration adds column, existing data defaults to "directory"
2.3 Create Indication Pathway Processing
- Add
process_indication_pathways()topathway_pipeline.py:- Group by: Trust → Search_Term → Drug → Pathway
- For unmatched patients: use directorate name as Search_Term fallback
- Output: Same structure as directory pathways but with indication grouping
- Add
extract_indication_fields()for denormalized columns:- Extract: trust_name, search_term (or fallback_directorate), drug_sequence
- Verify: Process sample data, check hierarchy structure
Phase 3: CLI & Data Refresh Updates
3.1 Update Refresh Command for Dual Chart Types
- Modify
cli/refresh_pathways.py:- Process both "directory" and "indication" chart types
- For each of 6 date filters: generate 2 chart datasets
- Total: 12 pathway datasets (6 dates × 2 chart types)
- Add
--chart-typeargument: "all" (default), "directory", "indication" - Update progress logging to show both chart types
- Verify: Dry run shows both chart types being processed
3.2 Integrate Diagnosis-Based Directorate in Pipeline
- Update
fetch_and_transform_data()to include diagnosis lookup:- After UPID creation, batch lookup SNOMED matches for all patients
- Store: matched_search_term, matched_directorate, match_source
- Handle Snowflake connection for GP record queries (batched for performance)
- Log coverage: X% diagnosis-matched, Y% fallback
- Verify: Test refresh with --dry-run, check coverage stats
3.3 Test Full Refresh Pipeline
- Run
python -m cli.refresh_pathwayswith real data - Verify pathway_nodes table has both chart_type values
- Verify indication chart has expected hierarchy (Trust → SearchTerm → Drug)
- Verify unmatched patients appear with directorate fallback label
- Document: Processing time, record counts, coverage percentages
Phase 4: Reflex UI Updates
4.1 Add Chart Type State
- Add state variables to
AppState:selected_chart_type: str = "directory"(options: "directory", "indication")chart_type_options: list[dict]for dropdown
- Add
set_chart_type()event handler - Update
load_pathway_data()to filter by chart_type - Verify: State changes correctly, data queries include chart_type filter
4.2 Add Chart Type Toggle UI
- Create
chart_type_toggle()component:- Radio buttons or segmented control: "By Directory" | "By Indication"
- Place in filter strip or chart header area
- Wire to
set_chart_type()handler - Verify: Toggle switches chart data, UI updates reactively
4.3 Update Chart Display for Indication Labels
- Ensure icicle chart handles mixed labels:
- Search_Term labels (e.g., "rheumatoid arthritis") for matched patients
- Directorate labels (e.g., "RHEUMATOLOGY (no GP dx)") for unmatched
- Update hover templates if needed for indication context
- Verify: Chart renders correctly with both label types
Phase 5: Validation & Documentation
5.1 Measure Coverage Improvement
- Compare match rates: cluster-only vs cluster+direct SNOMED
- Generate report: % of patients with diagnosis-based directorate
- Identify drugs with best/worst coverage improvement
- Document results in progress.txt
5.2 End-to-End Validation
- Run full app with both chart types
- Verify chart toggle works correctly
- Verify filter interactions (drugs, directorates) work for both types
- Verify KPIs update correctly for both chart types
- Test at multiple viewport sizes
5.3 Update Documentation
- Update CLAUDE.md with new architecture
- Document new CLI arguments
- Document chart_type toggle behavior
- Update data flow diagrams
Completion Criteria
All tasks marked [x] AND:
- App compiles without errors (
reflex compilesucceeds) - Both chart types generate pathway data (12 total: 6 dates × 2 types)
- Chart type toggle switches between Directory and Indication views
- Diagnosis-based directorate is primary method with fallback working
- Unmatched patients show in indication chart with directorate fallback label
- Coverage metrics logged (% diagnosis-matched vs fallback)
- All filters work correctly for both chart types
- Performance acceptable (< 10 min full refresh, < 500ms filter change)
Reference
Current Pathway Hierarchy (Directory-based)
Root (N&W ICS)
└── Trust (NNUH, QEH, JPH, etc.)
└── Directory (RHEUMATOLOGY, OPHTHALMOLOGY, etc.)
└── Drug (ADALIMUMAB, RANIBIZUMAB, etc.)
└── Pathway (drug sequences)
New Pathway Hierarchy (Indication-based)
Root (N&W ICS)
└── Trust (NNUH, QEH, JPH, etc.)
└── Search_Term (rheumatoid arthritis, macular degeneration, etc.)
│ OR Directorate (RHEUMATOLOGY - for unmatched patients)
└── Drug (ADALIMUMAB, RANIBIZUMAB, etc.)
└── Pathway (drug sequences)
Key Files
| File | Purpose |
|---|---|
data_processing/schema.py |
SQLite schema for ref_drug_snomed_mapping |
data_processing/diagnosis_lookup.py |
Direct SNOMED lookup functions |
data_processing/pathway_pipeline.py |
Indication pathway processing |
cli/refresh_pathways.py |
CLI for dual chart type refresh |
pathways_app/pathways_app.py |
Reflex UI with chart type toggle |
data/drug_snomed_mapping_enriched.csv |
Source mapping data |
Expected Data Volumes
| Metric | Expected |
|---|---|
| SNOMED mapping rows | ~163K |
| Unique Search_Terms | 187 |
| Unique drugs | ~364 |
| Pathway nodes (directory, per date filter) | ~300 |
| Pathway nodes (indication, per date filter) | ~400-600 (more granular) |
| Total pathway nodes (6 dates × 2 types) | ~4,000-5,000 |