# Progress Log - Direct SNOMED Indication Mapping ## Project Context This project extends the existing HCD Pathway Analysis application with direct SNOMED code matching from GP records. The previous project (Phases 1-5) established the pre-computed pathway architecture and modern UI. This phase adds: 1. **Diagnosis-based directorate assignment** - Primary method using GP SNOMED codes 2. **Indication-based icicle chart** - New chart type showing Trust → Search_Term → Drug → Pathway ## Key Files Reference **Existing (reuse these):** - `data_processing/schema.py` - SQLite schema (add new table) - `data_processing/diagnosis_lookup.py` - Existing cluster-based lookup (extend with direct SNOMED) - `data_processing/pathway_pipeline.py` - Pathway processing (add indication type) - `cli/refresh_pathways.py` - CLI refresh command (add chart type support) - `pathways_app/pathways_app.py` - Reflex app (add chart type toggle) - `tools/data.py` - Data transformations including department_identification() **New data:** - `data/drug_snomed_mapping_enriched.csv` - 163K rows, 187 Search_Terms, 364 drugs ## Known Patterns ### SNOMED Mapping Structure The enriched mapping CSV has columns: - Drug, Indication, TA_ID (from NICE TAs) - Search_Term (simplified grouping, 187 unique values) - SNOMEDCode, SNOMEDDescription - CleanedDrugName, PrimaryDirectorate, AllDirectorates ### Direct SNOMED Lookup Logic For a patient on drug X: 1. Get all SNOMED codes for that drug from ref_drug_snomed_mapping 2. Query PrimaryCareClinicalCoding for those codes (patient's GP record) 3. If match found → use Search_Term and PrimaryDirectorate from matched row 4. If no match → fall back to department_identification() 5. Use most recent SNOMED code by EventDateTime if multiple matches ### Chart Type Architecture - `chart_type` column in pathway_nodes: "directory" or "indication" - 12 total pathway datasets: 6 date filters × 2 chart types - Indication chart: mixed labels (Search_Term for matched, Directorate for unmatched) ### Date Filter Combinations | ID | Initiated | Last Seen | Default | |----|-----------|-----------|---------| | `all_6mo` | All years | Last 6 months | Yes | | `all_12mo` | All years | Last 12 months | No | | `1yr_6mo` | Last 1 year | Last 6 months | No | | `1yr_12mo` | Last 1 year | Last 12 months | No | | `2yr_6mo` | Last 2 years | Last 6 months | No | | `2yr_12mo` | Last 2 years | Last 12 months | No | ### Expected Volumes - SNOMED mapping: 163K rows - Search_Terms: 187 unique - Pathway nodes per date filter: ~300 (directory), ~400-600 (indication) --- ## Iteration Log ## Iteration 1 — 2026-02-05 ### Task: 1.1 Create SQLite Table for SNOMED Mapping ### Why this task: - First task in Phase 1 (Data Infrastructure) — all other phases depend on having the data layer in place - No external dependencies — pure schema definition work - Follows "data infrastructure first" principle ### Status: COMPLETE ### What was done: - Added `REF_DRUG_SNOMED_MAPPING_SCHEMA` to `data_processing/schema.py` with 11 columns: - id, drug_name, indication, ta_id, search_term, snomed_code, snomed_description - cleaned_drug_name, primary_directorate, all_directorates, created_at - Added 5 custom indexes for lookup performance: - idx_ref_drug_snomed_mapping_drug (drug_name) - idx_ref_drug_snomed_mapping_cleaned (cleaned_drug_name) - idx_ref_drug_snomed_mapping_snomed (snomed_code) - idx_ref_drug_snomed_mapping_search_term (search_term) - idx_ref_drug_snomed_mapping_drug_snomed (composite: cleaned_drug_name, snomed_code) - Added `create_drug_snomed_mapping_table()` helper function - Added schema to `REFERENCE_TABLES_SCHEMA` (included in `ALL_TABLES_SCHEMA`) - Updated helper functions to include new table: - `drop_reference_tables()` — drops new table - `get_reference_table_counts()` — counts new table (with try/except for safety) - `verify_reference_tables_exist()` — checks for new table ### Validation results: - Tier 1 (Code): `python -m py_compile data_processing/schema.py` — PASSED - Tier 1 (Code): Import check — PASSED - Tier 2 (Data): Migration created table with 0 rows — PASSED - Tier 2 (Data): All 11 columns and 6 indexes present — PASSED ### Files changed: - `data_processing/schema.py` — added schema, updated helpers ### Committed: cf35937 "feat: add ref_drug_snomed_mapping schema (Task 1.1)" ### Patterns discovered: - Existing schema pattern: CREATE TABLE + CREATE INDEX in single SQL string - Helper functions follow consistent pattern: create, drop, get_counts, verify_exists - `get_reference_table_counts` needed try/except for tables that might not exist yet ### Next iteration should: - Start Task 1.2: Create `data_processing/load_snomed_mapping.py` script to load the CSV - The CSV has columns: Drug, Indication, TA_ID, Search_Term, SNOMEDCode, SNOMEDDescription, CleanedDrugName, PrimaryDirectorate, AllDirectorates - Note: SNOMEDCode in CSV has `.0` suffix (e.g., "156370009.0") — may need cleanup ### Blocked items: - None