Files
HighCostDrugsDemo/progress.txt
T
Andrew Charlwood 9943e85761 feat: add ref_drug_snomed_mapping schema (Task 1.1)
- Add REF_DRUG_SNOMED_MAPPING_SCHEMA with 11 columns for direct SNOMED mapping
- Add 5 indexes for lookup performance (drug, cleaned_drug, snomed, search_term, composite)
- Add create_drug_snomed_mapping_table() helper function
- Update helper functions (drop, get_counts, verify_exists) to include new table
- Table is included in REFERENCE_TABLES_SCHEMA and created by migration
2026-02-05 14:06:31 +00:00

106 lines
4.9 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Progress Log - Direct SNOMED Indication Mapping
## Project Context
This project extends the existing HCD Pathway Analysis application with direct SNOMED code matching from GP records. The previous project (Phases 1-5) established the pre-computed pathway architecture and modern UI. This phase adds:
1. **Diagnosis-based directorate assignment** - Primary method using GP SNOMED codes
2. **Indication-based icicle chart** - New chart type showing Trust → Search_Term → Drug → Pathway
## Key Files Reference
**Existing (reuse these):**
- `data_processing/schema.py` - SQLite schema (add new table)
- `data_processing/diagnosis_lookup.py` - Existing cluster-based lookup (extend with direct SNOMED)
- `data_processing/pathway_pipeline.py` - Pathway processing (add indication type)
- `cli/refresh_pathways.py` - CLI refresh command (add chart type support)
- `pathways_app/pathways_app.py` - Reflex app (add chart type toggle)
- `tools/data.py` - Data transformations including department_identification()
**New data:**
- `data/drug_snomed_mapping_enriched.csv` - 163K rows, 187 Search_Terms, 364 drugs
## Known Patterns
### SNOMED Mapping Structure
The enriched mapping CSV has columns:
- Drug, Indication, TA_ID (from NICE TAs)
- Search_Term (simplified grouping, 187 unique values)
- SNOMEDCode, SNOMEDDescription
- CleanedDrugName, PrimaryDirectorate, AllDirectorates
### Direct SNOMED Lookup Logic
For a patient on drug X:
1. Get all SNOMED codes for that drug from ref_drug_snomed_mapping
2. Query PrimaryCareClinicalCoding for those codes (patient's GP record)
3. If match found → use Search_Term and PrimaryDirectorate from matched row
4. If no match → fall back to department_identification()
5. Use most recent SNOMED code by EventDateTime if multiple matches
### Chart Type Architecture
- `chart_type` column in pathway_nodes: "directory" or "indication"
- 12 total pathway datasets: 6 date filters × 2 chart types
- Indication chart: mixed labels (Search_Term for matched, Directorate for unmatched)
### Date Filter Combinations
| ID | Initiated | Last Seen | Default |
|----|-----------|-----------|---------|
| `all_6mo` | All years | Last 6 months | Yes |
| `all_12mo` | All years | Last 12 months | No |
| `1yr_6mo` | Last 1 year | Last 6 months | No |
| `1yr_12mo` | Last 1 year | Last 12 months | No |
| `2yr_6mo` | Last 2 years | Last 6 months | No |
| `2yr_12mo` | Last 2 years | Last 12 months | No |
### Expected Volumes
- SNOMED mapping: 163K rows
- Search_Terms: 187 unique
- Pathway nodes per date filter: ~300 (directory), ~400-600 (indication)
---
## Iteration Log
## Iteration 1 — 2026-02-05
### Task: 1.1 Create SQLite Table for SNOMED Mapping
### Why this task:
- First task in Phase 1 (Data Infrastructure) — all other phases depend on having the data layer in place
- No external dependencies — pure schema definition work
- Follows "data infrastructure first" principle
### Status: COMPLETE
### What was done:
- Added `REF_DRUG_SNOMED_MAPPING_SCHEMA` to `data_processing/schema.py` with 11 columns:
- id, drug_name, indication, ta_id, search_term, snomed_code, snomed_description
- cleaned_drug_name, primary_directorate, all_directorates, created_at
- Added 5 custom indexes for lookup performance:
- idx_ref_drug_snomed_mapping_drug (drug_name)
- idx_ref_drug_snomed_mapping_cleaned (cleaned_drug_name)
- idx_ref_drug_snomed_mapping_snomed (snomed_code)
- idx_ref_drug_snomed_mapping_search_term (search_term)
- idx_ref_drug_snomed_mapping_drug_snomed (composite: cleaned_drug_name, snomed_code)
- Added `create_drug_snomed_mapping_table()` helper function
- Added schema to `REFERENCE_TABLES_SCHEMA` (included in `ALL_TABLES_SCHEMA`)
- Updated helper functions to include new table:
- `drop_reference_tables()` — drops new table
- `get_reference_table_counts()` — counts new table (with try/except for safety)
- `verify_reference_tables_exist()` — checks for new table
### Validation results:
- Tier 1 (Code): `python -m py_compile data_processing/schema.py` — PASSED
- Tier 1 (Code): Import check — PASSED
- Tier 2 (Data): Migration created table with 0 rows — PASSED
- Tier 2 (Data): All 11 columns and 6 indexes present — PASSED
### Files changed:
- `data_processing/schema.py` — added schema, updated helpers
### Committed: cf35937 "feat: add ref_drug_snomed_mapping schema (Task 1.1)"
### Patterns discovered:
- Existing schema pattern: CREATE TABLE + CREATE INDEX in single SQL string
- Helper functions follow consistent pattern: create, drop, get_counts, verify_exists
- `get_reference_table_counts` needed try/except for tables that might not exist yet
### Next iteration should:
- Start Task 1.2: Create `data_processing/load_snomed_mapping.py` script to load the CSV
- The CSV has columns: Drug, Indication, TA_ID, Search_Term, SNOMEDCode, SNOMEDDescription, CleanedDrugName, PrimaryDirectorate, AllDirectorates
- Note: SNOMEDCode in CSV has `.0` suffix (e.g., "156370009.0") — may need cleanup
### Blocked items:
- None