9943e85761
- Add REF_DRUG_SNOMED_MAPPING_SCHEMA with 11 columns for direct SNOMED mapping - Add 5 indexes for lookup performance (drug, cleaned_drug, snomed, search_term, composite) - Add create_drug_snomed_mapping_table() helper function - Update helper functions (drop, get_counts, verify_exists) to include new table - Table is included in REFERENCE_TABLES_SCHEMA and created by migration
106 lines
4.9 KiB
Plaintext
106 lines
4.9 KiB
Plaintext
# Progress Log - Direct SNOMED Indication Mapping
|
||
|
||
## Project Context
|
||
|
||
This project extends the existing HCD Pathway Analysis application with direct SNOMED code matching from GP records. The previous project (Phases 1-5) established the pre-computed pathway architecture and modern UI. This phase adds:
|
||
|
||
1. **Diagnosis-based directorate assignment** - Primary method using GP SNOMED codes
|
||
2. **Indication-based icicle chart** - New chart type showing Trust → Search_Term → Drug → Pathway
|
||
|
||
## Key Files Reference
|
||
|
||
**Existing (reuse these):**
|
||
- `data_processing/schema.py` - SQLite schema (add new table)
|
||
- `data_processing/diagnosis_lookup.py` - Existing cluster-based lookup (extend with direct SNOMED)
|
||
- `data_processing/pathway_pipeline.py` - Pathway processing (add indication type)
|
||
- `cli/refresh_pathways.py` - CLI refresh command (add chart type support)
|
||
- `pathways_app/pathways_app.py` - Reflex app (add chart type toggle)
|
||
- `tools/data.py` - Data transformations including department_identification()
|
||
|
||
**New data:**
|
||
- `data/drug_snomed_mapping_enriched.csv` - 163K rows, 187 Search_Terms, 364 drugs
|
||
|
||
## Known Patterns
|
||
|
||
### SNOMED Mapping Structure
|
||
The enriched mapping CSV has columns:
|
||
- Drug, Indication, TA_ID (from NICE TAs)
|
||
- Search_Term (simplified grouping, 187 unique values)
|
||
- SNOMEDCode, SNOMEDDescription
|
||
- CleanedDrugName, PrimaryDirectorate, AllDirectorates
|
||
|
||
### Direct SNOMED Lookup Logic
|
||
For a patient on drug X:
|
||
1. Get all SNOMED codes for that drug from ref_drug_snomed_mapping
|
||
2. Query PrimaryCareClinicalCoding for those codes (patient's GP record)
|
||
3. If match found → use Search_Term and PrimaryDirectorate from matched row
|
||
4. If no match → fall back to department_identification()
|
||
5. Use most recent SNOMED code by EventDateTime if multiple matches
|
||
|
||
### Chart Type Architecture
|
||
- `chart_type` column in pathway_nodes: "directory" or "indication"
|
||
- 12 total pathway datasets: 6 date filters × 2 chart types
|
||
- Indication chart: mixed labels (Search_Term for matched, Directorate for unmatched)
|
||
|
||
### Date Filter Combinations
|
||
| ID | Initiated | Last Seen | Default |
|
||
|----|-----------|-----------|---------|
|
||
| `all_6mo` | All years | Last 6 months | Yes |
|
||
| `all_12mo` | All years | Last 12 months | No |
|
||
| `1yr_6mo` | Last 1 year | Last 6 months | No |
|
||
| `1yr_12mo` | Last 1 year | Last 12 months | No |
|
||
| `2yr_6mo` | Last 2 years | Last 6 months | No |
|
||
| `2yr_12mo` | Last 2 years | Last 12 months | No |
|
||
|
||
### Expected Volumes
|
||
- SNOMED mapping: 163K rows
|
||
- Search_Terms: 187 unique
|
||
- Pathway nodes per date filter: ~300 (directory), ~400-600 (indication)
|
||
|
||
---
|
||
|
||
## Iteration Log
|
||
|
||
## Iteration 1 — 2026-02-05
|
||
### Task: 1.1 Create SQLite Table for SNOMED Mapping
|
||
### Why this task:
|
||
- First task in Phase 1 (Data Infrastructure) — all other phases depend on having the data layer in place
|
||
- No external dependencies — pure schema definition work
|
||
- Follows "data infrastructure first" principle
|
||
### Status: COMPLETE
|
||
### What was done:
|
||
- Added `REF_DRUG_SNOMED_MAPPING_SCHEMA` to `data_processing/schema.py` with 11 columns:
|
||
- id, drug_name, indication, ta_id, search_term, snomed_code, snomed_description
|
||
- cleaned_drug_name, primary_directorate, all_directorates, created_at
|
||
- Added 5 custom indexes for lookup performance:
|
||
- idx_ref_drug_snomed_mapping_drug (drug_name)
|
||
- idx_ref_drug_snomed_mapping_cleaned (cleaned_drug_name)
|
||
- idx_ref_drug_snomed_mapping_snomed (snomed_code)
|
||
- idx_ref_drug_snomed_mapping_search_term (search_term)
|
||
- idx_ref_drug_snomed_mapping_drug_snomed (composite: cleaned_drug_name, snomed_code)
|
||
- Added `create_drug_snomed_mapping_table()` helper function
|
||
- Added schema to `REFERENCE_TABLES_SCHEMA` (included in `ALL_TABLES_SCHEMA`)
|
||
- Updated helper functions to include new table:
|
||
- `drop_reference_tables()` — drops new table
|
||
- `get_reference_table_counts()` — counts new table (with try/except for safety)
|
||
- `verify_reference_tables_exist()` — checks for new table
|
||
### Validation results:
|
||
- Tier 1 (Code): `python -m py_compile data_processing/schema.py` — PASSED
|
||
- Tier 1 (Code): Import check — PASSED
|
||
- Tier 2 (Data): Migration created table with 0 rows — PASSED
|
||
- Tier 2 (Data): All 11 columns and 6 indexes present — PASSED
|
||
### Files changed:
|
||
- `data_processing/schema.py` — added schema, updated helpers
|
||
### Committed: cf35937 "feat: add ref_drug_snomed_mapping schema (Task 1.1)"
|
||
### Patterns discovered:
|
||
- Existing schema pattern: CREATE TABLE + CREATE INDEX in single SQL string
|
||
- Helper functions follow consistent pattern: create, drop, get_counts, verify_exists
|
||
- `get_reference_table_counts` needed try/except for tables that might not exist yet
|
||
### Next iteration should:
|
||
- Start Task 1.2: Create `data_processing/load_snomed_mapping.py` script to load the CSV
|
||
- The CSV has columns: Drug, Indication, TA_ID, Search_Term, SNOMEDCode, SNOMEDDescription, CleanedDrugName, PrimaryDirectorate, AllDirectorates
|
||
- Note: SNOMEDCode in CSV has `.0` suffix (e.g., "156370009.0") — may need cleanup
|
||
### Blocked items:
|
||
- None
|
||
|