- Add CLUSTER_MAPPING_SQL constant embedding full snomed_indication_mapping_query.sql
- Add get_patient_indication_groups() function that queries Snowflake directly
- Uses QUALIFY ROW_NUMBER() to get most recent diagnosis per patient
- Returns DataFrame with PatientPseudonym, Search_Term, EventDateTime
- Handles edge cases: empty list, Snowflake unavailable
- Batch processing with configurable batch_size (default 500)
- Comprehensive logging for match statistics
Two critical fixes for the indication-based pathway feature:
1. clean_snomed_code() now handles scientific notation (e.g., "1.06e+16")
- CSV export from pandas/Excel converts large SNOMED codes to scientific notation
- Without this fix, codes like "10629311000119108" were stored as "1.06e+16"
- Now properly converts to full integer strings
2. batch_lookup_indication_groups() now uses PseudoNHSNoLinked instead of PersonKey
- PersonKey is LocalPatientID (provider-specific like "J188448")
- PseudoNHSNoLinked is the pseudonymised NHS number that matches PatientPseudonym in GP records
- Without this fix, 0% of patients matched GP records
- Test shows ~20% match rate for ADALIMUMAB patients with correct identifier
- Added DirectorateAssignment dataclass for return type
- Added get_directorate_from_diagnosis() function to diagnosis_lookup.py
- Logic: Try diagnosis-based lookup first (direct SNOMED match)
- Returns FALLBACK source if no match found, letting caller handle fallback
- Extracts PatientPseudonym from UPID (last part after provider code)
- Updated __all__ exports with new dataclass and function
- Tested: function handles no-match cases correctly
Add two new functions to diagnosis_lookup.py for direct SNOMED code matching:
- get_drug_snomed_codes(drug_name): Query ref_drug_snomed_mapping for all
SNOMED codes mapped to a drug. Returns list of DrugSnomedMapping with
snomed_code, snomed_description, search_term, primary_directorate.
Tested: ADALIMUMAB returns 1320 mappings across 10 Search_Terms.
- patient_has_indication_direct(patient_pseudonym, mappings, connector):
Query PrimaryCareClinicalCoding for exact SNOMED code matches.
Returns most recent match by EventDateTime with DirectSnomedMatchResult.
Both functions follow existing patterns in the module and are exported
in __all__. The lookup is case-insensitive for drug names.