fix: correct patient identifier for GP diagnosis lookup (Task 3.3)

Two critical fixes for the indication-based pathway feature:

1. clean_snomed_code() now handles scientific notation (e.g., "1.06e+16")
   - CSV export from pandas/Excel converts large SNOMED codes to scientific notation
   - Without this fix, codes like "10629311000119108" were stored as "1.06e+16"
   - Now properly converts to full integer strings

2. batch_lookup_indication_groups() now uses PseudoNHSNoLinked instead of PersonKey
   - PersonKey is LocalPatientID (provider-specific like "J188448")
   - PseudoNHSNoLinked is the pseudonymised NHS number that matches PatientPseudonym in GP records
   - Without this fix, 0% of patients matched GP records
   - Test shows ~20% match rate for ADALIMUMAB patients with correct identifier
This commit is contained in:
Andrew Charlwood
2026-02-05 15:49:24 +00:00
parent b9f4041670
commit 5b1569ed5c
3 changed files with 42 additions and 24 deletions
+1 -1
View File
@@ -130,7 +130,7 @@ python -m reflex compile
- [ ] Verify: Test refresh with --dry-run, check coverage stats
### 3.3 Test Full Refresh Pipeline
- [ ] Run `python -m cli.refresh_pathways` with real data
- [~] Run `python -m cli.refresh_pathways` with real data
- [ ] Verify pathway_nodes table has both chart_type values
- [ ] Verify indication chart has expected hierarchy (Trust → SearchTerm → Drug)
- [ ] Verify unmatched patients appear with directorate fallback label