docs: add guardrails for patient identifier and SNOMED code handling

This commit is contained in:
Andrew Charlwood
2026-02-05 15:51:52 +00:00
parent 843b4f23cc
commit 99bab08402
+105 -3
View File
@@ -73,6 +73,30 @@ def filtered_count(self) -> int:
--- ---
## Data Processing Guardrails
### Use existing pathway_analyzer functions
- **When**: Processing pathway data for the icicle chart
- **Rule**: Reuse functions from `analysis/pathway_analyzer.py` — don't reinvent
- **Why**: The existing code handles edge cases (empty groups, statistics calculation, color mapping)
### Extract denormalized fields from ids string
- **When**: Creating denormalized columns (trust_name, directory, drug_sequence)
- **Rule**: Parse the `ids` column which contains the full hierarchical path
- **Why**: The ids format is "Trust|Directory|Drug1|Drug2|..." — split on "|" to extract components
### Handle None/NULL values in pathway data
- **When**: Reading pathway_nodes from SQLite
- **Rule**: Always use `or ""` / `or 0` / `or "N/A"` when accessing optional columns
- **Why**: Many columns (costpp, average_spacing, etc.) can be NULL for certain hierarchy levels
### Use parameterized queries for SQLite
- **When**: Building WHERE clauses with user-selected filters
- **Rule**: Use `?` placeholders and pass params tuple — never string interpolation
- **Why**: Prevents SQL injection and handles special characters in drug/directory names
---
## Code Quality Guardrails ## Code Quality Guardrails
### Verify compilation before committing ### Verify compilation before committing
@@ -110,12 +134,90 @@ def filtered_count(self) -> int:
- **Why**: The next iteration has zero memory. If you don't write it down, it's lost. - **Why**: The next iteration has zero memory. If you don't write it down, it's lost.
### Check existing code for patterns ### Check existing code for patterns
- **When**: Unsure how to implement something in Reflex - **When**: Unsure how to implement something in Reflex or pathway processing
- **Rule**: Look at `pathways_app.py` for working examples before inventing new patterns - **Rule**: Look at `pathways_app/app_v2.py`, `analysis/pathway_analyzer.py`, `visualization/plotly_generator.py`
- **Why**: The existing codebase has solved many Reflex quirks already - **Why**: The existing codebase has solved many quirks already
--- ---
---
## UI Redesign Guardrails
### Clear Reflex cache before running
- **When**: Before running `reflex run` or `reflex compile`, especially after style/layout changes
- **Rule**: Delete `.states` and `.web` folders first: `Remove-Item -Recurse -Force .states, .web -ErrorAction SilentlyContinue`
- **Why**: Stale cache causes old styles/components to persist, making it appear changes didn't work
### Test visual changes with reflex run
- **When**: After any layout or styling changes
- **Rule**: Run `reflex run` and visually verify in browser. Screenshots are not enough.
- **Why**: CSS calculations and flex layouts often behave differently than expected
### Don't break existing functionality
- **When**: Refactoring layout components
- **Rule**: Ensure all filter handlers, KPI updates, and chart rendering still work after changes
- **Why**: It's easy to accidentally disconnect event handlers when restructuring components
### Use calc() for responsive heights
- **When**: Making elements fill remaining viewport space
- **Rule**: Use `height="calc(100vh - Xpx)"` where X is the sum of fixed-height elements above
- **Why**: Fixed heights don't adapt to content changes; calc() keeps things responsive
### Test at multiple viewport widths
- **When**: Making full-width changes
- **Rule**: Test at 1366px, 1920px, and 2560px widths minimum
- **Why**: Full-width layouts can break or look sparse at extreme sizes
### Keep filter dropdown z-index high
- **When**: Restructuring filter section
- **Rule**: Dropdown panels need `z_index="50"` or higher to appear above chart
- **Why**: Plotly charts have their own stacking context and can overlap dropdowns
---
## SNOMED Mapping Guardrails
### Use Search_Term for grouping, not Indication
- **When**: Creating indication-based pathway hierarchy
- **Rule**: Group patients by `Search_Term` column, NOT `Indication` column
- **Why**: Indication has 603 granular values; Search_Term has 187 broader categories suitable for chart grouping
### Handle unmatched patients in indication chart
- **When**: Patient has no GP diagnosis matching their drug's SNOMED codes
- **Rule**: Use their assigned directorate (from fallback logic) as the grouping label, not "Unknown"
- **Why**: User wants mixed labels - indication Search_Terms for matched patients, directorate names for unmatched
### Use most recent SNOMED code for multiple matches
- **When**: Patient has GP records matching multiple SNOMED codes for their drug
- **Rule**: Use the match with the most recent `EventDateTime` from PrimaryCareClinicalCoding
- **Why**: Most recent diagnosis reflects current clinical state
### Batch Snowflake queries for performance
- **When**: Looking up GP records for many patients
- **Rule**: Batch SNOMED lookups (e.g., 1000 patients at a time) rather than one query per patient
- **Why**: Individual queries for 35K+ patients would be extremely slow
### Track diagnosis match source
- **When**: Assigning directorate to a patient
- **Rule**: Track whether assignment came from "DIAGNOSIS" (SNOMED match) or "FALLBACK" (department_identification)
- **Why**: Needed for coverage metrics and debugging
### Chart type column in pathway_nodes
- **When**: Inserting pathway records to SQLite
- **Rule**: Include `chart_type` column with value "directory" or "indication"
- **Why**: Needed to filter pathways when user toggles chart type in UI
### Use PseudoNHSNoLinked for GP record matching
- **When**: Querying GP records (PrimaryCareClinicalCoding) for patient diagnoses
- **Rule**: Use `PseudoNHSNoLinked` column, NOT `PersonKey` (LocalPatientID)
- **Why**: PersonKey is provider-specific local ID. Only PseudoNHSNoLinked matches PatientPseudonym in GP records. Using PersonKey caused 0% GP match rate.
### Handle scientific notation in SNOMED codes
- **When**: Loading SNOMED codes from CSV files
- **Rule**: Convert scientific notation (e.g., "1.06e+16") back to full integers before storing
- **Why**: Large SNOMED codes (15-16 digits) exceed float precision. Pandas/Excel exports them as scientific notation. String matching will fail unless converted.
<!-- <!--
ADD NEW GUARDRAILS BELOW as failures are observed during the loop. ADD NEW GUARDRAILS BELOW as failures are observed during the loop.