docs: add guardrails for patient identifier and SNOMED code handling

2026-02-05 15:51:52 +00:00
parent 843b4f23cc
commit 99bab08402
1 changed files with 105 additions and 3 deletions
@@ -73,6 +73,30 @@ def filtered_count(self) -> int:

 ---

+## Data Processing Guardrails
+
+### Use existing pathway_analyzer functions
+- **When**: Processing pathway data for the icicle chart
+- **Rule**: Reuse functions from `analysis/pathway_analyzer.py` — don't reinvent
+- **Why**: The existing code handles edge cases (empty groups, statistics calculation, color mapping)
+
+### Extract denormalized fields from ids string
+- **When**: Creating denormalized columns (trust_name, directory, drug_sequence)
+- **Rule**: Parse the `ids` column which contains the full hierarchical path
+- **Why**: The ids format is "Trust|Directory|Drug1|Drug2|..." — split on "|" to extract components
+
+### Handle None/NULL values in pathway data
+- **When**: Reading pathway_nodes from SQLite
+- **Rule**: Always use `or ""` / `or 0` / `or "N/A"` when accessing optional columns
+- **Why**: Many columns (costpp, average_spacing, etc.) can be NULL for certain hierarchy levels
+
+### Use parameterized queries for SQLite
+- **When**: Building WHERE clauses with user-selected filters
+- **Rule**: Use `?` placeholders and pass params tuple — never string interpolation
+- **Why**: Prevents SQL injection and handles special characters in drug/directory names
+
+---
+
 ## Code Quality Guardrails

 ### Verify compilation before committing
@@ -110,12 +134,90 @@ def filtered_count(self) -> int:
 - **Why**: The next iteration has zero memory. If you don't write it down, it's lost.

 ### Check existing code for patterns
- **When**: Unsure how to implement something in Reflex
- **Rule**: Look at `pathways_app.py` for working examples before inventing new patterns
- **Why**: The existing codebase has solved many Reflex quirks already
+- **When**: Unsure how to implement something in Reflex or pathway processing
+- **Rule**: Look at `pathways_app/app_v2.py`, `analysis/pathway_analyzer.py`, `visualization/plotly_generator.py`
+- **Why**: The existing codebase has solved many quirks already

 ---

+---
+
+## UI Redesign Guardrails
+
+### Clear Reflex cache before running
+- **When**: Before running `reflex run` or `reflex compile`, especially after style/layout changes
+- **Rule**: Delete `.states` and `.web` folders first: `Remove-Item -Recurse -Force .states, .web -ErrorAction SilentlyContinue`
+- **Why**: Stale cache causes old styles/components to persist, making it appear changes didn't work
+
+### Test visual changes with reflex run
+- **When**: After any layout or styling changes
+- **Rule**: Run `reflex run` and visually verify in browser. Screenshots are not enough.
+- **Why**: CSS calculations and flex layouts often behave differently than expected
+
+### Don't break existing functionality
+- **When**: Refactoring layout components
+- **Rule**: Ensure all filter handlers, KPI updates, and chart rendering still work after changes
+- **Why**: It's easy to accidentally disconnect event handlers when restructuring components
+
+### Use calc() for responsive heights
+- **When**: Making elements fill remaining viewport space
+- **Rule**: Use `height="calc(100vh - Xpx)"` where X is the sum of fixed-height elements above
+- **Why**: Fixed heights don't adapt to content changes; calc() keeps things responsive
+
+### Test at multiple viewport widths
+- **When**: Making full-width changes
+- **Rule**: Test at 1366px, 1920px, and 2560px widths minimum
+- **Why**: Full-width layouts can break or look sparse at extreme sizes
+
+### Keep filter dropdown z-index high
+- **When**: Restructuring filter section
+- **Rule**: Dropdown panels need `z_index="50"` or higher to appear above chart
+- **Why**: Plotly charts have their own stacking context and can overlap dropdowns
+
+---
+
+## SNOMED Mapping Guardrails
+
+### Use Search_Term for grouping, not Indication
+- **When**: Creating indication-based pathway hierarchy
+- **Rule**: Group patients by `Search_Term` column, NOT `Indication` column
+- **Why**: Indication has 603 granular values; Search_Term has 187 broader categories suitable for chart grouping
+
+### Handle unmatched patients in indication chart
+- **When**: Patient has no GP diagnosis matching their drug's SNOMED codes
+- **Rule**: Use their assigned directorate (from fallback logic) as the grouping label, not "Unknown"
+- **Why**: User wants mixed labels - indication Search_Terms for matched patients, directorate names for unmatched
+
+### Use most recent SNOMED code for multiple matches
+- **When**: Patient has GP records matching multiple SNOMED codes for their drug
+- **Rule**: Use the match with the most recent `EventDateTime` from PrimaryCareClinicalCoding
+- **Why**: Most recent diagnosis reflects current clinical state
+
+### Batch Snowflake queries for performance
+- **When**: Looking up GP records for many patients
+- **Rule**: Batch SNOMED lookups (e.g., 1000 patients at a time) rather than one query per patient
+- **Why**: Individual queries for 35K+ patients would be extremely slow
+
+### Track diagnosis match source
+- **When**: Assigning directorate to a patient
+- **Rule**: Track whether assignment came from "DIAGNOSIS" (SNOMED match) or "FALLBACK" (department_identification)
+- **Why**: Needed for coverage metrics and debugging
+
+### Chart type column in pathway_nodes
+- **When**: Inserting pathway records to SQLite
+- **Rule**: Include `chart_type` column with value "directory" or "indication"
+- **Why**: Needed to filter pathways when user toggles chart type in UI
+
+### Use PseudoNHSNoLinked for GP record matching
+- **When**: Querying GP records (PrimaryCareClinicalCoding) for patient diagnoses
+- **Rule**: Use `PseudoNHSNoLinked` column, NOT `PersonKey` (LocalPatientID)
+- **Why**: PersonKey is provider-specific local ID. Only PseudoNHSNoLinked matches PatientPseudonym in GP records. Using PersonKey caused 0% GP match rate.
+
+### Handle scientific notation in SNOMED codes
+- **When**: Loading SNOMED codes from CSV files
+- **Rule**: Convert scientific notation (e.g., "1.06e+16") back to full integers before storing
+- **Why**: Large SNOMED codes (15-16 digits) exceed float precision. Pandas/Excel exports them as scientific notation. String matching will fail unless converted.
+
 <!--
 ADD NEW GUARDRAILS BELOW as failures are observed during the loop.