Files
HighCostDrugsDemo/guardrails.md
T
Andrew Charlwood 6331d44165 fix: prevent DataFrame mutation in prepare_data() causing indication charts to fail
prepare_data() mapped Provider Code → Name in-place. When called for directory
charts first, then indication charts, the second call re-mapped already-mapped
values to NaN, silently dropping all data. Added df.copy() to prevent mutation.

Also fixes directory charts only generating data for the first date filter.

Results: 3,633 pathway nodes now generated (1,101 directory + 2,532 indication)
across all 12 datasets (6 date filters × 2 chart types).
2026-02-05 20:10:12 +00:00

11 KiB

Guardrails

Known failure patterns. Read EVERY iteration. Follow ALL of these rules. If you discover a new failure pattern during your work, add it to this file.


Reflex Guardrails

Use .to() methods for Var operations in rx.foreach

  • When: Working with items inside rx.foreach render functions
  • Rule: Use item.to(int) for numeric comparisons, item.to_string() for text operations
  • Why: Items from rx.foreach are ObjectItemOperation Vars, not plain Python values. Using >= or f-strings directly causes TypeError.

Bad:

def render_row(item):
    color = rx.cond(item["value"] >= 50, "green", "red")  # TypeError!
    return rx.text(f"{item['name']}: {item['value']}")    # Won't interpolate!

Good:

def render_row(item):
    color = rx.cond(item["value"].to(int) >= 50, "green", "red")
    return rx.text(item["name"].to_string() + ": " + item["value"].to_string())

Use rx.cond for conditional rendering, not Python if

  • When: Conditionally showing/hiding components or changing styles based on state
  • Rule: Use rx.cond(condition, true_component, false_component) — not Python if
  • Why: Python if evaluates at definition time; rx.cond evaluates reactively at render time

State variables must have default values

  • When: Defining state variables in the State class
  • Rule: Always provide a default: my_var: str = "" not just my_var: str
  • Why: Reflex requires defaults for state initialization

Computed vars use @rx.var decorator

  • When: Creating derived/computed values from state
  • Rule: Use @rx.var decorator, return a value, and include return type annotation
  • Why: Without the decorator, the method won't be reactive
@rx.var
def filtered_count(self) -> int:
    return len(self.filtered_data)

Event handlers don't return values to components

  • When: Creating methods that handle user interactions
  • Rule: Event handlers modify state; they don't return values directly to UI
  • Why: Use state variables and computed vars to communicate between handlers and UI

Design System Guardrails

Never hardcode colors

  • When: Any styling that involves color
  • Rule: Import from pathways_app.styles and use Colors.PRIMARY, Colors.SLATE_700, etc.
  • Why: Hardcoded colors break consistency and make theming impossible

Never hardcode spacing

  • When: Any padding, margin, gap values
  • Rule: Use Spacing.SM, Spacing.LG, etc. from the styles module
  • Why: Consistent spacing is fundamental to visual cohesion

Use design system typography

  • When: Any text styling
  • Rule: Use the typography classes/helpers from styles.py
  • Why: Typography hierarchy creates visual structure

Data Processing Guardrails

Use existing pathway_analyzer functions

  • When: Processing pathway data for the icicle chart
  • Rule: Reuse functions from analysis/pathway_analyzer.py — don't reinvent
  • Why: The existing code handles edge cases (empty groups, statistics calculation, color mapping)

Extract denormalized fields from ids string

  • When: Creating denormalized columns (trust_name, directory, drug_sequence)
  • Rule: Parse the ids column which contains the full hierarchical path
  • Why: The ids format is "Trust|Directory|Drug1|Drug2|..." — split on "|" to extract components

Handle None/NULL values in pathway data

  • When: Reading pathway_nodes from SQLite
  • Rule: Always use or "" / or 0 / or "N/A" when accessing optional columns
  • Why: Many columns (costpp, average_spacing, etc.) can be NULL for certain hierarchy levels

Use parameterized queries for SQLite

  • When: Building WHERE clauses with user-selected filters
  • Rule: Use ? placeholders and pass params tuple — never string interpolation
  • Why: Prevents SQL injection and handles special characters in drug/directory names

Code Quality Guardrails

Verify compilation before committing

  • When: After ANY code changes
  • Rule: Run python -m py_compile <file> AND reflex run (briefly) to check
  • Why: Committing broken code wastes the next iteration fixing preventable errors

One component per function

  • When: Creating UI components
  • Rule: Each logical component should be its own function returning rx.Component
  • Why: Smaller functions are easier to debug and reuse

Keep state minimal

  • When: Designing state structure
  • Rule: Only store what's necessary; derive everything else with computed vars
  • Why: Duplicate state leads to sync bugs

Process Guardrails

One task per iteration

  • When: Temptation to do additional tasks after completing the current one
  • Rule: Complete ONE task, validate it, commit it, update progress, then stop
  • Why: Multiple tasks increase error risk and make failures harder to diagnose

Never mark complete without validation

  • When: Task feels "done" but hasn't been tested
  • Rule: All validation tiers must pass before marking [x]
  • Why: "Feels done" is not "is done"

Write explicit handoff notes

  • When: Every iteration, before stopping
  • Rule: The "Next iteration should" section must contain specific, actionable guidance
  • Why: The next iteration has zero memory. If you don't write it down, it's lost.

Check existing code for patterns

  • When: Unsure how to implement something in Reflex or pathway processing
  • Rule: Look at pathways_app/pathways_app.py, analysis/pathway_analyzer.py, visualization/plotly_generator.py
  • Why: The existing codebase has solved many quirks already

UI Redesign Guardrails

Clear Reflex cache before running

  • When: Before running reflex run or reflex compile, especially after style/layout changes
  • Rule: Delete .states and .web folders first: Remove-Item -Recurse -Force .states, .web -ErrorAction SilentlyContinue
  • Why: Stale cache causes old styles/components to persist, making it appear changes didn't work

Test visual changes with reflex run

  • When: After any layout or styling changes
  • Rule: Run reflex run and visually verify in browser. Screenshots are not enough.
  • Why: CSS calculations and flex layouts often behave differently than expected

Don't break existing functionality

  • When: Refactoring layout components
  • Rule: Ensure all filter handlers, KPI updates, and chart rendering still work after changes
  • Why: It's easy to accidentally disconnect event handlers when restructuring components

Use calc() for responsive heights

  • When: Making elements fill remaining viewport space
  • Rule: Use height="calc(100vh - Xpx)" where X is the sum of fixed-height elements above
  • Why: Fixed heights don't adapt to content changes; calc() keeps things responsive

Test at multiple viewport widths

  • When: Making full-width changes
  • Rule: Test at 1366px, 1920px, and 2560px widths minimum
  • Why: Full-width layouts can break or look sparse at extreme sizes

Keep filter dropdown z-index high

  • When: Restructuring filter section
  • Rule: Dropdown panels need z_index="50" or higher to appear above chart
  • Why: Plotly charts have their own stacking context and can overlap dropdowns

Snowflake Query Guardrails

Use PseudoNHSNoLinked for GP record matching

  • When: Querying GP records (PrimaryCareClinicalCoding) for patient diagnoses
  • Rule: Use PseudoNHSNoLinked column from HCD data, NOT PersonKey (LocalPatientID)
  • Why: PersonKey is provider-specific local ID. Only PseudoNHSNoLinked matches PatientPseudonym in GP records.

Use Search_Term for grouping, not Indication

  • When: Creating indication-based pathway hierarchy
  • Rule: Group patients by Search_Term from the cluster query
  • Why: Search_Term provides meaningful clinical groupings (~148 values)

Handle unmatched patients in indication chart

  • When: Patient has no GP diagnosis matching cluster SNOMED codes
  • Rule: Use their assigned directorate (from fallback logic) as the grouping label, not "Unknown"
  • Why: User wants mixed labels - Search_Terms for matched patients, directorate names for unmatched

Use most recent SNOMED code for multiple matches

  • When: Patient has GP records matching multiple SNOMED codes
  • Rule: Use the match with the most recent EventDateTime from PrimaryCareClinicalCoding
  • Why: Most recent diagnosis reflects current clinical state

Embed cluster query as CTE in Snowflake

  • When: Looking up patient indications during data refresh
  • Rule: Use the snomed_indication_mapping_query.sql content as a WITH clause in the patient lookup query
  • Why: This ensures we always use the complete cluster mapping and don't need local storage

Chart type column in pathway_nodes

  • When: Inserting pathway records to SQLite
  • Rule: Include chart_type column with value "directory" or "indication"
  • Why: Needed to filter pathways when user toggles chart type in UI

Quote mixed-case column aliases in Snowflake SQL

  • When: Writing SELECT queries that return results to Python code
  • Rule: Use AS "ColumnName" (quoted) for any column alias you'll access by name in Python
  • Why: Snowflake uppercases unquoted identifiers. SELECT foo AS Search_Term returns SEARCH_TERM, so row.get('Search_Term') returns None. Fix: SELECT foo AS "Search_Term"

Build indication_df from all unique UPIDs, not PseudoNHSNoLinked

  • When: Creating the indication mapping DataFrame for pathway processing
  • Rule: Use df.drop_duplicates(subset=['UPID']) not drop_duplicates(subset=['PseudoNHSNoLinked'])
  • Why: A patient visiting multiple providers has multiple UPIDs (UPID = ProviderCode[:3] + PersonKey). Using unique PseudoNHSNoLinked only maps one UPID per patient, leaving others as NaN and causing TypeError in build_hierarchy.

Handle NaN in Directory when building fallback labels

  • When: Creating fallback indication labels for patients without GP diagnosis match
  • Rule: Check pd.notna(directory) before concatenating to string. Use "UNKNOWN (no GP dx)" for NaN cases.
  • Why: str(nan) + " (no GP dx)" doesn't cause error, but nan + " (no GP dx)" causes TypeError. Always be explicit about NaN handling.

Copy DataFrames in functions that modify columns

  • When: Writing functions like prepare_data() that modify DataFrame columns (e.g., mapping Provider Code to trust names)
  • Rule: Always df = df.copy() at the start of any function that modifies column values on the input DataFrame
  • Why: prepare_data() mapped Provider Code → Name in-place. When called for directory charts first, then indication charts second, the second call tried to map already-mapped names → NaN, silently dropping all data. The fix: df = df.copy() prevents destructive mutation of the caller's DataFrame.