Files

T

Andrew Charlwood 6331d44165 fix: prevent DataFrame mutation in prepare_data() causing indication charts to fail

prepare_data() mapped Provider Code → Name in-place. When called for directory
charts first, then indication charts, the second call re-mapped already-mapped
values to NaN, silently dropping all data. Added df.copy() to prevent mutation.

Also fixes directory charts only generating data for the first date filter.

Results: 3,633 pathway nodes now generated (1,101 directory + 2,532 indication)
across all 12 datasets (6 date filters × 2 chart types).

2026-02-05 20:10:12 +00:00

11 KiB

Raw Blame History

Guardrails

Known failure patterns. Read EVERY iteration. Follow ALL of these rules. If you discover a new failure pattern during your work, add it to this file.

Reflex Guardrails

Use .to() methods for Var operations in rx.foreach

When: Working with items inside rx.foreach render functions
Rule: Use item.to(int) for numeric comparisons, item.to_string() for text operations
Why: Items from rx.foreach are ObjectItemOperation Vars, not plain Python values. Using >= or f-strings directly causes TypeError.

Bad:

def render_row(item):
    color = rx.cond(item["value"] >= 50, "green", "red")  # TypeError!
    return rx.text(f"{item['name']}: {item['value']}")    # Won't interpolate!

Good:

def render_row(item):
    color = rx.cond(item["value"].to(int) >= 50, "green", "red")
    return rx.text(item["name"].to_string() + ": " + item["value"].to_string())

Use rx.cond for conditional rendering, not Python if

When: Conditionally showing/hiding components or changing styles based on state
Rule: Use rx.cond(condition, true_component, false_component) — not Python if
Why: Python if evaluates at definition time; rx.cond evaluates reactively at render time

State variables must have default values

When: Defining state variables in the State class
Rule: Always provide a default: my_var: str = "" not just my_var: str
Why: Reflex requires defaults for state initialization

Computed vars use @rx.var decorator

When: Creating derived/computed values from state
Rule: Use @rx.var decorator, return a value, and include return type annotation
Why: Without the decorator, the method won't be reactive

@rx.var
def filtered_count(self) -> int:
    return len(self.filtered_data)

Event handlers don't return values to components

When: Creating methods that handle user interactions
Rule: Event handlers modify state; they don't return values directly to UI
Why: Use state variables and computed vars to communicate between handlers and UI

Design System Guardrails

Never hardcode colors

When: Any styling that involves color
Rule: Import from pathways_app.styles and use Colors.PRIMARY, Colors.SLATE_700, etc.
Why: Hardcoded colors break consistency and make theming impossible

Never hardcode spacing

When: Any padding, margin, gap values
Rule: Use Spacing.SM, Spacing.LG, etc. from the styles module
Why: Consistent spacing is fundamental to visual cohesion

Use design system typography

When: Any text styling
Rule: Use the typography classes/helpers from styles.py
Why: Typography hierarchy creates visual structure

Data Processing Guardrails

Use existing pathway_analyzer functions

When: Processing pathway data for the icicle chart
Rule: Reuse functions from analysis/pathway_analyzer.py — don't reinvent
Why: The existing code handles edge cases (empty groups, statistics calculation, color mapping)

Extract denormalized fields from ids string

When: Creating denormalized columns (trust_name, directory, drug_sequence)
Rule: Parse the ids column which contains the full hierarchical path
Why: The ids format is "Trust|Directory|Drug1|Drug2|..." — split on "|" to extract components

Handle None/NULL values in pathway data

When: Reading pathway_nodes from SQLite
Rule: Always use or "" / or 0 / or "N/A" when accessing optional columns
Why: Many columns (costpp, average_spacing, etc.) can be NULL for certain hierarchy levels

Use parameterized queries for SQLite

When: Building WHERE clauses with user-selected filters
Rule: Use ? placeholders and pass params tuple — never string interpolation
Why: Prevents SQL injection and handles special characters in drug/directory names

Code Quality Guardrails

Verify compilation before committing

When: After ANY code changes
Rule: Run python -m py_compile <file> AND reflex run (briefly) to check
Why: Committing broken code wastes the next iteration fixing preventable errors

One component per function

When: Creating UI components
Rule: Each logical component should be its own function returning rx.Component
Why: Smaller functions are easier to debug and reuse

Keep state minimal

When: Designing state structure
Rule: Only store what's necessary; derive everything else with computed vars
Why: Duplicate state leads to sync bugs

Process Guardrails

One task per iteration

When: Temptation to do additional tasks after completing the current one
Rule: Complete ONE task, validate it, commit it, update progress, then stop
Why: Multiple tasks increase error risk and make failures harder to diagnose

Never mark complete without validation

When: Task feels "done" but hasn't been tested
Rule: All validation tiers must pass before marking [x]
Why: "Feels done" is not "is done"

Write explicit handoff notes

When: Every iteration, before stopping
Rule: The "Next iteration should" section must contain specific, actionable guidance
Why: The next iteration has zero memory. If you don't write it down, it's lost.

Check existing code for patterns

When: Unsure how to implement something in Reflex or pathway processing
Rule: Look at pathways_app/pathways_app.py, analysis/pathway_analyzer.py, visualization/plotly_generator.py
Why: The existing codebase has solved many quirks already

UI Redesign Guardrails

Clear Reflex cache before running

When: Before running reflex run or reflex compile, especially after style/layout changes
Rule: Delete .states and .web folders first: Remove-Item -Recurse -Force .states, .web -ErrorAction SilentlyContinue
Why: Stale cache causes old styles/components to persist, making it appear changes didn't work

Test visual changes with reflex run

When: After any layout or styling changes
Rule: Run reflex run and visually verify in browser. Screenshots are not enough.
Why: CSS calculations and flex layouts often behave differently than expected

Don't break existing functionality

When: Refactoring layout components
Rule: Ensure all filter handlers, KPI updates, and chart rendering still work after changes
Why: It's easy to accidentally disconnect event handlers when restructuring components

Use calc() for responsive heights

When: Making elements fill remaining viewport space
Rule: Use height="calc(100vh - Xpx)" where X is the sum of fixed-height elements above
Why: Fixed heights don't adapt to content changes; calc() keeps things responsive

Test at multiple viewport widths

When: Making full-width changes
Rule: Test at 1366px, 1920px, and 2560px widths minimum
Why: Full-width layouts can break or look sparse at extreme sizes

When: Restructuring filter section
Rule: Dropdown panels need z_index="50" or higher to appear above chart
Why: Plotly charts have their own stacking context and can overlap dropdowns

Snowflake Query Guardrails

Use PseudoNHSNoLinked for GP record matching

When: Querying GP records (PrimaryCareClinicalCoding) for patient diagnoses
Rule: Use PseudoNHSNoLinked column from HCD data, NOT PersonKey (LocalPatientID)
Why: PersonKey is provider-specific local ID. Only PseudoNHSNoLinked matches PatientPseudonym in GP records.

Use Search_Term for grouping, not Indication

When: Creating indication-based pathway hierarchy
Rule: Group patients by Search_Term from the cluster query
Why: Search_Term provides meaningful clinical groupings (~148 values)

Handle unmatched patients in indication chart

When: Patient has no GP diagnosis matching cluster SNOMED codes
Rule: Use their assigned directorate (from fallback logic) as the grouping label, not "Unknown"
Why: User wants mixed labels - Search_Terms for matched patients, directorate names for unmatched

Use most recent SNOMED code for multiple matches

When: Patient has GP records matching multiple SNOMED codes
Rule: Use the match with the most recent EventDateTime from PrimaryCareClinicalCoding
Why: Most recent diagnosis reflects current clinical state

Embed cluster query as CTE in Snowflake

When: Looking up patient indications during data refresh
Rule: Use the snomed_indication_mapping_query.sql content as a WITH clause in the patient lookup query
Why: This ensures we always use the complete cluster mapping and don't need local storage

Chart type column in pathway_nodes

When: Inserting pathway records to SQLite
Rule: Include chart_type column with value "directory" or "indication"
Why: Needed to filter pathways when user toggles chart type in UI

Quote mixed-case column aliases in Snowflake SQL

When: Writing SELECT queries that return results to Python code
Rule: Use AS "ColumnName" (quoted) for any column alias you'll access by name in Python
Why: Snowflake uppercases unquoted identifiers. SELECT foo AS Search_Term returns SEARCH_TERM, so row.get('Search_Term') returns None. Fix: SELECT foo AS "Search_Term"

Build indication_df from all unique UPIDs, not PseudoNHSNoLinked

When: Creating the indication mapping DataFrame for pathway processing
Rule: Use df.drop_duplicates(subset=['UPID']) not drop_duplicates(subset=['PseudoNHSNoLinked'])
Why: A patient visiting multiple providers has multiple UPIDs (UPID = ProviderCode[:3] + PersonKey). Using unique PseudoNHSNoLinked only maps one UPID per patient, leaving others as NaN and causing TypeError in build_hierarchy.

Handle NaN in Directory when building fallback labels

When: Creating fallback indication labels for patients without GP diagnosis match
Rule: Check pd.notna(directory) before concatenating to string. Use "UNKNOWN (no GP dx)" for NaN cases.
Why: str(nan) + " (no GP dx)" doesn't cause error, but nan + " (no GP dx)" causes TypeError. Always be explicit about NaN handling.

Copy DataFrames in functions that modify columns

When: Writing functions like prepare_data() that modify DataFrame columns (e.g., mapping Provider Code to trust names)
Rule: Always df = df.copy() at the start of any function that modifies column values on the input DataFrame
Why: prepare_data() mapped Provider Code → Name in-place. When called for directory charts first, then indication charts second, the second call tried to map already-mapped names → NaN, silently dropping all data. The fix: df = df.copy() prevents destructive mutation of the caller's DataFrame.

11 KiB Raw Blame History