54b4a0f743
Rewrote README.md, USER_GUIDE.md, and DEPLOYMENT.md to reflect the Dash application. Updated RALPH_PROMPT.md, guardrails.md, and DESIGN_SYSTEM.md to remove Reflex references. All non-archive documentation now reflects the current Dash + DMC architecture.
11 KiB
11 KiB
Guardrails
Known failure patterns. Read EVERY iteration. Follow ALL of these rules. If you discover a new failure pattern during your work, add it to this file.
Backend Isolation
Do NOT modify pipeline/analysis logic in src/
- When: Building Dash integration
- Rule: Do NOT change the logic in these files — they are the data pipeline and must stay as-is:
data_processing/pathway_pipeline.py,transforms.py,diagnosis_lookup.py(matching/query logic)analysis/pathway_analyzer.py,statistics.pycli/refresh_pathways.pydata_processing/schema.py,reference_data.py,cache.py,data_source.py
- Why: The pipeline is complete and tested. Changing it risks breaking the data refresh workflow.
DO use shared utilities in src/ rather than duplicating
- When: The Dash app needs data loading or figure construction
- Rule: Dash callbacks should CALL INTO
src/, not duplicate the code. Shared functions:data_processing/pathway_queries.py—load_initial_data()andload_pathway_nodes()for all SQLite queriesvisualization/plotly_generator.py—create_icicle_from_nodes()for icicle chart from list-of-dictsdash_app/data/queries.py— thin wrapper that resolves DB path and delegates to shared functions
- Why: Duplicating SQL queries and figure logic creates copies that drift apart. Shared code in
src/is the cleaner architecture.
Do NOT modify pathways.db schema or data
- When: Querying the database from Dash callbacks
- Rule: Read-only access. Use
sqlite3.connect(db_path)with SELECT queries only. Never INSERT, UPDATE, DELETE, or ALTER. - Why: pathways.db is populated by
python -m cli.refresh_pathways. The Dash app is a read-only consumer.
CSS & Design Fidelity
Use className matching 01_nhs_classic.html, not inline styles
- When: Building any Dash HTML component
- Rule: Use
className="css-class-name"referencing classes fromdash_app/assets/nhs.css. Do NOT use inlinestyle={}dicts for layout/visual styling. Only use inline styles for truly dynamic values (e.g.,style={"flex": patient_count}for proportional widths). - Why: CSS fidelity to the HTML concept is a primary goal. Inline styles drift from the design and are harder to maintain.
nhs.css is the single source of CSS truth
- When: Adding or modifying styles
- Rule: All styles go in
dash_app/assets/nhs.css. If the concept HTML doesn't have a class for something, add it to nhs.css with the same naming convention (.component__element--modifier). - Why: Dash auto-serves files from
assets/. Keeping CSS in one file matches the design source (01_nhs_classic.html) and avoids style fragmentation.
Read 01_nhs_classic.html when building UI components
- When: Creating any component in
dash_app/components/ - Rule: Read
01_nhs_classic.htmlfirst to see the exact HTML structure, CSS classes, and element hierarchy for that component. Match it as closely as possible. - Why: The HTML concept IS the design spec. Deviating creates visual inconsistency.
Callback Architecture
No circular callback dependencies
- When: Writing Dash callbacks
- Rule: Callbacks must flow unidirectionally: filter inputs →
app-statestore →chart-datastore → UI components. Never have a component that is both Input and Output in the same callback chain without an intermediate store. - Why: Dash raises
DuplicateCallbackerrors for circular dependencies, and they're extremely hard to debug.
Use dcc.Store for all state, not server-side globals
- When: Managing application state (selected filters, chart data, reference data)
- Rule: ALL state lives in
dcc.Storecomponents. Never use module-level globals, class variables, orflask.gfor state. The 3 stores are:app-state(session),chart-data(memory),reference-data(session). - Why: Dash is stateless per request. Server-side state breaks with multiple users and causes subtle bugs during development.
Use callback_context for multi-input callbacks
- When: A callback has multiple Inputs and needs to know which one triggered it
- Rule: Use
dash.callback_context.triggered(orctx.triggered_idin Dash 2.x) to determine the triggering input. - Why: Without this, the callback runs for every input change and you can't distinguish which filter changed.
Pattern-matching callbacks for dynamic drug chips
- When: Building the card browser drawer with clickable drug chips
- Rule: Use
{"type": "drug-chip", "index": drug_name}pattern for chip IDs. Register callbacks withInput({"type": "drug-chip", "index": ALL}, "n_clicks"). Access triggered chip viactx.triggered_id["index"]. - Why: The number of drug chips is dynamic (changes per directorate/indication). Pattern-matching callbacks handle this without hardcoding IDs.
Plotly Figure
Preserve create_icicle_from_nodes() in src/visualization/plotly_generator.py
- When: Modifying the icicle chart
- Rule:
create_icicle_from_nodes(nodes, title)insrc/visualization/plotly_generator.pyis the shared icicle chart function. It accepts list-of-dicts from dcc.Store. Key properties:- 10-field customdata structure (value, colour, cost, costpp, first_seen, last_seen, first_seen_parent, last_seen_parent, average_spacing, cost_pp_pa)
- NHS colorscale:
[[0.0, "#003087"], [0.25, "#0066CC"], [0.5, "#1E88E5"], [0.75, "#4FC3F7"], [1.0, "#E3F2FD"]] maxdepth=3,branchvalues="total",sort=False- Layout: transparent background, reduced margins, autosize
- Why: The icicle chart is tested and correct. The Dash callback in
dash_app/callbacks/chart.pycalls this function.
Chart data is a list of dicts
- When: Passing data between
chart-datastore and chart callback - Rule:
chart-datastore holds{"nodes": [...], "unique_patients": int, "total_drugs": int, "total_cost": float}. Each node is a dict with keys matching the SQLite columns needed for the figure:parents, ids, labels, value, cost, costpp, colour, first_seen, last_seen, first_seen_parent, last_seen_parent, average_spacing, cost_pp_pa. - Why:
dcc.Storeserializes to JSON. Keep the same dict structure thatpathways_app.pyuses forchart_dataso the figure callback works identically.
Data Extraction
Keep data logic in shared src/ functions, not dash_app/ duplicates
- When: Adding or modifying data loading functions
- Rule: SQL queries and data logic live in
src/data_processing/pathway_queries.py. Thedash_app/data/queries.pyis a thin wrapper that resolves the DB path and delegates. Do not duplicate queries indash_app/. - Why: Shared code in
src/prevents query drift and keeps the single source of truth for data access.
DimSearchTerm.csv fragments are substrings
- When: Building the card browser or matching drugs to indications
- Rule:
CleanedDrugNamevalues in DimSearchTerm.csv are drug name FRAGMENTS (e.g., "ADALIMUMAB", "PEGYLATED", "INHALED"). They're matched against full drug names usingdrug_name.upper().contains(fragment). Don't assume exact match. - Why: Some fragments are partial (INHALED matches "INHALED BECLOMETASONE", "INHALED FLUTICASONE", etc.).
Apply SEARCH_TERM_MERGE_MAP when loading DimSearchTerm.csv
- When: Building the directorate tree in
card_browser.py - Rule: Import and apply
SEARCH_TERM_MERGE_MAPfromdata_processing.diagnosis_lookupto normalize "allergic asthma" → "asthma" and "severe persistent allergic asthma" → "asthma". Keep "urticaria" separate. - Why: The Snowflake query and pathway processing already use merged Search_Terms. The card browser must match.
SQLite Queries
Use parameterized queries for all filters
- When: Building WHERE clauses with user-selected values
- Rule: Use
?placeholders and pass params as a list. Never use f-strings or string interpolation for filter values. - Why: Prevents SQL injection and handles special characters in drug/directory names (e.g., "CROHN'S DISEASE").
Database path resolution
- When: Connecting to pathways.db from dash_app/
- Rule: Use
Path(__file__).resolve().parents[2] / "data" / "pathways.db"from files indash_app/data/. This resolves fromdash_app/data/queries.py→ project root →data/pathways.db. - Why: Relative paths break depending on the working directory. Absolute path resolution is reliable.
Dash Framework
Wrap layout in dmc.MantineProvider
- When: Setting up the app layout in
app.py - Rule: The outermost layout element must be
dmc.MantineProvider(children=[...]). Without this, DMC components (Drawer, Accordion, Chip, etc.) won't render. - Why: Dash Mantine Components requires the Provider context to function.
dcc.Store storage_type matters
- When: Creating the 3 store components
- Rule:
app-state:storage_type="session"— persists across page refreshes within a tabchart-data:storage_type="memory"— cleared on page refresh (reloaded from SQLite)reference-data:storage_type="session"— loaded once, persists across refreshes
- Why: Wrong storage type causes stale data bugs (memory clears too often) or wasted queries (session persists when it shouldn't).
Dash assets directory is auto-served
- When: Placing CSS, JS, or images
- Rule: Put static assets in
dash_app/assets/. Dash serves them automatically. Reference CSS viaclassName, not<link>tags. - Why: Dash's asset pipeline handles caching and serving. Manual
<link>tags are unnecessary and may not work.
Process Guardrails
One task per iteration
- When: Temptation to do additional tasks after completing the current one
- Rule: Complete ONE task, validate it, commit it, update progress, then stop
- Why: Multiple tasks increase error risk and make failures harder to diagnose
Never mark complete without validation
- When: Task feels "done" but hasn't been tested
- Rule: All validation tiers must pass before marking
[x] - Why: "Feels done" is not "is done"
Write explicit handoff notes
- When: Every iteration, before stopping
- Rule: The "Next iteration should" section must contain specific, actionable guidance
- Why: The next iteration has zero memory. If you don't write it down, it's lost.
Validate with python run_dash.py
- When: After completing any task
- Rule: Run
python run_dash.py(orpython -c "from dash_app.app import app"for import checks). The app must start without errors after EVERY task. - Why: Broken imports or circular dependencies compound across tasks. Catch them immediately.