docs: update all documentation for Dash migration (Phase 6)

Rewrote README.md, USER_GUIDE.md, and DEPLOYMENT.md to reflect the Dash application. Updated RALPH_PROMPT.md, guardrails.md, and DESIGN_SYSTEM.md to remove Reflex references. All non-archive documentation now reflects the current Dash + DMC architecture.
2026-02-06 14:54:12 +00:00
parent 4cb5641c2d
commit 54b4a0f743
8 changed files with 635 additions and 956 deletions
@@ -1,128 +1,124 @@
-# Ralph Wiggum Loop - Drug-Aware Indication Matching
+# Ralph Wiggum Loop — Dash Application Maintenance

-You are operating inside an automated loop extending a pathway analysis application with drug-aware indication matching. Each iteration you receive fresh context — you have NO memory of previous iterations. Your only memory is the filesystem.
+You are operating inside an automated loop maintaining an NHS patient pathway analysis tool built with Dash (Plotly) + Dash Mantine Components. Each iteration you receive fresh context — you have NO memory of previous iterations. Your only memory is the filesystem.

-**Current Focus**: Update indication charts so that patient indications are matched **per drug**, not just per patient. Each drug must be validated against the patient's GP diagnoses AND the drug-to-indication mapping from DimSearchTerm.csv.
+**Current Focus**: Maintain and enhance the Dash application in `dash_app/`. The backend (`src/`) provides shared data access and visualization functions. The design target is `01_nhs_classic.html`.

 ## First Actions Every Iteration

 Read these files in this order before doing anything else:

-1. `progress.txt` — What previous iterations accomplished, what's blocked, and what to do next. The most recent entry is most important.
-2. `IMPLEMENTATION_PLAN.md` — Task list with status markers, project overview, and completion criteria.
+1. `progress.txt` — What previous iterations accomplished, what's blocked, and what to do next.
+2. `IMPLEMENTATION_PLAN.md` — Task list with status markers, architecture overview, and completion criteria.
 3. `guardrails.md` — Known failure patterns to avoid. You MUST read and follow these.
-4. `CLAUDE.md` — Project architecture and code patterns.
+4. `CLAUDE.md` — Project architecture and backend code patterns.

 Then run `git log --oneline -5` to see recent commits.

+## Reading the Design Reference
+
+**When building ANY UI component**, read `01_nhs_classic.html` first:
+- It contains the exact CSS classes, HTML structure, and visual layout you must replicate
+- CSS lives in the `<style>` block (lines 8-314) — this becomes `dash_app/assets/nhs.css`
+- HTML structure (lines 316-480+) shows the component hierarchy and class usage
+- Match the design as closely as possible — `className` in Dash = `class` in HTML
+
+**When building data loading or chart callbacks**, reference the shared functions in `src/`:
+- `src/data_processing/pathway_queries.py`: `load_initial_data()` and `load_pathway_nodes()` — shared query functions
+- `src/visualization/plotly_generator.py`: `create_icicle_from_nodes()` — icicle chart from list-of-dicts
+- `dash_app/data/queries.py`: Thin wrapper calling shared functions with correct DB path
+- The original logic is archived in `archive/pathways_app/pathways_app.py` for reference.
+
 ## Narration

 Narrate your work as you go. Your output is the only visibility the operator has into what's happening. For every significant action, explain what you're doing and why:

- **Reading files**: "Reading progress.txt to check what the last iteration accomplished..."
- **Creating code**: "Adding assign_drug_indications() function to diagnosis_lookup.py..."
- **Debugging**: "Drug matching returned 0 results for ADALIMUMAB. Checking DimSearchTerm lookup..."
- **Testing**: "Running import check to verify the new function is accessible..."
- **Making decisions**: "The guardrails say to use substring matching for drug fragments."
- **Committing**: "Committing drug-indication matching logic."
+- **Reading files**: "Reading 01_nhs_classic.html to get CSS classes for the header component..."
+- **Creating code**: "Creating dash_app/components/header.py with make_header() function..."
+- **Debugging**: "Import error for dmc.Drawer — checking dash-mantine-components version..."
+- **Testing**: "Running python run_dash.py to verify the app starts..."
+- **Making decisions**: "The guardrails say to use className from nhs.css, not inline styles."
+- **Committing**: "Committing header and sidebar components."

-Do NOT just output a summary at the end. Narrate throughout. Think of this as a live log of your reasoning.
+Do NOT just output a summary at the end. Narrate throughout.

 ## Task Selection

-You have flexibility to choose which task to work on. Use your judgement, but document your reasoning.
-
 1. Read ALL tasks in IMPLEMENTATION_PLAN.md — understand the full picture
 2. Skip any marked `[x]` (complete) or `[B]` (blocked)
 3. Check progress.txt for guidance — the previous iteration may have recommendations
 4. **Choose a task** based on:
-   - Dependencies (some tasks require others to be done first)
-   - Logical flow (query changes before matching logic, matching before pipeline integration)
-   - Your assessment of what would be most valuable to tackle next
-   - Previous iteration's recommendations (consider but don't blindly follow)
-5. **Document your reasoning**: Before starting work, briefly explain WHY you chose this task over others
+   - Dependencies (scaffolding before components, components before callbacks)
+   - Logical flow (Phase 0 → 1 → 2 → 3 → 4 → 5)
+   - Previous iteration's recommendations
+5. **Document your reasoning**: Before starting, explain WHY you chose this task
 6. Mark your chosen task `[~]` (in progress) in IMPLEMENTATION_PLAN.md

-If your chosen task turns out to be blocked during work:
- Mark it `[B]` with a reason in IMPLEMENTATION_PLAN.md
+If your chosen task is blocked:
+- Mark it `[B]` with a reason
 - Document the blocker in progress.txt
- Move to a different ready task within this same iteration
+- Move to a different ready task

 ## Development

 Work on ONE task per iteration. Build incrementally and verify as you go.

-### Key Concepts
+### Key Technologies

-**Drug-Indication Matching Flow:**
-1. Get patient's GP-matched Search_Terms from Snowflake (ALL matches, not just most recent, with code_frequency)
-   - Only count GP codes from MIN(Intervention Date) onwards (the HCD data window)
-2. Load DimSearchTerm.csv to get which drugs belong to which Search_Terms
-3. For each patient-drug pair: intersection of (Search_Terms listing this drug) AND (patient's GP matches)
-   - If multiple matches: pick highest code_frequency (most GP coding = most likely indication)
-4. Modify UPID to include matched indication: `{UPID}|{search_term}`
-5. Drugs sharing the same indication for the same patient → same modified UPID → same pathway
-6. Drugs under different indications → different modified UPIDs → separate pathways
+- **Dash 2.x**: `from dash import Dash, html, dcc, Input, Output, State, callback_context, ALL`
+- **Dash Mantine Components 0.14.x**: `import dash_mantine_components as dmc` — needs `dmc.MantineProvider` wrapping the layout
+- **Plotly**: `import plotly.graph_objects as go` — for the icicle chart
+- **SQLite**: `import sqlite3` — read-only access to `data/pathways.db`
+- **CSS**: All in `dash_app/assets/nhs.css` — auto-served by Dash

-**DimSearchTerm.csv:**
- `Search_Term`: Clinical condition (e.g., "rheumatoid arthritis")
- `CleanedDrugName`: Pipe-separated drug fragments (e.g., "ADALIMUMAB|GOLIMUMAB|...")
- `PrimaryDirectorate`: The directorate for this condition
- Drug matching: check if any fragment is a substring of the HCD drug name (case-insensitive)
+### Dash Component Patterns

-**Modified UPID Format:**
- Original: `RMV12345` (Provider Code[:3] + PersonKey)
- Modified: `RMV12345|rheumatoid arthritis`
- Fallback: `RMV12345|RHEUMATOLOGY (no GP dx)`
- The existing pathway analyzer treats UPID as an opaque identifier — this works transparently
-
-### Code Patterns
-
- **Snowflake queries**: Use parameterized queries, embed the cluster CTE from CLUSTER_MAPPING_SQL
- **GP record matching**: Return ALL matches per patient (not just most recent)
- **Drug mapping**: Load from `data/DimSearchTerm.csv`, match drug name fragments
- **Pathway pipeline**: Use existing functions — modified UPIDs flow through naturally
- **Reflex state**: No changes expected — indication charts already work, just with better matching
-
-### Key Data Structures
-
-**GP Matches (from Snowflake) — updated to return ALL matches with frequency:**
 ```python
-# Multiple rows per patient (one per matched Search_Term)
-# code_frequency = COUNT of matching SNOMED codes (used as tiebreaker)
-# Only counts codes from MIN(Intervention Date) onwards
-DataFrame with: PatientPseudonym, Search_Term, code_frequency
+# HTML elements use dash.html
+from dash import html
+html.Div(className="top-header", children=[...])
+
+# Mantine components for rich UI
+import dash_mantine_components as dmc
+dmc.Drawer(id="drug-drawer", position="right", size="480px", children=[...])
+dmc.Accordion(children=[dmc.AccordionItem(...)])
+
+# State management
+dcc.Store(id="app-state", storage_type="session", data={})
+
+# Callbacks
+@app.callback(
+    Output("chart-data", "data"),
+    Input("app-state", "data"),
+)
+def load_pathway_data(app_state):
+    ...
 ```

-**Drug-to-Indication Mapping (from DimSearchTerm.csv):**
-```python
-# search_term → list of drug fragments
-{"rheumatoid arthritis": ["ABATACEPT", "ADALIMUMAB", "ANAKINRA", ...]}
-```
+### Database Access Pattern

-**Modified HCD Data:**
 ```python
-# Original UPID replaced with indication-aware UPID
-df["UPID"] = "RMV12345|rheumatoid arthritis"  # for matched drugs
-df["UPID"] = "RMV12345|RHEUMATOLOGY (no GP dx)"  # for unmatched drugs
-```
+from pathlib import Path
+import sqlite3

-**Indication DataFrame:**
-```python
-# Maps modified UPID → Search_Term (for pathway hierarchy level 2)
-indication_df = pd.DataFrame({
-    'Directory': ['rheumatoid arthritis', 'asthma', 'CARDIOLOGY (no GP dx)']
-}, index=['RMV12345|rheumatoid arthritis', 'RMV12345|asthma', 'RMV67890|CARDIOLOGY (no GP dx)'])
+DB_PATH = Path(__file__).resolve().parents[2] / "data" / "pathways.db"
+
+def load_pathway_data(filter_id, chart_type, selected_drugs=None, selected_directorates=None):
+    conn = sqlite3.connect(str(DB_PATH))
+    conn.row_factory = sqlite3.Row
+    # ... query with parameterized WHERE ...
+    conn.close()
+    return result_dict
 ```

 ### Verification Steps

 After writing code, ALWAYS verify:

-1. **Syntax check**: `python -m py_compile <file.py>`
-2. **Import check**: `python -c "from module import function"`
-3. **For database changes**: Test with query against pathways.db
-4. **For Reflex changes**: `python -m reflex compile`
+1. **Import check**: `python -c "from dash_app.app import app"` (or specific module)
+2. **App starts**: `python run_dash.py` — must start without errors
+3. **Visual check** (when building UI): describe what you expect to see at localhost:8050
+4. **For callbacks**: verify the callback chain fires correctly (add temporary `print()` statements if needed)

 If any step fails, fix the issue before proceeding.

@@ -133,24 +129,23 @@ Every task MUST pass validation before being marked complete:
 ### Tier 1: Code Validation (MANDATORY)
 - Code compiles without Python syntax errors
 - Imports work without errors
- No TypeErrors, ImportErrors, or AttributeErrors
+- `python run_dash.py` starts without exceptions

-### Tier 2: Data Validation (for data/pipeline tasks)
- Queries return expected row counts
- Data structures have correct columns/types
- Drug-indication matching produces valid results
- Modified UPIDs have correct format
+### Tier 2: Layout Validation (for UI component tasks)
+- Component renders in the browser
+- CSS classes match 01_nhs_classic.html
+- Layout structure matches the HTML concept

-### Tier 3: Functional Validation (for UI/integration tasks)
- Reflex compiles the app without errors
- State changes trigger expected behavior
- Both chart types render correctly
+### Tier 3: Functional Validation (for callback tasks)
+- Callbacks fire when inputs change
+- Data flows correctly through dcc.Store chain
+- Chart renders with real data from SQLite

 ### Validation Failure

 If any tier fails:
 - DO NOT mark the task complete
- Document the failure details in progress.txt
+- Document the failure in progress.txt
 - Fix the issue within this iteration if possible
 - If you cannot fix it, mark the task `[B]` with details

@@ -159,34 +154,33 @@ If any tier fails:
 Before marking ANY task `[x]`, ALL of these must be true:

 1. Code is saved to the appropriate file(s)
-2. Tier 1 code validation passed
+2. Tier 1 validation passed (imports + app starts)
 3. Tier 2/3 validation passed (as applicable)
 4. All changes committed to git with a descriptive message

-These are non-negotiable. A task that "feels done" but hasn't passed all gates is NOT done.
+These are non-negotiable.

 ## Update Progress

-After completing your work (whether the task succeeded, failed, or was blocked), append to progress.txt using this format:
+After completing your work, append to progress.txt using this format:

 ```
 ## Iteration [N] — [YYYY-MM-DD]
 ### Task: [which task you worked on]
 ### Why this task:
 - [Brief explanation of why you chose this task over others]
- [What dependencies or logical flow led to this choice]
 ### Status: COMPLETE | BLOCKED | IN PROGRESS
 ### What was done:
 - [Specific actions taken]
 ### Validation results:
- Tier 1 (Code): [syntax check, import check]
- Tier 2 (Data): [query results, row counts]
- Tier 3 (Functional): [reflex compile, UI check]
+- Tier 1 (Code): [import check, app starts]
+- Tier 2 (Layout): [renders correctly, CSS matches]
+- Tier 3 (Functional): [callbacks fire, data flows]
 ### Files changed:
 - [list of files created/modified]
 ### Committed: [git hash] "[commit message]"
 ### Patterns discovered:
- [Any reusable learnings — query patterns, matching logic quirks]
+- [Any reusable learnings — Dash patterns, DMC quirks, CSS gotchas]
 ### Next iteration should:
 - [Explicit guidance for what the next fresh instance should do first]
 - [Note any context that would be lost without writing it here]
@@ -194,20 +188,20 @@ After completing your work (whether the task succeeded, failed, or was blocked),
 - [Any tasks that are blocked and why]
 ```

-If you discover a failure pattern that future iterations should avoid, add it to `guardrails.md`.
+If you discover a failure pattern, add it to `guardrails.md`.

 ## Commit Changes

 1. Stage changed files
-2. Use a descriptive commit message referencing the task (e.g., "feat: add drug-indication matching function (Task 2.1)")
-3. Commit after your task is validated and complete — one commit per logical unit of work
+2. Use a descriptive commit message referencing the task (e.g., "feat: create dash_app skeleton with nhs.css (Task 0.1 + 0.2)")
+3. Commit after your task is validated and complete
 4. If you updated progress.txt with a blocked status, commit that too

 ## Completion Check

 If ALL tasks in IMPLEMENTATION_PLAN.md are marked `[x]`:

-1. Run `reflex compile` to verify app compiles
+1. Run `python run_dash.py` to verify app starts cleanly
 2. Verify all completion criteria at the bottom of IMPLEMENTATION_PLAN.md are satisfied
 3. Only then output the completion signal on its own line:

@@ -217,20 +211,19 @@ If ALL tasks in IMPLEMENTATION_PLAN.md are marked `[x]`:

 DO NOT output this string under any other circumstances.
 DO NOT output it if any task is still `[ ]` or `[B]` or `[~]`.
-DO NOT paraphrase, vary, or conditionally output this string.

 ## Rules

 - Complete ONE task per iteration, then update progress and stop
 - ALWAYS read progress.txt, guardrails.md before starting work
- **Match drugs to indications** — not just patients to indications
- **Use DimSearchTerm.csv** for drug-to-Search_Term mapping
- **Return ALL GP matches** — not just most recent (remove QUALIFY ROW_NUMBER = 1)
- **Modified UPID format**: `{UPID}|{search_term}` — pipe delimiter is safe
- **Use PseudoNHSNoLinked** — NOT PersonKey for GP record matching
- **Substring matching** for drug fragments from DimSearchTerm.csv
+- **Read 01_nhs_classic.html** when building ANY visual component
+- **Read src/data_processing/pathway_queries.py and src/visualization/plotly_generator.py** when building data logic or chart callbacks
+- **DO NOT modify pipeline/analysis logic** in src/ (pathway_pipeline, transforms, diagnosis_lookup, pathway_analyzer, refresh_pathways)
+- **DO add shared utilities** to src/ (visualization/plotly_generator.py, data_processing/database.py) rather than duplicating logic in dash_app/
+- **Use className from nhs.css** — not inline styles
+- **dcc.Store for state** — no server-side globals
+- **Unidirectional callbacks** — app-state → chart-data → UI
+- **Port icicle_figure exactly** — same customdata, colorscale, templates
 - Keep commits atomic and well-described
- If stuck on the same issue for more than 2 attempts within one iteration, document it in progress.txt and move to the next ready task
- When in doubt, check existing code for patterns that work
- **Pipeline before UI** — processing logic before Reflex changes
- **Don't change directory charts** — only indication chart matching changes
+- If stuck for 2+ attempts, document in progress.txt and move on
+- `python run_dash.py` must work after every task