- Remove old iteration logs and deprecated files from archive/can_delete/
- Update RALPH_PROMPT.md and guardrails.md for Phase 10+ work
- Update ralph.ps1 banner text
- Add AdditionalAnalytics.md chart specification
- Add run.bat convenience script
Rewrote README.md, USER_GUIDE.md, and DEPLOYMENT.md to reflect
the Dash application. Updated RALPH_PROMPT.md, guardrails.md, and
DESIGN_SYSTEM.md to remove Reflex references. All non-archive
documentation now reflects the current Dash + DMC architecture.
Dry run test revealed GP lookup queries timing out at 30s (connection_timeout
in snowflake.toml). Increased to 600s. Also increased batch_size from 500 to
5000 — query time is ~40s regardless of batch size (CTE compilation overhead),
so larger batches reduce total time from ~50min to ~6min for 36K patients.
Dry run results: 91.8% GP match rate, 49.3% drug-indication match rate,
42,072 modified UPIDs, 1,846 pathway nodes across 6 date filters.
The UNIQUE constraint was UNIQUE(date_filter_id, ids) instead of
UNIQUE(date_filter_id, chart_type, ids), causing INSERT OR REPLACE
to overwrite directory chart root/trust nodes when indication nodes
were inserted. Dropped and recreated the table, re-ran full refresh.
Validation: both chart types have all hierarchy levels (0-5),
all 12 date filters produce valid icicle charts, KPIs correct.
prepare_data() mapped Provider Code → Name in-place. When called for directory
charts first, then indication charts, the second call re-mapped already-mapped
values to NaN, silently dropping all data. Added df.copy() to prevent mutation.
Also fixes directory charts only generating data for the first date filter.
Results: 3,633 pathway nodes now generated (1,101 directory + 2,532 indication)
across all 12 datasets (6 date filters × 2 chart types).