Commit Graph

6 Commits

Author SHA1 Message Date
Andrew Charlwood c6e426e36c fix: increase network timeout and batch size for GP lookup queries (Task 3.2)
Dry run test revealed GP lookup queries timing out at 30s (connection_timeout
in snowflake.toml). Increased to 600s. Also increased batch_size from 500 to
5000 — query time is ~40s regardless of batch size (CTE compilation overhead),
so larger batches reduce total time from ~50min to ~6min for 36K patients.

Dry run results: 91.8% GP match rate, 49.3% drug-indication match rate,
42,072 modified UPIDs, 1,846 pathway nodes across 6 date filters.
2026-02-05 23:55:12 +00:00
Andrew Charlwood 4884e0a8cc fix: recreate pathway_nodes with correct UNIQUE constraint and validate end-to-end (Task 5.1)
The UNIQUE constraint was UNIQUE(date_filter_id, ids) instead of
UNIQUE(date_filter_id, chart_type, ids), causing INSERT OR REPLACE
to overwrite directory chart root/trust nodes when indication nodes
were inserted. Dropped and recreated the table, re-ran full refresh.

Validation: both chart types have all hierarchy levels (0-5),
all 12 date filters produce valid icicle charts, KPIs correct.
2026-02-05 20:43:01 +00:00
Andrew Charlwood 6331d44165 fix: prevent DataFrame mutation in prepare_data() causing indication charts to fail
prepare_data() mapped Provider Code → Name in-place. When called for directory
charts first, then indication charts, the second call re-mapped already-mapped
values to NaN, silently dropping all data. Added df.copy() to prevent mutation.

Also fixes directory charts only generating data for the first date filter.

Results: 3,633 pathway nodes now generated (1,101 directory + 2,532 indication)
across all 12 datasets (6 date filters × 2 chart types).
2026-02-05 20:10:12 +00:00
Andrew Charlwood 0b5b462766 docs: update progress.txt with iteration 3, add new guardrails (Task 3.1) 2026-02-05 18:31:29 +00:00
Andrew Charlwood 99bab08402 docs: add guardrails for patient identifier and SNOMED code handling 2026-02-05 15:51:52 +00:00
Andrew Charlwood fdd33a67af Initial commit before Ralph loop 2026-02-04 13:04:29 +00:00