fix: increase network timeout and batch size for GP lookup queries (Task 3.2)

Dry run test revealed GP lookup queries timing out at 30s (connection_timeout in snowflake.toml). Increased to 600s. Also increased batch_size from 500 to 5000 — query time is ~40s regardless of batch size (CTE compilation overhead), so larger batches reduce total time from ~50min to ~6min for 36K patients. Dry run results: 91.8% GP match rate, 49.3% drug-indication match rate, 42,072 modified UPIDs, 1,846 pathway nodes across 6 date filters.
2026-02-05 23:55:12 +00:00
parent 73088b063b
commit c6e426e36c
7 changed files with 197 additions and 207 deletions
@@ -36,8 +36,9 @@ user = "ANDREW.CHARLWOOD@NHS.NET"
 role = ""

 [timeouts]
-# Connection timeout in seconds
-connection_timeout = 30
+# Network timeout in seconds (how long client waits for Snowflake response)
+# Must be high enough for GP record lookups which can take 30-60s per batch
+connection_timeout = 600

 # Query execution timeout in seconds (for long-running queries)
 # Set to 0 for no timeout