fix: increase network timeout and batch size for GP lookup queries (Task 3.2)
Dry run test revealed GP lookup queries timing out at 30s (connection_timeout in snowflake.toml). Increased to 600s. Also increased batch_size from 500 to 5000 — query time is ~40s regardless of batch size (CTE compilation overhead), so larger batches reduce total time from ~50min to ~6min for 36K patients. Dry run results: 91.8% GP match rate, 49.3% drug-indication match rate, 42,072 modified UPIDs, 1,846 pathway nodes across 6 date filters.
This commit is contained in:
+1
-1
@@ -182,7 +182,7 @@ def load_snowflake_config(config_path: Optional[Path] = None) -> SnowflakeConfig
|
||||
# Parse timeout settings
|
||||
timeout_data = data.get("timeouts", {})
|
||||
timeouts = TimeoutConfig(
|
||||
connection_timeout=timeout_data.get("connection_timeout", 30),
|
||||
connection_timeout=timeout_data.get("connection_timeout", 600),
|
||||
query_timeout=timeout_data.get("query_timeout", 300),
|
||||
login_timeout=timeout_data.get("login_timeout", 120),
|
||||
)
|
||||
|
||||
Reference in New Issue
Block a user