feat: complete Task 2.2 - test refresh pipeline with Snowflake data
Tested full refresh pipeline end-to-end with real Snowflake data: - Fixed trust filter to read Name column from defaultTrusts.csv - Fixed Decimal type handling in calculate_cost_per_patient_per_annum - Fixed array handling in convert_to_records for average_administered - Added required reference CSV files to data/ directory - Configured Snowflake connection (account, warehouse, user) Results: - Snowflake fetch: 656,695 records in ~7s - Transformations: 519,848 records after UPID/drug/directory - Pathway nodes: 293 for all_6mo (8 trusts, 14 directories) - Total processing time: ~6.2 minutes
This commit is contained in:
@@ -153,7 +153,7 @@ def calculate_cost_per_patient_per_annum(
|
||||
patients with different treatment durations.
|
||||
|
||||
Args:
|
||||
total_cost: Total cost for the patient
|
||||
total_cost: Total cost for the patient (can be Decimal or float)
|
||||
days_treated: Treatment duration as timedelta
|
||||
|
||||
Returns:
|
||||
@@ -171,7 +171,8 @@ def calculate_cost_per_patient_per_annum(
|
||||
if days <= 0:
|
||||
return None
|
||||
|
||||
return total_cost / (days / 365)
|
||||
# Convert total_cost to float to handle Decimal from Snowflake
|
||||
return float(total_cost) / (days / 365)
|
||||
|
||||
|
||||
def calculate_treatment_duration(
|
||||
|
||||
Reference in New Issue
Block a user