Commit Graph

2 Commits

Author SHA1 Message Date
Andrew Charlwood adc1dbfc58 feat: complete Task 2.2 - test refresh pipeline with Snowflake data
Tested full refresh pipeline end-to-end with real Snowflake data:
- Fixed trust filter to read Name column from defaultTrusts.csv
- Fixed Decimal type handling in calculate_cost_per_patient_per_annum
- Fixed array handling in convert_to_records for average_administered
- Added required reference CSV files to data/ directory
- Configured Snowflake connection (account, warehouse, user)

Results:
- Snowflake fetch: 656,695 records in ~7s
- Transformations: 519,848 records after UPID/drug/directory
- Pathway nodes: 293 for all_6mo (8 trusts, 14 directories)
- Total processing time: ~6.2 minutes
2026-02-05 00:20:12 +00:00
Andrew Charlwood 5945649ae3 feat: add pathway pipeline module (Task 1.2)
Create data_processing/pathway_pipeline.py with:
- DateFilterConfig dataclass for date filter configuration
- DATE_FILTER_CONFIGS with 6 pre-defined combinations
- compute_date_ranges() for computing actual dates from config
- fetch_and_transform_data() for Snowflake fetch + transformations
- process_pathway_for_date_filter() using existing generate_icicle_chart()
- extract_denormalized_fields() to parse trust/directory/drugs from ids
- convert_to_records() for SQLite insertion
- process_all_date_filters() convenience function
2026-02-04 23:21:39 +00:00