prepare_data() mapped Provider Code → Name in-place. When called for directory
charts first, then indication charts, the second call re-mapped already-mapped
values to NaN, silently dropping all data. Added df.copy() to prevent mutation.
Also fixes directory charts only generating data for the first date filter.
Results: 3,633 pathway nodes now generated (1,101 directory + 2,532 indication)
across all 12 datasets (6 date filters × 2 chart types).
- Add generate_icicle_chart_indication() to pathway_analyzer.py
- Variant that uses indication_df instead of directory_df
- Groups by Trust → Search_Term → Drug → Pathway
- Accepts indication_df mapping UPID → Indication_Group
- Add process_indication_pathway_for_date_filter() to pathway_pipeline.py
- Processes indication-based pathway for a single date filter
- Uses generate_icicle_chart_indication() for hierarchy building
- Add extract_indication_fields() to pathway_pipeline.py
- Extracts trust_name, search_term, drug_sequence from ids column
- Similar to extract_denormalized_fields() but for indication charts
- Update convert_to_records() with chart_type parameter
- Includes chart_type column in output records
- Supports "directory" and "indication" values
- Add ChartType type alias (Literal["directory", "indication"])
- Update __all__ exports with new functions
Tested full refresh pipeline end-to-end with real Snowflake data:
- Fixed trust filter to read Name column from defaultTrusts.csv
- Fixed Decimal type handling in calculate_cost_per_patient_per_annum
- Fixed array handling in convert_to_records for average_administered
- Added required reference CSV files to data/ directory
- Configured Snowflake connection (account, warehouse, user)
Results:
- Snowflake fetch: 656,695 records in ~7s
- Transformations: 519,848 records after UPID/drug/directory
- Pathway nodes: 293 for all_6mo (8 trusts, 14 directories)
- Total processing time: ~6.2 minutes