HighCostDrugsDemo

Author	SHA1	Message	Date
Andrew Charlwood	22222fe9ca	fix: resolve Snowflake column casing and UPID mapping issues (Task 3.1) Three issues identified and fixed during Task 3.1 testing: 1. Snowflake column name casing: - Unquoted columns in Snowflake are returned as UPPERCASE - Fixed by aliasing columns with quoted names: AS "Search_Term" - Now correctly populates 139 unique Search_Terms (was 0) 2. Duplicate UPID index error: - indication_df_for_chart could have duplicate UPIDs - Added drop_duplicates(subset=['UPID']) before set_index() - Keeps first occurrence (DIAGNOSIS over FALLBACK) 3. Missing UPIDs in indication lookup: - Old code: built indication_df from unique PseudoNHSNoLinked only - Problem: patients with multiple UPIDs (multi-provider) were missing - Fixed: now builds indication_df from ALL unique UPIDs in df - Also handles NaN values in Directory column safely Validation results from test run: - 36,628 patients queried - 34,006 (92.8%) had GP diagnosis matches - 139 unique Search_Terms found - Top 5: drug misuse (8602), influenza (6239), diabetes (2476) Still to verify: full pathway processing after these fixes.	2026-02-05 18:30:23 +00:00
Andrew Charlwood	ad10b374cb	feat: integrate Snowflake-direct indication lookup into CLI refresh (Task 1.2, 2.3) Replace batch_lookup_indication_groups() with get_patient_indication_groups() for indication chart processing. The new approach: - Extracts unique PseudoNHSNoLinked values from HCD data - Queries Snowflake directly using the cluster CTE - Builds indication_df mapping UPID → Search_Term (matched) or Directory (fallback) - Logs coverage statistics (diagnosis % vs fallback %) This completes the integration of the new Snowflake-direct GP lookup approach.	2026-02-05 17:06:34 +00:00
Andrew Charlwood	8952156798	feat: integrate batch GP diagnosis lookup for indication charts (Task 3.2) - Add batch_lookup_indication_groups() to diagnosis_lookup.py - Efficient batch Snowflake queries (500 patients per batch) - Returns UPID → Indication_Group mapping - Source tracking: DIAGNOSIS vs FALLBACK - Update cli/refresh_pathways.py indication processing - Call batch_lookup_indication_groups() before chart generation - Build indication_df for process_indication_pathway_for_date_filter() - Log diagnosis coverage statistics - Enables full --chart-type all functionality	2026-02-05 14:45:06 +00:00
Andrew Charlwood	593d14c70f	feat: add chart_type argument to refresh command (Task 3.1) - Add --chart-type argument with choices: directory, indication, all - Update insert_pathway_records to include chart_type column - Update refresh_pathways to process multiple chart types - Update logging to show chart type counts - Indication chart processing deferred to Task 3.2 (GP diagnosis integration)	2026-02-05 14:38:57 +00:00
Andrew Charlwood	adc1dbfc58	feat: complete Task 2.2 - test refresh pipeline with Snowflake data Tested full refresh pipeline end-to-end with real Snowflake data: - Fixed trust filter to read Name column from defaultTrusts.csv - Fixed Decimal type handling in calculate_cost_per_patient_per_annum - Fixed array handling in convert_to_records for average_administered - Added required reference CSV files to data/ directory - Configured Snowflake connection (account, warehouse, user) Results: - Snowflake fetch: 656,695 records in ~7s - Transformations: 519,848 records after UPID/drug/directory - Pathway nodes: 293 for all_6mo (8 trusts, 14 directories) - Total processing time: ~6.2 minutes	2026-02-05 00:20:12 +00:00
Andrew Charlwood	092fdbba5a	feat: add CLI refresh command for pathway data (Task 2.1) Add cli/refresh_pathways.py with: - refresh_pathways() main function for full pipeline orchestration - insert_pathway_records() for SQLite insertion - log_refresh_start/complete/failed() for refresh tracking - CLI with --minimum-patients, --provider-codes, --dry-run, --verbose Uses existing pipeline functions: - fetch_and_transform_data() from pathway_pipeline.py - process_all_date_filters() for 6 date filter combinations - Schema helpers from data_processing/schema.py	2026-02-04 23:30:11 +00:00

6 Commits