fix: resolve Snowflake column casing and UPID mapping issues (Task 3.1)
Three issues identified and fixed during Task 3.1 testing: 1. Snowflake column name casing: - Unquoted columns in Snowflake are returned as UPPERCASE - Fixed by aliasing columns with quoted names: AS "Search_Term" - Now correctly populates 139 unique Search_Terms (was 0) 2. Duplicate UPID index error: - indication_df_for_chart could have duplicate UPIDs - Added drop_duplicates(subset=['UPID']) before set_index() - Keeps first occurrence (DIAGNOSIS over FALLBACK) 3. Missing UPIDs in indication lookup: - Old code: built indication_df from unique PseudoNHSNoLinked only - Problem: patients with multiple UPIDs (multi-provider) were missing - Fixed: now builds indication_df from ALL unique UPIDs in df - Also handles NaN values in Directory column safely Validation results from test run: - 36,628 patients queried - 34,006 (92.8%) had GP diagnosis matches - 139 unique Search_Terms found - Top 5: drug misuse (8602), influenza (6239), diabetes (2476) Still to verify: full pathway processing after these fixes.
This commit is contained in:
@@ -1348,12 +1348,13 @@ def get_patient_indication_groups(
|
||||
|
||||
# Build the full query with cluster CTE
|
||||
# This finds the most recent matching diagnosis for each patient
|
||||
# Note: Column names must be aliased to ensure consistent casing in results
|
||||
query = f"""
|
||||
{CLUSTER_MAPPING_SQL}
|
||||
SELECT
|
||||
pc."PatientPseudonym",
|
||||
aic.Search_Term,
|
||||
pc."EventDateTime"
|
||||
pc."PatientPseudonym" AS "PatientPseudonym",
|
||||
aic.Search_Term AS "Search_Term",
|
||||
pc."EventDateTime" AS "EventDateTime"
|
||||
FROM DATA_HUB.PHM."PrimaryCareClinicalCoding" pc
|
||||
INNER JOIN AllIndicationCodes aic
|
||||
ON pc."SNOMEDCode" = aic.SNOMEDCode
|
||||
|
||||
Reference in New Issue
Block a user