Files
HighCostDrugsDemo/data_processing/CLAUDE.md
T

1.8 KiB
Raw Blame History

data_processing Package

Data layer for NHS High-Cost Drug Patient Pathway Analysis Tool.

Core Responsibilities

Data Pipeline: Snowflake → Transforms → Pathway Generation → SQLite

Key Modules

transforms.py — Core data transformations (moved from tools/data.py):

  • patient_id() — Creates UPID = Provider Code (first 3 chars) + PersonKey
  • drug_names() — Standardizes drug names via drugnames.csv lookup
  • department_identification() — 5-level fallback chain for directory assignment

pathway_pipeline.py — Pipeline orchestration:

  • Processes 6 date filter combinations × 2 chart types (directory + indication)
  • fetch_and_transform_data() — Snowflake fetch + UPID/drug/directory transforms
  • process_pathway_for_date_filter() — Directory charts using generate_icicle_chart()
  • process_indication_pathway_for_date_filter() — Indication charts using generate_icicle_chart_indication()
  • insert_pathway_records() — SQLite insertion with parameterized queries

diagnosis_lookup.py — GP diagnosis matching:

  • get_patient_indication_groups() — Batch queries Snowflake (500 patients at a time)
  • Embeds ~148 Search_Term → Cluster_ID mappings as SQL CTE
  • Returns most recent match per patient via QUALIFY ROW_NUMBER()

database.py — SQLite connection pooling and transaction management

schema.py — SQL schema definitions (reference tables + pathway_nodes)

snowflake_connector.py — Snowflake SSO integration via externalbrowser authenticator

cache.py — Query result caching with TTL-based invalidation

Import Pattern

All imports use package names directly:

from data_processing.transforms import patient_id, drug_names, department_identification
from data_processing.pathway_pipeline import process_all_date_filters