Files

T

Andrew Charlwood 547bc7c867 docs: complete Phase 9 final integration (Task 9.10)

All 8 chart tabs verified — queries, figures, and filter dispatch
tested in both directory and indication modes. CLAUDE.md updated
with new chart types, query functions, and parsing utilities.
Phase 9 completion criteria all satisfied.

2026-02-06 20:22:21 +00:00

33 KiB

Raw Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

NHS High-Cost Drug Patient Pathway Analysis Tool - a web-based application that analyzes secondary care patient treatment pathways. It processes clinical activity data to visualize hierarchical treatment patterns as interactive Plotly icicle charts.

Key Features:

Dual chart types: Directory-based (Trust → Directory → Drug → Pathway) and Indication-based (Trust → GP Diagnosis → Drug → Pathway) views with toggle
Pre-computed pathway architecture: Treatment pathways pre-processed and stored in SQLite for instant filtering
GP diagnosis matching: Patient indications matched from GP records using SNOMED cluster codes queried directly from Snowflake (~93% match rate)
Data pipeline: Snowflake → pre-computed SQLite pathway nodes (CSV/Parquet file loading retained for legacy compatibility)
Interactive browser-based UI using Dash (Plotly) + Dash Mantine Components
6 pre-defined date filter combinations × 2 chart types = 12 pre-computed datasets with sub-50ms response times

Running the Application

# Install dependencies
uv sync

# One-time dev setup: adds src/ to Python path via .pth file
uv run python setup_dev.py

# Initialize/migrate the database (creates pathway tables)
python -m data_processing.migrate

# Refresh pathway data from Snowflake (requires SSO auth)
python -m cli.refresh_pathways

# Run the Dash web application
python run_dash.py

The application requires Python 3.10+ and runs on http://localhost:8050 by default.

CLI Commands

Refresh Pathway Data:

# Full refresh — both chart types (directory + indication), all date filters
python -m cli.refresh_pathways --chart-type all

# Directory charts only (faster, skips GP diagnosis lookup)
python -m cli.refresh_pathways --chart-type directory

# Indication charts only
python -m cli.refresh_pathways --chart-type indication

# Dry run (test without database changes)
python -m cli.refresh_pathways --chart-type all --dry-run -v

# Custom minimum patient threshold
python -m cli.refresh_pathways --minimum-patients 10

# Help
python -m cli.refresh_pathways --help

The --chart-type argument controls which pathway types are processed:

all (default) — generates both directory and indication charts (~15 minutes)
directory — directory-based charts only (~5 minutes)
indication — indication-based charts only (~12 minutes, includes GP lookup)

The refresh command:

Fetches activity data from Snowflake (656K+ records, ~7 seconds)
Applies UPID, drug name, and directory transformations (~6 minutes)
For indication charts: queries GP records via SNOMED clusters (~9 minutes for 37K patients)
Processes 6 date filter combinations × selected chart types
Inserts pathway nodes to SQLite for fast Dash filtering

Architecture

Package Structure

.
├── src/                         # All application library code
│   ├── core/                    # Foundation: paths, models, logging
│   │   ├── config.py           # PathConfig dataclass for file paths
│   │   ├── models.py           # AnalysisFilters dataclass
│   │   └── logging_config.py   # Structured logging setup
│   │
│   ├── config/                  # Service configuration
│   │   ├── __init__.py         # SnowflakeConfig + loader
│   │   └── snowflake.toml      # Connection settings (co-located with loader)
│   │
│   ├── data_processing/         # Data layer
│   │   ├── database.py         # SQLite connection management
│   │   ├── schema.py           # Database schema (reference + pathway tables)
│   │   ├── pathway_pipeline.py # Pipeline: Snowflake → SQLite
│   │   ├── transforms.py       # Data transformations (UPID, drug names, directory)
│   │   ├── loader.py           # FileDataLoader for CSV/Parquet files
│   │   ├── reference_data.py   # Reference data migration
│   │   ├── snowflake_connector.py  # Snowflake integration
│   │   ├── cache.py            # Query result caching
│   │   ├── data_source.py      # Data source fallback chain
│   │   ├── diagnosis_lookup.py # GP diagnosis lookup (SNOMED clusters)
│   │   └── parsing.py          # Parse average_spacing HTML, pathway drugs, retention rates
│   │
│   ├── analysis/                # Analysis pipeline
│   │   ├── pathway_analyzer.py # prepare_data, calculate_statistics, build_hierarchy
│   │   └── statistics.py       # Statistical calculation functions
│   │
│   ├── visualization/           # Chart generation
│   │   └── plotly_generator.py # Icicle, market share, cost effectiveness, waterfall, Sankey, dosing, heatmap, duration figures
│   │
│   └── cli/                     # CLI tools
│       └── refresh_pathways.py # Data refresh command
│
├── dash_app/                    # Dash web application
│   ├── app.py                  # Dash app, layout root, dcc.Store, register_callbacks
│   ├── assets/
│   │   └── nhs.css             # NHS design system CSS (from 01_nhs_classic.html)
│   ├── data/
│   │   ├── queries.py          # Thin wrapper calling src/data_processing/pathway_queries.py
│   │   └── card_browser.py     # DimSearchTerm.csv → directorate tree for drawer
│   ├── components/
│   │   ├── header.py           # Top header bar with data freshness indicator
│   │   ├── sidebar.py          # Left navigation with drawer triggers
│   │   ├── kpi_row.py          # 4 KPI cards (patients, drugs, cost, match rate)
│   │   ├── filter_bar.py       # Chart type toggle pills + date filter dropdowns
│   │   ├── chart_card.py       # Chart area with tabs + dcc.Graph + loading spinner
│   │   ├── drawer.py           # dmc.Drawer with drug/trust chips + directorate cards
│   │   └── footer.py           # Page footer
│   ├── callbacks/
│   │   ├── __init__.py         # register_callbacks(app)
│   │   ├── filters.py          # Reference data loading + filter state management
│   │   ├── chart.py            # Tab switching, pathway data loading, 8-chart dispatch
│   │   ├── drawer.py           # Drawer open/close + drug/trust selection
│   │   └── kpi.py              # KPI card value updates
│   └── utils/
│       └── __init__.py
│
├── run_dash.py                  # Entry point: python run_dash.py
├── tests/                       # Test suite (113 tests)
├── data/                        # Reference data + SQLite DB
├── docs/                        # Documentation
├── assets/                      # Static assets (logo, favicon)
├── archive/                     # Historical/deprecated (includes old Reflex app)
└── logs/                        # Runtime logs

Path resolution: src/ is added to sys.path via a .pth file (created by setup_dev.py). All imports use package names directly: from core import ..., from data_processing import ..., etc.

Pathway Data Architecture

The application uses a pre-computed pathway architecture for performance:

Architecture: Snowflake → Pathway Processing → SQLite (pre-computed) → Dash (filter & view)

Key Benefits:

Performance: Pathway calculation done once during data refresh, not on every filter change
Simplicity: Dash callbacks filter pre-computed data with simple SQL WHERE clauses
Full Pathways: Sequential treatment pathways (drug_0 → drug_1 → drug_2...) with statistics

Chart Types:

Type	Hierarchy	Level 2 Source
`directory`	Trust → Directory → Drug → Pathway	Assigned directorate (5-level fallback)
`indication`	Trust → GP Diagnosis → Drug → Pathway	SNOMED cluster Search_Term from GP records

For indication charts, ~93% of patients are matched to a GP diagnosis (Search_Term). Unmatched patients use their directorate as a fallback label (e.g., "RHEUMATOLOGY (no GP dx)").

Date Filter Combinations:

ID	Initiated	Last Seen	Default
`all_6mo`	All years	Last 6 months	Yes
`all_12mo`	All years	Last 12 months	No
`1yr_6mo`	Last 1 year	Last 6 months	No
`1yr_12mo`	Last 1 year	Last 12 months	No
`2yr_6mo`	Last 2 years	Last 6 months	No
`2yr_12mo`	Last 2 years	Last 12 months	No

Total pre-computed datasets: 6 date filters × 2 chart types = 12 datasets (~3,600 pathway nodes).

Pathway Node Structure: Each node in pathway_nodes contains:

Routing: chart_type ("directory" or "indication"), date_filter_id
Hierarchy: parents, ids, labels, level (0=Root, 1=Trust, 2=Directory/Indication, 3=Drug, 4+=Pathway)
Counts: value (patient count)
Costs: cost, costpp, cost_pp_pa (per patient per annum)
Dates: first_seen, last_seen, first_seen_parent, last_seen_parent
Statistics: average_spacing, average_administered, avg_days
Denormalized: trust_name, directory, drug_sequence (for efficient filtering)
Unique constraint: UNIQUE(date_filter_id, chart_type, ids)

Core Module (`core/`)

PathConfig - Dataclass encapsulating all file paths, with validate() method
AnalysisFilters - Dataclass for filter state (dates, drugs, trusts, directories)
logging_config - Structured logging with file and console output

CLI Module (`cli/`)

refresh_pathways.py - Command-line tool to refresh pre-computed pathway data:
- refresh_pathways() - Main function orchestrating the full pipeline
- insert_pathway_records() - SQLite insertion with parameterized queries
- log_refresh_start/complete/failed() - Refresh tracking in pathway_refresh_log
- get_default_filters() - Load trusts/drugs/directories from CSV files

Data Processing Module (`data_processing/`)

Database Management:

DatabaseManager - SQLite connection pooling and transaction management
Reference Tables: ref_drug_names, ref_organizations, ref_directories, ref_drug_directory_map, ref_drug_indication_clusters
Pathway Tables: pathway_date_filters, pathway_nodes, pathway_refresh_log

Pathway Pipeline (pathway_pipeline.py):

DateFilterConfig - Dataclass for date filter configuration
DATE_FILTER_CONFIGS - All 6 pre-defined date combinations
compute_date_ranges(config, max_date) - Computes actual ISO dates from config
fetch_and_transform_data() - Snowflake fetch + UPID/drug/directory transformations
Directory chart functions:
- process_pathway_for_date_filter() - Processes single date filter using generate_icicle_chart()
- extract_denormalized_fields() - Parses ids column to extract trust, directory, drug_sequence
Indication chart functions:
- process_indication_pathway_for_date_filter() - Processes single date filter using generate_icicle_chart_indication()
- extract_indication_fields() - Parses ids for indication charts (trust, search_term, drug_sequence)
Shared functions:
- convert_to_records(ice_df, chart_type) - Converts ice_df to list of dicts with chart_type column
- process_all_date_filters() - Convenience function to process all 6 filters

Data Loaders:

FileDataLoader - Loads from CSV/Parquet files (used by legacy pipeline, not by Dash app)
Factory function get_loader() creates a FileDataLoader

Snowflake Integration:

SSO authentication via externalbrowser authenticator
fetch_activity_data(start_date, end_date, provider_codes) method
Query caching with TTL-based invalidation

GP Diagnosis Lookup (diagnosis_lookup.py):

CLUSTER_MAPPING_SQL - Embedded SQL constant with ~148 Search_Term → Cluster_ID mappings plus explicit SNOMED codes
get_patient_indication_groups(patient_pseudonyms) - Batch queries Snowflake to match patients to GP diagnoses:
- Embeds cluster mapping as CTE, joins with PrimaryCareClinicalCoding
- Uses PseudoNHSNoLinked (not PersonKey) to match PatientPseudonym in GP records
- Returns most recent match per patient via QUALIFY ROW_NUMBER()
- Batches 500 patients per query, returns DataFrame with PatientPseudonym, Search_Term, EventDateTime
patient_has_indication(patient_pseudonym, cluster_ids) - Single-patient GP record check (legacy)
validate_indication(patient_pseudonym, drug_name) - Full validation result with source tracking (legacy)

Analysis Module (`analysis/`)

Refactored from the original 267-line generate_graph() function:

prepare_data() - Filter DataFrame by date range, trusts, drugs, directories (copies df to prevent mutation)
calculate_statistics() - Compute frequency, cost, duration statistics
build_hierarchy() - Create Trust → Directory → Drug → Pathway structure
prepare_chart_data() - Format data for Plotly icicle chart
generate_icicle_chart_indication(df, indication_df, ...) - Build indication-based hierarchy using Search_Term instead of Directory. Takes an indication_df (UPID → Search_Term mapping) alongside the main activity DataFrame.

Visualization Module (`visualization/`)

create_icicle_figure(ice_df) - Generate Plotly icicle chart from DataFrame (legacy/pipeline use)
create_icicle_from_nodes(nodes, title) - Generate icicle chart from list-of-dicts (Dash use). Accepts JSON-serializable node dicts from dcc.Store. Uses NHS blue gradient colorscale, 10-field customdata, Source Sans 3 font.
create_market_share_figure(data, title) - Horizontal stacked bar chart: drugs grouped by directorate/indication, bar length = % patients
create_cost_effectiveness_figure(data, retention, title) - Lollipop chart: pathway cost_pp_pa with dot size = patient count, retention annotations
create_cost_waterfall_figure(data, title) - Waterfall chart: directorate-level cost_pp_pa sorted highest to lowest
create_sankey_figure(data, title) - Sankey diagram: drug switching flows across treatment lines (1st → 2nd → 3rd)
create_dosing_figure(data, title, group_by) - Grouped horizontal bar chart: dosing intervals by drug or trust
create_heatmap_figure(data, title, metric) - Matrix heatmap: directorate × drug with patient/cost/cost_pp_pa colouring
create_duration_figure(data, title, show_directory) - Horizontal bar chart: average treatment duration in days per drug
save_figure_html() - Save interactive HTML file
open_figure_in_browser() - Open chart in default browser

Parsing Utilities (`data_processing/parsing.py`)

parse_average_spacing(spacing_html) - Extract drug_name, dose_count, weekly_interval, total_weeks from HTML string
parse_pathway_drugs(ids, level) - Extract ordered drug list from ids column at level 4+
calculate_retention_rate(nodes) - For each N-drug pathway, calculate % not escalating to N+1 drugs

Shared Data Queries (`data_processing/pathway_queries.py`)

Shared query functions used by the Dash app (via thin wrappers in dash_app/data/queries.py):

load_initial_data(db_path) - Returns available drugs (42), directorates (14), indications (32), trusts (7), total_patients, last_updated
load_pathway_nodes(db_path, filter_id, chart_type, selected_drugs, selected_directorates, selected_trusts) - Returns pathway nodes, unique_patients, total_drugs, total_cost, last_updated. Parameterized SQL with optional drug/directorate/trust filters.
get_drug_market_share(db_path, filter_id, chart_type, directory, trust) - Level 3 nodes grouped by directory, returns drug, value, colour
get_pathway_costs(db_path, filter_id, chart_type, directory, trust) - Level 4+ nodes with cost_pp_pa, pathway labels, patient counts
get_cost_waterfall(db_path, filter_id, chart_type, trust) - Level 2 nodes with cost_pp_pa per directorate/indication
get_drug_transitions(db_path, filter_id, chart_type, directory, trust) - Level 3+ nodes parsed into source→target drug transitions
get_dosing_intervals(db_path, filter_id, chart_type, drug, trust) - Level 3 nodes with parsed average_spacing intervals
get_drug_directory_matrix(db_path, filter_id, chart_type, trust) - Level 3 nodes pivoted as directory × drug matrix
get_treatment_durations(db_path, filter_id, chart_type, directory, trust) - Level 3 nodes with avg_days by drug

Dash Application (`dash_app/`)

State Management via 3 dcc.Store components:

app-state (session): chart_type, initiated, last_seen, date_filter_id, selected_drugs, selected_directorates, selected_trusts
chart-data (memory): nodes[], unique_patients, total_drugs, total_cost, last_updated
reference-data (session): available_drugs, available_directorates, available_indications, available_trusts, total_patients, last_updated

Callback Chain (unidirectional):

Page Load → load_reference_data → reference-data store + header indicators
         → update_app_state → app-state store (default filters)
                             → load_pathway_data → chart-data store
                                                  ├→ update_kpis → KPI cards
                                                  └→ update_chart → dcc.Graph (dispatches by active-tab)

Filter change → update_app_state → app-state → load_pathway_data → (chain above)
Drawer selection → all-drugs-chips/trust-chips → update_app_state → (chain above)
Tab click → switch_tab → active-tab store → update_chart → dcc.Graph (lazy: only active tab computed)

Key Components:

Header (header.py): NHS branding, data freshness indicator (patient count + relative time)
Sidebar (sidebar.py): Navigation with Pathway Overview link (chart views moved to tab bar in chart_card.py)
Filter Bar (filter_bar.py): Chart type toggle pills (By Directory / By Indication) + date filter dropdowns
KPI Row (kpi_row.py): 4 cards — Unique Patients, Drug Types, Total Cost, Indication Match Rate (~93%)
Chart Card (chart_card.py): 8-tab chart area (Icicle, Market Share, Cost Effectiveness, Cost Waterfall, Sankey, Dosing, Heatmap, Duration) with dcc.Loading spinner, dynamic subtitle, and dcc.Store(id="active-tab")
Drawer (drawer.py): dmc.Drawer with drug chips (dmc.ChipGroup), trust chips, directorate accordion with indication sub-items and drug fragment badges
Footer (footer.py): NHS Norfolk and Waveney ICB branding

Drawer Drug Browser:

"All Drugs" section: flat dmc.ChipGroup with 42 drugs from pathway_nodes level 3
"Trusts" section: dmc.ChipGroup with 7 trusts
"By Directorate" section: nested dmc.Accordion — 19 directorates → indications → drug fragment dmc.Badge items
Clicking a drug fragment badge selects all full drug names containing that fragment (substring match)
"Clear All Filters" button resets drug and trust selections

Data Transformations (`data_processing/transforms.py`)

Core data transformation functions used by the pipeline:

patient_id() - Creates UPID = Provider Code (first 3 chars) + PersonKey
drug_names() - Standardizes via drugnames.csv lookup
department_identification() - 5-level fallback chain for directory assignment

Data Flow

Pre-Computed Pathway Architecture (Current):

[CLI: python -m cli.refresh_pathways --chart-type all]

    Snowflake Data Warehouse
           │
           ▼ (fetch_and_transform_data)
    ┌──────────────────────────────────────────┐
    │ Data Transformations (data_processing/transforms.py)     │
    │   → patient_id() creates UPID            │
    │   → drug_names() standardizes names      │
    │   → department_identification() → Dir    │
    └──────────────────────────────────────────┘
           │
           ├─── Directory Charts ──────────────────────────────────────┐
           │                                                           │
           │    ┌──────────────────────────────────────────┐           │
           │    │ For each of 6 date filter combos:        │           │
           │    │   → generate_icicle_chart()              │           │
           │    │   → extract_denormalized_fields()        │           │
           │    │   → convert_to_records("directory")      │           │
           │    └──────────────────────────────────────────┘           │
           │                                                           │
           ├─── Indication Charts ─────────────────────────────────────┤
           │                                                           │
           │    ┌──────────────────────────────────────────┐           │
           │    │ GP Diagnosis Lookup (diagnosis_lookup.py)│           │
           │    │   → Extract PseudoNHSNoLinked from HCD   │           │
           │    │   → get_patient_indication_groups()      │           │
           │    │     (SNOMED cluster CTE + GP records)    │           │
           │    │   → Build indication_df: UPID → Search   │           │
           │    │     Term (matched) or Directorate (no GP)│           │
           │    └──────────────────────────────────────────┘           │
           │                        │                                  │
           │                        ▼                                  │
           │    ┌──────────────────────────────────────────┐           │
           │    │ For each of 6 date filter combos:        │           │
           │    │   → generate_icicle_chart_indication()   │           │
           │    │   → extract_indication_fields()          │           │
           │    │   → convert_to_records("indication")     │           │
           │    └──────────────────────────────────────────┘           │
           │                                                           │
           └───────────────────────┬───────────────────────────────────┘
                                   │
                                   ▼ (insert_pathway_records)
    ┌──────────────────────────────────────────┐
    │ SQLite: pathway_nodes table              │
    │   → ~3,600 nodes across 12 datasets      │
    │   → UNIQUE(date_filter_id, chart_type,   │
    │     ids) prevents cross-type overwrites  │
    │   → Indexed for fast filtering           │
    └──────────────────────────────────────────┘


[Dash App: python run_dash.py]

    ┌──────────────────────────────────────────┐
    │ Filter Bar + Drawer (toggle pills,       │
    │   date dropdowns, drug/trust chips)      │
    │   → Triggers update_app_state callback   │
    └──────────────────────────────────────────┘
           │
           ▼
    ┌──────────────────────────────────────────┐
    │ load_pathway_data callback               │
    │   → Input: app-state dcc.Store           │
    │   → Calls pathway_queries.load_pathway_  │
    │     nodes() with filters                 │
    │   → Output: chart-data dcc.Store         │
    └──────────────────────────────────────────┘
           │
           ├──────────────────────────────┐
           ▼                              ▼
    ┌────────────────────┐  ┌──────────────────────┐
    │ update_kpis        │  │ update_chart         │
    │   → 4 KPI cards    │  │   → create_icicle_   │
    │   → formatted      │  │     from_nodes()     │
    │     counts/costs   │  │   → 10-field custom- │
    └────────────────────┘  │     data + NHS blue  │
                            │   → dcc.Graph figure │
                            └──────────────────────┘

Reference Data Files (`data/`)

File	Purpose
`include.csv`	Drug filter list with default selections (Include=1)
`defaultTrusts.csv`	NHS Trust list for filter
`directory_list.csv`	Medical specialties/directories
`drugnames.csv`	Drug name standardization mapping
`org_codes.csv`	Provider code to organization name mapping
`drug_directory_list.csv`	Valid drug-to-directory mappings (pipe-separated)
`treatment_function_codes.csv`	NHS treatment function code mappings
`drug_indication_clusters.csv`	Drug to SNOMED cluster mappings
`ta-recommendations.xlsx`	NICE TA recommendations
`pathways.db`	SQLite database (~3.5 MB: reference tables + pathway nodes)

Key Patterns

Department Identification Fallback Chain: The department_identification() function has 5 levels of fallback:

SINGLE_VALID_DIR - Drug has only one valid directory
EXTRACTED - Extracted from Additional Detail/Description fields
CALCULATED_MOST_FREQ - Most frequent valid directory for UPID/Drug
UPID_INFERENCE - Inferred from other records with same UPID
UNDEFINED - No directory could be determined

Indication Lookup Workflow (for indication charts):

Extract unique PseudoNHSNoLinked values from HCD activity data
Query Snowflake in batches of 500 patients:
- Embed CLUSTER_MAPPING_SQL (~148 Search_Term → Cluster_ID mappings) as CTE
- Join ClinicalCodingClusterSnomedCodes to get SNOMED codes per cluster
- Join PrimaryCareClinicalCoding on PatientPseudonym = PseudoNHSNoLinked
- Use QUALIFY ROW_NUMBER() OVER (PARTITION BY PatientPseudonym ORDER BY EventDateTime DESC) = 1 for most recent match
Build indication_df mapping UPID → Search_Term (matched) or Directorate + " (no GP dx)" (unmatched)
Pass to generate_icicle_chart_indication() for pathway hierarchy building

Data Source Fallback Chain (for raw data loading, not used by Dash app):

Query cache for recent results
Attempt Snowflake connection
Fall back to CSV/Parquet files

Database Schema (~3.5 MB)

Reference Tables

ref_drug_names - Drug name standardization
ref_organizations - Provider code to name mapping
ref_directories - Valid directory names
ref_drug_directory_map - Valid drug-directory pairs
ref_drug_indication_clusters - Drug to SNOMED cluster mapping

Pathway Tables

pathway_date_filters - 6 pre-defined date filter combinations
- Columns: id, initiated, last_seen, is_default, description
- Auto-populated via migration
pathway_nodes - Pre-computed pathway hierarchy nodes (~3,600 rows for 12 datasets)
- Routing: chart_type ("directory" or "indication"), date_filter_id
- Hierarchy: parents, ids, labels, level
- Metrics: value, cost, costpp, cost_pp_pa, colour
- Dates: first_seen, last_seen, first_seen_parent, last_seen_parent
- Statistics: average_spacing, average_administered, avg_days
- Denormalized: trust_name, directory, drug_sequence
- Foreign key: date_filter_id → pathway_date_filters.id
- Unique constraint: UNIQUE(date_filter_id, chart_type, ids) — critical for INSERT OR REPLACE correctness
- Indexed for: date_filter_id, chart_type, trust_name, directory, level
pathway_refresh_log - Tracks data refresh status
- Columns: refresh_id, started_at, completed_at, status, records_processed, error_message, source_row_count

Input Data Requirements

The input data (CSV/Parquet) must contain columns including:

Provider Code, PersonKey - Used to create UPID
PseudoNHSNoLinked - NHS pseudonym for GP record matching (indication charts)
Drug Name, Intervention Date, Price Actual
OrganisationName
Various Additional Detail/Description columns for directory extraction
Treatment Function Code

Output

8 interactive chart tabs in a single Dash application:

Icicle — Hierarchical pathway view (Directory: Trust → Directorate → Drug → Pathway; Indication: Trust → GP Diagnosis → Drug → Pathway)
Market Share — Horizontal stacked bars showing drug market share by directorate/indication
Cost Effectiveness — Lollipop chart of pathway cost per patient per annum with retention annotations
Cost Waterfall — Waterfall chart of directorate-level cost_pp_pa
Sankey — Drug switching flows across 1st → 2nd → 3rd treatment lines
Dosing — Grouped bar chart of dosing intervals by drug or trust
Heatmap — Directorate × Drug matrix coloured by patient count, cost, or cost_pp_pa
Duration — Horizontal bar chart of average treatment duration per drug

All charts support:

Directory / Indication toggle
Date filter combinations (6 options)
Trust, drug, and directorate filters
Lazy rendering (only active tab computed)

Testing

# Run all tests with coverage
python -m pytest tests/ -v --cov=core --cov=analysis

# Run specific test file
python -m pytest tests/test_config.py -v

# Run specific test class
python -m pytest tests/test_data_transformations.py::TestPatientId -v

Test coverage includes:

PathConfig validation (23 tests)
AnalysisFilters validation (26 tests)
Data transformation functions (23 tests)
Directory assignment logic (19 tests)

Configuration

Snowflake Connection (`src/config/snowflake.toml`)

[snowflake]
account = "your-account"
database = "DATA_HUB"
schema = "CDM"
warehouse = "your-warehouse"
authenticator = "externalbrowser"  # Required for NHS SSO

Logging

Logs are written to logs/ directory with structured format. Configure via src/core/logging_config.py.

Breaking Changes from Original App

The pre-computed pathway architecture introduces these changes:

Date Filters

Old: Date pickers for arbitrary start_date and end_date
New: Two dropdowns:
- "Treatment Initiated": All years, Last 2 years, Last 1 year
- "Last Seen": Last 6 months, Last 12 months
Reason: Pre-computed pathways require fixed date combinations for performance

Data Refresh

Old: Real-time pathway calculation on each filter change
New: Pre-computed pathways stored in SQLite, refreshed via CLI command
Impact: Data is as fresh as the last python -m cli.refresh_pathways run
Benefit: Sub-50ms filter response time vs multi-minute calculations

State Management (Dash)

State lives in 3 dcc.Store components: app-state, chart-data, reference-data
Filter state: chart_type, initiated, last_seen, date_filter_id, selected_drugs, selected_directorates, selected_trusts
Chart type toggle: "By Directory" / "By Indication" pills in filter bar
Dynamic subtitle: "Trust → Directorate → Drug → Pathway" or "Trust → Indication → Drug → Pathway"
Drug/trust selection via dmc.ChipGroup in right-side drawer

Icicle Chart

Full 10-field customdata structure (value, colour, cost, costpp, first_seen, last_seen, first_seen_parent, last_seen_parent, average_spacing, cost_pp_pa)
NHS blue gradient colorscale: Heritage Blue #003087 → Pale Blue #E3F2FD
Treatment statistics (average_spacing, cost_pp_pa) in hover tooltips
First/last seen dates for drug nodes
create_icicle_from_nodes() in src/visualization/plotly_generator.py — shared function accepting list-of-dicts

33 KiB

Raw Blame History

CLAUDE.md

Project Overview

Running the Application

CLI Commands

Architecture

Package Structure

Pathway Data Architecture

Core Module (`core/`)

CLI Module (`cli/`)

Data Processing Module (`data_processing/`)

Analysis Module (`analysis/`)

Visualization Module (`visualization/`)

Parsing Utilities (`data_processing/parsing.py`)

Shared Data Queries (`data_processing/pathway_queries.py`)

Dash Application (`dash_app/`)

Data Transformations (`data_processing/transforms.py`)

Data Flow

Reference Data Files (`data/`)

Key Patterns

Database Schema (~3.5 MB)

Reference Tables

Pathway Tables

Input Data Requirements

Output

Testing

Configuration

Snowflake Connection (`src/config/snowflake.toml`)

Logging

Breaking Changes from Original App

Date Filters

Data Refresh

State Management (Dash)

Icicle Chart

Development

Adding New Analysis Features

Adding New Reference Data

33 KiB Raw Blame History Unescape Escape

CLAUDE.md

Project Overview

Running the Application

CLI Commands

Architecture

Package Structure

Pathway Data Architecture

Core Module (core/)

CLI Module (cli/)

Data Processing Module (data_processing/)

Analysis Module (analysis/)

Visualization Module (visualization/)

Parsing Utilities (data_processing/parsing.py)

Shared Data Queries (data_processing/pathway_queries.py)

Dash Application (dash_app/)

Data Transformations (data_processing/transforms.py)

Data Flow

Reference Data Files (data/)

Key Patterns

Database Schema (~3.5 MB)

Reference Tables

Pathway Tables

Input Data Requirements

Output

Testing

Configuration

Snowflake Connection (src/config/snowflake.toml)

Logging

Breaking Changes from Original App

Date Filters

Data Refresh

State Management (Dash)

Icicle Chart

Development

Adding New Analysis Features

Adding New Reference Data

33 KiB

Raw Blame History

Core Module (`core/`)

CLI Module (`cli/`)

Data Processing Module (`data_processing/`)

Analysis Module (`analysis/`)

Visualization Module (`visualization/`)

Parsing Utilities (`data_processing/parsing.py`)

Shared Data Queries (`data_processing/pathway_queries.py`)

Dash Application (`dash_app/`)

Data Transformations (`data_processing/transforms.py`)

Reference Data Files (`data/`)

Snowflake Connection (`src/config/snowflake.toml`)