Files
HighCostDrugsDemo/CLAUDE.md
T
Andrew Charlwood fe8642dfaf feat: remove Reflex, archive old app, update docs for Dash migration (Task 5.4)
- Remove reflex dependency from pyproject.toml
- Move pathways_app/ and rxconfig.py to archive/
- Update CLAUDE.md: Dash app structure, callback chain, run command
- All completion criteria validated (10/10 pass)
2026-02-06 14:35:43 +00:00

564 lines
30 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
NHS High-Cost Drug Patient Pathway Analysis Tool - a web-based application that analyzes secondary care patient treatment pathways. It processes clinical activity data to visualize hierarchical treatment patterns as interactive Plotly icicle charts.
**Key Features:**
- **Dual chart types**: Directory-based (Trust → Directory → Drug → Pathway) and Indication-based (Trust → GP Diagnosis → Drug → Pathway) views with toggle
- **Pre-computed pathway architecture**: Treatment pathways pre-processed and stored in SQLite for instant filtering
- **GP diagnosis matching**: Patient indications matched from GP records using SNOMED cluster codes queried directly from Snowflake (~93% match rate)
- Data pipeline: Snowflake → pre-computed SQLite pathway nodes (CSV/Parquet file loading retained for legacy compatibility)
- Interactive browser-based UI using Dash (Plotly) + Dash Mantine Components
- 6 pre-defined date filter combinations × 2 chart types = 12 pre-computed datasets with sub-50ms response times
## Running the Application
```bash
# Install dependencies
uv sync
# One-time dev setup: adds src/ to Python path via .pth file
uv run python setup_dev.py
# Initialize/migrate the database (creates pathway tables)
python -m data_processing.migrate
# Refresh pathway data from Snowflake (requires SSO auth)
python -m cli.refresh_pathways
# Run the Dash web application
python run_dash.py
```
The application requires Python 3.10+ and runs on http://localhost:8050 by default.
### CLI Commands
**Refresh Pathway Data:**
```bash
# Full refresh — both chart types (directory + indication), all date filters
python -m cli.refresh_pathways --chart-type all
# Directory charts only (faster, skips GP diagnosis lookup)
python -m cli.refresh_pathways --chart-type directory
# Indication charts only
python -m cli.refresh_pathways --chart-type indication
# Dry run (test without database changes)
python -m cli.refresh_pathways --chart-type all --dry-run -v
# Custom minimum patient threshold
python -m cli.refresh_pathways --minimum-patients 10
# Help
python -m cli.refresh_pathways --help
```
The `--chart-type` argument controls which pathway types are processed:
- `all` (default) — generates both directory and indication charts (~15 minutes)
- `directory` — directory-based charts only (~5 minutes)
- `indication` — indication-based charts only (~12 minutes, includes GP lookup)
The refresh command:
1. Fetches activity data from Snowflake (656K+ records, ~7 seconds)
2. Applies UPID, drug name, and directory transformations (~6 minutes)
3. For indication charts: queries GP records via SNOMED clusters (~9 minutes for 37K patients)
4. Processes 6 date filter combinations × selected chart types
5. Inserts pathway nodes to SQLite for fast Dash filtering
## Architecture
### Package Structure
```
.
├── src/ # All application library code
│ ├── core/ # Foundation: paths, models, logging
│ │ ├── config.py # PathConfig dataclass for file paths
│ │ ├── models.py # AnalysisFilters dataclass
│ │ └── logging_config.py # Structured logging setup
│ │
│ ├── config/ # Service configuration
│ │ ├── __init__.py # SnowflakeConfig + loader
│ │ └── snowflake.toml # Connection settings (co-located with loader)
│ │
│ ├── data_processing/ # Data layer
│ │ ├── database.py # SQLite connection management
│ │ ├── schema.py # Database schema (reference + pathway tables)
│ │ ├── pathway_pipeline.py # Pipeline: Snowflake → SQLite
│ │ ├── transforms.py # Data transformations (UPID, drug names, directory)
│ │ ├── loader.py # FileDataLoader for CSV/Parquet files
│ │ ├── reference_data.py # Reference data migration
│ │ ├── snowflake_connector.py # Snowflake integration
│ │ ├── cache.py # Query result caching
│ │ ├── data_source.py # Data source fallback chain
│ │ └── diagnosis_lookup.py # GP diagnosis lookup (SNOMED clusters)
│ │
│ ├── analysis/ # Analysis pipeline
│ │ ├── pathway_analyzer.py # prepare_data, calculate_statistics, build_hierarchy
│ │ └── statistics.py # Statistical calculation functions
│ │
│ ├── visualization/ # Chart generation
│ │ └── plotly_generator.py # create_icicle_figure, create_icicle_from_nodes
│ │
│ └── cli/ # CLI tools
│ └── refresh_pathways.py # Data refresh command
├── dash_app/ # Dash web application
│ ├── app.py # Dash app, layout root, dcc.Store, register_callbacks
│ ├── assets/
│ │ └── nhs.css # NHS design system CSS (from 01_nhs_classic.html)
│ ├── data/
│ │ ├── queries.py # Thin wrapper calling src/data_processing/pathway_queries.py
│ │ └── card_browser.py # DimSearchTerm.csv → directorate tree for drawer
│ ├── components/
│ │ ├── header.py # Top header bar with data freshness indicator
│ │ ├── sidebar.py # Left navigation with drawer triggers
│ │ ├── kpi_row.py # 4 KPI cards (patients, drugs, cost, match rate)
│ │ ├── filter_bar.py # Chart type toggle pills + date filter dropdowns
│ │ ├── chart_card.py # Chart area with tabs + dcc.Graph + loading spinner
│ │ ├── drawer.py # dmc.Drawer with drug/trust chips + directorate cards
│ │ └── footer.py # Page footer
│ ├── callbacks/
│ │ ├── __init__.py # register_callbacks(app)
│ │ ├── filters.py # Reference data loading + filter state management
│ │ ├── chart.py # Pathway data loading + icicle chart rendering
│ │ ├── drawer.py # Drawer open/close + drug/trust selection
│ │ └── kpi.py # KPI card value updates
│ └── utils/
│ └── __init__.py
├── run_dash.py # Entry point: python run_dash.py
├── tests/ # Test suite (113 tests)
├── data/ # Reference data + SQLite DB
├── docs/ # Documentation
├── assets/ # Static assets (logo, favicon)
├── archive/ # Historical/deprecated (includes old Reflex app)
└── logs/ # Runtime logs
```
**Path resolution**: `src/` is added to `sys.path` via a `.pth` file (created by `setup_dev.py`).
All imports use package names directly: `from core import ...`, `from data_processing import ...`, etc.
### Pathway Data Architecture
The application uses a pre-computed pathway architecture for performance:
**Architecture:** `Snowflake → Pathway Processing → SQLite (pre-computed) → Dash (filter & view)`
**Key Benefits:**
- **Performance**: Pathway calculation done once during data refresh, not on every filter change
- **Simplicity**: Dash callbacks filter pre-computed data with simple SQL WHERE clauses
- **Full Pathways**: Sequential treatment pathways (drug_0 → drug_1 → drug_2...) with statistics
**Chart Types:**
| Type | Hierarchy | Level 2 Source |
|------|-----------|----------------|
| `directory` | Trust → Directory → Drug → Pathway | Assigned directorate (5-level fallback) |
| `indication` | Trust → GP Diagnosis → Drug → Pathway | SNOMED cluster Search_Term from GP records |
For indication charts, ~93% of patients are matched to a GP diagnosis (Search_Term). Unmatched patients use their directorate as a fallback label (e.g., "RHEUMATOLOGY (no GP dx)").
**Date Filter Combinations:**
| ID | Initiated | Last Seen | Default |
|----|-----------|-----------|---------|
| `all_6mo` | All years | Last 6 months | Yes |
| `all_12mo` | All years | Last 12 months | No |
| `1yr_6mo` | Last 1 year | Last 6 months | No |
| `1yr_12mo` | Last 1 year | Last 12 months | No |
| `2yr_6mo` | Last 2 years | Last 6 months | No |
| `2yr_12mo` | Last 2 years | Last 12 months | No |
Total pre-computed datasets: 6 date filters × 2 chart types = 12 datasets (~3,600 pathway nodes).
**Pathway Node Structure:**
Each node in `pathway_nodes` contains:
- Routing: `chart_type` ("directory" or "indication"), `date_filter_id`
- Hierarchy: `parents`, `ids`, `labels`, `level` (0=Root, 1=Trust, 2=Directory/Indication, 3=Drug, 4+=Pathway)
- Counts: `value` (patient count)
- Costs: `cost`, `costpp`, `cost_pp_pa` (per patient per annum)
- Dates: `first_seen`, `last_seen`, `first_seen_parent`, `last_seen_parent`
- Statistics: `average_spacing`, `average_administered`, `avg_days`
- Denormalized: `trust_name`, `directory`, `drug_sequence` (for efficient filtering)
- Unique constraint: `UNIQUE(date_filter_id, chart_type, ids)`
### Core Module (`core/`)
- **PathConfig** - Dataclass encapsulating all file paths, with `validate()` method
- **AnalysisFilters** - Dataclass for filter state (dates, drugs, trusts, directories)
- **logging_config** - Structured logging with file and console output
### CLI Module (`cli/`)
- **refresh_pathways.py** - Command-line tool to refresh pre-computed pathway data:
- `refresh_pathways()` - Main function orchestrating the full pipeline
- `insert_pathway_records()` - SQLite insertion with parameterized queries
- `log_refresh_start/complete/failed()` - Refresh tracking in `pathway_refresh_log`
- `get_default_filters()` - Load trusts/drugs/directories from CSV files
### Data Processing Module (`data_processing/`)
**Database Management:**
- `DatabaseManager` - SQLite connection pooling and transaction management
- **Reference Tables**: `ref_drug_names`, `ref_organizations`, `ref_directories`, `ref_drug_directory_map`, `ref_drug_indication_clusters`
- **Pathway Tables**: `pathway_date_filters`, `pathway_nodes`, `pathway_refresh_log`
**Pathway Pipeline (`pathway_pipeline.py`):**
- `DateFilterConfig` - Dataclass for date filter configuration
- `DATE_FILTER_CONFIGS` - All 6 pre-defined date combinations
- `compute_date_ranges(config, max_date)` - Computes actual ISO dates from config
- `fetch_and_transform_data()` - Snowflake fetch + UPID/drug/directory transformations
- Directory chart functions:
- `process_pathway_for_date_filter()` - Processes single date filter using `generate_icicle_chart()`
- `extract_denormalized_fields()` - Parses `ids` column to extract trust, directory, drug_sequence
- Indication chart functions:
- `process_indication_pathway_for_date_filter()` - Processes single date filter using `generate_icicle_chart_indication()`
- `extract_indication_fields()` - Parses `ids` for indication charts (trust, search_term, drug_sequence)
- Shared functions:
- `convert_to_records(ice_df, chart_type)` - Converts ice_df to list of dicts with `chart_type` column
- `process_all_date_filters()` - Convenience function to process all 6 filters
**Data Loaders:**
- `FileDataLoader` - Loads from CSV/Parquet files (used by legacy pipeline, not by Dash app)
- Factory function `get_loader()` creates a `FileDataLoader`
**Snowflake Integration:**
- SSO authentication via `externalbrowser` authenticator
- `fetch_activity_data(start_date, end_date, provider_codes)` method
- Query caching with TTL-based invalidation
**GP Diagnosis Lookup (`diagnosis_lookup.py`):**
- `CLUSTER_MAPPING_SQL` - Embedded SQL constant with ~148 Search_Term → Cluster_ID mappings plus explicit SNOMED codes
- `get_patient_indication_groups(patient_pseudonyms)` - Batch queries Snowflake to match patients to GP diagnoses:
- Embeds cluster mapping as CTE, joins with `PrimaryCareClinicalCoding`
- Uses `PseudoNHSNoLinked` (not PersonKey) to match `PatientPseudonym` in GP records
- Returns most recent match per patient via `QUALIFY ROW_NUMBER()`
- Batches 500 patients per query, returns DataFrame with PatientPseudonym, Search_Term, EventDateTime
- `patient_has_indication(patient_pseudonym, cluster_ids)` - Single-patient GP record check (legacy)
- `validate_indication(patient_pseudonym, drug_name)` - Full validation result with source tracking (legacy)
### Analysis Module (`analysis/`)
Refactored from the original 267-line `generate_graph()` function:
- **prepare_data()** - Filter DataFrame by date range, trusts, drugs, directories (copies df to prevent mutation)
- **calculate_statistics()** - Compute frequency, cost, duration statistics
- **build_hierarchy()** - Create Trust → Directory → Drug → Pathway structure
- **prepare_chart_data()** - Format data for Plotly icicle chart
- **generate_icicle_chart_indication(df, indication_df, ...)** - Build indication-based hierarchy using Search_Term instead of Directory. Takes an `indication_df` (UPID → Search_Term mapping) alongside the main activity DataFrame.
### Visualization Module (`visualization/`)
- **create_icicle_figure(ice_df)** - Generate Plotly icicle chart from DataFrame (legacy/pipeline use)
- **create_icicle_from_nodes(nodes, title)** - Generate icicle chart from list-of-dicts (Dash use). Accepts JSON-serializable node dicts from `dcc.Store`. Uses NHS blue gradient colorscale, 10-field customdata, Source Sans 3 font.
- **save_figure_html()** - Save interactive HTML file
- **open_figure_in_browser()** - Open chart in default browser
### Shared Data Queries (`data_processing/pathway_queries.py`)
Shared query functions used by both the Dash app and potentially other consumers:
- **load_initial_data(db_path)** - Returns available drugs (42), directorates (14), indications (32), trusts (7), total_patients, last_updated
- **load_pathway_nodes(db_path, filter_id, chart_type, selected_drugs, selected_directorates, selected_trusts)** - Returns pathway nodes, unique_patients, total_drugs, total_cost, last_updated. Parameterized SQL with optional drug/directorate/trust filters.
### Dash Application (`dash_app/`)
**State Management** via 3 `dcc.Store` components:
- **app-state** (session): `chart_type`, `initiated`, `last_seen`, `date_filter_id`, `selected_drugs`, `selected_directorates`, `selected_trusts`
- **chart-data** (memory): `nodes[]`, `unique_patients`, `total_drugs`, `total_cost`, `last_updated`
- **reference-data** (session): `available_drugs`, `available_directorates`, `available_indications`, `available_trusts`, `total_patients`, `last_updated`
**Callback Chain** (unidirectional):
```
Page Load → load_reference_data → reference-data store + header indicators
→ update_app_state → app-state store (default filters)
→ load_pathway_data → chart-data store
├→ update_kpis → KPI cards
└→ update_chart → dcc.Graph
Filter change → update_app_state → app-state → load_pathway_data → (chain above)
Drawer selection → all-drugs-chips/trust-chips → update_app_state → (chain above)
```
**Key Components:**
- **Header** (`header.py`): NHS branding, data freshness indicator (patient count + relative time)
- **Sidebar** (`sidebar.py`): Navigation items with drawer trigger IDs for Drug Selection, Trust Selection, Indications
- **Filter Bar** (`filter_bar.py`): Chart type toggle pills (By Directory / By Indication) + date filter dropdowns
- **KPI Row** (`kpi_row.py`): 4 cards — Unique Patients, Drug Types, Total Cost, Indication Match Rate (~93%)
- **Chart Card** (`chart_card.py`): Icicle chart with `dcc.Loading` spinner, dynamic subtitle, tab row
- **Drawer** (`drawer.py`): `dmc.Drawer` with drug chips (`dmc.ChipGroup`), trust chips, directorate accordion with indication sub-items and drug fragment badges
- **Footer** (`footer.py`): NHS Norfolk and Waveney ICB branding
**Drawer Drug Browser:**
- "All Drugs" section: flat `dmc.ChipGroup` with 42 drugs from pathway_nodes level 3
- "Trusts" section: `dmc.ChipGroup` with 7 trusts
- "By Directorate" section: nested `dmc.Accordion` — 19 directorates → indications → drug fragment `dmc.Badge` items
- Clicking a drug fragment badge selects all full drug names containing that fragment (substring match)
- "Clear All Filters" button resets drug and trust selections
### Data Transformations (`data_processing/transforms.py`)
Core data transformation functions used by the pipeline:
- `patient_id()` - Creates UPID = Provider Code (first 3 chars) + PersonKey
- `drug_names()` - Standardizes via drugnames.csv lookup
- `department_identification()` - 5-level fallback chain for directory assignment
### Data Flow
**Pre-Computed Pathway Architecture (Current):**
```
[CLI: python -m cli.refresh_pathways --chart-type all]
Snowflake Data Warehouse
▼ (fetch_and_transform_data)
┌──────────────────────────────────────────┐
│ Data Transformations (data_processing/transforms.py) │
│ → patient_id() creates UPID │
│ → drug_names() standardizes names │
│ → department_identification() → Dir │
└──────────────────────────────────────────┘
├─── Directory Charts ──────────────────────────────────────┐
│ │
│ ┌──────────────────────────────────────────┐ │
│ │ For each of 6 date filter combos: │ │
│ │ → generate_icicle_chart() │ │
│ │ → extract_denormalized_fields() │ │
│ │ → convert_to_records("directory") │ │
│ └──────────────────────────────────────────┘ │
│ │
├─── Indication Charts ─────────────────────────────────────┤
│ │
│ ┌──────────────────────────────────────────┐ │
│ │ GP Diagnosis Lookup (diagnosis_lookup.py)│ │
│ │ → Extract PseudoNHSNoLinked from HCD │ │
│ │ → get_patient_indication_groups() │ │
│ │ (SNOMED cluster CTE + GP records) │ │
│ │ → Build indication_df: UPID → Search │ │
│ │ Term (matched) or Directorate (no GP)│ │
│ └──────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────┐ │
│ │ For each of 6 date filter combos: │ │
│ │ → generate_icicle_chart_indication() │ │
│ │ → extract_indication_fields() │ │
│ │ → convert_to_records("indication") │ │
│ └──────────────────────────────────────────┘ │
│ │
└───────────────────────┬───────────────────────────────────┘
▼ (insert_pathway_records)
┌──────────────────────────────────────────┐
│ SQLite: pathway_nodes table │
│ → ~3,600 nodes across 12 datasets │
│ → UNIQUE(date_filter_id, chart_type, │
│ ids) prevents cross-type overwrites │
│ → Indexed for fast filtering │
└──────────────────────────────────────────┘
[Dash App: python run_dash.py]
┌──────────────────────────────────────────┐
│ Filter Bar + Drawer (toggle pills, │
│ date dropdowns, drug/trust chips) │
│ → Triggers update_app_state callback │
└──────────────────────────────────────────┘
┌──────────────────────────────────────────┐
│ load_pathway_data callback │
│ → Input: app-state dcc.Store │
│ → Calls pathway_queries.load_pathway_ │
│ nodes() with filters │
│ → Output: chart-data dcc.Store │
└──────────────────────────────────────────┘
├──────────────────────────────┐
▼ ▼
┌────────────────────┐ ┌──────────────────────┐
│ update_kpis │ │ update_chart │
│ → 4 KPI cards │ │ → create_icicle_ │
│ → formatted │ │ from_nodes() │
│ counts/costs │ │ → 10-field custom- │
└────────────────────┘ │ data + NHS blue │
│ → dcc.Graph figure │
└──────────────────────┘
```
### Reference Data Files (`data/`)
| File | Purpose |
|------|---------|
| `include.csv` | Drug filter list with default selections (Include=1) |
| `defaultTrusts.csv` | NHS Trust list for filter |
| `directory_list.csv` | Medical specialties/directories |
| `drugnames.csv` | Drug name standardization mapping |
| `org_codes.csv` | Provider code to organization name mapping |
| `drug_directory_list.csv` | Valid drug-to-directory mappings (pipe-separated) |
| `treatment_function_codes.csv` | NHS treatment function code mappings |
| `drug_indication_clusters.csv` | Drug to SNOMED cluster mappings |
| `ta-recommendations.xlsx` | NICE TA recommendations |
| `pathways.db` | SQLite database (~3.5 MB: reference tables + pathway nodes) |
### Key Patterns
**Department Identification Fallback Chain:**
The `department_identification()` function has 5 levels of fallback:
1. **SINGLE_VALID_DIR** - Drug has only one valid directory
2. **EXTRACTED** - Extracted from Additional Detail/Description fields
3. **CALCULATED_MOST_FREQ** - Most frequent valid directory for UPID/Drug
4. **UPID_INFERENCE** - Inferred from other records with same UPID
5. **UNDEFINED** - No directory could be determined
**Indication Lookup Workflow (for indication charts):**
1. Extract unique `PseudoNHSNoLinked` values from HCD activity data
2. Query Snowflake in batches of 500 patients:
- Embed `CLUSTER_MAPPING_SQL` (~148 Search_Term → Cluster_ID mappings) as CTE
- Join `ClinicalCodingClusterSnomedCodes` to get SNOMED codes per cluster
- Join `PrimaryCareClinicalCoding` on `PatientPseudonym` = `PseudoNHSNoLinked`
- Use `QUALIFY ROW_NUMBER() OVER (PARTITION BY PatientPseudonym ORDER BY EventDateTime DESC) = 1` for most recent match
3. Build `indication_df` mapping UPID → Search_Term (matched) or Directorate + " (no GP dx)" (unmatched)
4. Pass to `generate_icicle_chart_indication()` for pathway hierarchy building
**Data Source Fallback Chain** (for raw data loading, not used by Dash app):
1. Query cache for recent results
2. Attempt Snowflake connection
3. Fall back to CSV/Parquet files
## Database Schema (~3.5 MB)
### Reference Tables
- `ref_drug_names` - Drug name standardization
- `ref_organizations` - Provider code to name mapping
- `ref_directories` - Valid directory names
- `ref_drug_directory_map` - Valid drug-directory pairs
- `ref_drug_indication_clusters` - Drug to SNOMED cluster mapping
### Pathway Tables
- `pathway_date_filters` - 6 pre-defined date filter combinations
- Columns: `id`, `initiated`, `last_seen`, `is_default`, `description`
- Auto-populated via migration
- `pathway_nodes` - Pre-computed pathway hierarchy nodes (~3,600 rows for 12 datasets)
- Routing: `chart_type` ("directory" or "indication"), `date_filter_id`
- Hierarchy: `parents`, `ids`, `labels`, `level`
- Metrics: `value`, `cost`, `costpp`, `cost_pp_pa`, `colour`
- Dates: `first_seen`, `last_seen`, `first_seen_parent`, `last_seen_parent`
- Statistics: `average_spacing`, `average_administered`, `avg_days`
- Denormalized: `trust_name`, `directory`, `drug_sequence`
- Foreign key: `date_filter_id``pathway_date_filters.id`
- Unique constraint: `UNIQUE(date_filter_id, chart_type, ids)` — critical for INSERT OR REPLACE correctness
- Indexed for: date_filter_id, chart_type, trust_name, directory, level
- `pathway_refresh_log` - Tracks data refresh status
- Columns: `refresh_id`, `started_at`, `completed_at`, `status`, `records_processed`, `error_message`, `source_row_count`
## Input Data Requirements
The input data (CSV/Parquet) must contain columns including:
- `Provider Code`, `PersonKey` - Used to create UPID
- `PseudoNHSNoLinked` - NHS pseudonym for GP record matching (indication charts)
- `Drug Name`, `Intervention Date`, `Price Actual`
- `OrganisationName`
- Various `Additional Detail/Description` columns for directory extraction
- `Treatment Function Code`
## Output
Interactive Plotly icicle chart with toggleable views:
- **Directory view**: Trust → Directorate → Drug → Patient Pathway
- **Indication view**: Trust → GP Diagnosis (Search_Term) → Drug → Patient Pathway
- Patient counts and percentages at each hierarchy level
- Total and average costs
- Treatment duration and dosing frequency information
- Color gradient based on patient volume
## Testing
```bash
# Run all tests with coverage
python -m pytest tests/ -v --cov=core --cov=analysis
# Run specific test file
python -m pytest tests/test_config.py -v
# Run specific test class
python -m pytest tests/test_data_transformations.py::TestPatientId -v
```
Test coverage includes:
- PathConfig validation (23 tests)
- AnalysisFilters validation (26 tests)
- Data transformation functions (23 tests)
- Directory assignment logic (19 tests)
## Configuration
### Snowflake Connection (`src/config/snowflake.toml`)
```toml
[snowflake]
account = "your-account"
database = "DATA_HUB"
schema = "CDM"
warehouse = "your-warehouse"
authenticator = "externalbrowser" # Required for NHS SSO
```
### Logging
Logs are written to `logs/` directory with structured format.
Configure via `src/core/logging_config.py`.
## Breaking Changes from Original App
The pre-computed pathway architecture introduces these changes:
### Date Filters
- **Old**: Date pickers for arbitrary `start_date` and `end_date`
- **New**: Two dropdowns:
- "Treatment Initiated": All years, Last 2 years, Last 1 year
- "Last Seen": Last 6 months, Last 12 months
- **Reason**: Pre-computed pathways require fixed date combinations for performance
### Data Refresh
- **Old**: Real-time pathway calculation on each filter change
- **New**: Pre-computed pathways stored in SQLite, refreshed via CLI command
- **Impact**: Data is as fresh as the last `python -m cli.refresh_pathways` run
- **Benefit**: Sub-50ms filter response time vs multi-minute calculations
### State Management (Dash)
- State lives in 3 `dcc.Store` components: `app-state`, `chart-data`, `reference-data`
- Filter state: `chart_type`, `initiated`, `last_seen`, `date_filter_id`, `selected_drugs`, `selected_directorates`, `selected_trusts`
- Chart type toggle: "By Directory" / "By Indication" pills in filter bar
- Dynamic subtitle: "Trust → Directorate → Drug → Pathway" or "Trust → Indication → Drug → Pathway"
- Drug/trust selection via `dmc.ChipGroup` in right-side drawer
### Icicle Chart
- Full 10-field customdata structure (value, colour, cost, costpp, first_seen, last_seen, first_seen_parent, last_seen_parent, average_spacing, cost_pp_pa)
- NHS blue gradient colorscale: Heritage Blue #003087 → Pale Blue #E3F2FD
- Treatment statistics (average_spacing, cost_pp_pa) in hover tooltips
- First/last seen dates for drug nodes
- `create_icicle_from_nodes()` in `src/visualization/plotly_generator.py` — shared function accepting list-of-dicts
## Development
### Adding New Analysis Features
1. Add statistical functions to `src/analysis/statistics.py`
2. Integrate into pipeline in `src/analysis/pathway_analyzer.py`
3. Update visualization in `src/visualization/plotly_generator.py`
### Adding New Reference Data
1. Add CSV file to `data/` directory
2. Define schema in `src/data_processing/schema.py`
3. Create migration function in `src/data_processing/reference_data.py`
4. Add path to `PathConfig` in `src/core/config.py`