- Add create_icicle_from_nodes() to src/visualization/plotly_generator.py
accepting list-of-dicts from dcc.Store with NHS blue gradient colorscale,
10-field customdata, and matching text/hover templates from Reflex version
- Add update_chart callback to dash_app/callbacks/chart.py rendering
go.Icicle figure from chart-data store with dynamic subtitle
- Title generation helper mirrors Reflex _generate_pathway_chart_title()
- header.py: NHS branded top bar with logo, title, breadcrumb,
data freshness indicators (record count + last updated with IDs
for callback updates)
- sidebar.py: Navigation with 7 items across Analysis/Reports
sections, SVG icons via data URI, Drug Selection and Indications
items have IDs for drawer open callbacks (Phase 4)
- app.py: Assembles header + sidebar + main content placeholder
- nhs.css: Added .sidebar__icon rule for img-based SVG icons
Extract load_data() and load_pathway_data() logic from Reflex AppState
into standalone functions in src/data_processing/pathway_queries.py.
Create thin dash_app/data/queries.py wrapper with DB_PATH resolution.
Drop fact_interventions (440K rows), mv_patient_treatment_summary (35K rows),
ref_drug_snomed_mapping (144K rows), and processed_files — all unused since
the app moved to pre-computed pathway_nodes.
Key changes:
- Rewrite load_data() to source from pathway_nodes + pathway_refresh_log
- Remove 7 dead methods and 8 dead state vars from pathways_app.py
- Delete patient_data.py, load_snomed_mapping.py, test_large_dataset_performance.py
- Remove SQLiteDataLoader (depended on fact_interventions)
- Remove file tracking schema (processed_files tracked fact_interventions loads)
- Remove legacy diagnosis functions from diagnosis_lookup.py
- Add source_row_count migration for pathway_refresh_log
- Clean all cross-references in __init__.py, data_source.py, migrate.py
Dry run test revealed GP lookup queries timing out at 30s (connection_timeout
in snowflake.toml). Increased to 600s. Also increased batch_size from 500 to
5000 — query time is ~40s regardless of batch size (CTE compilation overhead),
so larger batches reduce total time from ~50min to ~6min for 36K patients.
Dry run results: 91.8% GP match rate, 49.3% drug-indication match rate,
42,072 modified UPIDs, 1,846 pathway nodes across 6 date filters.
Replace old per-patient indication matching in refresh_pathways.py with
drug-aware matching via assign_drug_indications(). Each drug is now
cross-referenced against both the patient's GP diagnoses AND the
DimSearchTerm.csv drug mapping. GP codes restricted to HCD data window
via earliest_hcd_date parameter.
- Replace QUALIFY ROW_NUMBER()=1 with GROUP BY + COUNT(*) to return all matching
Search_Terms per patient instead of just the most recent
- Add earliest_hcd_date parameter to restrict GP codes to HCD data window
- Return code_frequency column (count of matching SNOMED codes per Search_Term)
for use as tiebreaker in drug-aware indication matching
- Update empty DataFrame returns to match new column format
Merge 'allergic asthma' and 'severe persistent allergic asthma' into
canonical 'asthma' in both CLUSTER_MAPPING_SQL (Snowflake CTE) and
load_drug_indication_mapping() (DimSearchTerm.csv loader).
- CLUSTER_MAPPING_SQL: 3 Cluster_IDs (AST_COD, eFI2_Asthma, SEVAST_COD) now
all map to Search_Term = 'asthma'
- Added SEARCH_TERM_MERGE_MAP constant for reusable normalization
- load_drug_indication_mapping() applies merge at CSV load time
- urticaria (XSAL_COD) stays separate — not merged with asthma
- Combined asthma drug list: BENRALIZUMAB, DUPILUMAB, INHALED, MEPOLIZUMAB,
OMALIZUMAB, RESLIZUMAB
Add load_drug_indication_mapping() and get_search_terms_for_drug() to
diagnosis_lookup.py. Loads DimSearchTerm.csv to build bidirectional
lookup between drug name fragments and Search_Terms. Uses substring
matching for drug fragments (handles both exact names like ADALIMUMAB
and partial fragments like PEGYLATED). Handles duplicate Search_Terms
(e.g., diabetes appearing under two directorates) by combining fragments.
The UNIQUE constraint was UNIQUE(date_filter_id, ids) instead of
UNIQUE(date_filter_id, chart_type, ids), causing INSERT OR REPLACE
to overwrite directory chart root/trust nodes when indication nodes
were inserted. Dropped and recreated the table, re-ran full refresh.
Validation: both chart types have all hierarchy levels (0-5),
all 12 date filters produce valid icicle charts, KPIs correct.