Extract chart type toggle and date filter dropdowns from filter_bar.py
into a new sub-header component. Sub-header is fixed-position below the
main header, visible across both views. Filter bar now contains only
drug/trust/directorate buttons for Patient Pathways view.
All 8 chart tabs verified — queries, figures, and filter dispatch
tested in both directory and indication modes. CLAUDE.md updated
with new chart types, query functions, and parsing utilities.
Phase 9 completion criteria all satisfied.
- Create create_cost_effectiveness_figure() in plotly_generator.py
Horizontal lollipop chart with dot size by patient count,
colour gradient green→amber→red by cost, retention annotations
- Fix calculate_retention_rate() to accept both 'value' and 'patients' keys
- Add _render_cost_effectiveness() dispatch in chart.py callbacks
- Wire into tab switching for active_tab='cost-effectiveness'
- create_market_share_figure() in src/visualization/plotly_generator.py
- Horizontal stacked bar chart: directorates × drugs with patient %
- Wire into tab dispatch via _render_market_share() helper in chart.py
- Responds to date, chart type, trust, and directorate filters
New query functions in src/data_processing/pathway_queries.py:
- get_drug_market_share: Level 3 drug nodes grouped by directory
- get_pathway_costs: Level 4+ pathway nodes with cost_pp_pa
- get_cost_waterfall: Directorate cost per patient from level 3 aggregation
- get_drug_transitions: Sankey source/target drug transitions with ordinal line labels
- get_dosing_intervals: Parsed average_spacing by trust/directory
- get_drug_directory_matrix: Directory x drug pivot with patient/cost metrics
- get_treatment_durations: Weighted avg_days by drug within directorates
Thin wrappers added in dash_app/data/queries.py for all 7 functions.
- Create src/data_processing/parsing.py with parse_average_spacing(),
parse_pathway_drugs(), and calculate_retention_rate()
- Add 8-tab bar to chart_card.py (Icicle, Market Share, Cost Effectiveness,
Cost Waterfall, Sankey, Dosing, Heatmap, Duration)
- Add active-tab dcc.Store and tab switching callback in chart.py
- Remove Chart Views section from sidebar (now in tab bar)
- Lazy rendering: only active tab's chart is computed
- Add _prune_empty_ancestors() to remove directorate/trust nodes with no
matching children when drug or directorate filters are active (e.g.,
filtering by Immunoglobulin no longer shows empty Ophthalmology box)
- Sum level-3 drug nodes for KPI values when entity filters are active
instead of using the root node's pre-computed unfiltered totals
- Created 3 separate modals: Drug Selection (lg), Trust Selection (sm),
Directorate Browser (xl) with centered overlay
- Added filter trigger buttons to filter bar with count badges
- Added "Clear All" button in filter bar for global filter reset
- Per-modal clear buttons for drugs and trusts
- Preserved all existing selection logic (same component IDs)
- Deleted drawer.py component and callbacks (replaced by modals.py)
- Updated CSS: filter-btn styles, modal chip/badge styles
Drug filter WHERE clause used `drug_sequence IS NULL` to keep ancestor nodes,
but levels 0-2 have empty string '' not NULL. Changed to level-based gating:
- Drug filter: `(level < 3 OR drug_sequence LIKE ...)`
- Directorate filter: `(level < 2 OR directory IN (...) OR directory IS NULL OR directory = '')`
- Trust filter was already correct (had `OR trust_name = ''`)
Badge IDs changed from f"{directorate}|{frag}" to f"{directorate}|{search_term}|{frag}"
to handle fragments appearing under multiple indications within the same directorate.
Callback parsing updated to use rsplit("|", 1)[-1] for the 3-part key.
Rewrote README.md, USER_GUIDE.md, and DEPLOYMENT.md to reflect
the Dash application. Updated RALPH_PROMPT.md, guardrails.md, and
DESIGN_SYSTEM.md to remove Reflex references. All non-archive
documentation now reflects the current Dash + DMC architecture.
- Add create_icicle_from_nodes() to src/visualization/plotly_generator.py
accepting list-of-dicts from dcc.Store with NHS blue gradient colorscale,
10-field customdata, and matching text/hover templates from Reflex version
- Add update_chart callback to dash_app/callbacks/chart.py rendering
go.Icicle figure from chart-data store with dynamic subtitle
- Title generation helper mirrors Reflex _generate_pathway_chart_title()
- header.py: NHS branded top bar with logo, title, breadcrumb,
data freshness indicators (record count + last updated with IDs
for callback updates)
- sidebar.py: Navigation with 7 items across Analysis/Reports
sections, SVG icons via data URI, Drug Selection and Indications
items have IDs for drawer open callbacks (Phase 4)
- app.py: Assembles header + sidebar + main content placeholder
- nhs.css: Added .sidebar__icon rule for img-based SVG icons
Extract load_data() and load_pathway_data() logic from Reflex AppState
into standalone functions in src/data_processing/pathway_queries.py.
Create thin dash_app/data/queries.py wrapper with DB_PATH resolution.
Dry run test revealed GP lookup queries timing out at 30s (connection_timeout
in snowflake.toml). Increased to 600s. Also increased batch_size from 500 to
5000 — query time is ~40s regardless of batch size (CTE compilation overhead),
so larger batches reduce total time from ~50min to ~6min for 36K patients.
Dry run results: 91.8% GP match rate, 49.3% drug-indication match rate,
42,072 modified UPIDs, 1,846 pathway nodes across 6 date filters.
Replace old per-patient indication matching in refresh_pathways.py with
drug-aware matching via assign_drug_indications(). Each drug is now
cross-referenced against both the patient's GP diagnoses AND the
DimSearchTerm.csv drug mapping. GP codes restricted to HCD data window
via earliest_hcd_date parameter.
- Replace QUALIFY ROW_NUMBER()=1 with GROUP BY + COUNT(*) to return all matching
Search_Terms per patient instead of just the most recent
- Add earliest_hcd_date parameter to restrict GP codes to HCD data window
- Return code_frequency column (count of matching SNOMED codes per Search_Term)
for use as tiebreaker in drug-aware indication matching
- Update empty DataFrame returns to match new column format
Merge 'allergic asthma' and 'severe persistent allergic asthma' into
canonical 'asthma' in both CLUSTER_MAPPING_SQL (Snowflake CTE) and
load_drug_indication_mapping() (DimSearchTerm.csv loader).
- CLUSTER_MAPPING_SQL: 3 Cluster_IDs (AST_COD, eFI2_Asthma, SEVAST_COD) now
all map to Search_Term = 'asthma'
- Added SEARCH_TERM_MERGE_MAP constant for reusable normalization
- load_drug_indication_mapping() applies merge at CSV load time
- urticaria (XSAL_COD) stays separate — not merged with asthma
- Combined asthma drug list: BENRALIZUMAB, DUPILUMAB, INHALED, MEPOLIZUMAB,
OMALIZUMAB, RESLIZUMAB
Add load_drug_indication_mapping() and get_search_terms_for_drug() to
diagnosis_lookup.py. Loads DimSearchTerm.csv to build bidirectional
lookup between drug name fragments and Search_Terms. Uses substring
matching for drug fragments (handles both exact names like ADALIMUMAB
and partial fragments like PEGYLATED). Handles duplicate Search_Terms
(e.g., diabetes appearing under two directorates) by combining fragments.
The UNIQUE constraint was UNIQUE(date_filter_id, ids) instead of
UNIQUE(date_filter_id, chart_type, ids), causing INSERT OR REPLACE
to overwrite directory chart root/trust nodes when indication nodes
were inserted. Dropped and recreated the table, re-ran full refresh.
Validation: both chart types have all hierarchy levels (0-5),
all 12 date filters produce valid icicle charts, KPIs correct.
prepare_data() mapped Provider Code → Name in-place. When called for directory
charts first, then indication charts, the second call re-mapped already-mapped
values to NaN, silently dropping all data. Added df.copy() to prevent mutation.
Also fixes directory charts only generating data for the first date filter.
Results: 3,633 pathway nodes now generated (1,101 directory + 2,532 indication)
across all 12 datasets (6 date filters × 2 chart types).
- Add selected_chart_type state variable and set_chart_type() handler
- Add chart_type filter to load_pathway_data() WHERE clause
- Create segmented control toggle component in filter strip
- Add dynamic hierarchy label (Directorate vs Indication)
- Update chart title to include chart type prefix
Three issues identified and fixed during Task 3.1 testing:
1. Snowflake column name casing:
- Unquoted columns in Snowflake are returned as UPPERCASE
- Fixed by aliasing columns with quoted names: AS "Search_Term"
- Now correctly populates 139 unique Search_Terms (was 0)
2. Duplicate UPID index error:
- indication_df_for_chart could have duplicate UPIDs
- Added drop_duplicates(subset=['UPID']) before set_index()
- Keeps first occurrence (DIAGNOSIS over FALLBACK)
3. Missing UPIDs in indication lookup:
- Old code: built indication_df from unique PseudoNHSNoLinked only
- Problem: patients with multiple UPIDs (multi-provider) were missing
- Fixed: now builds indication_df from ALL unique UPIDs in df
- Also handles NaN values in Directory column safely
Validation results from test run:
- 36,628 patients queried
- 34,006 (92.8%) had GP diagnosis matches
- 139 unique Search_Terms found
- Top 5: drug misuse (8602), influenza (6239), diabetes (2476)
Still to verify: full pathway processing after these fixes.