Commit Graph

91 Commits

Author SHA1 Message Date
Andrew Charlwood 0af76e68e0 feat: add Directorate × Drug Heatmap chart (Task 9.8) 2026-02-06 20:04:19 +00:00
Andrew Charlwood 02fe4b4e28 feat: add Dosing Interval Comparison chart (Task 9.7) 2026-02-06 19:58:28 +00:00
Andrew Charlwood 4ffcdf4268 feat: add Drug Switching Sankey diagram (Task 9.6) 2026-02-06 19:50:43 +00:00
Andrew Charlwood 73a8d1a49f feat: add Cost Waterfall bar chart (Task 9.5) 2026-02-06 19:44:37 +00:00
Andrew Charlwood 4ef7239eed feat: add Pathway Cost Effectiveness lollipop chart (Task 9.4)
- Create create_cost_effectiveness_figure() in plotly_generator.py
  Horizontal lollipop chart with dot size by patient count,
  colour gradient green→amber→red by cost, retention annotations
- Fix calculate_retention_rate() to accept both 'value' and 'patients' keys
- Add _render_cost_effectiveness() dispatch in chart.py callbacks
- Wire into tab switching for active_tab='cost-effectiveness'
2026-02-06 19:38:54 +00:00
Andrew Charlwood f8960a3064 feat: add First-Line Market Share chart (Task 9.3)
- create_market_share_figure() in src/visualization/plotly_generator.py
- Horizontal stacked bar chart: directorates × drugs with patient %
- Wire into tab dispatch via _render_market_share() helper in chart.py
- Responds to date, chart type, trust, and directorate filters
2026-02-06 19:28:20 +00:00
Andrew Charlwood d98cd4fd69 feat: add 7 analytics chart query functions (Task 9.2)
New query functions in src/data_processing/pathway_queries.py:
- get_drug_market_share: Level 3 drug nodes grouped by directory
- get_pathway_costs: Level 4+ pathway nodes with cost_pp_pa
- get_cost_waterfall: Directorate cost per patient from level 3 aggregation
- get_drug_transitions: Sankey source/target drug transitions with ordinal line labels
- get_dosing_intervals: Parsed average_spacing by trust/directory
- get_drug_directory_matrix: Directory x drug pivot with patient/cost metrics
- get_treatment_durations: Weighted avg_days by drug within directorates

Thin wrappers added in dash_app/data/queries.py for all 7 functions.
2026-02-06 19:21:10 +00:00
Andrew Charlwood fe2d048a21 feat: add parsing utilities and 8-tab chart infrastructure (Task 9.1)
- Create src/data_processing/parsing.py with parse_average_spacing(),
  parse_pathway_drugs(), and calculate_retention_rate()
- Add 8-tab bar to chart_card.py (Icicle, Market Share, Cost Effectiveness,
  Cost Waterfall, Sankey, Dosing, Heatmap, Duration)
- Add active-tab dcc.Store and tab switching callback in chart.py
- Remove Chart Views section from sidebar (now in tab bar)
- Lazy rendering: only active tab's chart is computed
2026-02-06 19:13:19 +00:00
Andrew Charlwood de08d4b520 fix: prune empty ancestor nodes and update KPIs for filtered views (Section 8)
- Add _prune_empty_ancestors() to remove directorate/trust nodes with no
  matching children when drug or directorate filters are active (e.g.,
  filtering by Immunoglobulin no longer shows empty Ophthalmology box)
- Sum level-3 drug nodes for KPI values when entity filters are active
  instead of using the root node's pre-computed unfiltered totals
2026-02-06 16:25:56 +00:00
Andrew Charlwood f2c5b2645e refactor: replace dmc.Drawer with dmc.Modal for filter selection (Task 7.4 + 7.5)
- Created 3 separate modals: Drug Selection (lg), Trust Selection (sm),
  Directorate Browser (xl) with centered overlay
- Added filter trigger buttons to filter bar with count badges
- Added "Clear All" button in filter bar for global filter reset
- Per-modal clear buttons for drugs and trusts
- Preserved all existing selection logic (same component IDs)
- Deleted drawer.py component and callbacks (replaced by modals.py)
- Updated CSS: filter-btn styles, modal chip/badge styles
2026-02-06 15:42:48 +00:00
Andrew Charlwood 7aa49b0d6b refactor: restructure sidebar with chart views, remove placeholder items (Task 7.3)
- Remove non-functional sidebar items: Cost Analysis, Export Data
- Remove filter trigger items: Drug/Trust/Directory Selection, Indications
- Add Chart Views section: Icicle Chart (active), Sankey Diagram (disabled), Timeline (disabled)
- Remove tab row from chart_card.py (chart view selection now in sidebar)
- Remove open_drawer callback (sidebar no longer has filter triggers)
- Add .sidebar__item--disabled CSS class
2026-02-06 15:29:53 +00:00
Andrew Charlwood 00627a7299 fix: preserve ancestor nodes in drug/directorate filters to prevent broken icicle hierarchy (Task 7.2)
Drug filter WHERE clause used `drug_sequence IS NULL` to keep ancestor nodes,
but levels 0-2 have empty string '' not NULL. Changed to level-based gating:
- Drug filter: `(level < 3 OR drug_sequence LIKE ...)`
- Directorate filter: `(level < 2 OR directory IN (...) OR directory IS NULL OR directory = '')`
- Trust filter was already correct (had `OR trust_name = ''`)
2026-02-06 15:24:09 +00:00
Andrew Charlwood 7be136ac87 fix: resolve DuplicateIdError by including search_term in drug-fragment badge IDs (Task 7.1)
Badge IDs changed from f"{directorate}|{frag}" to f"{directorate}|{search_term}|{frag}"
to handle fragments appearing under multiple indications within the same directorate.
Callback parsing updated to use rsplit("|", 1)[-1] for the 3-part key.
2026-02-06 15:19:18 +00:00
Andrew Charlwood 54b4a0f743 docs: update all documentation for Dash migration (Phase 6)
Rewrote README.md, USER_GUIDE.md, and DEPLOYMENT.md to reflect
the Dash application. Updated RALPH_PROMPT.md, guardrails.md, and
DESIGN_SYSTEM.md to remove Reflex references. All non-archive
documentation now reflects the current Dash + DMC architecture.
2026-02-06 14:54:12 +00:00
Andrew Charlwood fe8642dfaf feat: remove Reflex, archive old app, update docs for Dash migration (Task 5.4)
- Remove reflex dependency from pyproject.toml
- Move pathways_app/ and rxconfig.py to archive/
- Update CLAUDE.md: Dash app structure, callback chain, run command
- All completion criteria validated (10/10 pass)
2026-02-06 14:35:43 +00:00
Andrew Charlwood e877268805 feat: add data freshness indicator with relative time and patient count (Task 5.3) 2026-02-06 14:21:45 +00:00
Andrew Charlwood 5593d08062 feat: add loading spinner, empty state, and error handling to chart area (Task 5.2) 2026-02-06 14:16:26 +00:00
Andrew Charlwood f0505ee43e feat: add trust selection to drawer with filter wiring (Task 5.1) 2026-02-06 14:09:36 +00:00
Andrew Charlwood fe76e5a313 feat: add drawer callbacks for drug selection, fragment matching, and clear (Task 4.2) 2026-02-06 13:59:00 +00:00
Andrew Charlwood 5dc552f8c5 feat: add dmc.Drawer drug browser with directorate cards and drug chips (Task 4.1) 2026-02-06 13:51:24 +00:00
Andrew Charlwood 40ce7fc5f9 feat: add icicle chart rendering with NHS colorscale and dynamic titles (Task 3.4)
- Add create_icicle_from_nodes() to src/visualization/plotly_generator.py
  accepting list-of-dicts from dcc.Store with NHS blue gradient colorscale,
  10-field customdata, and matching text/hover templates from Reflex version
- Add update_chart callback to dash_app/callbacks/chart.py rendering
  go.Icicle figure from chart-data store with dynamic subtitle
- Title generation helper mirrors Reflex _generate_pathway_chart_title()
2026-02-06 13:44:13 +00:00
Andrew Charlwood 9c971c083b feat: add KPI update callback with formatted patient/drug/cost display (Task 3.3) 2026-02-06 13:38:11 +00:00
Andrew Charlwood ad9fa1cfec feat: add pathway data loading callback bridging filters to chart-data (Task 3.2) 2026-02-06 13:33:31 +00:00
Andrew Charlwood eda35c7168 feat: add reference data loading and filter state callbacks (Task 3.1) 2026-02-06 13:29:30 +00:00
Andrew Charlwood 3568e03fc2 feat: add footer component and complete Phase 2 static layout (Task 2.3) 2026-02-06 13:24:33 +00:00
Andrew Charlwood 307563bb31 feat: add KPI row, filter bar, and chart card components (Task 2.2) 2026-02-06 13:20:42 +00:00
Andrew Charlwood bdc1690f0f feat: add header and sidebar components for Dash layout (Task 2.1)
- header.py: NHS branded top bar with logo, title, breadcrumb,
  data freshness indicators (record count + last updated with IDs
  for callback updates)
- sidebar.py: Navigation with 7 items across Analysis/Reports
  sections, SVG icons via data URI, Drug Selection and Indications
  items have IDs for drawer open callbacks (Phase 4)
- app.py: Assembles header + sidebar + main content placeholder
- nhs.css: Added .sidebar__icon rule for img-based SVG icons
2026-02-06 13:13:03 +00:00
Andrew Charlwood 76549420a0 feat: add directorate card tree builder for drug browser drawer (Task 1.2) 2026-02-06 13:06:29 +00:00
Andrew Charlwood b71748fa7d feat: add shared pathway query functions for Dash data access (Task 1.1)
Extract load_data() and load_pathway_data() logic from Reflex AppState
into standalone functions in src/data_processing/pathway_queries.py.
Create thin dash_app/data/queries.py wrapper with DB_PATH resolution.
2026-02-06 13:02:34 +00:00
Andrew Charlwood 1c3ece6480 feat: create dash_app skeleton with nhs.css and MantineProvider (Phase 0)
- dash_app/ directory structure: app.py, assets/, data/, components/, callbacks/, utils/
- run_dash.py entry point at project root
- Added dash>=2.14.0 and dash-mantine-components>=0.14.0 to pyproject.toml
- app.py: Dash app with MantineProvider wrapper and 3 dcc.Store components
- nhs.css: extracted from 01_nhs_classic.html (sans mock icicle CSS)
- Validated: app starts cleanly at localhost:8050
2026-02-06 12:57:47 +00:00
Andrew Charlwood f3bba6dfab docs: complete Phase 4 validation — full refresh and data verification (Task 4.1-4.3)
Full refresh: 2,947 nodes (1,101 directory + 1,846 indication) in 738s.
Validation: RA/asthma drugs correctly grouped, fallback labels present,
directory charts unchanged, Reflex compiles. All completion criteria met.
2026-02-06 00:12:53 +00:00
Andrew Charlwood c6e426e36c fix: increase network timeout and batch size for GP lookup queries (Task 3.2)
Dry run test revealed GP lookup queries timing out at 30s (connection_timeout
in snowflake.toml). Increased to 600s. Also increased batch_size from 500 to
5000 — query time is ~40s regardless of batch size (CTE compilation overhead),
so larger batches reduce total time from ~50min to ~6min for 36K patients.

Dry run results: 91.8% GP match rate, 49.3% drug-indication match rate,
42,072 modified UPIDs, 1,846 pathway nodes across 6 date filters.
2026-02-05 23:55:12 +00:00
Andrew Charlwood 920570b437 feat: integrate drug-aware indication matching into refresh pipeline (Task 3.1)
Replace old per-patient indication matching in refresh_pathways.py with
drug-aware matching via assign_drug_indications(). Each drug is now
cross-referenced against both the patient's GP diagnoses AND the
DimSearchTerm.csv drug mapping. GP codes restricted to HCD data window
via earliest_hcd_date parameter.
2026-02-05 23:11:01 +00:00
Andrew Charlwood 408976e001 feat: add assign_drug_indications() for drug-aware indication matching (Task 2.1 + 2.2) 2026-02-05 23:05:40 +00:00
Andrew Charlwood c93417f0e7 feat: return ALL GP matches with code_frequency in get_patient_indication_groups (Task 1.1)
- Replace QUALIFY ROW_NUMBER()=1 with GROUP BY + COUNT(*) to return all matching
  Search_Terms per patient instead of just the most recent
- Add earliest_hcd_date parameter to restrict GP codes to HCD data window
- Return code_frequency column (count of matching SNOMED codes per Search_Term)
  for use as tiebreaker in drug-aware indication matching
- Update empty DataFrame returns to match new column format
2026-02-05 23:01:01 +00:00
Andrew Charlwood b0a8a9de1c feat: merge asthma Search_Term variants in CLUSTER_MAPPING_SQL and drug mapping (Task 1.2)
Merge 'allergic asthma' and 'severe persistent allergic asthma' into
canonical 'asthma' in both CLUSTER_MAPPING_SQL (Snowflake CTE) and
load_drug_indication_mapping() (DimSearchTerm.csv loader).

- CLUSTER_MAPPING_SQL: 3 Cluster_IDs (AST_COD, eFI2_Asthma, SEVAST_COD) now
  all map to Search_Term = 'asthma'
- Added SEARCH_TERM_MERGE_MAP constant for reusable normalization
- load_drug_indication_mapping() applies merge at CSV load time
- urticaria (XSAL_COD) stays separate — not merged with asthma
- Combined asthma drug list: BENRALIZUMAB, DUPILUMAB, INHALED, MEPOLIZUMAB,
  OMALIZUMAB, RESLIZUMAB
2026-02-05 22:56:29 +00:00
Andrew Charlwood 0779df78d1 feat: add drug-to-indication mapping from DimSearchTerm.csv (Task 1.2)
Add load_drug_indication_mapping() and get_search_terms_for_drug() to
diagnosis_lookup.py. Loads DimSearchTerm.csv to build bidirectional
lookup between drug name fragments and Search_Terms. Uses substring
matching for drug fragments (handles both exact names like ADALIMUMAB
and partial fragments like PEGYLATED). Handles duplicate Search_Terms
(e.g., diabetes appearing under two directorates) by combining fragments.
2026-02-05 22:48:09 +00:00
Andrew Charlwood 1c4d2c07ee docs: mark project complete - all tasks done, viewport testing blocked by env (Iteration 9) 2026-02-05 20:51:48 +00:00
Andrew Charlwood fed909481e docs: update CLAUDE.md with indication chart architecture and CLI docs (Task 5.2) 2026-02-05 20:50:01 +00:00
Andrew Charlwood 4884e0a8cc fix: recreate pathway_nodes with correct UNIQUE constraint and validate end-to-end (Task 5.1)
The UNIQUE constraint was UNIQUE(date_filter_id, ids) instead of
UNIQUE(date_filter_id, chart_type, ids), causing INSERT OR REPLACE
to overwrite directory chart root/trust nodes when indication nodes
were inserted. Dropped and recreated the table, re-ran full refresh.

Validation: both chart types have all hierarchy levels (0-5),
all 12 date filters produce valid icicle charts, KPIs correct.
2026-02-05 20:43:01 +00:00
Andrew Charlwood 6331d44165 fix: prevent DataFrame mutation in prepare_data() causing indication charts to fail
prepare_data() mapped Provider Code → Name in-place. When called for directory
charts first, then indication charts, the second call re-mapped already-mapped
values to NaN, silently dropping all data. Added df.copy() to prevent mutation.

Also fixes directory charts only generating data for the first date filter.

Results: 3,633 pathway nodes now generated (1,101 directory + 2,532 indication)
across all 12 datasets (6 date filters × 2 chart types).
2026-02-05 20:10:12 +00:00
Andrew Charlwood 6f88a59978 feat: add chart type toggle for Directory/Indication views (Task 4.1, 4.2, 4.3)
- Add selected_chart_type state variable and set_chart_type() handler
- Add chart_type filter to load_pathway_data() WHERE clause
- Create segmented control toggle component in filter strip
- Add dynamic hierarchy label (Directorate vs Indication)
- Update chart title to include chart type prefix
2026-02-05 19:39:45 +00:00
Andrew Charlwood 2deaa2f6da docs: mark Task 3.1 complete - indication pipeline verified (Task 3.1)
Pipeline test results:
- 695 indication pathway nodes generated for all_6mo filter
- 92.8% GP diagnosis match rate (34,006/36,628 patients)
- 139 unique Search_Terms found
- Top indications: drug misuse, influenza, diabetes, sepsis, cardiovascular disease
- Full pipeline completes in ~10 minutes

Phase 3 complete, Phase 4 (Reflex UI) ready to begin.
2026-02-05 18:44:34 +00:00
Andrew Charlwood 22222fe9ca fix: resolve Snowflake column casing and UPID mapping issues (Task 3.1)
Three issues identified and fixed during Task 3.1 testing:

1. Snowflake column name casing:
   - Unquoted columns in Snowflake are returned as UPPERCASE
   - Fixed by aliasing columns with quoted names: AS "Search_Term"
   - Now correctly populates 139 unique Search_Terms (was 0)

2. Duplicate UPID index error:
   - indication_df_for_chart could have duplicate UPIDs
   - Added drop_duplicates(subset=['UPID']) before set_index()
   - Keeps first occurrence (DIAGNOSIS over FALLBACK)

3. Missing UPIDs in indication lookup:
   - Old code: built indication_df from unique PseudoNHSNoLinked only
   - Problem: patients with multiple UPIDs (multi-provider) were missing
   - Fixed: now builds indication_df from ALL unique UPIDs in df
   - Also handles NaN values in Directory column safely

Validation results from test run:
- 36,628 patients queried
- 34,006 (92.8%) had GP diagnosis matches
- 139 unique Search_Terms found
- Top 5: drug misuse (8602), influenza (6239), diabetes (2476)

Still to verify: full pathway processing after these fixes.
2026-02-05 18:30:23 +00:00
Andrew Charlwood ad10b374cb feat: integrate Snowflake-direct indication lookup into CLI refresh (Task 1.2, 2.3)
Replace batch_lookup_indication_groups() with get_patient_indication_groups()
for indication chart processing. The new approach:

- Extracts unique PseudoNHSNoLinked values from HCD data
- Queries Snowflake directly using the cluster CTE
- Builds indication_df mapping UPID → Search_Term (matched) or Directory (fallback)
- Logs coverage statistics (diagnosis % vs fallback %)

This completes the integration of the new Snowflake-direct GP lookup approach.
2026-02-05 17:06:34 +00:00
Andrew Charlwood 1a817b8257 feat: add get_patient_indication_groups() for Snowflake-direct GP lookup (Task 1.1)
- Add CLUSTER_MAPPING_SQL constant embedding full snomed_indication_mapping_query.sql
- Add get_patient_indication_groups() function that queries Snowflake directly
- Uses QUALIFY ROW_NUMBER() to get most recent diagnosis per patient
- Returns DataFrame with PatientPseudonym, Search_Term, EventDateTime
- Handles edge cases: empty list, Snowflake unavailable
- Batch processing with configurable batch_size (default 500)
- Comprehensive logging for match statistics
2026-02-05 17:03:12 +00:00
Andrew Charlwood 5b1569ed5c fix: correct patient identifier for GP diagnosis lookup (Task 3.3)
Two critical fixes for the indication-based pathway feature:

1. clean_snomed_code() now handles scientific notation (e.g., "1.06e+16")
   - CSV export from pandas/Excel converts large SNOMED codes to scientific notation
   - Without this fix, codes like "10629311000119108" were stored as "1.06e+16"
   - Now properly converts to full integer strings

2. batch_lookup_indication_groups() now uses PseudoNHSNoLinked instead of PersonKey
   - PersonKey is LocalPatientID (provider-specific like "J188448")
   - PseudoNHSNoLinked is the pseudonymised NHS number that matches PatientPseudonym in GP records
   - Without this fix, 0% of patients matched GP records
   - Test shows ~20% match rate for ADALIMUMAB patients with correct identifier
2026-02-05 15:49:24 +00:00
Andrew Charlwood 8952156798 feat: integrate batch GP diagnosis lookup for indication charts (Task 3.2)
- Add batch_lookup_indication_groups() to diagnosis_lookup.py
  - Efficient batch Snowflake queries (500 patients per batch)
  - Returns UPID → Indication_Group mapping
  - Source tracking: DIAGNOSIS vs FALLBACK
- Update cli/refresh_pathways.py indication processing
  - Call batch_lookup_indication_groups() before chart generation
  - Build indication_df for process_indication_pathway_for_date_filter()
  - Log diagnosis coverage statistics
- Enables full --chart-type all functionality
2026-02-05 14:45:06 +00:00
Andrew Charlwood 593d14c70f feat: add chart_type argument to refresh command (Task 3.1)
- Add --chart-type argument with choices: directory, indication, all
- Update insert_pathway_records to include chart_type column
- Update refresh_pathways to process multiple chart types
- Update logging to show chart type counts
- Indication chart processing deferred to Task 3.2 (GP diagnosis integration)
2026-02-05 14:38:57 +00:00
Andrew Charlwood 7cbc648c6d feat: add indication pathway processing functions (Task 2.3)
- Add generate_icicle_chart_indication() to pathway_analyzer.py
  - Variant that uses indication_df instead of directory_df
  - Groups by Trust → Search_Term → Drug → Pathway
  - Accepts indication_df mapping UPID → Indication_Group

- Add process_indication_pathway_for_date_filter() to pathway_pipeline.py
  - Processes indication-based pathway for a single date filter
  - Uses generate_icicle_chart_indication() for hierarchy building

- Add extract_indication_fields() to pathway_pipeline.py
  - Extracts trust_name, search_term, drug_sequence from ids column
  - Similar to extract_denormalized_fields() but for indication charts

- Update convert_to_records() with chart_type parameter
  - Includes chart_type column in output records
  - Supports "directory" and "indication" values

- Add ChartType type alias (Literal["directory", "indication"])

- Update __all__ exports with new functions
2026-02-05 14:32:28 +00:00