HighCostDrugsDemo

Author	SHA1	Message	Date
Andrew Charlwood	f0505ee43e	feat: add trust selection to drawer with filter wiring (Task 5.1)	2026-02-06 14:09:36 +00:00
Andrew Charlwood	fe76e5a313	feat: add drawer callbacks for drug selection, fragment matching, and clear (Task 4.2)	2026-02-06 13:59:00 +00:00
Andrew Charlwood	5dc552f8c5	feat: add dmc.Drawer drug browser with directorate cards and drug chips (Task 4.1)	2026-02-06 13:51:24 +00:00
Andrew Charlwood	40ce7fc5f9	feat: add icicle chart rendering with NHS colorscale and dynamic titles (Task 3.4) - Add create_icicle_from_nodes() to src/visualization/plotly_generator.py accepting list-of-dicts from dcc.Store with NHS blue gradient colorscale, 10-field customdata, and matching text/hover templates from Reflex version - Add update_chart callback to dash_app/callbacks/chart.py rendering go.Icicle figure from chart-data store with dynamic subtitle - Title generation helper mirrors Reflex _generate_pathway_chart_title()	2026-02-06 13:44:13 +00:00
Andrew Charlwood	9c971c083b	feat: add KPI update callback with formatted patient/drug/cost display (Task 3.3)	2026-02-06 13:38:11 +00:00
Andrew Charlwood	ad9fa1cfec	feat: add pathway data loading callback bridging filters to chart-data (Task 3.2)	2026-02-06 13:33:31 +00:00
Andrew Charlwood	eda35c7168	feat: add reference data loading and filter state callbacks (Task 3.1)	2026-02-06 13:29:30 +00:00
Andrew Charlwood	3568e03fc2	feat: add footer component and complete Phase 2 static layout (Task 2.3)	2026-02-06 13:24:33 +00:00
Andrew Charlwood	307563bb31	feat: add KPI row, filter bar, and chart card components (Task 2.2)	2026-02-06 13:20:42 +00:00
Andrew Charlwood	bdc1690f0f	feat: add header and sidebar components for Dash layout (Task 2.1) - header.py: NHS branded top bar with logo, title, breadcrumb, data freshness indicators (record count + last updated with IDs for callback updates) - sidebar.py: Navigation with 7 items across Analysis/Reports sections, SVG icons via data URI, Drug Selection and Indications items have IDs for drawer open callbacks (Phase 4) - app.py: Assembles header + sidebar + main content placeholder - nhs.css: Added .sidebar__icon rule for img-based SVG icons	2026-02-06 13:13:03 +00:00
Andrew Charlwood	76549420a0	feat: add directorate card tree builder for drug browser drawer (Task 1.2)	2026-02-06 13:06:29 +00:00
Andrew Charlwood	b71748fa7d	feat: add shared pathway query functions for Dash data access (Task 1.1) Extract load_data() and load_pathway_data() logic from Reflex AppState into standalone functions in src/data_processing/pathway_queries.py. Create thin dash_app/data/queries.py wrapper with DB_PATH resolution.	2026-02-06 13:02:34 +00:00
Andrew Charlwood	1c3ece6480	feat: create dash_app skeleton with nhs.css and MantineProvider (Phase 0) - dash_app/ directory structure: app.py, assets/, data/, components/, callbacks/, utils/ - run_dash.py entry point at project root - Added dash>=2.14.0 and dash-mantine-components>=0.14.0 to pyproject.toml - app.py: Dash app with MantineProvider wrapper and 3 dcc.Store components - nhs.css: extracted from 01_nhs_classic.html (sans mock icicle CSS) - Validated: app starts cleanly at localhost:8050	2026-02-06 12:57:47 +00:00
Andrew Charlwood	f3bba6dfab	docs: complete Phase 4 validation — full refresh and data verification (Task 4.1-4.3) Full refresh: 2,947 nodes (1,101 directory + 1,846 indication) in 738s. Validation: RA/asthma drugs correctly grouped, fallback labels present, directory charts unchanged, Reflex compiles. All completion criteria met.	2026-02-06 00:12:53 +00:00
Andrew Charlwood	c6e426e36c	fix: increase network timeout and batch size for GP lookup queries (Task 3.2) Dry run test revealed GP lookup queries timing out at 30s (connection_timeout in snowflake.toml). Increased to 600s. Also increased batch_size from 500 to 5000 — query time is ~40s regardless of batch size (CTE compilation overhead), so larger batches reduce total time from ~50min to ~6min for 36K patients. Dry run results: 91.8% GP match rate, 49.3% drug-indication match rate, 42,072 modified UPIDs, 1,846 pathway nodes across 6 date filters.	2026-02-05 23:55:12 +00:00
Andrew Charlwood	920570b437	feat: integrate drug-aware indication matching into refresh pipeline (Task 3.1) Replace old per-patient indication matching in refresh_pathways.py with drug-aware matching via assign_drug_indications(). Each drug is now cross-referenced against both the patient's GP diagnoses AND the DimSearchTerm.csv drug mapping. GP codes restricted to HCD data window via earliest_hcd_date parameter.	2026-02-05 23:11:01 +00:00
Andrew Charlwood	408976e001	feat: add assign_drug_indications() for drug-aware indication matching (Task 2.1 + 2.2)	2026-02-05 23:05:40 +00:00
Andrew Charlwood	c93417f0e7	feat: return ALL GP matches with code_frequency in get_patient_indication_groups (Task 1.1) - Replace QUALIFY ROW_NUMBER()=1 with GROUP BY + COUNT(*) to return all matching Search_Terms per patient instead of just the most recent - Add earliest_hcd_date parameter to restrict GP codes to HCD data window - Return code_frequency column (count of matching SNOMED codes per Search_Term) for use as tiebreaker in drug-aware indication matching - Update empty DataFrame returns to match new column format	2026-02-05 23:01:01 +00:00
Andrew Charlwood	b0a8a9de1c	feat: merge asthma Search_Term variants in CLUSTER_MAPPING_SQL and drug mapping (Task 1.2) Merge 'allergic asthma' and 'severe persistent allergic asthma' into canonical 'asthma' in both CLUSTER_MAPPING_SQL (Snowflake CTE) and load_drug_indication_mapping() (DimSearchTerm.csv loader). - CLUSTER_MAPPING_SQL: 3 Cluster_IDs (AST_COD, eFI2_Asthma, SEVAST_COD) now all map to Search_Term = 'asthma' - Added SEARCH_TERM_MERGE_MAP constant for reusable normalization - load_drug_indication_mapping() applies merge at CSV load time - urticaria (XSAL_COD) stays separate — not merged with asthma - Combined asthma drug list: BENRALIZUMAB, DUPILUMAB, INHALED, MEPOLIZUMAB, OMALIZUMAB, RESLIZUMAB	2026-02-05 22:56:29 +00:00
Andrew Charlwood	0779df78d1	feat: add drug-to-indication mapping from DimSearchTerm.csv (Task 1.2) Add load_drug_indication_mapping() and get_search_terms_for_drug() to diagnosis_lookup.py. Loads DimSearchTerm.csv to build bidirectional lookup between drug name fragments and Search_Terms. Uses substring matching for drug fragments (handles both exact names like ADALIMUMAB and partial fragments like PEGYLATED). Handles duplicate Search_Terms (e.g., diabetes appearing under two directorates) by combining fragments.	2026-02-05 22:48:09 +00:00
Andrew Charlwood	1c4d2c07ee	docs: mark project complete - all tasks done, viewport testing blocked by env (Iteration 9)	2026-02-05 20:51:48 +00:00
Andrew Charlwood	fed909481e	docs: update CLAUDE.md with indication chart architecture and CLI docs (Task 5.2)	2026-02-05 20:50:01 +00:00
Andrew Charlwood	4884e0a8cc	fix: recreate pathway_nodes with correct UNIQUE constraint and validate end-to-end (Task 5.1) The UNIQUE constraint was UNIQUE(date_filter_id, ids) instead of UNIQUE(date_filter_id, chart_type, ids), causing INSERT OR REPLACE to overwrite directory chart root/trust nodes when indication nodes were inserted. Dropped and recreated the table, re-ran full refresh. Validation: both chart types have all hierarchy levels (0-5), all 12 date filters produce valid icicle charts, KPIs correct.	2026-02-05 20:43:01 +00:00
Andrew Charlwood	6331d44165	fix: prevent DataFrame mutation in prepare_data() causing indication charts to fail prepare_data() mapped Provider Code → Name in-place. When called for directory charts first, then indication charts, the second call re-mapped already-mapped values to NaN, silently dropping all data. Added df.copy() to prevent mutation. Also fixes directory charts only generating data for the first date filter. Results: 3,633 pathway nodes now generated (1,101 directory + 2,532 indication) across all 12 datasets (6 date filters × 2 chart types).	2026-02-05 20:10:12 +00:00
Andrew Charlwood	6f88a59978	feat: add chart type toggle for Directory/Indication views (Task 4.1, 4.2, 4.3) - Add selected_chart_type state variable and set_chart_type() handler - Add chart_type filter to load_pathway_data() WHERE clause - Create segmented control toggle component in filter strip - Add dynamic hierarchy label (Directorate vs Indication) - Update chart title to include chart type prefix	2026-02-05 19:39:45 +00:00
Andrew Charlwood	2deaa2f6da	docs: mark Task 3.1 complete - indication pipeline verified (Task 3.1) Pipeline test results: - 695 indication pathway nodes generated for all_6mo filter - 92.8% GP diagnosis match rate (34,006/36,628 patients) - 139 unique Search_Terms found - Top indications: drug misuse, influenza, diabetes, sepsis, cardiovascular disease - Full pipeline completes in ~10 minutes Phase 3 complete, Phase 4 (Reflex UI) ready to begin.	2026-02-05 18:44:34 +00:00
Andrew Charlwood	22222fe9ca	fix: resolve Snowflake column casing and UPID mapping issues (Task 3.1) Three issues identified and fixed during Task 3.1 testing: 1. Snowflake column name casing: - Unquoted columns in Snowflake are returned as UPPERCASE - Fixed by aliasing columns with quoted names: AS "Search_Term" - Now correctly populates 139 unique Search_Terms (was 0) 2. Duplicate UPID index error: - indication_df_for_chart could have duplicate UPIDs - Added drop_duplicates(subset=['UPID']) before set_index() - Keeps first occurrence (DIAGNOSIS over FALLBACK) 3. Missing UPIDs in indication lookup: - Old code: built indication_df from unique PseudoNHSNoLinked only - Problem: patients with multiple UPIDs (multi-provider) were missing - Fixed: now builds indication_df from ALL unique UPIDs in df - Also handles NaN values in Directory column safely Validation results from test run: - 36,628 patients queried - 34,006 (92.8%) had GP diagnosis matches - 139 unique Search_Terms found - Top 5: drug misuse (8602), influenza (6239), diabetes (2476) Still to verify: full pathway processing after these fixes.	2026-02-05 18:30:23 +00:00
Andrew Charlwood	ad10b374cb	feat: integrate Snowflake-direct indication lookup into CLI refresh (Task 1.2, 2.3) Replace batch_lookup_indication_groups() with get_patient_indication_groups() for indication chart processing. The new approach: - Extracts unique PseudoNHSNoLinked values from HCD data - Queries Snowflake directly using the cluster CTE - Builds indication_df mapping UPID → Search_Term (matched) or Directory (fallback) - Logs coverage statistics (diagnosis % vs fallback %) This completes the integration of the new Snowflake-direct GP lookup approach.	2026-02-05 17:06:34 +00:00
Andrew Charlwood	1a817b8257	feat: add get_patient_indication_groups() for Snowflake-direct GP lookup (Task 1.1) - Add CLUSTER_MAPPING_SQL constant embedding full snomed_indication_mapping_query.sql - Add get_patient_indication_groups() function that queries Snowflake directly - Uses QUALIFY ROW_NUMBER() to get most recent diagnosis per patient - Returns DataFrame with PatientPseudonym, Search_Term, EventDateTime - Handles edge cases: empty list, Snowflake unavailable - Batch processing with configurable batch_size (default 500) - Comprehensive logging for match statistics	2026-02-05 17:03:12 +00:00
Andrew Charlwood	5b1569ed5c	fix: correct patient identifier for GP diagnosis lookup (Task 3.3) Two critical fixes for the indication-based pathway feature: 1. clean_snomed_code() now handles scientific notation (e.g., "1.06e+16") - CSV export from pandas/Excel converts large SNOMED codes to scientific notation - Without this fix, codes like "10629311000119108" were stored as "1.06e+16" - Now properly converts to full integer strings 2. batch_lookup_indication_groups() now uses PseudoNHSNoLinked instead of PersonKey - PersonKey is LocalPatientID (provider-specific like "J188448") - PseudoNHSNoLinked is the pseudonymised NHS number that matches PatientPseudonym in GP records - Without this fix, 0% of patients matched GP records - Test shows ~20% match rate for ADALIMUMAB patients with correct identifier	2026-02-05 15:49:24 +00:00
Andrew Charlwood	8952156798	feat: integrate batch GP diagnosis lookup for indication charts (Task 3.2) - Add batch_lookup_indication_groups() to diagnosis_lookup.py - Efficient batch Snowflake queries (500 patients per batch) - Returns UPID → Indication_Group mapping - Source tracking: DIAGNOSIS vs FALLBACK - Update cli/refresh_pathways.py indication processing - Call batch_lookup_indication_groups() before chart generation - Build indication_df for process_indication_pathway_for_date_filter() - Log diagnosis coverage statistics - Enables full --chart-type all functionality	2026-02-05 14:45:06 +00:00
Andrew Charlwood	593d14c70f	feat: add chart_type argument to refresh command (Task 3.1) - Add --chart-type argument with choices: directory, indication, all - Update insert_pathway_records to include chart_type column - Update refresh_pathways to process multiple chart types - Update logging to show chart type counts - Indication chart processing deferred to Task 3.2 (GP diagnosis integration)	2026-02-05 14:38:57 +00:00
Andrew Charlwood	7cbc648c6d	feat: add indication pathway processing functions (Task 2.3) - Add generate_icicle_chart_indication() to pathway_analyzer.py - Variant that uses indication_df instead of directory_df - Groups by Trust → Search_Term → Drug → Pathway - Accepts indication_df mapping UPID → Indication_Group - Add process_indication_pathway_for_date_filter() to pathway_pipeline.py - Processes indication-based pathway for a single date filter - Uses generate_icicle_chart_indication() for hierarchy building - Add extract_indication_fields() to pathway_pipeline.py - Extracts trust_name, search_term, drug_sequence from ids column - Similar to extract_denormalized_fields() but for indication charts - Update convert_to_records() with chart_type parameter - Includes chart_type column in output records - Supports "directory" and "indication" values - Add ChartType type alias (Literal["directory", "indication"]) - Update __all__ exports with new functions	2026-02-05 14:32:28 +00:00
Andrew Charlwood	19607d72b0	feat: add chart_type column to pathway_nodes schema (Task 2.2) - Add chart_type column (TEXT NOT NULL DEFAULT 'directory') - Update UNIQUE constraint to (date_filter_id, chart_type, ids) - Add idx_pathway_nodes_chart_type index for filtering - Add migrate_pathway_nodes_chart_type() function for existing databases - Update initialize_database() to run migration automatically - Existing rows default to 'directory' chart type	2026-02-05 14:24:57 +00:00
Andrew Charlwood	506769470d	feat: add get_directorate_from_diagnosis() function (Task 2.1) - Added DirectorateAssignment dataclass for return type - Added get_directorate_from_diagnosis() function to diagnosis_lookup.py - Logic: Try diagnosis-based lookup first (direct SNOMED match) - Returns FALLBACK source if no match found, letting caller handle fallback - Extracts PatientPseudonym from UPID (last part after provider code) - Updated __all__ exports with new dataclass and function - Tested: function handles no-match cases correctly	2026-02-05 14:19:18 +00:00
Andrew Charlwood	b44d22de2c	feat: add direct SNOMED lookup functions (Task 1.3) Add two new functions to diagnosis_lookup.py for direct SNOMED code matching: - get_drug_snomed_codes(drug_name): Query ref_drug_snomed_mapping for all SNOMED codes mapped to a drug. Returns list of DrugSnomedMapping with snomed_code, snomed_description, search_term, primary_directorate. Tested: ADALIMUMAB returns 1320 mappings across 10 Search_Terms. - patient_has_indication_direct(patient_pseudonym, mappings, connector): Query PrimaryCareClinicalCoding for exact SNOMED code matches. Returns most recent match by EventDateTime with DirectSnomedMatchResult. Both functions follow existing patterns in the module and are exported in __all__. The lookup is case-insensitive for drug names.	2026-02-05 14:14:55 +00:00
Andrew Charlwood	6d68b5eaa5	feat: add SNOMED mapping loader script (Task 1.2) - Create data_processing/load_snomed_mapping.py with: - migrate_drug_snomed_mapping() for CSV to SQLite migration - get_drug_snomed_mapping_counts() for statistics - verify_drug_snomed_mapping_migration() for validation - clean_snomed_code() to remove trailing .0 from SNOMED codes - CLI interface: python -m data_processing.load_snomed_mapping - Loaded 144,056 mappings from enriched CSV: - 707 unique drugs - 187 unique search terms - 21,265 unique SNOMED codes	2026-02-05 14:10:36 +00:00
Andrew Charlwood	9943e85761	feat: add ref_drug_snomed_mapping schema (Task 1.1) - Add REF_DRUG_SNOMED_MAPPING_SCHEMA with 11 columns for direct SNOMED mapping - Add 5 indexes for lookup performance (drug, cleaned_drug, snomed, search_term, composite) - Add create_drug_snomed_mapping_table() helper function - Update helper functions (drop, get_counts, verify_exists) to include new table - Table is included in REFERENCE_TABLES_SCHEMA and created by migration	2026-02-05 14:06:31 +00:00
Andrew Charlwood	fa72fb3098	docs: mark all tasks complete in IMPLEMENTATION_PLAN.md	2026-02-05 02:17:17 +00:00
Andrew Charlwood	9b466b4e6c	feat: add hover/focus states and clean up unused styles (Task 5.6) - Add subtle hover states to KPI badges, dropdown triggers, tabs - Add consistent focus rings for accessibility (2px Pale Blue) - Update button styles with focus/active states - Clean up unused styles: compact_kpi_* (Option B), unused imports - All interactive elements now have appropriate hover/focus feedback	2026-02-05 02:16:01 +00:00
Andrew Charlwood	754e98dbe5	feat: refine top bar with style helpers (Task 5.5) - Use top_bar_style() for 48px height container - Use logo_style() for 28px height logo (was 36px) - Use top_bar_tab_style() for 28px height pills - Simplify data freshness to single line - Remove max_width constraint for full-width bar - Lighter shadow (SM instead of MD)	2026-02-05 02:08:01 +00:00
Andrew Charlwood	ef2a109528	feat: full-width responsive chart layout (Task 5.4) - Remove PAGE_MAX_WIDTH constraint from main_content() - Update chart_display() with calc(100vh - 152px) height - Update icicle_figure with autosize=True and reduced margins - Update chart_section() with flex layout for height fill - Update page_layout() with 100vh height	2026-02-05 02:03:55 +00:00
Andrew Charlwood	826dd1c022	feat: compact KPI badges integrated into filter strip (Task 5.3) - Add kpi_badge() and kpi_badges() functions for inline pill-style KPIs - Integrate KPI badges into filter_section() on the right side - Remove separate kpi_row() from main_content() layout - Zero extra vertical height - KPIs now share the filter strip row Design: Follows Option A from DESIGN_SYSTEM.md (preferred approach)	2026-02-05 01:59:00 +00:00
Andrew Charlwood	d2bed71078	feat: compact filter section as single horizontal strip (Task 5.2) - Redesign filter_section() as 48px horizontal strip - Remove "Filters" header (saves vertical space) - Compact initiated_filter_dropdown() and last_seen_filter_dropdown() - 32px height triggers via compact_dropdown_trigger_style() - Labels moved inside dropdown panels - Compact searchable_dropdown() component - 32px trigger height, no external label - Reduced panel item height (150px max, was 200px) - Smaller search input (size="1"), tighter spacing - All filters now in ONE row with divider separator Target: filter section height ≤ 60px (from ~200px)	2026-02-05 01:53:38 +00:00
Andrew Charlwood	0a68c2a5a5	feat: update design tokens for SaaS redesign (Task 5.1) - Typography: Reduce sizes (Display 32→28, H1 24→18, H2 20→16, Caption 12→11) - Spacing: Tighten scale by ~25% (SM 8→6, MD 12→8, LG 16→12, etc.) - Shadows: Lighter values for modern feel - Colors: Modernize semantic colors (#10B981 success, #EF4444 error) - Layout: TOP_BAR_HEIGHT 64→48px, new FILTER_STRIP_HEIGHT 48px New style helpers added: - compact_kpi_card_style/value/label - 50% smaller KPI cards - kpi_badge_style - inline pill variant for zero-height KPIs - filter_strip_style - horizontal single-row container - compact_dropdown_trigger_style - 32px height triggers - chart_container_style/wrapper - full-width flex-grow - top_bar_style/tab/logo - compact 48px top bar All tokens verified via import and Reflex compile.	2026-02-05 01:46:58 +00:00
Andrew Charlwood	76e0d64820	docs: complete Task 4.3 Documentation Update CLAUDE.md with new pathway data architecture: - Add Pathway Data Architecture section with date filter table - Update package structure with cli/ and pathway_pipeline.py - Add CLI module and pathway pipeline documentation - Update data flow diagrams (pre-computed vs legacy) - Add pathway tables to database schema section - Add CLI commands section with usage examples - Add Breaking Changes section documenting: - Date filter changes (pickers -> dropdowns) - Data refresh model (real-time -> pre-computed) - State variable changes - Icicle chart enhancements Mark all Task 4.3 subtasks complete in IMPLEMENTATION_PLAN.md Update completion criteria status	2026-02-05 00:56:34 +00:00
Andrew Charlwood	870d2e6e0e	feat: complete Task 4.2 Performance Testing - all targets met	2026-02-05 00:50:14 +00:00
Andrew Charlwood	cabaa72e9d	feat: complete Task 4.1 End-to-End Validation All 5 validation tests pass: - Hierarchy structure: 6 levels (Root→Trust→Directory→Drug→Pathway) - Patient counts: 11,118 patients, £130.5M from root node - Treatment statistics: average_spacing, cost_pp_pa populated - Drug filtering: drug_sequence column for LIKE patterns - Customdata: All 10 fields present and populated	2026-02-05 00:43:55 +00:00
Andrew Charlwood	a6f1d8b30e	feat: replace date pickers with select dropdowns (Task 3.3) - Created initiated_filter_dropdown() and last_seen_filter_dropdown() components - Uses rx.select.root pattern with static options for Reflex compatibility - Updated filter_section() to use new dropdown components - Removed old date_range_picker() function (replaced by new dropdowns) - Data freshness indicator already working in top_bar via load_pathway_data() - Verified: py_compile PASS, imports PASS, reflex compile PASS (11.1s)	2026-02-05 00:37:18 +00:00
Andrew Charlwood	ced994f93f	feat: update icicle_figure with full 10-field customdata (Task 3.2) Updated the icicle_figure computed property in AppState to use the full 10-field customdata structure matching visualization/plotly_generator.py: - value (patient count) - colour (proportion of parent) - cost (total cost) - costpp (cost per patient) - first_seen (first intervention date) - last_seen (last intervention date) - first_seen_parent (earliest date in parent) - last_seen_parent (latest date in parent) - average_spacing (dosing information) - cost_pp_pa (cost per patient per annum) Updated texttemplate and hovertemplate to display treatment statistics including duration, dosing, and full cost breakdown.	2026-02-05 00:30:22 +00:00

1 2

74 Commits