The UNIQUE constraint was UNIQUE(date_filter_id, ids) instead of
UNIQUE(date_filter_id, chart_type, ids), causing INSERT OR REPLACE
to overwrite directory chart root/trust nodes when indication nodes
were inserted. Dropped and recreated the table, re-ran full refresh.
Validation: both chart types have all hierarchy levels (0-5),
all 12 date filters produce valid icicle charts, KPIs correct.
prepare_data() mapped Provider Code → Name in-place. When called for directory
charts first, then indication charts, the second call re-mapped already-mapped
values to NaN, silently dropping all data. Added df.copy() to prevent mutation.
Also fixes directory charts only generating data for the first date filter.
Results: 3,633 pathway nodes now generated (1,101 directory + 2,532 indication)
across all 12 datasets (6 date filters × 2 chart types).
- Add selected_chart_type state variable and set_chart_type() handler
- Add chart_type filter to load_pathway_data() WHERE clause
- Create segmented control toggle component in filter strip
- Add dynamic hierarchy label (Directorate vs Indication)
- Update chart title to include chart type prefix
Three issues identified and fixed during Task 3.1 testing:
1. Snowflake column name casing:
- Unquoted columns in Snowflake are returned as UPPERCASE
- Fixed by aliasing columns with quoted names: AS "Search_Term"
- Now correctly populates 139 unique Search_Terms (was 0)
2. Duplicate UPID index error:
- indication_df_for_chart could have duplicate UPIDs
- Added drop_duplicates(subset=['UPID']) before set_index()
- Keeps first occurrence (DIAGNOSIS over FALLBACK)
3. Missing UPIDs in indication lookup:
- Old code: built indication_df from unique PseudoNHSNoLinked only
- Problem: patients with multiple UPIDs (multi-provider) were missing
- Fixed: now builds indication_df from ALL unique UPIDs in df
- Also handles NaN values in Directory column safely
Validation results from test run:
- 36,628 patients queried
- 34,006 (92.8%) had GP diagnosis matches
- 139 unique Search_Terms found
- Top 5: drug misuse (8602), influenza (6239), diabetes (2476)
Still to verify: full pathway processing after these fixes.
Replace batch_lookup_indication_groups() with get_patient_indication_groups()
for indication chart processing. The new approach:
- Extracts unique PseudoNHSNoLinked values from HCD data
- Queries Snowflake directly using the cluster CTE
- Builds indication_df mapping UPID → Search_Term (matched) or Directory (fallback)
- Logs coverage statistics (diagnosis % vs fallback %)
This completes the integration of the new Snowflake-direct GP lookup approach.
- Add CLUSTER_MAPPING_SQL constant embedding full snomed_indication_mapping_query.sql
- Add get_patient_indication_groups() function that queries Snowflake directly
- Uses QUALIFY ROW_NUMBER() to get most recent diagnosis per patient
- Returns DataFrame with PatientPseudonym, Search_Term, EventDateTime
- Handles edge cases: empty list, Snowflake unavailable
- Batch processing with configurable batch_size (default 500)
- Comprehensive logging for match statistics
Two critical fixes for the indication-based pathway feature:
1. clean_snomed_code() now handles scientific notation (e.g., "1.06e+16")
- CSV export from pandas/Excel converts large SNOMED codes to scientific notation
- Without this fix, codes like "10629311000119108" were stored as "1.06e+16"
- Now properly converts to full integer strings
2. batch_lookup_indication_groups() now uses PseudoNHSNoLinked instead of PersonKey
- PersonKey is LocalPatientID (provider-specific like "J188448")
- PseudoNHSNoLinked is the pseudonymised NHS number that matches PatientPseudonym in GP records
- Without this fix, 0% of patients matched GP records
- Test shows ~20% match rate for ADALIMUMAB patients with correct identifier
- Add --chart-type argument with choices: directory, indication, all
- Update insert_pathway_records to include chart_type column
- Update refresh_pathways to process multiple chart types
- Update logging to show chart type counts
- Indication chart processing deferred to Task 3.2 (GP diagnosis integration)
- Add generate_icicle_chart_indication() to pathway_analyzer.py
- Variant that uses indication_df instead of directory_df
- Groups by Trust → Search_Term → Drug → Pathway
- Accepts indication_df mapping UPID → Indication_Group
- Add process_indication_pathway_for_date_filter() to pathway_pipeline.py
- Processes indication-based pathway for a single date filter
- Uses generate_icicle_chart_indication() for hierarchy building
- Add extract_indication_fields() to pathway_pipeline.py
- Extracts trust_name, search_term, drug_sequence from ids column
- Similar to extract_denormalized_fields() but for indication charts
- Update convert_to_records() with chart_type parameter
- Includes chart_type column in output records
- Supports "directory" and "indication" values
- Add ChartType type alias (Literal["directory", "indication"])
- Update __all__ exports with new functions
- Add chart_type column (TEXT NOT NULL DEFAULT 'directory')
- Update UNIQUE constraint to (date_filter_id, chart_type, ids)
- Add idx_pathway_nodes_chart_type index for filtering
- Add migrate_pathway_nodes_chart_type() function for existing databases
- Update initialize_database() to run migration automatically
- Existing rows default to 'directory' chart type
- Added DirectorateAssignment dataclass for return type
- Added get_directorate_from_diagnosis() function to diagnosis_lookup.py
- Logic: Try diagnosis-based lookup first (direct SNOMED match)
- Returns FALLBACK source if no match found, letting caller handle fallback
- Extracts PatientPseudonym from UPID (last part after provider code)
- Updated __all__ exports with new dataclass and function
- Tested: function handles no-match cases correctly
Add two new functions to diagnosis_lookup.py for direct SNOMED code matching:
- get_drug_snomed_codes(drug_name): Query ref_drug_snomed_mapping for all
SNOMED codes mapped to a drug. Returns list of DrugSnomedMapping with
snomed_code, snomed_description, search_term, primary_directorate.
Tested: ADALIMUMAB returns 1320 mappings across 10 Search_Terms.
- patient_has_indication_direct(patient_pseudonym, mappings, connector):
Query PrimaryCareClinicalCoding for exact SNOMED code matches.
Returns most recent match by EventDateTime with DirectSnomedMatchResult.
Both functions follow existing patterns in the module and are exported
in __all__. The lookup is case-insensitive for drug names.
- Add REF_DRUG_SNOMED_MAPPING_SCHEMA with 11 columns for direct SNOMED mapping
- Add 5 indexes for lookup performance (drug, cleaned_drug, snomed, search_term, composite)
- Add create_drug_snomed_mapping_table() helper function
- Update helper functions (drop, get_counts, verify_exists) to include new table
- Table is included in REFERENCE_TABLES_SCHEMA and created by migration
- Add subtle hover states to KPI badges, dropdown triggers, tabs
- Add consistent focus rings for accessibility (2px Pale Blue)
- Update button styles with focus/active states
- Clean up unused styles: compact_kpi_* (Option B), unused imports
- All interactive elements now have appropriate hover/focus feedback
- Use top_bar_style() for 48px height container
- Use logo_style() for 28px height logo (was 36px)
- Use top_bar_tab_style() for 28px height pills
- Simplify data freshness to single line
- Remove max_width constraint for full-width bar
- Lighter shadow (SM instead of MD)
- Remove PAGE_MAX_WIDTH constraint from main_content()
- Update chart_display() with calc(100vh - 152px) height
- Update icicle_figure with autosize=True and reduced margins
- Update chart_section() with flex layout for height fill
- Update page_layout() with 100vh height
- Add kpi_badge() and kpi_badges() functions for inline pill-style KPIs
- Integrate KPI badges into filter_section() on the right side
- Remove separate kpi_row() from main_content() layout
- Zero extra vertical height - KPIs now share the filter strip row
Design: Follows Option A from DESIGN_SYSTEM.md (preferred approach)
Update CLAUDE.md with new pathway data architecture:
- Add Pathway Data Architecture section with date filter table
- Update package structure with cli/ and pathway_pipeline.py
- Add CLI module and pathway pipeline documentation
- Update data flow diagrams (pre-computed vs legacy)
- Add pathway tables to database schema section
- Add CLI commands section with usage examples
- Add Breaking Changes section documenting:
- Date filter changes (pickers -> dropdowns)
- Data refresh model (real-time -> pre-computed)
- State variable changes
- Icicle chart enhancements
Mark all Task 4.3 subtasks complete in IMPLEMENTATION_PLAN.md
Update completion criteria status
- Created initiated_filter_dropdown() and last_seen_filter_dropdown() components
- Uses rx.select.root pattern with static options for Reflex compatibility
- Updated filter_section() to use new dropdown components
- Removed old date_range_picker() function (replaced by new dropdowns)
- Data freshness indicator already working in top_bar via load_pathway_data()
- Verified: py_compile PASS, imports PASS, reflex compile PASS (11.1s)
Updated the icicle_figure computed property in AppState to use the full
10-field customdata structure matching visualization/plotly_generator.py:
- value (patient count)
- colour (proportion of parent)
- cost (total cost)
- costpp (cost per patient)
- first_seen (first intervention date)
- last_seen (last intervention date)
- first_seen_parent (earliest date in parent)
- last_seen_parent (latest date in parent)
- average_spacing (dosing information)
- cost_pp_pa (cost per patient per annum)
Updated texttemplate and hovertemplate to display treatment statistics
including duration, dosing, and full cost breakdown.
- Add dropdown state for date filters (selected_initiated, selected_last_seen)
- Add date_filter_id computed property combining the two selections
- Add load_pathway_data() method to query pathway_nodes table
- Add recalculate_parent_totals() for filtered hierarchies
- Update all filter handlers to call load_pathway_data()
- Update KPI calculations from root node data
Phase 3 Reflex integration: Task 3.1 complete
Tested full refresh pipeline end-to-end with real Snowflake data:
- Fixed trust filter to read Name column from defaultTrusts.csv
- Fixed Decimal type handling in calculate_cost_per_patient_per_annum
- Fixed array handling in convert_to_records for average_administered
- Added required reference CSV files to data/ directory
- Configured Snowflake connection (account, warehouse, user)
Results:
- Snowflake fetch: 656,695 records in ~7s
- Transformations: 519,848 records after UPID/drug/directory
- Pathway nodes: 293 for all_6mo (8 trusts, 14 directories)
- Total processing time: ~6.2 minutes
Add cli/refresh_pathways.py with:
- refresh_pathways() main function for full pipeline orchestration
- insert_pathway_records() for SQLite insertion
- log_refresh_start/complete/failed() for refresh tracking
- CLI with --minimum-patients, --provider-codes, --dry-run, --verbose
Uses existing pipeline functions:
- fetch_and_transform_data() from pathway_pipeline.py
- process_all_date_filters() for 6 date filter combinations
- Schema helpers from data_processing/schema.py
Task 1.3 (Create Migration Script) is satisfied by existing code:
- python -m data_processing.migrate creates all pathway tables
- pathway_date_filters auto-populated via INSERT OR REPLACE in schema
- Verified: fresh database creates all 3 tables with 6 date filters
Create data_processing/pathway_pipeline.py with:
- DateFilterConfig dataclass for date filter configuration
- DATE_FILTER_CONFIGS with 6 pre-defined combinations
- compute_date_ranges() for computing actual dates from config
- fetch_and_transform_data() for Snowflake fetch + transformations
- process_pathway_for_date_filter() using existing generate_icicle_chart()
- extract_denormalized_fields() to parse trust/directory/drugs from ids
- convert_to_records() for SQLite insertion
- process_all_date_filters() convenience function
Add three new tables to support pre-computed pathway data:
- pathway_date_filters: 6 pre-defined date filter combinations
- pathway_nodes: pre-computed pathway hierarchy with all visualization data
- pathway_refresh_log: tracks data refresh status
Includes:
- 8 indexes for efficient filtering by date_filter_id, trust, directory, drug
- Helper functions: create/drop/verify/get_counts for pathway tables
- clear_pathway_nodes() for selective or full data clearing
- get_pathway_refresh_status() for checking last refresh
- Integration with existing ALL_TABLES_SCHEMA and combined helpers
- Verified all design tokens match spec exactly
- Confirmed responsive behavior via flex_wrap patterns
- Audited hover states and transitions
- Validated chart colorscale uses design system palette
- Update pathways_app/__init__.py to re-export app from app_v2
- Verified reflex run compiles 33/33 components successfully
- App runs on localhost:3003 (frontend) and :8002 (backend)
- Mark completion criteria "App compiles" as verified
- Replace chart_ready_placeholder() with chart_display() function
- chart_display() wraps rx.plotly() with AppState.icicle_figure
- Chart updates reactively when filters change via computed property
- Loading, error, and empty states already handled in chart_section()
- Add prepare_chart_data() method for hierarchical chart data
- Build Trust → Directory → Drug hierarchy from filtered SQLite data
- Calculate patient counts and costs at each hierarchy level
- Compute color values (proportions) for visualization
- Generate dynamic chart title based on filter state
- Call prepare_chart_data() from apply_filters() for reactivity
- Mark Task 3.4 complete (KPIs implemented in apply_filters)
- Add apply_filters() method that queries SQLite with current filter state
- Handle initiated date filter (first intervention date range)
- Handle last seen date filter (last intervention date range)
- Handle drug and directorate multi-select filters
- Use CTE pattern for efficient patient-level date filtering
- Update KPI values (unique_patients, total_drugs, total_cost) on filter change
- Call apply_filters() from all filter event handlers
- Call apply_filters() after initial data load
- Add load_data() method to AppState that connects to SQLite database
- Populate available_drugs, available_directorates, available_indications from DB
- Detect latest date in dataset and set filter defaults accordingly
- Load KPI values: total_records, unique_patients, total_drugs, total_cost
- Add on_load handler to trigger data loading on page initialization
- Handle database errors gracefully with meaningful error messages
- Add datetime imports for date handling
- Add data state variables: last_updated, raw_data, latest_date_in_data
- Set last_seen date defaults (6 months ago to today)
- Add last_updated_display computed var for top bar
- Update top bar to show dynamic refresh timestamp
- Add chart_loading_skeleton() with animated bar chart and spinner
- Add chart_error_state() for displaying errors with guidance
- Add chart_empty_state() for when filters yield no results
- Add chart_ready_placeholder() for Phase 4 Plotly integration
- Rewrite chart_section() with 4-state rx.cond() logic
- Fix icon names (triangle-alert) and color references (SLATE_500)
This completes Phase 2 Layout Components.
- Create kpi_card() component with icon, value, label, and highlight option
- Create kpi_row() with 4 KPI cards: Unique Patients, Drug Types, Total Cost, Indication Match
- Add computed vars for formatted KPI display values
- Add placeholder KPI state variables (unique_patients, total_drugs, total_cost, indication_match_rate)
- Use design system tokens for styling with hover effects
- Responsive flex-wrap layout for smaller screens
- Add date_range_picker() component with enable/disable checkbox
- Add searchable_dropdown() component with search, select all, clear
- Implement filter_section() with layout for dates and multi-selects
- Add comprehensive state management in AppState:
- Filter toggle states (initiated_filter_enabled, last_seen_filter_enabled)
- Date values for both ranges
- Dropdown visibility state
- Selection state for drugs, indications, directorates
- Search text state for filtering options
- Event handlers for all filter interactions
- Computed vars for filtered options and selection counts
- Style components using design tokens from styles.py
- Debounced handlers deferred to Phase 3.3 (Filter Logic)