Commit Graph

245 Commits

Author SHA1 Message Date
Andrew Charlwood b0a8a9de1c feat: merge asthma Search_Term variants in CLUSTER_MAPPING_SQL and drug mapping (Task 1.2)
Merge 'allergic asthma' and 'severe persistent allergic asthma' into
canonical 'asthma' in both CLUSTER_MAPPING_SQL (Snowflake CTE) and
load_drug_indication_mapping() (DimSearchTerm.csv loader).

- CLUSTER_MAPPING_SQL: 3 Cluster_IDs (AST_COD, eFI2_Asthma, SEVAST_COD) now
  all map to Search_Term = 'asthma'
- Added SEARCH_TERM_MERGE_MAP constant for reusable normalization
- load_drug_indication_mapping() applies merge at CSV load time
- urticaria (XSAL_COD) stays separate — not merged with asthma
- Combined asthma drug list: BENRALIZUMAB, DUPILUMAB, INHALED, MEPOLIZUMAB,
  OMALIZUMAB, RESLIZUMAB
2026-02-05 22:56:29 +00:00
Andrew Charlwood c85aae4f6a docs: update progress.txt with Iteration 1 results (Task 1.2) 2026-02-05 22:48:46 +00:00
Andrew Charlwood 0779df78d1 feat: add drug-to-indication mapping from DimSearchTerm.csv (Task 1.2)
Add load_drug_indication_mapping() and get_search_terms_for_drug() to
diagnosis_lookup.py. Loads DimSearchTerm.csv to build bidirectional
lookup between drug name fragments and Search_Terms. Uses substring
matching for drug fragments (handles both exact names like ADALIMUMAB
and partial fragments like PEGYLATED). Handles duplicate Search_Terms
(e.g., diabetes appearing under two directorates) by combining fragments.
2026-02-05 22:48:09 +00:00
Andrew Charlwood 1c4d2c07ee docs: mark project complete - all tasks done, viewport testing blocked by env (Iteration 9) 2026-02-05 20:51:48 +00:00
Andrew Charlwood fed909481e docs: update CLAUDE.md with indication chart architecture and CLI docs (Task 5.2) 2026-02-05 20:50:01 +00:00
Andrew Charlwood 4884e0a8cc fix: recreate pathway_nodes with correct UNIQUE constraint and validate end-to-end (Task 5.1)
The UNIQUE constraint was UNIQUE(date_filter_id, ids) instead of
UNIQUE(date_filter_id, chart_type, ids), causing INSERT OR REPLACE
to overwrite directory chart root/trust nodes when indication nodes
were inserted. Dropped and recreated the table, re-ran full refresh.

Validation: both chart types have all hierarchy levels (0-5),
all 12 date filters produce valid icicle charts, KPIs correct.
2026-02-05 20:43:01 +00:00
Andrew Charlwood 6331d44165 fix: prevent DataFrame mutation in prepare_data() causing indication charts to fail
prepare_data() mapped Provider Code → Name in-place. When called for directory
charts first, then indication charts, the second call re-mapped already-mapped
values to NaN, silently dropping all data. Added df.copy() to prevent mutation.

Also fixes directory charts only generating data for the first date filter.

Results: 3,633 pathway nodes now generated (1,101 directory + 2,532 indication)
across all 12 datasets (6 date filters × 2 chart types).
2026-02-05 20:10:12 +00:00
Andrew Charlwood 6f88a59978 feat: add chart type toggle for Directory/Indication views (Task 4.1, 4.2, 4.3)
- Add selected_chart_type state variable and set_chart_type() handler
- Add chart_type filter to load_pathway_data() WHERE clause
- Create segmented control toggle component in filter strip
- Add dynamic hierarchy label (Directorate vs Indication)
- Update chart title to include chart type prefix
2026-02-05 19:39:45 +00:00
Andrew Charlwood 2deaa2f6da docs: mark Task 3.1 complete - indication pipeline verified (Task 3.1)
Pipeline test results:
- 695 indication pathway nodes generated for all_6mo filter
- 92.8% GP diagnosis match rate (34,006/36,628 patients)
- 139 unique Search_Terms found
- Top indications: drug misuse, influenza, diabetes, sepsis, cardiovascular disease
- Full pipeline completes in ~10 minutes

Phase 3 complete, Phase 4 (Reflex UI) ready to begin.
2026-02-05 18:44:34 +00:00
Andrew Charlwood 0b5b462766 docs: update progress.txt with iteration 3, add new guardrails (Task 3.1) 2026-02-05 18:31:29 +00:00
Andrew Charlwood 22222fe9ca fix: resolve Snowflake column casing and UPID mapping issues (Task 3.1)
Three issues identified and fixed during Task 3.1 testing:

1. Snowflake column name casing:
   - Unquoted columns in Snowflake are returned as UPPERCASE
   - Fixed by aliasing columns with quoted names: AS "Search_Term"
   - Now correctly populates 139 unique Search_Terms (was 0)

2. Duplicate UPID index error:
   - indication_df_for_chart could have duplicate UPIDs
   - Added drop_duplicates(subset=['UPID']) before set_index()
   - Keeps first occurrence (DIAGNOSIS over FALLBACK)

3. Missing UPIDs in indication lookup:
   - Old code: built indication_df from unique PseudoNHSNoLinked only
   - Problem: patients with multiple UPIDs (multi-provider) were missing
   - Fixed: now builds indication_df from ALL unique UPIDs in df
   - Also handles NaN values in Directory column safely

Validation results from test run:
- 36,628 patients queried
- 34,006 (92.8%) had GP diagnosis matches
- 139 unique Search_Terms found
- Top 5: drug misuse (8602), influenza (6239), diabetes (2476)

Still to verify: full pathway processing after these fixes.
2026-02-05 18:30:23 +00:00
Andrew Charlwood f7166b38c8 docs: update progress.txt with iteration 2 completion (Task 1.2, 2.3) 2026-02-05 17:07:06 +00:00
Andrew Charlwood ad10b374cb feat: integrate Snowflake-direct indication lookup into CLI refresh (Task 1.2, 2.3)
Replace batch_lookup_indication_groups() with get_patient_indication_groups()
for indication chart processing. The new approach:

- Extracts unique PseudoNHSNoLinked values from HCD data
- Queries Snowflake directly using the cluster CTE
- Builds indication_df mapping UPID → Search_Term (matched) or Directory (fallback)
- Logs coverage statistics (diagnosis % vs fallback %)

This completes the integration of the new Snowflake-direct GP lookup approach.
2026-02-05 17:06:34 +00:00
Andrew Charlwood 1a817b8257 feat: add get_patient_indication_groups() for Snowflake-direct GP lookup (Task 1.1)
- Add CLUSTER_MAPPING_SQL constant embedding full snomed_indication_mapping_query.sql
- Add get_patient_indication_groups() function that queries Snowflake directly
- Uses QUALIFY ROW_NUMBER() to get most recent diagnosis per patient
- Returns DataFrame with PatientPseudonym, Search_Term, EventDateTime
- Handles edge cases: empty list, Snowflake unavailable
- Batch processing with configurable batch_size (default 500)
- Comprehensive logging for match statistics
2026-02-05 17:03:12 +00:00
Andrew Charlwood 99bab08402 docs: add guardrails for patient identifier and SNOMED code handling 2026-02-05 15:51:52 +00:00
Andrew Charlwood 843b4f23cc docs: update progress.txt with iteration 9 (Task 3.3 in progress)
Fixed two critical bugs preventing GP diagnosis matching:
1. SNOMED codes in scientific notation now converted to integers
2. Using PseudoNHSNoLinked (not PersonKey) for GP record lookup

Full refresh is running in background - next iteration should verify completion.
2026-02-05 15:51:17 +00:00
Andrew Charlwood 5b1569ed5c fix: correct patient identifier for GP diagnosis lookup (Task 3.3)
Two critical fixes for the indication-based pathway feature:

1. clean_snomed_code() now handles scientific notation (e.g., "1.06e+16")
   - CSV export from pandas/Excel converts large SNOMED codes to scientific notation
   - Without this fix, codes like "10629311000119108" were stored as "1.06e+16"
   - Now properly converts to full integer strings

2. batch_lookup_indication_groups() now uses PseudoNHSNoLinked instead of PersonKey
   - PersonKey is LocalPatientID (provider-specific like "J188448")
   - PseudoNHSNoLinked is the pseudonymised NHS number that matches PatientPseudonym in GP records
   - Without this fix, 0% of patients matched GP records
   - Test shows ~20% match rate for ADALIMUMAB patients with correct identifier
2026-02-05 15:49:24 +00:00
Andrew Charlwood b9f4041670 docs: update progress.txt with iteration 8 completion (Task 3.2) 2026-02-05 14:45:57 +00:00
Andrew Charlwood 8952156798 feat: integrate batch GP diagnosis lookup for indication charts (Task 3.2)
- Add batch_lookup_indication_groups() to diagnosis_lookup.py
  - Efficient batch Snowflake queries (500 patients per batch)
  - Returns UPID → Indication_Group mapping
  - Source tracking: DIAGNOSIS vs FALLBACK
- Update cli/refresh_pathways.py indication processing
  - Call batch_lookup_indication_groups() before chart generation
  - Build indication_df for process_indication_pathway_for_date_filter()
  - Log diagnosis coverage statistics
- Enables full --chart-type all functionality
2026-02-05 14:45:06 +00:00
Andrew Charlwood 50b8548688 docs: update progress.txt with iteration 7 completion (Task 3.1) 2026-02-05 14:39:35 +00:00
Andrew Charlwood 593d14c70f feat: add chart_type argument to refresh command (Task 3.1)
- Add --chart-type argument with choices: directory, indication, all
- Update insert_pathway_records to include chart_type column
- Update refresh_pathways to process multiple chart types
- Update logging to show chart type counts
- Indication chart processing deferred to Task 3.2 (GP diagnosis integration)
2026-02-05 14:38:57 +00:00
Andrew Charlwood 0d15000aa0 docs: update progress.txt with iteration 6 completion (Task 2.3) 2026-02-05 14:33:16 +00:00
Andrew Charlwood 7cbc648c6d feat: add indication pathway processing functions (Task 2.3)
- Add generate_icicle_chart_indication() to pathway_analyzer.py
  - Variant that uses indication_df instead of directory_df
  - Groups by Trust → Search_Term → Drug → Pathway
  - Accepts indication_df mapping UPID → Indication_Group

- Add process_indication_pathway_for_date_filter() to pathway_pipeline.py
  - Processes indication-based pathway for a single date filter
  - Uses generate_icicle_chart_indication() for hierarchy building

- Add extract_indication_fields() to pathway_pipeline.py
  - Extracts trust_name, search_term, drug_sequence from ids column
  - Similar to extract_denormalized_fields() but for indication charts

- Update convert_to_records() with chart_type parameter
  - Includes chart_type column in output records
  - Supports "directory" and "indication" values

- Add ChartType type alias (Literal["directory", "indication"])

- Update __all__ exports with new functions
2026-02-05 14:32:28 +00:00
Andrew Charlwood aabe4bf45d docs: update progress.txt with iteration 5 completion (Task 2.2) 2026-02-05 14:25:44 +00:00
Andrew Charlwood 19607d72b0 feat: add chart_type column to pathway_nodes schema (Task 2.2)
- Add chart_type column (TEXT NOT NULL DEFAULT 'directory')
- Update UNIQUE constraint to (date_filter_id, chart_type, ids)
- Add idx_pathway_nodes_chart_type index for filtering
- Add migrate_pathway_nodes_chart_type() function for existing databases
- Update initialize_database() to run migration automatically
- Existing rows default to 'directory' chart type
2026-02-05 14:24:57 +00:00
Andrew Charlwood 3db93a685b docs: update progress.txt with iteration 4 completion (Task 2.1) 2026-02-05 14:20:04 +00:00
Andrew Charlwood 506769470d feat: add get_directorate_from_diagnosis() function (Task 2.1)
- Added DirectorateAssignment dataclass for return type
- Added get_directorate_from_diagnosis() function to diagnosis_lookup.py
- Logic: Try diagnosis-based lookup first (direct SNOMED match)
- Returns FALLBACK source if no match found, letting caller handle fallback
- Extracts PatientPseudonym from UPID (last part after provider code)
- Updated __all__ exports with new dataclass and function
- Tested: function handles no-match cases correctly
2026-02-05 14:19:18 +00:00
Andrew Charlwood b44d22de2c feat: add direct SNOMED lookup functions (Task 1.3)
Add two new functions to diagnosis_lookup.py for direct SNOMED code matching:

- get_drug_snomed_codes(drug_name): Query ref_drug_snomed_mapping for all
  SNOMED codes mapped to a drug. Returns list of DrugSnomedMapping with
  snomed_code, snomed_description, search_term, primary_directorate.
  Tested: ADALIMUMAB returns 1320 mappings across 10 Search_Terms.

- patient_has_indication_direct(patient_pseudonym, mappings, connector):
  Query PrimaryCareClinicalCoding for exact SNOMED code matches.
  Returns most recent match by EventDateTime with DirectSnomedMatchResult.

Both functions follow existing patterns in the module and are exported
in __all__. The lookup is case-insensitive for drug names.
2026-02-05 14:14:55 +00:00
Andrew Charlwood 6d68b5eaa5 feat: add SNOMED mapping loader script (Task 1.2)
- Create data_processing/load_snomed_mapping.py with:
  - migrate_drug_snomed_mapping() for CSV to SQLite migration
  - get_drug_snomed_mapping_counts() for statistics
  - verify_drug_snomed_mapping_migration() for validation
  - clean_snomed_code() to remove trailing .0 from SNOMED codes
  - CLI interface: python -m data_processing.load_snomed_mapping

- Loaded 144,056 mappings from enriched CSV:
  - 707 unique drugs
  - 187 unique search terms
  - 21,265 unique SNOMED codes
2026-02-05 14:10:36 +00:00
Andrew Charlwood 9943e85761 feat: add ref_drug_snomed_mapping schema (Task 1.1)
- Add REF_DRUG_SNOMED_MAPPING_SCHEMA with 11 columns for direct SNOMED mapping
- Add 5 indexes for lookup performance (drug, cleaned_drug, snomed, search_term, composite)
- Add create_drug_snomed_mapping_table() helper function
- Update helper functions (drop, get_counts, verify_exists) to include new table
- Table is included in REFERENCE_TABLES_SCHEMA and created by migration
2026-02-05 14:06:31 +00:00
Andrew Charlwood fa72fb3098 docs: mark all tasks complete in IMPLEMENTATION_PLAN.md 2026-02-05 02:17:17 +00:00
Andrew Charlwood 139a71b752 docs: update progress.txt with iteration 17 completion (Task 5.6) 2026-02-05 02:16:28 +00:00
Andrew Charlwood 9b466b4e6c feat: add hover/focus states and clean up unused styles (Task 5.6)
- Add subtle hover states to KPI badges, dropdown triggers, tabs
- Add consistent focus rings for accessibility (2px Pale Blue)
- Update button styles with focus/active states
- Clean up unused styles: compact_kpi_* (Option B), unused imports
- All interactive elements now have appropriate hover/focus feedback
2026-02-05 02:16:01 +00:00
Andrew Charlwood 731db2d85f docs: update progress.txt with iteration 16 completion (Task 5.5) 2026-02-05 02:08:41 +00:00
Andrew Charlwood 754e98dbe5 feat: refine top bar with style helpers (Task 5.5)
- Use top_bar_style() for 48px height container
- Use logo_style() for 28px height logo (was 36px)
- Use top_bar_tab_style() for 28px height pills
- Simplify data freshness to single line
- Remove max_width constraint for full-width bar
- Lighter shadow (SM instead of MD)
2026-02-05 02:08:01 +00:00
Andrew Charlwood fc03e44ce2 docs: update progress.txt with iteration 15 completion (Task 5.4) 2026-02-05 02:04:42 +00:00
Andrew Charlwood ef2a109528 feat: full-width responsive chart layout (Task 5.4)
- Remove PAGE_MAX_WIDTH constraint from main_content()
- Update chart_display() with calc(100vh - 152px) height
- Update icicle_figure with autosize=True and reduced margins
- Update chart_section() with flex layout for height fill
- Update page_layout() with 100vh height
2026-02-05 02:03:55 +00:00
Andrew Charlwood 390328f2b4 docs: update progress.txt with iteration 14 completion (Task 5.3) 2026-02-05 01:59:35 +00:00
Andrew Charlwood 826dd1c022 feat: compact KPI badges integrated into filter strip (Task 5.3)
- Add kpi_badge() and kpi_badges() functions for inline pill-style KPIs
- Integrate KPI badges into filter_section() on the right side
- Remove separate kpi_row() from main_content() layout
- Zero extra vertical height - KPIs now share the filter strip row

Design: Follows Option A from DESIGN_SYSTEM.md (preferred approach)
2026-02-05 01:59:00 +00:00
Andrew Charlwood 645fe0ab6c docs: update progress.txt with iteration 13 completion (Task 5.2) 2026-02-05 01:54:13 +00:00
Andrew Charlwood d2bed71078 feat: compact filter section as single horizontal strip (Task 5.2)
- Redesign filter_section() as 48px horizontal strip
- Remove "Filters" header (saves vertical space)
- Compact initiated_filter_dropdown() and last_seen_filter_dropdown()
  - 32px height triggers via compact_dropdown_trigger_style()
  - Labels moved inside dropdown panels
- Compact searchable_dropdown() component
  - 32px trigger height, no external label
  - Reduced panel item height (150px max, was 200px)
  - Smaller search input (size="1"), tighter spacing
- All filters now in ONE row with divider separator

Target: filter section height ≤ 60px (from ~200px)
2026-02-05 01:53:38 +00:00
Andrew Charlwood c9654905be docs: update progress.txt with iteration 12 completion (Task 5.1) 2026-02-05 01:47:40 +00:00
Andrew Charlwood 0a68c2a5a5 feat: update design tokens for SaaS redesign (Task 5.1)
- Typography: Reduce sizes (Display 32→28, H1 24→18, H2 20→16, Caption 12→11)
- Spacing: Tighten scale by ~25% (SM 8→6, MD 12→8, LG 16→12, etc.)
- Shadows: Lighter values for modern feel
- Colors: Modernize semantic colors (#10B981 success, #EF4444 error)
- Layout: TOP_BAR_HEIGHT 64→48px, new FILTER_STRIP_HEIGHT 48px

New style helpers added:
- compact_kpi_card_style/value/label - 50% smaller KPI cards
- kpi_badge_style - inline pill variant for zero-height KPIs
- filter_strip_style - horizontal single-row container
- compact_dropdown_trigger_style - 32px height triggers
- chart_container_style/wrapper - full-width flex-grow
- top_bar_style/tab/logo - compact 48px top bar

All tokens verified via import and Reflex compile.
2026-02-05 01:46:58 +00:00
Andrew Charlwood 27d2d603c3 docs: update progress.txt with iteration 11 completion (Task 4.3 Documentation) 2026-02-05 00:57:48 +00:00
Andrew Charlwood 76e0d64820 docs: complete Task 4.3 Documentation
Update CLAUDE.md with new pathway data architecture:
- Add Pathway Data Architecture section with date filter table
- Update package structure with cli/ and pathway_pipeline.py
- Add CLI module and pathway pipeline documentation
- Update data flow diagrams (pre-computed vs legacy)
- Add pathway tables to database schema section
- Add CLI commands section with usage examples
- Add Breaking Changes section documenting:
  - Date filter changes (pickers -> dropdowns)
  - Data refresh model (real-time -> pre-computed)
  - State variable changes
  - Icicle chart enhancements

Mark all Task 4.3 subtasks complete in IMPLEMENTATION_PLAN.md
Update completion criteria status
2026-02-05 00:56:34 +00:00
Andrew Charlwood 49bf4cdf1b docs: update progress.txt with iteration 10 completion (Task 4.2 Performance) 2026-02-05 00:50:57 +00:00
Andrew Charlwood 870d2e6e0e feat: complete Task 4.2 Performance Testing - all targets met 2026-02-05 00:50:14 +00:00
Andrew Charlwood 58450d78fa docs: update progress.txt with iteration 9 completion (Task 4.1 E2E Validation) 2026-02-05 00:44:39 +00:00
Andrew Charlwood cabaa72e9d feat: complete Task 4.1 End-to-End Validation
All 5 validation tests pass:
- Hierarchy structure: 6 levels (Root→Trust→Directory→Drug→Pathway)
- Patient counts: 11,118 patients, £130.5M from root node
- Treatment statistics: average_spacing, cost_pp_pa populated
- Drug filtering: drug_sequence column for LIKE patterns
- Customdata: All 10 fields present and populated
2026-02-05 00:43:55 +00:00
Andrew Charlwood 7628c5fa20 docs: update progress.txt with iteration 8 completion (Task 3.3 UI Components) 2026-02-05 00:38:16 +00:00