Commit Graph

107 Commits

Author SHA1 Message Date
Andrew Charlwood c93417f0e7 feat: return ALL GP matches with code_frequency in get_patient_indication_groups (Task 1.1)
- Replace QUALIFY ROW_NUMBER()=1 with GROUP BY + COUNT(*) to return all matching
  Search_Terms per patient instead of just the most recent
- Add earliest_hcd_date parameter to restrict GP codes to HCD data window
- Return code_frequency column (count of matching SNOMED codes per Search_Term)
  for use as tiebreaker in drug-aware indication matching
- Update empty DataFrame returns to match new column format
2026-02-05 23:01:01 +00:00
Andrew Charlwood 4fed0e53df docs: update progress.txt with Iteration 2 results (Task 1.2) 2026-02-05 22:56:44 +00:00
Andrew Charlwood b0a8a9de1c feat: merge asthma Search_Term variants in CLUSTER_MAPPING_SQL and drug mapping (Task 1.2)
Merge 'allergic asthma' and 'severe persistent allergic asthma' into
canonical 'asthma' in both CLUSTER_MAPPING_SQL (Snowflake CTE) and
load_drug_indication_mapping() (DimSearchTerm.csv loader).

- CLUSTER_MAPPING_SQL: 3 Cluster_IDs (AST_COD, eFI2_Asthma, SEVAST_COD) now
  all map to Search_Term = 'asthma'
- Added SEARCH_TERM_MERGE_MAP constant for reusable normalization
- load_drug_indication_mapping() applies merge at CSV load time
- urticaria (XSAL_COD) stays separate — not merged with asthma
- Combined asthma drug list: BENRALIZUMAB, DUPILUMAB, INHALED, MEPOLIZUMAB,
  OMALIZUMAB, RESLIZUMAB
2026-02-05 22:56:29 +00:00
Andrew Charlwood c85aae4f6a docs: update progress.txt with Iteration 1 results (Task 1.2) 2026-02-05 22:48:46 +00:00
Andrew Charlwood 1c4d2c07ee docs: mark project complete - all tasks done, viewport testing blocked by env (Iteration 9) 2026-02-05 20:51:48 +00:00
Andrew Charlwood fed909481e docs: update CLAUDE.md with indication chart architecture and CLI docs (Task 5.2) 2026-02-05 20:50:01 +00:00
Andrew Charlwood 4884e0a8cc fix: recreate pathway_nodes with correct UNIQUE constraint and validate end-to-end (Task 5.1)
The UNIQUE constraint was UNIQUE(date_filter_id, ids) instead of
UNIQUE(date_filter_id, chart_type, ids), causing INSERT OR REPLACE
to overwrite directory chart root/trust nodes when indication nodes
were inserted. Dropped and recreated the table, re-ran full refresh.

Validation: both chart types have all hierarchy levels (0-5),
all 12 date filters produce valid icicle charts, KPIs correct.
2026-02-05 20:43:01 +00:00
Andrew Charlwood 6331d44165 fix: prevent DataFrame mutation in prepare_data() causing indication charts to fail
prepare_data() mapped Provider Code → Name in-place. When called for directory
charts first, then indication charts, the second call re-mapped already-mapped
values to NaN, silently dropping all data. Added df.copy() to prevent mutation.

Also fixes directory charts only generating data for the first date filter.

Results: 3,633 pathway nodes now generated (1,101 directory + 2,532 indication)
across all 12 datasets (6 date filters × 2 chart types).
2026-02-05 20:10:12 +00:00
Andrew Charlwood 6f88a59978 feat: add chart type toggle for Directory/Indication views (Task 4.1, 4.2, 4.3)
- Add selected_chart_type state variable and set_chart_type() handler
- Add chart_type filter to load_pathway_data() WHERE clause
- Create segmented control toggle component in filter strip
- Add dynamic hierarchy label (Directorate vs Indication)
- Update chart title to include chart type prefix
2026-02-05 19:39:45 +00:00
Andrew Charlwood 2deaa2f6da docs: mark Task 3.1 complete - indication pipeline verified (Task 3.1)
Pipeline test results:
- 695 indication pathway nodes generated for all_6mo filter
- 92.8% GP diagnosis match rate (34,006/36,628 patients)
- 139 unique Search_Terms found
- Top indications: drug misuse, influenza, diabetes, sepsis, cardiovascular disease
- Full pipeline completes in ~10 minutes

Phase 3 complete, Phase 4 (Reflex UI) ready to begin.
2026-02-05 18:44:34 +00:00
Andrew Charlwood 0b5b462766 docs: update progress.txt with iteration 3, add new guardrails (Task 3.1) 2026-02-05 18:31:29 +00:00
Andrew Charlwood f7166b38c8 docs: update progress.txt with iteration 2 completion (Task 1.2, 2.3) 2026-02-05 17:07:06 +00:00
Andrew Charlwood 1a817b8257 feat: add get_patient_indication_groups() for Snowflake-direct GP lookup (Task 1.1)
- Add CLUSTER_MAPPING_SQL constant embedding full snomed_indication_mapping_query.sql
- Add get_patient_indication_groups() function that queries Snowflake directly
- Uses QUALIFY ROW_NUMBER() to get most recent diagnosis per patient
- Returns DataFrame with PatientPseudonym, Search_Term, EventDateTime
- Handles edge cases: empty list, Snowflake unavailable
- Batch processing with configurable batch_size (default 500)
- Comprehensive logging for match statistics
2026-02-05 17:03:12 +00:00
Andrew Charlwood 843b4f23cc docs: update progress.txt with iteration 9 (Task 3.3 in progress)
Fixed two critical bugs preventing GP diagnosis matching:
1. SNOMED codes in scientific notation now converted to integers
2. Using PseudoNHSNoLinked (not PersonKey) for GP record lookup

Full refresh is running in background - next iteration should verify completion.
2026-02-05 15:51:17 +00:00
Andrew Charlwood b9f4041670 docs: update progress.txt with iteration 8 completion (Task 3.2) 2026-02-05 14:45:57 +00:00
Andrew Charlwood 50b8548688 docs: update progress.txt with iteration 7 completion (Task 3.1) 2026-02-05 14:39:35 +00:00
Andrew Charlwood 0d15000aa0 docs: update progress.txt with iteration 6 completion (Task 2.3) 2026-02-05 14:33:16 +00:00
Andrew Charlwood aabe4bf45d docs: update progress.txt with iteration 5 completion (Task 2.2) 2026-02-05 14:25:44 +00:00
Andrew Charlwood 3db93a685b docs: update progress.txt with iteration 4 completion (Task 2.1) 2026-02-05 14:20:04 +00:00
Andrew Charlwood 6d68b5eaa5 feat: add SNOMED mapping loader script (Task 1.2)
- Create data_processing/load_snomed_mapping.py with:
  - migrate_drug_snomed_mapping() for CSV to SQLite migration
  - get_drug_snomed_mapping_counts() for statistics
  - verify_drug_snomed_mapping_migration() for validation
  - clean_snomed_code() to remove trailing .0 from SNOMED codes
  - CLI interface: python -m data_processing.load_snomed_mapping

- Loaded 144,056 mappings from enriched CSV:
  - 707 unique drugs
  - 187 unique search terms
  - 21,265 unique SNOMED codes
2026-02-05 14:10:36 +00:00
Andrew Charlwood 9943e85761 feat: add ref_drug_snomed_mapping schema (Task 1.1)
- Add REF_DRUG_SNOMED_MAPPING_SCHEMA with 11 columns for direct SNOMED mapping
- Add 5 indexes for lookup performance (drug, cleaned_drug, snomed, search_term, composite)
- Add create_drug_snomed_mapping_table() helper function
- Update helper functions (drop, get_counts, verify_exists) to include new table
- Table is included in REFERENCE_TABLES_SCHEMA and created by migration
2026-02-05 14:06:31 +00:00
Andrew Charlwood 139a71b752 docs: update progress.txt with iteration 17 completion (Task 5.6) 2026-02-05 02:16:28 +00:00
Andrew Charlwood 731db2d85f docs: update progress.txt with iteration 16 completion (Task 5.5) 2026-02-05 02:08:41 +00:00
Andrew Charlwood fc03e44ce2 docs: update progress.txt with iteration 15 completion (Task 5.4) 2026-02-05 02:04:42 +00:00
Andrew Charlwood 390328f2b4 docs: update progress.txt with iteration 14 completion (Task 5.3) 2026-02-05 01:59:35 +00:00
Andrew Charlwood 645fe0ab6c docs: update progress.txt with iteration 13 completion (Task 5.2) 2026-02-05 01:54:13 +00:00
Andrew Charlwood c9654905be docs: update progress.txt with iteration 12 completion (Task 5.1) 2026-02-05 01:47:40 +00:00
Andrew Charlwood 27d2d603c3 docs: update progress.txt with iteration 11 completion (Task 4.3 Documentation) 2026-02-05 00:57:48 +00:00
Andrew Charlwood 49bf4cdf1b docs: update progress.txt with iteration 10 completion (Task 4.2 Performance) 2026-02-05 00:50:57 +00:00
Andrew Charlwood 58450d78fa docs: update progress.txt with iteration 9 completion (Task 4.1 E2E Validation) 2026-02-05 00:44:39 +00:00
Andrew Charlwood 7628c5fa20 docs: update progress.txt with iteration 8 completion (Task 3.3 UI Components) 2026-02-05 00:38:16 +00:00
Andrew Charlwood 8f2425a9ae docs: update progress.txt with iteration 7 completion (Task 3.2 Icicle Figure) 2026-02-05 00:31:10 +00:00
Andrew Charlwood fc3b3525c6 docs: update progress.txt with iteration 6 completion (Task 3.1 AppState) 2026-02-05 00:26:59 +00:00
Andrew Charlwood 0a13ba550e docs: update progress.txt with iteration 5 completion (Task 2.2 Pipeline Test) 2026-02-05 00:21:08 +00:00
Andrew Charlwood 8b65dfd9a8 docs: update progress.txt with iteration 4 completion (Task 2.1 CLI) 2026-02-04 23:30:50 +00:00
Andrew Charlwood 9bb4748588 docs: mark Task 1.3 complete (migration already handled by schema)
Task 1.3 (Create Migration Script) is satisfied by existing code:
- python -m data_processing.migrate creates all pathway tables
- pathway_date_filters auto-populated via INSERT OR REPLACE in schema
- Verified: fresh database creates all 3 tables with 6 date filters
2026-02-04 23:25:14 +00:00
Andrew Charlwood b48dbbc96a docs: update progress.txt with iteration 2 completion (Task 1.2 Pipeline) 2026-02-04 23:21:50 +00:00
Andrew Charlwood 5945649ae3 feat: add pathway pipeline module (Task 1.2)
Create data_processing/pathway_pipeline.py with:
- DateFilterConfig dataclass for date filter configuration
- DATE_FILTER_CONFIGS with 6 pre-defined combinations
- compute_date_ranges() for computing actual dates from config
- fetch_and_transform_data() for Snowflake fetch + transformations
- process_pathway_for_date_filter() using existing generate_icicle_chart()
- extract_denormalized_fields() to parse trust/directory/drugs from ids
- convert_to_records() for SQLite insertion
- process_all_date_filters() convenience function
2026-02-04 23:21:39 +00:00
Andrew Charlwood f2717a2219 docs: update progress.txt with iteration 1 completion (Task 1.1 Schema) 2026-02-04 23:17:52 +00:00
Andrew Charlwood 85b3c20341 docs: update progress.txt with iteration 17 completion (Task 5.2 Debounce) 2026-02-04 19:26:26 +00:00
Andrew Charlwood f863e79299 docs: update progress.txt with iteration 16 completion (Task 5.3 Error Handling) 2026-02-04 19:21:31 +00:00
Andrew Charlwood 8c04e65ced docs: update progress.txt with correct commit hash 2026-02-04 19:16:58 +00:00
Andrew Charlwood 0dd99e6a42 docs: complete visual polish audit against DESIGN_SYSTEM.md (Task 5.1)
- Verified all design tokens match spec exactly
- Confirmed responsive behavior via flex_wrap patterns
- Audited hover states and transitions
- Validated chart colorscale uses design system palette
2026-02-04 19:16:47 +00:00
Andrew Charlwood b5afb5deb2 docs: update progress.txt with iteration 14 verification results 2026-02-04 19:09:02 +00:00
Andrew Charlwood fbf046eb8a docs: update progress.txt with iteration 13 completion 2026-02-04 18:59:38 +00:00
Andrew Charlwood 5267a9f4ef docs: update progress.txt with iteration 12 completion 2026-02-04 18:49:44 +00:00
Andrew Charlwood a749514889 docs: update progress.txt with iteration 11 completion 2026-02-04 18:45:59 +00:00
Andrew Charlwood f38ccfc128 feat: implement data loading from SQLite (Task 3.2)
- Add load_data() method to AppState that connects to SQLite database
- Populate available_drugs, available_directorates, available_indications from DB
- Detect latest date in dataset and set filter defaults accordingly
- Load KPI values: total_records, unique_patients, total_drugs, total_cost
- Add on_load handler to trigger data loading on page initialization
- Handle database errors gracefully with meaningful error messages
2026-02-04 14:11:03 +00:00
Andrew Charlwood a18de83c8d docs: update progress.txt with iteration 7 completion 2026-02-04 14:04:00 +00:00
Andrew Charlwood 80997cb0de docs: update progress.txt with iteration 6 completion 2026-02-04 13:59:20 +00:00