Commit Graph

6 Commits

Author SHA1 Message Date
admin fcbde7c689 Restructured src to more logical heirachy 2026-02-09 16:22:05 +00:00
Andrew Charlwood 76838887e6 refactor: reorganize repository to src/ layout
Move 6 packages (core, config, data_processing, analysis, visualization, cli)
into src/ to reduce root clutter. Merge tools/data.py into
data_processing/transforms.py. Move docs to docs/.

Path resolution via .pth file (setup_dev.py), pytest pythonpath config,
and sys.path bootstrap in rxconfig.py and CLI entry points.

Clean up pyproject.toml deps (remove stale pins, add snowflake-connector-python).
Fix tomllib import for Python 3.10 compatibility.

All 113 tests pass.
2026-02-06 12:03:48 +00:00
Andrew Charlwood 6331d44165 fix: prevent DataFrame mutation in prepare_data() causing indication charts to fail
prepare_data() mapped Provider Code → Name in-place. When called for directory
charts first, then indication charts, the second call re-mapped already-mapped
values to NaN, silently dropping all data. Added df.copy() to prevent mutation.

Also fixes directory charts only generating data for the first date filter.

Results: 3,633 pathway nodes now generated (1,101 directory + 2,532 indication)
across all 12 datasets (6 date filters × 2 chart types).
2026-02-05 20:10:12 +00:00
Andrew Charlwood 7cbc648c6d feat: add indication pathway processing functions (Task 2.3)
- Add generate_icicle_chart_indication() to pathway_analyzer.py
  - Variant that uses indication_df instead of directory_df
  - Groups by Trust → Search_Term → Drug → Pathway
  - Accepts indication_df mapping UPID → Indication_Group

- Add process_indication_pathway_for_date_filter() to pathway_pipeline.py
  - Processes indication-based pathway for a single date filter
  - Uses generate_icicle_chart_indication() for hierarchy building

- Add extract_indication_fields() to pathway_pipeline.py
  - Extracts trust_name, search_term, drug_sequence from ids column
  - Similar to extract_denormalized_fields() but for indication charts

- Update convert_to_records() with chart_type parameter
  - Includes chart_type column in output records
  - Supports "directory" and "indication" values

- Add ChartType type alias (Literal["directory", "indication"])

- Update __all__ exports with new functions
2026-02-05 14:32:28 +00:00
Andrew Charlwood adc1dbfc58 feat: complete Task 2.2 - test refresh pipeline with Snowflake data
Tested full refresh pipeline end-to-end with real Snowflake data:
- Fixed trust filter to read Name column from defaultTrusts.csv
- Fixed Decimal type handling in calculate_cost_per_patient_per_annum
- Fixed array handling in convert_to_records for average_administered
- Added required reference CSV files to data/ directory
- Configured Snowflake connection (account, warehouse, user)

Results:
- Snowflake fetch: 656,695 records in ~7s
- Transformations: 519,848 records after UPID/drug/directory
- Pathway nodes: 293 for all_6mo (8 trusts, 14 directories)
- Total processing time: ~6.2 minutes
2026-02-05 00:20:12 +00:00
Andrew Charlwood fdd33a67af Initial commit before Ralph loop 2026-02-04 13:04:29 +00:00