feat: add pathway pipeline module (Task 1.2)

Create data_processing/pathway_pipeline.py with:
- DateFilterConfig dataclass for date filter configuration
- DATE_FILTER_CONFIGS with 6 pre-defined combinations
- compute_date_ranges() for computing actual dates from config
- fetch_and_transform_data() for Snowflake fetch + transformations
- process_pathway_for_date_filter() using existing generate_icicle_chart()
- extract_denormalized_fields() to parse trust/directory/drugs from ids
- convert_to_records() for SQLite insertion
- process_all_date_filters() convenience function
This commit is contained in:
Andrew Charlwood
2026-02-04 23:21:27 +00:00
parent f2717a2219
commit 5945649ae3
3 changed files with 518 additions and 3 deletions
+3 -3
View File
@@ -50,13 +50,13 @@ cd pathways_app && timeout 60 python -m reflex run 2>&1 | head -30
- [x] Verify schema with: `python -c "from data_processing.schema import *"`
### 1.2 Create Pathway Pipeline Module
- [ ] Create `data_processing/pathway_pipeline.py` with:
- [x] Create `data_processing/pathway_pipeline.py` with:
- `fetch_and_transform_data()` - Snowflake fetch + UPID/drug/directory transformations
- `process_pathway_for_date_filter(df, date_filter_config)` - Single filter processing
- `extract_denormalized_fields(ice_df)` - Extract trust, directory, drug_sequence from ids
- `convert_to_records(ice_df, date_filter_id)` - Convert ice_df to list of dicts for SQLite
- [ ] Integrate with existing `analysis/pathway_analyzer.py` functions
- [ ] Verify: `python -c "from data_processing.pathway_pipeline import *"`
- [x] Integrate with existing `analysis/pathway_analyzer.py` functions
- [x] Verify: `python -c "from data_processing.pathway_pipeline import *"`
### 1.3 Create Migration Script
- [ ] Create script to set up new tables in existing `data/pathways.db`