chore: archive unused files and move legacy code to can_delete
archive/ — unused reference files (no active code references): - LookupSearchTermCleanedDrugName.csv, condition_directorate_mapping.csv - na_directory_rows.csv (diagnostic output), ta-recommendations.xlsx - snomed_indication_mapping_query.sql (source for embedded SQL) - IMPROVEMENT_RECOMMENDATIONS.md, power query.pq archive/can_delete/ — legacy code and logs safe to remove: - dashboard_gui.py (replaced by Reflex app) - pathways_app_old.py.bak (old backup) - Ralph loop iteration logs (iterations 2-8)
This commit is contained in:
@@ -0,0 +1,99 @@
|
||||
|
||||
|
||||
I'll start by reading the required files in order to understand the current state of the project.
|
||||
Now let me check recent git history and read the current state of the key file I'll need to modify.
|
||||
Good. I've read all the required files. Let me assess the current state:
|
||||
|
||||
**Completed tasks:**
|
||||
- Task 1.3: `load_drug_indication_mapping()` and `get_search_terms_for_drug()` ÔÇö done in Iteration 1
|
||||
|
||||
**Remaining tasks (Phase 1):**
|
||||
- Task 1.1: Update `get_patient_indication_groups()` to return ALL matches with code_frequency
|
||||
- Task 1.2: Merge asthma Search_Terms in CLUSTER_MAPPING_SQL and `load_drug_indication_mapping()`
|
||||
|
||||
**Task selection reasoning:**
|
||||
Progress.txt recommends either Task 1.2 (asthma merge ÔÇö self-contained, testable locally) or Task 1.1 (Snowflake query update). I'll choose **Task 1.2** because:
|
||||
1. It's fully self-contained and testable without Snowflake connectivity
|
||||
2. It's a prerequisite for getting clean data from both the Snowflake query AND the drug mapping
|
||||
3. It's simpler and lower risk ÔÇö a good foundation before modifying the Snowflake query
|
||||
|
||||
Let me start by understanding what needs to change. I need to:
|
||||
1. Update `CLUSTER_MAPPING_SQL` to merge 3 asthma Search_Terms into one `"asthma"` entry
|
||||
2. Update `load_drug_indication_mapping()` to merge asthma variants when loading DimSearchTerm.csv
|
||||
3. Verify the changes work correctly
|
||||
|
||||
First, let me check DimSearchTerm.csv to see the asthma-related rows.
|
||||
Let me check the asthma-related entries in DimSearchTerm.csv.
|
||||
Now let me check the current CLUSTER_MAPPING_SQL entries for asthma. I can see from the file I already read:
|
||||
- `('allergic asthma', 'AST_COD')` ÔÇö line 1201
|
||||
- `('asthma', 'eFI2_Asthma')` ÔÇö line 1210
|
||||
- `('severe persistent allergic asthma', 'SEVAST_COD')` ÔÇö line 1316
|
||||
- `('urticaria', 'XSAL_COD')` ÔÇö line 1337
|
||||
|
||||
The plan for CLUSTER_MAPPING_SQL is to map all 3 asthma Cluster_IDs to `Search_Term = 'asthma'`:
|
||||
- `('asthma', 'AST_COD')` ÔÇö was allergic asthma
|
||||
- `('asthma', 'eFI2_Asthma')` ÔÇö stays the same
|
||||
- `('asthma', 'SEVAST_COD')` ÔÇö was severe persistent allergic asthma
|
||||
|
||||
`urticaria` stays as-is.
|
||||
Now updating CLUSTER_MAPPING_SQL. I need to change `'allergic asthma'`  `'asthma'` and `'severe persistent allergic asthma'`  `'asthma'`. This means the CTE will have multiple rows mapping to `Search_Term = 'asthma'` with different Cluster_IDs (AST_COD, eFI2_Asthma, SEVAST_COD). That's fine  the UNION in AllIndicationCodes will just give us more SNOMED codes mapping to `'asthma'`, which is what we want.
|
||||
Good. Now the CLUSTER_MAPPING_SQL has 3 rows for `'asthma'` with Cluster_IDs: AST_COD, eFI2_Asthma, SEVAST_COD. The GP lookup will return `Search_Term = 'asthma'` for any of these clusters. `urticaria` with `XSAL_COD` stays separate.
|
||||
|
||||
Now I need to update `load_drug_indication_mapping()` to merge the asthma variants. When loading DimSearchTerm.csv, rows with Search_Term = "allergic asthma" or "severe persistent allergic asthma" should have their drug fragments merged under a single "asthma" key.
|
||||
I need to add a Search_Term normalization step in `load_drug_indication_mapping()`. The approach: define a mapping of asthma variant names  "asthma", and apply it when reading each row. This keeps the merge logic contained in one place.
|
||||
Now I need to update the `__all__` exports to include the new constant, and update the log message to note the merge.
|
||||
|
||||
Let me also update the `__all__` list to include `SEARCH_TERM_MERGE_MAP` for potential use by other modules.
|
||||
Now let me validate the changes.
|
||||
Syntax check passed. Now let me test the import and verify the asthma merge works correctly.
|
||||
All validations pass:
|
||||
|
||||
1. **Asthma merge works**: `asthma` now has the combined drug list: OMALIZUMAB, BENRALIZUMAB, DUPILUMAB, INHALED, MEPOLIZUMAB, RESLIZUMAB (from all 3 original rows)
|
||||
2. **Variants removed**: `allergic asthma` and `severe persistent allergic asthma` no longer exist as separate entries
|
||||
3. **Urticaria separate**: `urticaria`  `['OMALIZUMAB']`  correctly kept separate
|
||||
4. **OMALIZUMAB maps correctly**: to both `asthma` and `urticaria` (not just one)
|
||||
5. **Drug matching still works**: ADALIMUMAB maps to 7 indications as expected
|
||||
6. **Count changes**: 162 Search_Terms (was 164 ÔÇö 2 fewer because 3 asthma entries merged into 1)
|
||||
7. **Fragment count unchanged**: 346 (no fragments lost, just redistributed)
|
||||
|
||||
Now let me also verify the CLUSTER_MAPPING_SQL changes by checking the specific lines.
|
||||
CLUSTER_MAPPING_SQL now has 3 entries for `'asthma'`:
|
||||
- `('asthma', 'AST_COD')` ÔÇö was `allergic asthma`
|
||||
- `('asthma', 'eFI2_Asthma')` ÔÇö unchanged
|
||||
- `('asthma', 'SEVAST_COD')` ÔÇö was `severe persistent allergic asthma`
|
||||
|
||||
And `('urticaria', 'XSAL_COD')` is untouched.
|
||||
|
||||
Now let me mark the task in the IMPLEMENTATION_PLAN.md.
|
||||
Now let me commit the changes and update progress.txt.
|
||||
I see many modified files from prior work. Let me check what my specific changes are in the files I modified.
|
||||
The diff is clean and precise. Now let me update progress.txt and then commit both files together.
|
||||
Now let me commit the changes.
|
||||
Now let me update the commit hash in progress.txt.
|
||||
**Iteration 2 complete.** Here's the summary:
|
||||
|
||||
**Task 1.2: Merge related asthma Search_Terms** ÔÇö DONE
|
||||
|
||||
Changes made:
|
||||
1. **CLUSTER_MAPPING_SQL**: `'allergic asthma'`  `'asthma'` (AST_COD) and `'severe persistent allergic asthma'`  `'asthma'` (SEVAST_COD). Now 3 Cluster_IDs all map to `'asthma'`.
|
||||
2. **SEARCH_TERM_MERGE_MAP**: New constant mapping asthma variants  canonical `"asthma"`. Reusable by other modules.
|
||||
3. **load_drug_indication_mapping()**: Applies merge map at CSV load time. Combined asthma drug list: BENRALIZUMAB, DUPILUMAB, INHALED, MEPOLIZUMAB, OMALIZUMAB, RESLIZUMAB.
|
||||
4. **urticaria** stays separate (XSAL_COD, OMALIZUMAB only).
|
||||
|
||||
All validation passed. Commits: `b0a8a9d`, `4fed0e5`.
|
||||
|
||||
**Next iteration**: Task 1.1 (update Snowflake query to return ALL matches with code_frequency) or Task 2.1 (assign_drug_indications function).
|
||||
**Iteration 2 complete.** Here's the summary:
|
||||
|
||||
**Task 1.2: Merge related asthma Search_Terms** ÔÇö DONE
|
||||
|
||||
Changes made:
|
||||
1. **CLUSTER_MAPPING_SQL**: `'allergic asthma'`  `'asthma'` (AST_COD) and `'severe persistent allergic asthma'`  `'asthma'` (SEVAST_COD). Now 3 Cluster_IDs all map to `'asthma'`.
|
||||
2. **SEARCH_TERM_MERGE_MAP**: New constant mapping asthma variants  canonical `"asthma"`. Reusable by other modules.
|
||||
3. **load_drug_indication_mapping()**: Applies merge map at CSV load time. Combined asthma drug list: BENRALIZUMAB, DUPILUMAB, INHALED, MEPOLIZUMAB, OMALIZUMAB, RESLIZUMAB.
|
||||
4. **urticaria** stays separate (XSAL_COD, OMALIZUMAB only).
|
||||
|
||||
All validation passed. Commits: `b0a8a9d`, `4fed0e5`.
|
||||
|
||||
**Next iteration**: Task 1.1 (update Snowflake query to return ALL matches with code_frequency) or Task 2.1 (assign_drug_indications function).
|
||||
|
||||
Reference in New Issue
Block a user