chore: archive unused files and move legacy code to can_delete

archive/ — unused reference files (no active code references): - LookupSearchTermCleanedDrugName.csv, condition_directorate_mapping.csv - na_directory_rows.csv (diagnostic output), ta-recommendations.xlsx - snomed_indication_mapping_query.sql (source for embedded SQL) - IMPROVEMENT_RECOMMENDATIONS.md, power query.pq archive/can_delete/ — legacy code and logs safe to remove: - dashboard_gui.py (replaced by Reflex app) - pathways_app_old.py.bak (old backup) - Ralph loop iteration logs (iterations 2-8)
2026-02-06 01:01:02 +00:00
parent a31907aa1f
commit bb93c1673e
23 changed files with 121509 additions and 0 deletions
@@ -0,0 +1,99 @@
+
+
+I'll start by reading the required files in order to understand the current state of the project.
+Now let me check recent git history and read the current state of the key file I'll need to modify.
+Good. I've read all the required files. Let me assess the current state:
+
+**Completed tasks:**
+- Task 1.3: `load_drug_indication_mapping()` and `get_search_terms_for_drug()` ÔÇö done in Iteration 1
+
+**Remaining tasks (Phase 1):**
+- Task 1.1: Update `get_patient_indication_groups()` to return ALL matches with code_frequency
+- Task 1.2: Merge asthma Search_Terms in CLUSTER_MAPPING_SQL and `load_drug_indication_mapping()`
+
+**Task selection reasoning:**
+Progress.txt recommends either Task 1.2 (asthma merge ÔÇö self-contained, testable locally) or Task 1.1 (Snowflake query update). I'll choose **Task 1.2** because:
+1. It's fully self-contained and testable without Snowflake connectivity
+2. It's a prerequisite for getting clean data from both the Snowflake query AND the drug mapping
+3. It's simpler and lower risk ÔÇö a good foundation before modifying the Snowflake query
+
+Let me start by understanding what needs to change. I need to:
+1. Update `CLUSTER_MAPPING_SQL` to merge 3 asthma Search_Terms into one `"asthma"` entry
+2. Update `load_drug_indication_mapping()` to merge asthma variants when loading DimSearchTerm.csv
+3. Verify the changes work correctly
+
+First, let me check DimSearchTerm.csv to see the asthma-related rows.
+Let me check the asthma-related entries in DimSearchTerm.csv.
+Now let me check the current CLUSTER_MAPPING_SQL entries for asthma. I can see from the file I already read:
+- `('allergic asthma', 'AST_COD')` ÔÇö line 1201
+- `('asthma', 'eFI2_Asthma')` ÔÇö line 1210
+- `('severe persistent allergic asthma', 'SEVAST_COD')` ÔÇö line 1316
+- `('urticaria', 'XSAL_COD')` ÔÇö line 1337
+
+The plan for CLUSTER_MAPPING_SQL is to map all 3 asthma Cluster_IDs to `Search_Term = 'asthma'`:
+- `('asthma', 'AST_COD')` ÔÇö was allergic asthma
+- `('asthma', 'eFI2_Asthma')` ÔÇö stays the same
+- `('asthma', 'SEVAST_COD')` ÔÇö was severe persistent allergic asthma
+
+`urticaria` stays as-is.
+Now updating CLUSTER_MAPPING_SQL. I need to change `'allergic asthma'` ÔåÆ `'asthma'` and `'severe persistent allergic asthma'` ÔåÆ `'asthma'`. This means the CTE will have multiple rows mapping to `Search_Term = 'asthma'` with different Cluster_IDs (AST_COD, eFI2_Asthma, SEVAST_COD). That's fine ÔÇö the UNION in AllIndicationCodes will just give us more SNOMED codes mapping to `'asthma'`, which is what we want.
+Good. Now the CLUSTER_MAPPING_SQL has 3 rows for `'asthma'` with Cluster_IDs: AST_COD, eFI2_Asthma, SEVAST_COD. The GP lookup will return `Search_Term = 'asthma'` for any of these clusters. `urticaria` with `XSAL_COD` stays separate.
+
+Now I need to update `load_drug_indication_mapping()` to merge the asthma variants. When loading DimSearchTerm.csv, rows with Search_Term = "allergic asthma" or "severe persistent allergic asthma" should have their drug fragments merged under a single "asthma" key.
+I need to add a Search_Term normalization step in `load_drug_indication_mapping()`. The approach: define a mapping of asthma variant names ÔåÆ "asthma", and apply it when reading each row. This keeps the merge logic contained in one place.
+Now I need to update the `__all__` exports to include the new constant, and update the log message to note the merge.
+
+Let me also update the `__all__` list to include `SEARCH_TERM_MERGE_MAP` for potential use by other modules.
+Now let me validate the changes.
+Syntax check passed. Now let me test the import and verify the asthma merge works correctly.
+All validations pass:
+
+1. **Asthma merge works**: `asthma` now has the combined drug list: OMALIZUMAB, BENRALIZUMAB, DUPILUMAB, INHALED, MEPOLIZUMAB, RESLIZUMAB (from all 3 original rows)
+2. **Variants removed**: `allergic asthma` and `severe persistent allergic asthma` no longer exist as separate entries
+3. **Urticaria separate**: `urticaria` ÔåÆ `['OMALIZUMAB']` ÔÇö correctly kept separate
+4. **OMALIZUMAB maps correctly**: to both `asthma` and `urticaria` (not just one)
+5. **Drug matching still works**: ADALIMUMAB maps to 7 indications as expected
+6. **Count changes**: 162 Search_Terms (was 164 ÔÇö 2 fewer because 3 asthma entries merged into 1)
+7. **Fragment count unchanged**: 346 (no fragments lost, just redistributed)
+
+Now let me also verify the CLUSTER_MAPPING_SQL changes by checking the specific lines.
+CLUSTER_MAPPING_SQL now has 3 entries for `'asthma'`:
+- `('asthma', 'AST_COD')` ÔÇö was `allergic asthma`
+- `('asthma', 'eFI2_Asthma')` ÔÇö unchanged
+- `('asthma', 'SEVAST_COD')` ÔÇö was `severe persistent allergic asthma`
+
+And `('urticaria', 'XSAL_COD')` is untouched. 
+
+Now let me mark the task in the IMPLEMENTATION_PLAN.md.
+Now let me commit the changes and update progress.txt.
+I see many modified files from prior work. Let me check what my specific changes are in the files I modified.
+The diff is clean and precise. Now let me update progress.txt and then commit both files together.
+Now let me commit the changes.
+Now let me update the commit hash in progress.txt.
+**Iteration 2 complete.** Here's the summary:
+
+**Task 1.2: Merge related asthma Search_Terms** ÔÇö DONE
+
+Changes made:
+1. **CLUSTER_MAPPING_SQL**: `'allergic asthma'` ÔåÆ `'asthma'` (AST_COD) and `'severe persistent allergic asthma'` ÔåÆ `'asthma'` (SEVAST_COD). Now 3 Cluster_IDs all map to `'asthma'`.
+2. **SEARCH_TERM_MERGE_MAP**: New constant mapping asthma variants ÔåÆ canonical `"asthma"`. Reusable by other modules.
+3. **load_drug_indication_mapping()**: Applies merge map at CSV load time. Combined asthma drug list: BENRALIZUMAB, DUPILUMAB, INHALED, MEPOLIZUMAB, OMALIZUMAB, RESLIZUMAB.
+4. **urticaria** stays separate (XSAL_COD, OMALIZUMAB only).
+
+All validation passed. Commits: `b0a8a9d`, `4fed0e5`.
+
+**Next iteration**: Task 1.1 (update Snowflake query to return ALL matches with code_frequency) or Task 2.1 (assign_drug_indications function).
+**Iteration 2 complete.** Here's the summary:
+
+**Task 1.2: Merge related asthma Search_Terms** ÔÇö DONE
+
+Changes made:
+1. **CLUSTER_MAPPING_SQL**: `'allergic asthma'` ÔåÆ `'asthma'` (AST_COD) and `'severe persistent allergic asthma'` ÔåÆ `'asthma'` (SEVAST_COD). Now 3 Cluster_IDs all map to `'asthma'`.
+2. **SEARCH_TERM_MERGE_MAP**: New constant mapping asthma variants ÔåÆ canonical `"asthma"`. Reusable by other modules.
+3. **load_drug_indication_mapping()**: Applies merge map at CSV load time. Combined asthma drug list: BENRALIZUMAB, DUPILUMAB, INHALED, MEPOLIZUMAB, OMALIZUMAB, RESLIZUMAB.
+4. **urticaria** stays separate (XSAL_COD, OMALIZUMAB only).
+
+All validation passed. Commits: `b0a8a9d`, `4fed0e5`.
+
+**Next iteration**: Task 1.1 (update Snowflake query to return ALL matches with code_frequency) or Task 2.1 (assign_drug_indications function).
+