refactor: reorganize repository to src/ layout

Move 6 packages (core, config, data_processing, analysis, visualization, cli) into src/ to reduce root clutter. Merge tools/data.py into data_processing/transforms.py. Move docs to docs/. Path resolution via .pth file (setup_dev.py), pytest pythonpath config, and sys.path bootstrap in rxconfig.py and CLI entry points. Clean up pyproject.toml deps (remove stale pins, add snowflake-connector-python). Fix tomllib import for Python 3.10 compatibility. All 113 tests pass.
2026-02-06 12:03:48 +00:00
parent 1581b1d3dd
commit 76838887e6
40 changed files with 589 additions and 214 deletions
@@ -18,10 +18,11 @@ NHS High-Cost Drug Patient Pathway Analysis Tool - a web-based application that

 ```bash
 # Install dependencies
-pip install -r requirements.txt
-# OR with uv
 uv sync

+# One-time dev setup: adds src/ to Python path via .pth file
+uv run python setup_dev.py
+
 # Initialize/migrate the database (creates pathway tables)
 python -m data_processing.migrate

@@ -75,53 +76,53 @@ The refresh command:

 ```
 .
-├── core/                    # Core configuration and models
-│   ├── config.py           # PathConfig dataclass for file paths
-│   ├── models.py           # AnalysisFilters dataclass
-│   └── logging_config.py   # Structured logging setup
+├── src/                         # All application library code
+│   ├── core/                    # Foundation: paths, models, logging
+│   │   ├── config.py           # PathConfig dataclass for file paths
+│   │   ├── models.py           # AnalysisFilters dataclass
+│   │   └── logging_config.py   # Structured logging setup
+│   │
+│   ├── config/                  # Service configuration
+│   │   ├── __init__.py         # SnowflakeConfig + loader
+│   │   └── snowflake.toml      # Connection settings (co-located with loader)
+│   │
+│   ├── data_processing/         # Data layer
+│   │   ├── database.py         # SQLite connection management
+│   │   ├── schema.py           # Database schema (reference + pathway tables)
+│   │   ├── pathway_pipeline.py # Pipeline: Snowflake → SQLite
+│   │   ├── transforms.py       # Data transformations (UPID, drug names, directory)
+│   │   ├── loader.py           # FileDataLoader for CSV/Parquet files
+│   │   ├── reference_data.py   # Reference data migration
+│   │   ├── snowflake_connector.py  # Snowflake integration
+│   │   ├── cache.py            # Query result caching
+│   │   ├── data_source.py      # Data source fallback chain
+│   │   └── diagnosis_lookup.py # GP diagnosis lookup (SNOMED clusters)
+│   │
+│   ├── analysis/                # Analysis pipeline
+│   │   ├── pathway_analyzer.py # prepare_data, calculate_statistics, build_hierarchy
+│   │   └── statistics.py       # Statistical calculation functions
+│   │
+│   ├── visualization/           # Chart generation
+│   │   └── plotly_generator.py # create_icicle_figure, save_figure_html
+│   │
+│   └── cli/                     # CLI tools
+│       └── refresh_pathways.py # Data refresh command
 │
-├── cli/                     # Command-line interface tools
-│   ├── __init__.py
-│   └── refresh_pathways.py # CLI to refresh pre-computed pathway data
+├── pathways_app/                # Reflex web app (stays at root — framework requirement)
+│   ├── pathways_app.py         # AppState + page components
+│   └── components/             # Layout and navigation components
 │
-├── data_processing/         # Data layer
-│   ├── database.py         # SQLite connection management
-│   ├── schema.py           # Database schema (reference + pathway tables)
-│   ├── pathway_pipeline.py # Pathway processing pipeline (Snowflake → SQLite)
-│   ├── loader.py           # FileDataLoader for CSV/Parquet files
-│   ├── reference_data.py   # Reference data migration
-│   ├── snowflake_connector.py  # Snowflake integration
-│   ├── cache.py            # Query result caching
-│   ├── data_source.py      # Data source fallback chain (Snowflake/file)
-│   └── diagnosis_lookup.py # GP diagnosis lookup and drug-indication mapping
-│
-├── analysis/                # Analysis pipeline
-│   ├── pathway_analyzer.py # prepare_data, calculate_statistics, build_hierarchy
-│   └── statistics.py       # Statistical calculation functions
-│
-├── visualization/           # Chart generation
-│   └── plotly_generator.py # create_icicle_figure, save_figure_html
-│
-├── pathways_app/           # Reflex web application
-│   ├── pathways_app.py     # State class and page components
-│   └── components/         # Layout and navigation components
-│
-├── tools/                   # Legacy modules
-│   ├── dashboard_gui.py    # Original analysis engine (being refactored)
-│   └── data.py             # Data transformations (UPID, drug names, directory)
-│
-├── config/                  # Configuration files
-│   └── snowflake.toml      # Snowflake connection settings
-│
-├── data/                    # Reference data and database
-│   ├── pathways.db         # SQLite database (includes pathway_nodes)
-│   └── *.csv               # Reference data files
-│
-└── tests/                   # Test suite
-    ├── conftest.py         # Pytest fixtures
-    └── test_*.py           # Test modules
+├── tests/                       # Test suite (113 tests)
+├── data/                        # Reference data + SQLite DB
+├── docs/                        # Documentation
+├── assets/                      # Static assets (logo, favicon)
+├── archive/                     # Historical/deprecated
+└── logs/                        # Runtime logs
 ```

+**Path resolution**: `src/` is added to `sys.path` via a `.pth` file (created by `setup_dev.py`).
+All imports use package names directly: `from core import ...`, `from data_processing import ...`, etc.
+
 ### Pathway Data Architecture

 The application uses a pre-computed pathway architecture for performance:
@@ -252,16 +253,12 @@ The `AppState` class manages all application state:
 - Switching reloads pathway data from SQLite filtered by `chart_type`
 - Note: Directory filter only applies to directory charts (indication charts store Search_Terms in the directory column)

-### Legacy Modules (`tools/`)
+### Data Transformations (`data_processing/transforms.py`)

-Still used during transition:
-
- **tools/data.py** - Data transformation functions:
-  - `patient_id()` - Creates UPID = Provider Code (first 3 chars) + PersonKey
-  - `drug_names()` - Standardizes via drugnames.csv lookup
-  - `department_identification()` - 5-level fallback chain for directory assignment
-
- **tools/dashboard_gui.py** - Original analysis engine (being replaced by `analysis/` module)
+Core data transformation functions used by the pipeline:
+- `patient_id()` - Creates UPID = Provider Code (first 3 chars) + PersonKey
+- `drug_names()` - Standardizes via drugnames.csv lookup
+- `department_identification()` - 5-level fallback chain for directory assignment

 ### Data Flow

@@ -274,7 +271,7 @@ Still used during transition:
           │
           ▼ (fetch_and_transform_data)
    ┌──────────────────────────────────────────┐
-    │ Data Transformations (tools/data.py)     │
+    │ Data Transformations (data_processing/transforms.py)     │
    │   → patient_id() creates UPID            │
    │   → drug_names() standardizes names      │
    │   → department_identification() → Dir    │
@@ -461,7 +458,7 @@ Test coverage includes:

 ## Configuration

-### Snowflake Connection (`config/snowflake.toml`)
+### Snowflake Connection (`src/config/snowflake.toml`)

 ```toml
 [snowflake]
@@ -475,7 +472,7 @@ authenticator = "externalbrowser"  # Required for NHS SSO
 ### Logging

 Logs are written to `logs/` directory with structured format.
-Configure via `core/logging_config.py`.
+Configure via `src/core/logging_config.py`.

 ## Breaking Changes from Original App

@@ -519,13 +516,13 @@ The pre-computed pathway architecture introduces these changes:

 ### Adding New Analysis Features

-1. Add statistical functions to `analysis/statistics.py`
-2. Integrate into pipeline in `analysis/pathway_analyzer.py`
-3. Update visualization in `visualization/plotly_generator.py`
+1. Add statistical functions to `src/analysis/statistics.py`
+2. Integrate into pipeline in `src/analysis/pathway_analyzer.py`
+3. Update visualization in `src/visualization/plotly_generator.py`

 ### Adding New Reference Data

 1. Add CSV file to `data/` directory
-2. Define schema in `data_processing/schema.py`
-3. Create migration function in `data_processing/reference_data.py`
-4. Add path to `PathConfig` in `core/config.py`
+2. Define schema in `src/data_processing/schema.py`
+3. Create migration function in `src/data_processing/reference_data.py`
+4. Add path to `PathConfig` in `src/core/config.py`