# NHS High-Cost Drug Patient Pathway Analysis Tool A web-based application for analyzing secondary care patient treatment pathways. It processes clinical activity data to visualize hierarchical treatment patterns (Trust → Directory/Specialty → Drug → Patient pathway) as interactive Plotly icicle charts. ## Features - **Interactive Visualization**: Plotly icicle charts showing patient treatment hierarchies with cost and frequency statistics - **Multi-Source Data Loading**: CSV/Parquet files, SQLite database, or direct Snowflake integration - **GP Diagnosis Validation**: Validate patient indications against GP SNOMED codes via NHS Snowflake - **Modern Web Interface**: Browser-based UI using Reflex framework with NHS branding - **Flexible Filtering**: Filter by date range, NHS trusts, drugs, and medical directories - **Export Options**: Export charts as interactive HTML or data as CSV ## Requirements - Python 3.10 or higher - pip or uv package manager ### Optional (for Snowflake integration) - `snowflake-connector-python` package - Access to NHS Snowflake data warehouse with SSO authentication ## Installation ### Using pip ```bash # Clone the repository git clone cd patient-pathway-analysis # Install dependencies pip install -r requirements.txt ``` ### Using uv (recommended) ```bash # Install uv if not already installed pip install uv # Sync dependencies uv sync ``` ### Install with test dependencies ```bash pip install -e ".[test]" ``` ## Quick Start ### 1. Run the Web Application (Recommended) ```bash reflex run ``` Open http://localhost:3000 in your browser. ## Usage ### Web Interface (Reflex) 1. **Load Data**: On the home page, select your data source: - **SQLite Database**: Uses pre-loaded data from `data/pathways.db` - **File Upload**: Drag and drop a CSV or Parquet file - **Snowflake**: Fetch data directly from NHS Snowflake (requires configuration) 2. **Configure Filters**: - Set date range (Start Date, End Date, Last Seen After) - Navigate to Drug/Trust/Directory selection pages using the sidebar - Use search boxes to find and select items - Set minimum patient threshold to filter small groups 3. **Run Analysis**: Click "Run Analysis" to generate the icicle chart 4. **Export Results**: - **Export HTML**: Save the interactive chart as a standalone HTML file - **Export CSV**: Export the filtered data as a CSV file ### Data Migration To populate the SQLite database from CSV files: ```bash # Initialize database schema python -m data_processing.migrate # Load reference data from CSV files python -m data_processing.migrate --reference-data --verify # Load patient data from a CSV/Parquet file python -m data_processing.migrate --load-patient-data path/to/data.csv ``` ### Snowflake Configuration To use Snowflake integration, edit `config/snowflake.toml`: ```toml [connection] account = "your-account-identifier" warehouse = "your-warehouse" database = "DATA_HUB" schema = "CDM" authenticator = "externalbrowser" # NHS SSO authentication ``` ## Project Structure ``` . ├── core/ # Core configuration and models ├── data_processing/ # Data layer (SQLite, Snowflake, loaders) ├── analysis/ # Analysis pipeline (refactored from generate_graph) ├── visualization/ # Chart generation (Plotly) ├── pathways_app/ # Reflex web application ├── tools/ # Legacy modules (original analysis engine) ├── config/ # Configuration files ├── data/ # Reference data and SQLite database ├── docs/ # Additional documentation └── tests/ # Test suite ``` See `CLAUDE.md` for detailed architecture documentation. ## Documentation - [docs/USER_GUIDE.md](docs/USER_GUIDE.md) - End-user guide for using the web interface - [docs/DEPLOYMENT.md](docs/DEPLOYMENT.md) - Production deployment guide (Docker, nginx, cloud) - [CLAUDE.md](CLAUDE.md) - Technical architecture documentation for developers ## Deployment Quick production start: ```bash # Run in production mode reflex run --env prod ``` ## Running Tests ```bash # Run all tests python -m pytest tests/ -v # Run with coverage python -m pytest tests/ -v --cov=core --cov=data_processing --cov=analysis # Run only fast tests (exclude slow/integration) python -m pytest tests/ -v -m "not slow" ``` ## Reference Data Files The `data/` directory contains essential reference files: | File | Purpose | |------|---------| | `include.csv` | Drug filter list with default selections | | `defaultTrusts.csv` | NHS Trust list for filtering | | `directory_list.csv` | Medical specialties/directories | | `drugnames.csv` | Drug name standardization mapping | | `org_codes.csv` | Provider code to organization name mapping | | `drug_directory_list.csv` | Valid drug-to-directory mappings | | `drug_indication_clusters.csv` | Drug to SNOMED cluster mappings | | `ta-recommendations.xlsx` | NICE TA recommendations | ## Troubleshooting ### Reflex compilation errors If you encounter compilation errors when running `reflex run`: ```bash # Clear the build cache and restart rm -rf .web reflex run ``` ### Snowflake connection issues 1. Ensure `snowflake-connector-python` is installed: ```bash pip install snowflake-connector-python ``` 2. Check that `config/snowflake.toml` has the correct account identifier 3. For SSO authentication, a browser window will open automatically ### SQLite database not found If `data/pathways.db` doesn't exist, create it: ```bash python -m data_processing.migrate python -m data_processing.migrate --reference-data ``` ## Development ### Code Quality ```bash # Type checking python -m mypy core/ data_processing/ analysis/ --ignore-missing-imports # Run tests with coverage report python -m pytest tests/ -v --cov=core --cov=data_processing --cov-report=html ``` ### Adding New Reference Data 1. Add CSV file to `data/` directory 2. Define schema in `data_processing/schema.py` 3. Create migration function in `data_processing/reference_data.py` 4. Add path to `PathConfig` in `core/config.py` ## License Internal NHS use only. Not for distribution. ## Support For questions or issues, contact the Medicines Intelligence team.