Files
HighCostDrugsDemo/README.md
T
2026-02-04 13:04:29 +00:00

230 lines
6.2 KiB
Markdown

# NHS High-Cost Drug Patient Pathway Analysis Tool
A web-based application for analyzing secondary care patient treatment pathways. It processes clinical activity data to visualize hierarchical treatment patterns (Trust → Directory/Specialty → Drug → Patient pathway) as interactive Plotly icicle charts.
## Features
- **Interactive Visualization**: Plotly icicle charts showing patient treatment hierarchies with cost and frequency statistics
- **Multi-Source Data Loading**: CSV/Parquet files, SQLite database, or direct Snowflake integration
- **GP Diagnosis Validation**: Validate patient indications against GP SNOMED codes via NHS Snowflake
- **Modern Web Interface**: Browser-based UI using Reflex framework with NHS branding
- **Flexible Filtering**: Filter by date range, NHS trusts, drugs, and medical directories
- **Export Options**: Export charts as interactive HTML or data as CSV
## Requirements
- Python 3.10 or higher
- pip or uv package manager
### Optional (for Snowflake integration)
- `snowflake-connector-python` package
- Access to NHS Snowflake data warehouse with SSO authentication
## Installation
### Using pip
```bash
# Clone the repository
git clone <repository-url>
cd patient-pathway-analysis
# Install dependencies
pip install -r requirements.txt
```
### Using uv (recommended)
```bash
# Install uv if not already installed
pip install uv
# Sync dependencies
uv sync
```
### Install with test dependencies
```bash
pip install -e ".[test]"
```
## Quick Start
### 1. Run the Web Application (Recommended)
```bash
reflex run
```
Open http://localhost:3000 in your browser.
## Usage
### Web Interface (Reflex)
1. **Load Data**: On the home page, select your data source:
- **SQLite Database**: Uses pre-loaded data from `data/pathways.db`
- **File Upload**: Drag and drop a CSV or Parquet file
- **Snowflake**: Fetch data directly from NHS Snowflake (requires configuration)
2. **Configure Filters**:
- Set date range (Start Date, End Date, Last Seen After)
- Navigate to Drug/Trust/Directory selection pages using the sidebar
- Use search boxes to find and select items
- Set minimum patient threshold to filter small groups
3. **Run Analysis**: Click "Run Analysis" to generate the icicle chart
4. **Export Results**:
- **Export HTML**: Save the interactive chart as a standalone HTML file
- **Export CSV**: Export the filtered data as a CSV file
### Data Migration
To populate the SQLite database from CSV files:
```bash
# Initialize database schema
python -m data_processing.migrate
# Load reference data from CSV files
python -m data_processing.migrate --reference-data --verify
# Load patient data from a CSV/Parquet file
python -m data_processing.migrate --load-patient-data path/to/data.csv
```
### Snowflake Configuration
To use Snowflake integration, edit `config/snowflake.toml`:
```toml
[connection]
account = "your-account-identifier"
warehouse = "your-warehouse"
database = "DATA_HUB"
schema = "CDM"
authenticator = "externalbrowser" # NHS SSO authentication
```
## Project Structure
```
.
├── core/ # Core configuration and models
├── data_processing/ # Data layer (SQLite, Snowflake, loaders)
├── analysis/ # Analysis pipeline (refactored from generate_graph)
├── visualization/ # Chart generation (Plotly)
├── pathways_app/ # Reflex web application
├── tools/ # Legacy modules (original analysis engine)
├── config/ # Configuration files
├── data/ # Reference data and SQLite database
├── docs/ # Additional documentation
└── tests/ # Test suite
```
See `CLAUDE.md` for detailed architecture documentation.
## Documentation
- [docs/USER_GUIDE.md](docs/USER_GUIDE.md) - End-user guide for using the web interface
- [docs/DEPLOYMENT.md](docs/DEPLOYMENT.md) - Production deployment guide (Docker, nginx, cloud)
- [CLAUDE.md](CLAUDE.md) - Technical architecture documentation for developers
## Deployment
Quick production start:
```bash
# Run in production mode
reflex run --env prod
```
## Running Tests
```bash
# Run all tests
python -m pytest tests/ -v
# Run with coverage
python -m pytest tests/ -v --cov=core --cov=data_processing --cov=analysis
# Run only fast tests (exclude slow/integration)
python -m pytest tests/ -v -m "not slow"
```
## Reference Data Files
The `data/` directory contains essential reference files:
| File | Purpose |
|------|---------|
| `include.csv` | Drug filter list with default selections |
| `defaultTrusts.csv` | NHS Trust list for filtering |
| `directory_list.csv` | Medical specialties/directories |
| `drugnames.csv` | Drug name standardization mapping |
| `org_codes.csv` | Provider code to organization name mapping |
| `drug_directory_list.csv` | Valid drug-to-directory mappings |
| `drug_indication_clusters.csv` | Drug to SNOMED cluster mappings |
| `ta-recommendations.xlsx` | NICE TA recommendations |
## Troubleshooting
### Reflex compilation errors
If you encounter compilation errors when running `reflex run`:
```bash
# Clear the build cache and restart
rm -rf .web
reflex run
```
### Snowflake connection issues
1. Ensure `snowflake-connector-python` is installed:
```bash
pip install snowflake-connector-python
```
2. Check that `config/snowflake.toml` has the correct account identifier
3. For SSO authentication, a browser window will open automatically
### SQLite database not found
If `data/pathways.db` doesn't exist, create it:
```bash
python -m data_processing.migrate
python -m data_processing.migrate --reference-data
```
## Development
### Code Quality
```bash
# Type checking
python -m mypy core/ data_processing/ analysis/ --ignore-missing-imports
# Run tests with coverage report
python -m pytest tests/ -v --cov=core --cov=data_processing --cov-report=html
```
### Adding New Reference Data
1. Add CSV file to `data/` directory
2. Define schema in `data_processing/schema.py`
3. Create migration function in `data_processing/reference_data.py`
4. Add path to `PathConfig` in `core/config.py`
## License
Internal NHS use only. Not for distribution.
## Support
For questions or issues, contact the Medicines Intelligence team.