initial commit

This commit is contained in:
Andrew Charlwood
2026-05-12 16:40:03 +01:00
commit 647d1bfa7f
38 changed files with 2715 additions and 0 deletions
+81
View File
@@ -0,0 +1,81 @@
# Data Sources And Join Patterns
This repo assumes access to the Norfolk and Suffolk Snowflake environment used for medicines optimisation analysis.
## Core Medicines Sources
`REPORTING_DATASETS_ICB.SCRATCHPAD."MEDS__UnifiedPrescribingTable"`
Use this for current prescribing analysis when available. It combines EMIS and TPP prescribing into a single shape with `PersonKey`, `SNOMEDCode`, `DateMedicationStart`, quantity, estimated price, source system, prescribing organisation, and current registered GP.
`NATIONAL.GPMED."MedicinesDispensedInPrimarycare"`
Use this for official dispensing activity. It is usually slower to refresh than prescribing but is better aligned to dispensing/payment concepts. Key columns include `ProcessingPeriodDate`, `PatientPseudonym`, `PaiddmdCode`, `PaidBNFCode`, `CostCentreODSCode`, `ItemCount`, `PaidQuantity`, and `TotalPaidGross`.
`DATA_HUB.DWH."DimMedicineAndDevice"`
Use this as the medicine reference table. It links SNOMED product codes to BNF, VTM, VMP, product descriptions, routes, strengths, and indicative price fields.
`DATA_HUB.DWH."DimOrganisationAndSite"`
Use this for practice names and hierarchy columns such as PCN, Place, Alliance, and INT. For Norfolk/Suffolk practice reports, the common filter is:
```sql
WHERE "OrganisationSubType" = 'GP Practice'
AND "IsSiteActive" = 'Yes'
AND "IsSiteNorfolkAndSuffolk" = 'Yes'
AND "SiteCode" = "OrganisationCode"
```
`DATA_HUB.DWH."DimPerson"`
Use this for registered GP, age, demographic fields, and pseudonym-to-person links. For Suffolk-inclusive work, avoid old Norfolk-and-Waveney-only registration filters unless the report is explicitly Norfolk and Waveney only.
`DATA_HUB.PHM."PrimaryCareClinicalCoding"`
Use this for clinical coding events. Join on `PatientPseudonym`, then filter by `SNOMEDCode` and `EventDateTime`.
`DATA_HUB.PHM."ClinicalCodingClusterSnomedCodes"`
Use maintained clinical coding clusters where possible. This is usually safer than searching SNOMED descriptions with text matching.
## Common Joins
Prescribing to medicine:
```sql
INNER JOIN DATA_HUB.DWH."DimMedicineAndDevice" med
ON rx."SNOMEDCode" = med."ProductSnomedCode"
```
Prescribing to registered practice:
```sql
INNER JOIN DATA_HUB.DWH."DimOrganisationAndSite" gp
ON rx."CurrentGeneralPractice" = gp."OrganisationCode"
AND gp."SiteCode" = gp."OrganisationCode"
```
Dispensing to medicine:
```sql
INNER JOIN DATA_HUB.DWH."DimMedicineAndDevice" med
ON gpm."PaiddmdCode" = med."ProductSnomedCode"
```
Clinical coding to maintained cluster:
```sql
INNER JOIN DATA_HUB.PHM."ClinicalCodingClusterSnomedCodes" c
ON cc."SNOMEDCode" = c."SNOMEDCode"
AND c."Cluster_ID" = 'REPLACE_WITH_CLUSTER_ID'
```
## Choosing The Date Field
Use `ProcessingPeriodDate` for dispensing.
Use `DateMedicationStart` for prescribing activity windows.
Use `DateEventRecorded` as a conservative freshness marker for TPP prescribing extracts when checking whether the latest full month is safe to report.
+54
View File
@@ -0,0 +1,54 @@
# Glossary For Medicines Analysts
This is a practical glossary for analysts working with medicines data. It is not clinical guidance.
## Medicine Coding
`dm+d`: Dictionary of medicines and devices. The source of SNOMED product codes used for medicines.
`SNOMEDCode`: A coded identifier. In medicines queries this is usually a dm+d product code. In clinical coding queries it is usually a clinical event code.
`VTM`: Virtual Therapeutic Moiety. Broad ingredient-level grouping, useful when you want all products for a medicine substance.
`VMP`: Virtual Medicinal Product. More specific product family, useful when strength or formulation matters.
`AMP`: Actual Medicinal Product. Branded or supplier-specific product.
`VMPP` / `AMPP`: Pack-level products.
`BNFCode`: British National Formulary hierarchy code. Useful for broad prescribing sections such as antibiotics or antidepressants.
## Activity Sources
`Prescribing`: Records from GP clinical systems showing prescriptions/issues. More current, but not the same as dispensed supply.
`Dispensing`: BSA/GPMeds data showing items dispensed and paid. Better for payment-style reporting, usually with a lag.
`ProcessingPeriodDate`: The month attached to dispensing data.
`DateMedicationStart`: The prescribing date used in the unified prescribing table.
## People And Organisations
`PersonKey`: Internal person identifier used for linked analysis. Prefer this for patient counts where available.
`PatientPseudonym`: Pseudonymised patient identifier. Needed for some joins to clinical coding and dispensing.
`CurrentGeneralPractice`: Practice code attached to the patient or prescribing row.
`OrganisationCode` / `SiteCode`: Organisation and site identifiers. For practice-level reporting, use the parent practice row where `SiteCode = OrganisationCode` unless branch-level reporting is intended.
`PCN`, `Place`, `Alliance`, `INT`: Organisation hierarchy fields used for grouping practice outputs.
## Analysis Terms
`Cohort`: A defined set of patients meeting criteria.
`Denominator`: The population used to calculate a rate, for example registered patients at a practice.
`Numerator`: The count meeting the measure, for example patients prescribed a medicine.
`Quintile`: One of five ranked groups. Quintile 5 is often used for highest prescribing when ranking from low to high.
`Long format`: Output where each indicator is a row rather than a separate column. This is useful for appending measures and charting.
@@ -0,0 +1,42 @@
# SQL Style And Validation Guardrails
These notes are extracted from repeated medicines optimisation query work. They are intended to prevent common Snowflake and medicines-data errors.
## Snowflake Syntax
- Double-quote table and column identifiers, especially mixed-case columns such as `"PatientPseudonym"` and `"ProcessingPeriodDate"`.
- Quote aliases that will be consumed by Excel, Power BI, Python, or another SQL layer: `COUNT(*) AS "PatientCount"`.
- Use `CURRENT_DATE()` or `CURRENT_TIMESTAMP()` rather than T-SQL functions such as `GETDATE()`.
- Use `LIMIT` for quick checks.
- Cast long-format output values to a consistent numeric type before `UNION ALL`.
## Medicines Table Choices
- Use `REPORTING_DATASETS_ICB.SCRATCHPAD."MEDS__UnifiedPrescribingTable"` for current prescribing analysis when available.
- Use `NATIONAL.GPMED."MedicinesDispensedInPrimarycare"` for official dispensed/paid activity.
- Do not treat prescribing and dispensing as interchangeable. They answer related but different questions.
- Map prescribing SNOMED codes to BNF through `DATA_HUB.DWH."DimMedicineAndDevice"`.
- Use `"ProductDescription"` for medicine names from `DimMedicineAndDevice`. Do not assume a `"ProductName"` column.
## Geography And Denominators
- Keep the practice CTE visible in each report.
- For Norfolk/Suffolk GP practice outputs, use `DATA_HUB.DWH."DimOrganisationAndSite"` with `"IsSiteNorfolkAndSuffolk" = 'Yes'`.
- Only use older Norfolk-and-Waveney flags when the report is explicitly Norfolk and Waveney only.
- Be explicit about whether you are using organisation registered population, a counted active patient denominator, or another denominator.
## Clinical Coding
- Prefer maintained clusters in `DATA_HUB.PHM."ClinicalCodingClusterSnomedCodes"` over text searching SNOMED descriptions.
- Use `DATA_HUB.PHM."PrimaryCareClinicalCoding"` for unified clinical coding across systems.
- Join clinical coding to patients through `PatientPseudonym`, then to `DimPerson` when person or practice fields are needed.
- Make lookback windows visible in `SET` variables or a date CTE.
## Validation Before Sharing
- Check latest data dates before selecting a period.
- Run a limited version first and inspect row counts.
- Check the output column list before handing results to someone else.
- Avoid `SELECT *` in audit/detail outputs.
- Do not commit patient pseudonyms, CSV exports, spreadsheets, images, or local tooling files.
+62
View File
@@ -0,0 +1,62 @@
# Template Validation Status
Last reviewed: 2026-05-12
## Scope
The reusable SQL templates in folders `01_medicine_lookups` through
`06_advanced_methods` were checked with the Snowflake MCP against live table
metadata and representative parameter values. Templates that use `SET`
variables were validated by substituting safe example literals into the final
`SELECT` statement before running `describe_query`.
The `00_copied_reference` folder and `docs/training_intro_snowflake_sql` are
historic/reference material. They are retained for learning and comparison,
not as the preferred starting point for new analysis.
## Live Metadata Checked
- `REPORTING_DATASETS_ICB.SCRATCHPAD."MEDS__UnifiedPrescribingTable"`
- `NATIONAL.GPMED."MedicinesDispensedInPrimarycare"`
- `DATA_HUB.DWH."DimMedicineAndDevice"`
- `DATA_HUB.DWH."DimOrganisationAndSite"`
- `DATA_HUB.DWH."DimPerson"`
- `DATA_HUB.PHM."ClinicalCodingClusterSnomedCodes"`
- `DATA_HUB.PHM."PrimaryCareClinicalCoding"`
- `PRIMARY_CARE.TPP."SRPrimaryCareMedication"`
- `PRIMARY_CARE.TPP."SRPatient"`
## Query Compilation Checks
All files below passed Snowflake MCP `describe_query` validation:
- `01_medicine_lookups/medicine_reference_lookup.sql`
- `01_medicine_lookups/prescribing_by_vtm.sql`
- `01_medicine_lookups/prescribing_by_vmp.sql`
- `01_medicine_lookups/dispensing_by_vtm_or_vmp.sql`
- `01_medicine_lookups/prescribing_for_patient_pseudonym.sql`
- `02_prescribing_analysis/practice_level_bnf_prescribing_summary.sql`
- `02_prescribing_analysis/high_prescribing_practices_quintile_template.sql`
- `02_prescribing_analysis/prescribing_spend_by_patient_template.sql`
- `03_cohorts_and_clinical_coding/cluster_code_lookup.sql`
- `03_cohorts_and_clinical_coding/monthly_clinical_event_count_by_practice.sql`
- `03_cohorts_and_clinical_coding/prescribing_plus_clinical_code_cohort.sql`
- `04_rolling_and_pqs/rolling_period_generator.sql`
- `04_rolling_and_pqs/latest_data_dates.sql`
- `04_rolling_and_pqs/baseline_vs_evaluation_template.sql`
- `04_rolling_and_pqs/dual_source_long_format_measure_template.sql`
- `05_audit_detail/human_readable_prescribing_detail.sql`
- `06_advanced_methods/product_price_and_quantity_parsing_template.sql`
## Freshness Probe
The `latest_data_dates.sql` query was also executed with Snowflake MCP. At the
time of review it returned:
- `NATIONAL.GPMED."MedicinesDispensedInPrimarycare"` latest
`ProcessingPeriodDate`: `2025-07-01`
- `PRIMARY_CARE.TPP."SRPrimaryCareMedication"` latest `DateEventRecorded`
within the recent-window probe: `2026-03-28`
- `REPORTING_DATASETS_ICB.SCRATCHPAD."MEDS__UnifiedPrescribingTable"` latest
non-future `DateMedicationStart`: `2026-05-12`