7.0 KiB
7.0 KiB
Snowflake Reference
Essential database context for querying NHS data. Read this every iteration when working with Snowflake.
Snowflake MCP Server
Use mcp__snowflake-mcp__* functions to explore schema and test queries.
Schema Discovery (USE THESE FIRST)
test_connection()- Verify connectivitylist_databases()- List accessible databaseslist_schemas(database_name)- List schemas in a databaselist_tables(database, schema)- List tables with descriptionslist_views(schema_name, database)- List views with descriptionsdescribe_table(table_name, database)- Get detailed table schemadescribe_query(query, database)- Preview query output columns without execution
Query Execution
read_data(query, database, max_rows)- Execute SELECT queries with row limitsread_data_paginated(query, database, page_size, page)- Paginated results with total countread_data_pandas(query, database, max_rows, output_format)- Results in pandas-friendly formats
Async Query Support (long-running queries)
execute_async(query, database)- Submit asynchronously, returns query_idget_query_status(query_id, database)- Check statusget_async_results(query_id, database, max_rows)- Retrieve results
Usage Guidelines
- ALWAYS verify table structures and column names via MCP before writing queries
- Test with small result sets (
LIMIT 20) before full execution - Use
describe_queryto preview complex query outputs before running - Use async queries for operations expected to take >30 seconds
Database Overview
| Database | Purpose |
|---|---|
DATA_HUB |
Analyst-curated data warehouse - primary source for most queries |
PRIMARY_CARE |
Raw extracts from EMIS and TPP clinical systems |
NATIONAL |
NHS England national datasets (SUS, ECDS, MHSDS, etc.) |
FACTS_AND_DIMENSIONS_ALL_DATA |
External reference data (BNF, SNOMED, QOF clusters) |
REPORTING_DATASETS_ICB |
Reporting outputs and analyst workspaces (includes SCRATCHPAD) |
Avoid: SYSTEM database.
Key Tables and Views
DATA_HUB.DWH (Dimensions)
| View | Purpose | Key Columns |
|---|---|---|
DimMedicineAndDevice |
Master medication/device reference | ProductSnomedCode, TherapeuticMoietySnomedCode (VTM), BNFParagraphCode, StrengthDescription, ProductDescription |
DimPerson |
Patient demographics | PatientPseudonym, PersonKey, CurrentGeneralPractice, IsCurrentNWRegistered, YearMonthBirth |
DimSnomedCode |
SNOMED code descriptions | SnomedCode, SnomedDescription |
DimOrganisationAndSite |
GP practices and NHS orgs | SiteCode, OrganisationName, OrganisationSubType, IsSiteNorfolkAndWaveney, IsSiteActive |
DimDate |
Date dimension | |
DimCondition |
Clinical conditions | Long-term condition flags |
DimDeprivation |
Deprivation rankings by area |
CRITICAL:
ProductDescriptionis the correct column for product names.ProductNamedoes NOT exist.IsLatestdoes NOT exist inDimMedicineAndDevice.
DATA_HUB.CDM (Common Data Model)
| View | Purpose | Key Columns |
|---|---|---|
Acute__Conmon__PatientLevelDrugs |
HCD activity data | PseudoNHSNoLinked, InterventionDate, DrugName, Price Actual |
Note: HCD PseudoNHSNoLinked = GP PatientPseudonym for patient linkage.
DATA_HUB.PHM (Population Health Management)
| View | Purpose | Key Columns |
|---|---|---|
PrimaryCareClinicalCoding |
Unified clinical coding (EMIS + TPP, no duplicates) | PatientPseudonym, SNOMEDCode, EventDateTime, NumericValue |
PrimaryCareMedication |
Unified medication data (EMIS + TPP, no duplicates) | PatientPseudonym, SNOMEDCode, DateMedicationStart, Quantity |
ClinicalCodingClusterSnomedCodes |
SNOMED codes grouped by cluster | ClusterId, SnomedCode |
PersonCohort |
Pre-defined patient cohorts |
Prefer DATA_HUB.PHM unified views over raw PRIMARY_CARE tables.
Patient Identifiers
| Identifier | Source | Usage |
|---|---|---|
PatientPseudonym |
DATA_HUB, NATIONAL | Primary - use for most joins |
PseudoNHSNoLinked |
DATA_HUB.CDM (HCD data) | Links to PatientPseudonym |
PersonKey |
DATA_HUB.DWH.DimPerson | Integer key for person dimension |
Standard Join Patterns
-- HCD Activity to GP Diagnosis
FROM DATA_HUB.CDM."Acute__Conmon__PatientLevelDrugs" hcd
LEFT JOIN DATA_HUB.PHM."PrimaryCareClinicalCoding" pcc
ON hcd."PseudoNHSNoLinked" = pcc."PatientPseudonym"
-- Activity to Person Demographics
FROM DATA_HUB.CDM."Acute__Conmon__PatientLevelDrugs" hcd
INNER JOIN DATA_HUB.DWH."DimPerson" dp
ON hcd."PseudoNHSNoLinked" = dp."PatientPseudonym"
CRITICAL: Registered Population Filter
ALWAYS apply when counting patients:
WHERE dp."IsCurrentNWRegistered" = 'Yes'
AND dp."CurrentGeneralPractice" <> '*'
Without this filter, counts will be ~2x inflated (includes deceased, deregistered, out-of-area patients).
Query Development Patterns
Clinical Condition Detection (GP SNOMED Clusters)
-- Get all SNOMED codes for a clinical cluster
SELECT "SnomedCode"
FROM DATA_HUB.PHM."ClinicalCodingClusterSnomedCodes"
WHERE "ClusterId" = 'RARTH_COD' -- Rheumatoid arthritis
-- Check if patient has condition
SELECT DISTINCT pcc."PatientPseudonym"
FROM DATA_HUB.PHM."PrimaryCareClinicalCoding" pcc
WHERE pcc."SNOMEDCode" IN (SELECT "SnomedCode" FROM cluster_codes)
AND pcc."PatientPseudonym" IS NOT NULL
Available SNOMED Clusters for HCD Indications
RARTH_COD(155 codes) - Rheumatoid arthritisPSORIASIS_COD(116 codes) - PsoriasisCROHNS_COD(93 codes) - Crohn's diseaseULCCOLITIS_COD(62 codes) - Ulcerative colitisMS_COD(44 codes) - Multiple sclerosisDM_COD/DMTYPE1_COD/DMTYPE2AUDIT_COD- Diabetes
Sample HCD Activity Query
SELECT
hcd."PseudoNHSNoLinked" AS PatientPseudonym,
hcd."DrugName",
hcd."InterventionDate",
hcd."Provider Code",
hcd."OrganisationName"
FROM DATA_HUB.CDM."Acute__Conmon__PatientLevelDrugs" hcd
WHERE hcd."InterventionDate" >= '2024-01-01'
LIMIT 20
Snowflake SQL Syntax
- Double-quote identifiers:
"PatientPseudonym" - Date literals:
'2025-04-01'::DATE - Date functions:
DATEADD('MONTH', -3, date),DATEDIFF('YEAR', d1, d2),LAST_DAY(date) - Boolean:
TRUE/FALSE - No
TOP N- useLIMIT N COALESCE(),NULLIF(),GREATEST()work as expected
Troubleshooting
Column not found errors
- Use
describe_table(table_name, database)to get actual column names - Remember: Snowflake identifiers are case-sensitive when quoted
- Common mistakes:
ProductName(wrong) vsProductDescription(correct)
Empty results
- Check patient identifier filtering (
IS NOT NULL) - Check date ranges
- Test with
LIMIT 20first to see sample data
Slow queries
- Add
LIMITduring development - Use
describe_queryto validate structure before execution - Consider async execution for large result sets