Reference Databases
Helena uses thirteen reference databases for variant annotation and ACMG classification. All databases are stored locally on EU-based infrastructure in Helsinki, Finland. No variant data is sent to external APIs during processing.
Database versions are fixed per deployment. Each version undergoes validation testing before production deployment to ensure consistency with expected classification outcomes. The current versions and their roles in ACMG classification are documented below.
Zero External API Calls
During variant processing, Helena makes zero external API calls. All reference databases are stored locally. Ensembl VEP runs with a local cache. No patient data leaves the server at any processing stage.
Database Summary
| Database | Version | Records | Primary Use | ACMG Criteria |
|---|---|---|---|---|
| gnomAD | v4.1.0 | ~759M variants | Population frequencies | BA1, BS1, BS2, PM2 |
| ClinVar | 2025-01 | ~4.1M variants | Clinical significance | PS1, PP5, BP6, ClinVar override |
| dbNSFP | 4.9c | ~80.6M sites | Functional predictions | PP3, BP4 (BayesDel_noAF) |
| SpliceAI | MANE R113 | All coding variants | Splice impact | PP3_splice, BP7 guard |
| gnomAD Constraint | v4.1.0 | ~18.2K genes | Gene-level tolerance | PVS1, PP2, BP1 |
| HPO | Latest release | ~320K associations | Gene-phenotype mapping | PP4 |
| ClinGen | Latest release | ~1.6K genes | Dosage sensitivity | BS1, BP2 |
| Orphanet | June 2025 | ~3.2K genes | Gene-disease-inheritance | BS1 (AD/AR/XLD), PVS1 gate |
| VCEP Specs | Latest | ~60 genes | Gene-specific thresholds | BA1, BS1, PM2, PVS1 gate |
| ClinGen GDV | Latest release | ~3.4K gene-disease pairs | Gene-disease validity | PVS1 constraint gate, disease association gate |
| DECIPHER G2P | Latest release | ~2.4K gene-mechanism pairs | Molecular mechanism curation | PVS1 GoF/DN guard, disease association gate |
| GoFCards | Release 1.0 | 579 genes, 3.2K variants | Gain-of-function curation | PVS1 GoF guard |
| Ensembl VEP | Release 113 | All consequences | Variant effect prediction | PVS1, PM1, PM4, BP1, BP3, BP7 |
Annotation Pipeline
Reference data is loaded into each variant record during Stage 4 of the processing pipeline. After annotation, every variant carries all reference columns directly -- no database lookups are needed during classification or clinical review. The annotation order is:
gnomAD v4.1
Population allele frequencies. Positional match on chromosome, position, reference allele, and alternate allele. Loads 6 columns.
ClinVar
Clinical significance assertions. Same positional match. Loads 7 columns including review stars and disease associations.
dbNSFP 4.9c
Functional predictions from SIFT, AlphaMissense, MetaSVM, DANN, BayesDel, and conservation scores. Loads 9 columns with duplicate variant aggregation.
gnomAD Constraint
Gene-level tolerance metrics. Joined on gene symbol. Loads 4 columns: pLI, LOEUF, o/e LoF, and missense Z-score.
HPO
Gene-phenotype associations. Joined on gene symbol with deduplication and aggregation. Loads 6 columns.
ClinGen
Dosage sensitivity scores. Joined on gene symbol. Loads 2 columns: haploinsufficiency and triplosensitivity.
Ensembl VEP runs as a separate stage (Stage 3) before database annotation, providing consequence predictions and transcript selection that the annotation phases then build upon. SpliceAI scores are accessed from precomputed data during VEP annotation.
In This Section
gnomAD
Population allele frequencies from 807,162 individuals across 8 genetic ancestry groups.
ClinVar
Clinical significance assertions from submitting laboratories worldwide.
dbNSFP
Functional predictions and conservation scores for all possible coding SNVs.
HPO
Gene-phenotype associations from the Human Phenotype Ontology.
ClinGen
Gene dosage sensitivity curation from the Clinical Genome Resource.
Ensembl VEP
Variant Effect Predictor for consequence annotation and transcript selection.
SpliceAI Precomputed
Precomputed splice impact delta scores for all coding variants.
Update Policy
How and when reference databases are updated, validated, and versioned.
Orphanet / Orphadata
Gene-disease-inheritance annotations for ~3,200 genes covering ~6,100 rare diseases.
VCEP Specifications
Gene-specific ACMG criteria from ClinGen Variant Curation Expert Panels.
For details on how these databases are combined during ACMG classification, see the Criteria Reference.