Helena

Documentation / Changelog

Changelog

Version history and release notes for the Helena platform.

v1.5.0March 2026Classification Engine v3.17.0

Classification (HELIX-CR-2026-023)

PVS1: G2P molecular mechanism integration. DECIPHER Gene2Phenotype (G2P) database now provides primary GoF/DN guard for PVS1. gof_genes_g2p view contains 198 pure GoF/DN monoallelic genes where PVS1 is blocked.

PVS1: GOF_AD_GENES reduced from 36 to 14 fallback genes (genes not in G2P: 6 neurodegeneration/toxic aggregation, 3 somatic/neomorphic, 5 absent from G2P 2026-02-28 release). PRSS1 added to fallback.

PVS1: GNAS removed from GOF_AD_GENES (dual-mechanism: McCune-Albright = GoF, pseudohypoparathyroidism 1A = LoF). GNAS LoF variants now correctly receive PVS1.

PVS1: 49 dual-mechanism genes (SCN5A, LMNA, KCNH2, KCNQ1, FGFR1) excluded from GoF view, preserving PVS1 for legitimate LoF phenotypes.

PVS1: G2P confidence filter: only definitive, strong, moderate included. Limited confidence excluded.

Subtractive change: can only remove PVS1 from GoF/DN genes. Cannot create false P/LP.

Clinical trigger: PD2025_090 PRSS1 p.Gly177Ter (stop_gained in GoF gene).

Reference Databases

DECIPHER G2P: Molecular mechanism data (g2p_gene_mechanism table, 2,372 records) added to reference_db alongside existing HPO enrichment data.

gof_genes_g2p view: 198 pure GoF/DN monoallelic genes for PVS1 guard. Dual-mechanism genes excluded.

New loader script: scripts/load_g2p_mechanism.py with dry-run, force flags, and 9-point validation.

v1.4.0March 2026Classification Engine v3.16.8

Classification

PVS1: GoF AD gene exclusion (HELIX-CR-2026-022). GOF_AD_GENES curated list (36 genes) blocks PVS1 for AD genes where disease mechanism is gain-of-function, dominant-negative, or toxic aggregation.

PVS1: ClinGen AD Definitive/Strong genes bypass pLI/LOEUF constraint gate (v3.16.6). Genes like TUBB1 with ClinGen AD Definitive now qualify for PVS1.

LP classification: Disease association gate for LP rules LP4, LP5, LP6 (v3.16.7, HELIX-CR-2026-021). Prevents LP in genes without known Mendelian disease mechanism.

ClinGen Gene-Disease Validity integration (v3.16.0): ar_lof_genes_clingen and disease_associated_genes_clingen reference tables replace Python constants.

Clinical triggers: PD2025_082 RAC1 frameshift (GoF gene), PD2025_090 discordance analysis.

Reference Databases

ClinGen Gene-Disease Validity: ar_lof_genes_clingen and disease_associated_genes_clingen tables added to reference_db.

Total reference databases: 14 (previously 13).

v1.3.0March 2026Classification Engine v3.15.0

Classification

PM1: Critical functional domains migrated from Python constant to reference database table (refdb.interpro_pfam_domains). HELIX-CR-2026-012.

PM1: Full InterPro Pfam catalog (27,481 entries) loaded. 49 domains marked critical across 14 categories.

PM1: CRITICAL_PFAM_DOMAINS converted from short names to InterPro-verified accession numbers (v3.14.0, HELIX-CR-2026-011). Fixed 0 legitimate PM1 applications since curated list introduction.

PM1: Post-fix validation on HG2023_206: PM1 count from 2 (false positive) to 1,139 (legitimate).

PVS1: ClinVar LOF evidence as fifth constraint gate path (v3.13.0, HELIX-CR-2026-010). Small AD genes with ClinVar P/LP LOF but uninformative gnomAD constraint now qualify.

PP3/BP4: Missense consequence guard (v3.12.1, HELIX-CR-2026-009). BayesDel path now requires missense_variant consequence.

Clinical triggers: HG2023_206 RYR1 (PM1 fix), GCM2 p.Arg131Ter (PVS1 + PP3 guard).

Reference Databases

InterPro / Pfam added as 13th reference database (27,481 Pfam entries).

Loader script: scripts/load_interpro_pfam.py with InterPro API fetch and offline fallback.

Total reference databases: 13 (previously 12).

v1.2.0March 2026Classification Engine v3.12.0

HPO Enrichment (HELIX-REF-001)

Multi-Source HPO Enrichment Pipeline: gene-phenotype annotations expanded from 1 source (5,173 genes) to 6 sources (5,688 genes, +10%)

HPO Consortium: 320K records, 5,173 genes (primary source, curated gene-phenotype associations)

Orphanet disease-to-HPO: 168K records, 3,176 genes with clinical frequency data (via Orphadata en_product4.xml + en_product6.xml)

DECIPHER Gene2Phenotype (G2P): 43K records, 2,125 genes from 7 clinical panels (DD, Eye, Cardiac, Skin, Skeletal, Cancer, Ear)

Monarch Initiative: 151K records, 4,791 genes (HPO Consortium redistribution via Monarch KG)

ClinVar-MedGen: 245K records, 5,258 genes (P/LP variant -> MedGen CUI -> HPO chain mapping, largest contributor of new genes)

Manual clinical curation: common-disease genes outside rare-disease HPO scope (TBC1D4 with PMID evidence)

3,292 genes with increased HPO term coverage from cross-source aggregation

Source priority ordering: Orphanet > G2P > HPO Consortium > Monarch > ClinVar-MedGen

Low-confidence filtering: Orphanet modifying/candidate and G2P limited entries excluded from clinical view

Classification

Classifier version bumped to v3.12.0

PP4 criterion evaluates against enriched HPO set (more genes trigger PP4)

No ACMG criteria logic changes -- only annotation data expanded

Reference Databases

DECIPHER G2P added as new reference database (7 clinical panels)

Monarch Initiative added as new reference database

MedGen HPO-OMIM mapping added as new reference database

HPO database entry updated to reflect multi-source enriched view

Total reference databases: 12 (previously 9)

v1.1.0March 2026Classification Engine v3.11.4

Classification

PVS1 disease association gate: requires established disease association before applying PVS1 (ClinVar P/LP, Orphanet, ClinGen HI, AR LoF list, or VCEP coverage)

BS1 constraint-implied AD fallback: uses AD threshold (0.1%) for LoF-constrained genes without inheritance data

BS1 cascade expanded from 5 to 8 levels with explicit Orphanet AR and AR_LOF_GENES priorities

PVS1 non-canonical splice exclusion: splice_donor_5th_base_variant, splice_donor_region_variant, splice_acceptor_5th_base_variant blocked from PVS1 (v3.11.4)

PVS1 NMD_transcript_variant exclusion fix: NMD_transcript no longer blocks PVS1, NMD_escaping_variant now blocks instead (v3.11.3)

De novo projection: prospective PS2 upgrade computation for VUS variants to guide trio testing (v3.11.0)

ClinVar low-confidence disease association gate: 1-star P/LP requires gene disease evidence (v3.10.0)

disease_genes_clinvar CTE circular reference fix: review_stars >= 2 filter (v3.10.1)

PVS1 last-exon NMD downgrade: last-exon truncating variants receive PVS1_Strong instead of PVS1 Very Strong (v3.9.0)

BP1 ClinVar pathogenic missense guard and PP3_Supporting label (v3.8.2)

PP3_Strong disease association gate (v3.8.1)

Reference Databases

Orphanet/Orphadata documented as separate reference database (gene-disease-inheritance for 3,200 genes)

VCEP gene-specific specifications documented as separate reference database (~50-60 genes)

Total reference databases: 9 (previously 7)

v1.0.0February 2026Initial Release

Classification

ACMG/AMP 2015 classification with Bayesian point-based framework (Tavtigian 2018/2020)

19 of 28 ACMG criteria automated

BayesDel_noAF with ClinGen SVI-calibrated thresholds for PP3/BP4 (Pejaver 2022)

SpliceAI integration for PP3_splice with PVS1 double-counting prevention

ClinVar override logic with review star quality filtering

Gene-specific VCEP threshold support

Reference Databases

gnomAD v4.1 (759M variants, 807K individuals)

gnomAD Constraint v4.1 (18.2K genes)

ClinVar 2025-01

dbNSFP 4.9c

SpliceAI precomputed (Ensembl MANE Release 113)

HPO gene-phenotype associations

ClinGen dosage sensitivity

Ensembl VEP Release 113 with local offline cache

Phenotype Matching

Lin semantic similarity with HPO ontology graph

Five-tier clinical priority system

Gene-level deduplication and aggregation

Automatic HPO term extraction from free-text clinical descriptions

Screening

Seven-component scoring algorithm (constraint, deleteriousness, phenotype, dosage, consequence, compound het, age relevance)

Six screening modes (diagnostic, neonatal, pediatric, proactive adult, carrier, pharmacogenomics)

Age-aware prioritization with curated gene lists

Clinical boosts for ethnicity, family history, sex-linked inheritance, consanguinity, pregnancy

Four-tier priority ranking with clinical actionability labels

AI Clinical Assistant

Conversational variant analysis with natural language database queries

Biomedical literature search (1M+ publications, local PubMed mirror)

Four-level adaptive clinical interpretation generation

PDF and DOCX report export with Helena branding

Genomics-aware visualization suggestions

On-premise LLM inference within EU infrastructure

Infrastructure

EU-based processing (Helsinki, Finland)

All databases stored and queried locally

Zero external API calls during variant processing

GDPR-compliant data handling

DuckDB-based analytical pipeline