Classification (HELIX-CR-2026-023)
PVS1: G2P molecular mechanism integration. DECIPHER Gene2Phenotype (G2P) database now provides primary GoF/DN guard for PVS1. gof_genes_g2p view contains 198 pure GoF/DN monoallelic genes where PVS1 is blocked.
PVS1: GOF_AD_GENES reduced from 36 to 14 fallback genes (genes not in G2P: 6 neurodegeneration/toxic aggregation, 3 somatic/neomorphic, 5 absent from G2P 2026-02-28 release). PRSS1 added to fallback.
PVS1: GNAS removed from GOF_AD_GENES (dual-mechanism: McCune-Albright = GoF, pseudohypoparathyroidism 1A = LoF). GNAS LoF variants now correctly receive PVS1.
PVS1: 49 dual-mechanism genes (SCN5A, LMNA, KCNH2, KCNQ1, FGFR1) excluded from GoF view, preserving PVS1 for legitimate LoF phenotypes.
PVS1: G2P confidence filter: only definitive, strong, moderate included. Limited confidence excluded.
Subtractive change: can only remove PVS1 from GoF/DN genes. Cannot create false P/LP.
Clinical trigger: PD2025_090 PRSS1 p.Gly177Ter (stop_gained in GoF gene).
Reference Databases
DECIPHER G2P: Molecular mechanism data (g2p_gene_mechanism table, 2,372 records) added to reference_db alongside existing HPO enrichment data.
gof_genes_g2p view: 198 pure GoF/DN monoallelic genes for PVS1 guard. Dual-mechanism genes excluded.
New loader script: scripts/load_g2p_mechanism.py with dry-run, force flags, and 9-point validation.
Classification
PVS1: GoF AD gene exclusion (HELIX-CR-2026-022). GOF_AD_GENES curated list (36 genes) blocks PVS1 for AD genes where disease mechanism is gain-of-function, dominant-negative, or toxic aggregation.
PVS1: ClinGen AD Definitive/Strong genes bypass pLI/LOEUF constraint gate (v3.16.6). Genes like TUBB1 with ClinGen AD Definitive now qualify for PVS1.
LP classification: Disease association gate for LP rules LP4, LP5, LP6 (v3.16.7, HELIX-CR-2026-021). Prevents LP in genes without known Mendelian disease mechanism.
ClinGen Gene-Disease Validity integration (v3.16.0): ar_lof_genes_clingen and disease_associated_genes_clingen reference tables replace Python constants.
Clinical triggers: PD2025_082 RAC1 frameshift (GoF gene), PD2025_090 discordance analysis.
Reference Databases
ClinGen Gene-Disease Validity: ar_lof_genes_clingen and disease_associated_genes_clingen tables added to reference_db.
Total reference databases: 14 (previously 13).
Classification
PM1: Critical functional domains migrated from Python constant to reference database table (refdb.interpro_pfam_domains). HELIX-CR-2026-012.
PM1: Full InterPro Pfam catalog (27,481 entries) loaded. 49 domains marked critical across 14 categories.
PM1: CRITICAL_PFAM_DOMAINS converted from short names to InterPro-verified accession numbers (v3.14.0, HELIX-CR-2026-011). Fixed 0 legitimate PM1 applications since curated list introduction.
PM1: Post-fix validation on HG2023_206: PM1 count from 2 (false positive) to 1,139 (legitimate).
PVS1: ClinVar LOF evidence as fifth constraint gate path (v3.13.0, HELIX-CR-2026-010). Small AD genes with ClinVar P/LP LOF but uninformative gnomAD constraint now qualify.
PP3/BP4: Missense consequence guard (v3.12.1, HELIX-CR-2026-009). BayesDel path now requires missense_variant consequence.
Clinical triggers: HG2023_206 RYR1 (PM1 fix), GCM2 p.Arg131Ter (PVS1 + PP3 guard).
Reference Databases
InterPro / Pfam added as 13th reference database (27,481 Pfam entries).
Loader script: scripts/load_interpro_pfam.py with InterPro API fetch and offline fallback.
Total reference databases: 13 (previously 12).
HPO Enrichment (HELIX-REF-001)
Multi-Source HPO Enrichment Pipeline: gene-phenotype annotations expanded from 1 source (5,173 genes) to 6 sources (5,688 genes, +10%)
HPO Consortium: 320K records, 5,173 genes (primary source, curated gene-phenotype associations)
Orphanet disease-to-HPO: 168K records, 3,176 genes with clinical frequency data (via Orphadata en_product4.xml + en_product6.xml)
DECIPHER Gene2Phenotype (G2P): 43K records, 2,125 genes from 7 clinical panels (DD, Eye, Cardiac, Skin, Skeletal, Cancer, Ear)
Monarch Initiative: 151K records, 4,791 genes (HPO Consortium redistribution via Monarch KG)
ClinVar-MedGen: 245K records, 5,258 genes (P/LP variant -> MedGen CUI -> HPO chain mapping, largest contributor of new genes)
Manual clinical curation: common-disease genes outside rare-disease HPO scope (TBC1D4 with PMID evidence)
3,292 genes with increased HPO term coverage from cross-source aggregation
Source priority ordering: Orphanet > G2P > HPO Consortium > Monarch > ClinVar-MedGen
Low-confidence filtering: Orphanet modifying/candidate and G2P limited entries excluded from clinical view
Classification
Classifier version bumped to v3.12.0
PP4 criterion evaluates against enriched HPO set (more genes trigger PP4)
No ACMG criteria logic changes -- only annotation data expanded
Reference Databases
DECIPHER G2P added as new reference database (7 clinical panels)
Monarch Initiative added as new reference database
MedGen HPO-OMIM mapping added as new reference database
HPO database entry updated to reflect multi-source enriched view
Total reference databases: 12 (previously 9)
Classification
PVS1 disease association gate: requires established disease association before applying PVS1 (ClinVar P/LP, Orphanet, ClinGen HI, AR LoF list, or VCEP coverage)
BS1 constraint-implied AD fallback: uses AD threshold (0.1%) for LoF-constrained genes without inheritance data
BS1 cascade expanded from 5 to 8 levels with explicit Orphanet AR and AR_LOF_GENES priorities
PVS1 non-canonical splice exclusion: splice_donor_5th_base_variant, splice_donor_region_variant, splice_acceptor_5th_base_variant blocked from PVS1 (v3.11.4)
PVS1 NMD_transcript_variant exclusion fix: NMD_transcript no longer blocks PVS1, NMD_escaping_variant now blocks instead (v3.11.3)
De novo projection: prospective PS2 upgrade computation for VUS variants to guide trio testing (v3.11.0)
ClinVar low-confidence disease association gate: 1-star P/LP requires gene disease evidence (v3.10.0)
disease_genes_clinvar CTE circular reference fix: review_stars >= 2 filter (v3.10.1)
PVS1 last-exon NMD downgrade: last-exon truncating variants receive PVS1_Strong instead of PVS1 Very Strong (v3.9.0)
BP1 ClinVar pathogenic missense guard and PP3_Supporting label (v3.8.2)
PP3_Strong disease association gate (v3.8.1)
Reference Databases
Orphanet/Orphadata documented as separate reference database (gene-disease-inheritance for 3,200 genes)
VCEP gene-specific specifications documented as separate reference database (~50-60 genes)
Total reference databases: 9 (previously 7)
Classification
ACMG/AMP 2015 classification with Bayesian point-based framework (Tavtigian 2018/2020)
19 of 28 ACMG criteria automated
BayesDel_noAF with ClinGen SVI-calibrated thresholds for PP3/BP4 (Pejaver 2022)
SpliceAI integration for PP3_splice with PVS1 double-counting prevention
ClinVar override logic with review star quality filtering
Gene-specific VCEP threshold support
Reference Databases
gnomAD v4.1 (759M variants, 807K individuals)
gnomAD Constraint v4.1 (18.2K genes)
ClinVar 2025-01
dbNSFP 4.9c
SpliceAI precomputed (Ensembl MANE Release 113)
HPO gene-phenotype associations
ClinGen dosage sensitivity
Ensembl VEP Release 113 with local offline cache
Phenotype Matching
Lin semantic similarity with HPO ontology graph
Five-tier clinical priority system
Gene-level deduplication and aggregation
Automatic HPO term extraction from free-text clinical descriptions
Screening
Seven-component scoring algorithm (constraint, deleteriousness, phenotype, dosage, consequence, compound het, age relevance)
Six screening modes (diagnostic, neonatal, pediatric, proactive adult, carrier, pharmacogenomics)
Age-aware prioritization with curated gene lists
Clinical boosts for ethnicity, family history, sex-linked inheritance, consanguinity, pregnancy
Four-tier priority ranking with clinical actionability labels
AI Clinical Assistant
Conversational variant analysis with natural language database queries
Biomedical literature search (1M+ publications, local PubMed mirror)
Four-level adaptive clinical interpretation generation
PDF and DOCX report export with Helena branding
Genomics-aware visualization suggestions
On-premise LLM inference within EU infrastructure
Infrastructure
EU-based processing (Helsinki, Finland)
All databases stored and queried locally
Zero external API calls during variant processing
GDPR-compliant data handling
DuckDB-based analytical pipeline