Documentation / Reference Databases / HPO

HPO

The Human Phenotype Ontology (HPO) provides a standardized vocabulary of phenotypic abnormalities and their associations with genes and diseases. In Helena, HPO data is aggregated from 6 curated sources into an enriched gene-phenotype view, covering 5,688 genes with 927K associations. This multi-source approach increases gene coverage by 10% compared to single-source HPO Consortium data and provides richer phenotype profiles for clinical matching.

Database Details

Versionv3.12.0 (Multi-Source Enriched)

Records~927,000 records from 6 sources, 5,688 genes

SourcesHPO Consortium, Orphanet, DECIPHER G2P, Monarch, ClinVar-MedGen, Manual Curation

ProducerHelena Bioinformatics (aggregated from multiple public sources)

Role in Classification and Analysis

HPO data supports three analysis functions:

ACMG PP4 Criterion

When patient HPO terms are provided, PP4 triggers if >= 3 patient HPO terms match the gene HPO profile, or >= 2 terms match for a highly specific gene (<= 5 total HPO associations). This provides supporting pathogenic evidence.

Phenotype Matching Service

The dedicated phenotype matching module uses HPO semantic similarity to compute overlap between patient phenotype and gene-associated phenotypes, producing clinical tiers (Tier 1 through Tier 4) for variant prioritization.

Screening Prioritization

When no patient HPO terms are available, the hpo_count field serves as a proxy for clinical relevance. Genes associated with more phenotypes receive higher screening scores, reflecting broader clinical significance.

Data Deduplication

HPO terms are aggregated from 6 sources with source priority ordering (Orphanet > G2P > HPO Consortium > Monarch > ClinVar-MedGen > Manual Curation). The same HPO term from multiple sources is deduplicated per gene, ensuring each unique phenotype is counted once. Low-confidence entries (Orphanet modifying/candidate associations, G2P limited confidence) are excluded from the clinical view. Both hpo_ids and hpo_names are sorted by HPO ID to maintain a reliable 1:1 correspondence between identifiers and names.

Columns Loaded (6)

HPO data is joined on gene symbol. Each variant inherits the complete HPO profile of its associated gene.

hpo_idsVARCHAR

Semicolon-separated HPO term identifiers associated with the gene (e.g., "HP:0001250;HP:0001263;HP:0002069"). Sorted by HPO ID to maintain 1:1 correspondence with hpo_names.

hpo_namesVARCHAR

Semicolon-separated HPO term names corresponding to hpo_ids (e.g., "Seizure;Global developmental delay;Dementia"). Sorted to match hpo_ids.

hpo_countINTEGER

Number of unique HPO terms associated with the gene. Used in screening mode as a proxy for clinical breadth when no patient HPO terms are available.

hpo_frequency_dataVARCHAR

Frequency of each phenotype in the associated condition, when available. Not all gene-phenotype associations have frequency data.

hpo_disease_idsVARCHAR

OMIM or Orphanet disease identifiers that link the gene to each HPO phenotype.

hpo_gene_idVARCHAR

HPO internal gene identifier used for cross-referencing within the ontology.

Limitations

HPO coverage varies by disease. Well-studied conditions have comprehensive phenotype profiles, while rare diseases may have minimal HPO annotation.

HPO terms are associated at the gene level, not the variant level. Different variants in the same gene may produce different phenotypes.

Frequency data for phenotype-disease associations is incomplete. The absence of frequency data does not mean the phenotype is rare.

HPO is primarily focused on rare diseases. Common complex conditions may have less comprehensive ontology coverage.

Phenotype matching depends on accurate HPO term selection by the clinician. Overly broad or imprecise terms reduce matching specificity.

Reference

Kohler S, et al. "The Human Phenotype Ontology in 2024: phenotypes around the world." Nucleic Acids Research. 2024;52(D1):D1333-D1346. PMID: 37953324.

For details on phenotype-based analysis, see the Phenotype Matching section.