Data Protection Impact Assessment

Last updated: March 2026 (v1.1)

This document provides a summary of the Data Protection Impact Assessment (DPIA) conducted by Helena Bioinformatics for the Helena platform, pursuant to GDPR Article 35. A DPIA is mandatory when processing genetic data on a large scale, as it constitutes high-risk processing of special category data.

1. Processing Description

Nature of processing: automated analysis of genetic variant data (VCF files) uploaded by clinical genetics laboratories. Processing includes variant annotation against population and clinical databases, automated ACMG/AMP classification, phenotype-genotype correlation, biomedical literature mining, AI-assisted clinical summarization, and generation of clinical interpretation reports. All AI models run locally on dedicated infrastructure and no genomic data is sent to external AI services or APIs.

Scope: the platform processes whole-exome and whole-genome sequencing data containing thousands to millions of genetic variants per patient sample. Associated phenotype data (HPO terms) and clinical context are also processed.

Context: the platform serves as a clinical decision support tool for qualified geneticists. It does not make autonomous clinical decisions. All outputs, including AI-generated summaries, require professional review and validation.

Purpose: to reduce variant interpretation time from days to minutes, improving laboratory throughput and consistency while maintaining clinical accuracy.

2. Necessity and Proportionality

Necessity: processing genetic variant data is essential to the core function of the platform. The service cannot be provided without processing VCF files. Phenotype data is necessary for clinical correlation and prioritization of variants.

Proportionality: we process only the minimum data necessary. VCF files are received in pseudonymized form (sample IDs only, no patient names). We do not request or store directly identifying patient information. Phenotype data is limited to standardized HPO codes relevant to the clinical question. Data retention is time-limited with automatic deletion.

Legal basis: Helena Bioinformatics operates in a dual capacity depending on the processing activity. When processing genomic data on behalf of a laboratory for clinical variant interpretation, Helena acts as Data Processor under the Data Processing Agreement (Article 28 GDPR). The Data Controller (laboratory) relies on explicit consent (Article 9(2)(a)) or the healthcare provision exemption (Article 9(2)(h)) as the legal basis for processing special category data. When processing de-identified data for internal research and development purposes, including algorithm validation and platform improvement, Helena acts as an independent Data Controller with legitimate interest (Article 6(1)(f)) as the legal basis, in accordance with the applicable Data Use Agreement. In all cases, the stricter conditions of the governing agreements apply.

3. Risk Assessment

Risk 1: Unauthorized access to genetic data

Severity: High. Genetic data is immutable and uniquely identifying. Likelihood: Low. Mitigated by dedicated servers (not multi-tenant cloud), TLS 1.3 encryption in transit, AES-128 symmetric encryption at application level (Fernet with HMAC-SHA256 integrity verification), full-disk encryption at infrastructure level, role-based access control, network firewall rules, and comprehensive audit logging. Residual risk: Low.

Risk 2: Data breach during transit

Severity: High. Likelihood: Very Low. All data transmission uses TLS 1.3 encryption. VCF files are uploaded directly to our EU servers. No genomic data transits through non-EU jurisdictions. Residual risk: Very Low.

Risk 3: Re-identification of pseudonymized data

Severity: High. Genetic data is inherently identifying. Likelihood: Very Low. We receive only sample IDs, not patient identifiers. Our staff cannot link sample IDs to individuals. Access is restricted to automated processing pipelines. Residual risk: Very Low.

Risk 4: Incorrect variant classification leading to clinical harm

Severity: High. Incorrect classification could affect patient care. Likelihood: Low. The platform is explicitly positioned as a decision support tool, not a diagnostic device. All outputs must be reviewed by qualified geneticists. The platform follows established ACMG/AMP guidelines and references validated databases (ClinVar, gnomAD). Regular validation against clinical benchmarks. Residual risk: Low (mitigated by mandatory human review).

Risk 5: Excessive data retention

Severity: Medium. Likelihood: Very Low. Automatic deletion after configurable retention period (default 90 days). Data Controllers can request immediate deletion. Deletion processes are logged and auditable. Residual risk: Very Low.

Risk 6: Sub-processor non-compliance

Severity: Medium. Likelihood: Very Low. Hetzner Online GmbH (infrastructure provider) maintains ISO 27001 certification and provides physical hosting only without logical data access. DPA in place with Hetzner. Vercel Inc. (frontend hosting) processes only website delivery assets and user interface resources; no genomic data, patient identifiers, or clinical information is processed by or accessible to Vercel. Standard Contractual Clauses apply for any data transfers outside the EU. No sub-processors outside the EU process genomic data. Residual risk: Very Low.

Risk 7: AI model misuse or data leakage through AI processing

Severity: High. Likelihood: Very Low. All AI and machine learning models operate exclusively on dedicated local GPU infrastructure owned and managed by Helena Bioinformatics within the same EU datacenter. No genomic or clinical data is transmitted to external AI services, cloud-based language models, or third-party APIs. AI processing is subject to the same access controls, audit logging, and encryption as all other platform operations. Residual risk: Very Low.

4. Measures to Address Risks

The following technical and organizational measures are implemented:

Technical measures: dedicated EU-based servers (Helsinki, Finland) with no multi-tenant sharing; TLS 1.3 encryption in transit; AES-128 symmetric encryption at application level with HMAC-SHA256 integrity verification, plus full-disk encryption at infrastructure level; role-based access control with principle of least privilege and organization-level data isolation; JWT authentication with automatic session expiration; comprehensive audit trails for all data operations; automated data deletion based on configurable retention policies; network isolation with firewall restricting non-essential traffic; bcrypt password hashing (work factor 12); API key management with SHA-256 hashing and constant-time comparison; local-only AI inference with no external API calls; and regular vulnerability assessments.

Organizational measures: Data Processing Agreements with all laboratory partners; confidentiality obligations for all personnel; documented security incident response procedures with 24-hour notification to Data Controllers; regular review and updating of security measures; sub-processor management with Data Controller notification; staff training on data protection obligations; and designated data protection contact.

5. Regulatory Compliance

The Helena platform is designed and operated in compliance with applicable EU and national data protection and cybersecurity regulations. In addition to GDPR (Regulation (EU) 2016/679), the platform's security architecture addresses the requirements of the NIS2 Directive (Directive (EU) 2022/2555) as transposed into Bulgarian national law through the amendments to the Cybersecurity Act (State Gazette No. 17, 13 February 2026). While Helena Bioinformatics may not itself be classified as an essential or important entity under NIS2, its infrastructure and security measures are designed to support laboratory partners who hold such status. The measures described in this DPIA, including encryption at rest and in transit, access control, audit logging, incident response procedures, and sub-processor management, are aligned with the risk management obligations set out in NIS2 Article 21.

6. Conclusion

This DPIA concludes that the Helena platform can process genetic data with an acceptable level of residual risk, provided all identified measures are maintained and regularly reviewed. The key factors supporting this conclusion are: data is received only in pseudonymized form; all processing occurs within the EU on dedicated infrastructure; encryption is applied both in transit and at rest; AI models operate exclusively on local infrastructure with no external data transmission; the platform functions as decision support requiring mandatory human review; and data retention is time-limited with automatic deletion.

This DPIA will be reviewed annually or when significant changes are made to the processing activities, infrastructure, or regulatory landscape.

For questions regarding this assessment, contact us at privacy@helena.bio.

Version History

v1.1 - March 2026: clarified dual legal basis (Processor and Controller roles); corrected encryption specification (AES-128 Fernet at application level, full-disk encryption at infrastructure level); added Risk 7 covering AI processing and local-only inference; added Vercel as sub-processor with scope clarification; added Section 5 on NIS2 regulatory compliance; updated DPO reference to designated data protection contact; general editorial improvements.

v1.0 - February 2026: initial publication.