SAFEWORDS

Ethical, Privacy-Preserving and Trustworthy Language Technologies within HumanAIze

Background and Rationale

The rapid evolution of Large Language Models (LLMs) has transformed natural language processing and generative AI applications. However, their development and deployment raise critical challenges related to data protection, bias amplification, explainability, and regulatory compliance. The European Union has positioned itself at the forefront of trustworthy AI through the GDPR and the AI Act, establishing a robust regulatory framework that demands accountability, transparency, and protection of fundamental rights.

SAFEWORDS is a strategic component of the HumanAIze project, a coordinated initiative aimed at building next-generation human-centred, multilingual and trustworthy LLMs for Europe. SAFEWORDS provides the ethical, legal, and data governance infrastructure that ensures full alignment between technological innovation and European values.

Objectives

Develop a comprehensive data governance framework for multilingual and multimodal AI training.

Design and validate privacy-preserving anonymisation pipelines compliant with GDPR and AI Act requirements

Establish methodologies for bias detection and mitigation across datasets and models.

Ensure the integration of legal, ethical, and sustainability principles throughout the AI lifecycle.

Strengthen transparency, explainability, and accountability in LLM development.

Expected Impact

SAFEWORDS strengthens Europe’s capacity to develop trustworthy, human-centric and regulation-compliant AI systems, contributing to:
  • Implementation of the AI Act and GDPR in advanced AI development
  • Reduction of legal and ethical risks in generative AI
  • Increased public trust in AI technologies
  • Sustainable and energy-efficient model development
  • Strategic European autonomy in language technologies

Approach

SAFEWORDS operates across the HumanAIze architecture:

  • In WP2, it codifies ethical, legal, and sustainability guidelines governing AI development.

  • In WP3, it leads privacy-preserving dataset curation, anonymisation, and fairness evaluation.

  • In collaboration with WP4 and WP5, it ensures that base models and aligned models comply with governance constraints and European regulatory standards.

  • In WP6, it contributes to evaluation and real-world validation, particularly in administrative-legal and biomedical domains.

The project integrates advanced anonymisation techniques, PII detection systems, bias auditing tools, and benchmark datasets with human annotations to evaluate factuality, safety, and non-discrimination.

Man doing software quality assurance using tablet, reading source code before implementing it. Programmer inspecting coding on portable device, looking to fix potential bugs
System administrator using machine learning applications to optimize server farm operations. IT expert pointing to PC monitor, implementing server virtualization in data center facility

Innovation

SAFEWORDS moves beyond compliance-by-design by operationalising regulation into technical protocols, measurable indicators, and enforceable governance mechanisms. It combines:

  • Legal and AI expertise

  • Privacy-preserving technologies

  • Fairness auditing methodologies

  • Sustainability-aware training strategies

This interdisciplinary integration ensures scalable and responsible AI infrastructures for multilingual European contexts.

European Added Value

By embedding governance, privacy and fairness at the core of AI innovation, SAFEWORDS ensures that HumanAIze delivers LLMs that are not only technically advanced, but also aligned with democratic values, fundamental rights, and the European digital strategy.

SAFEWORDS exemplifies Europe’s model of innovation: competitive, ethical, transparent and socially responsible.