Qualificação de mestrado do discente Marcos Rezende, dia 28/11/24, as 14h.

Título: PRIVACY-PRESERVING CLUSTERED FEDERATED LEARNING WITH HOMOMORPHIC ENCRYPTION RESILIENT TO BYZANTINE ATTACKS
Data: 28/11/2024
Horário: 14h
Link: Presencial
Resumo:

Medical Named Entity Recognition (MNER) and Medical Image Classification (MIC) are critical tasks for extracting and
organizing medical knowledge from unstructured data, which is essential for health monitoring and clinical decision-making.
Despite advancements in Deep Learning (DL), in particular with Large Language Models (LLMs), these tasks still face significant
challenges due to the limited availability and dispersion of labeled data across institutions, often protected under privacy regulations.
Federated Learning (FL) provides a promising solution by enabling centralized model training with decentralized data, but remains
vulnerable to byzantine adversaries.

This work introduces FedHE, a secure and privacy-preserving FL protocol designed to address byzantine adversaries
in the context of MNER and MIC tasks. The primary objective of FedHE is to defend the FL system against two primary threats: inference
and poisoning attacks. The protocol leverages Homomorphic Encryption (HE) to securely aggregate client updates, allowing participants
to encrypt their local updates before sending them to the coordination server. The server can then aggregate the encrypted updates without decrypting
them, preserving the confidentiality of the local models from byzantine attackers. Poisoning attacks can be mitigated through a clustering-based
aggregation algorithm that identifies and isolates malicious clients according to their local updates. The proposed protocol extends the Flower
framework with HE-based aggregation functions and a client-side framework to work with HE, enabling researchers to develop and test secure FL
protocols with minimal effort.

Banca: Prof. Dr. Rodrigo César Pedrosa - UFOP; Prof. Dr. Pedro Henrique Lopes - UFOP

The preliminary version of this thesis focuses on the MNER task, evaluating the feasibility of HE in the context of FL.
Clustering techniques to mitigate poisoning attacks will be evaluated in future work.
The preliminary results highlight several important findings regarding the resource consumption and performance of FedHE.
The application of HE introduces significant memory and bandwidth overheads, especially with larger models, making it impractical for
state-of-the-art models such as $BERT_{blue}$. The study finds that compact models like $BERT_{tiny}$ and $BERT_{mini}$ are
more feasible in terms of resource usage and still achieve competitive performance. While the HE encryption process increases computational
costs, these costs do not represent a bottleneck of the overall training process. Importantly, the study shows that HE does not negatively
impact model performance, with the encryption noise even potentially improving generalization.

PPGCC

Qualificação de mestrado do discente Marcos Rezende, dia 28/11/24, as 14h.