318 results sorted by ID
Secure Vault scheme in the Cloud Operating Model
Rishiraj Bhattacharyya, Avradip Mandal, Meghna Sengupta
Cryptographic protocols
The rising demand for data privacy in cloud-based environments has led to the development of advanced mechanisms for securely managing sensitive information. A prominent solution in this domain is the "Data Privacy Vault," a concept that is being provided commercially by companies such as Hashicorp, Basis Theory, Skyflow Inc., VGS, Evervault, Protegrity, Anonomatic, and BoxyHQ. However, no existing work has rigorously defined the security notions required for a Data Privacy Vault or proven...
How To Think About End-To-End Encryption and AI: Training, Processing, Disclosure, and Consent
Mallory Knodel, Andrés Fábrega, Daniella Ferrari, Jacob Leiken, Betty Li Hou, Derek Yen, Sam de Alfaro, Kyunghyun Cho, Sunoo Park
Applications
End-to-end encryption (E2EE) has become the gold standard for securing communications, bringing strong confidentiality and privacy guarantees to billions of users worldwide. However, the current push towards widespread integration of artificial intelligence (AI) models, including in E2EE systems, raises some serious security concerns.
This work performs a critical examination of the (in)compatibility of AI models and E2EE applications. We explore this on two fronts: (1) the integration of...
ClusterGuard: Secure Clustered Aggregation for Federated Learning with Robustness
Yulin Zhao, Zhiguo Wan, Zhangshuang Guan
Applications
Federated Learning (FL) enables collaborative model training while preserving data privacy by avoiding the sharing of raw data. However, in large-scale FL systems, efficient secure aggregation and dropout handling remain critical challenges. Existing state-of-the-art methods, such as those proposed by Liu et al. (UAI'22) and Li et al. (ASIACRYPT'23), suffer from prohibitive communication overhead, implementation complexity, and vulnerability to poisoning attacks. Alternative approaches that...
Orion's Ascent: Accelerating Hash-Based Zero Knowledge Proof on Hardware Platforms
Florian Hirner, Florian Krieger, Constantin Piber, Sujoy Sinha Roy
Implementation
Zero-knowledge proofs (ZKPs) are cryptographic protocols that enable one party to prove the validity of a statement without revealing the underlying data. Such proofs have applications in privacy-preserving technologies and verifiable computations. However, slow proof generation poses a significant challenge in the wide-scale adoption of ZKP. Orion is a recent ZKP scheme with linear prover time. It leverages coding theory, expander graphs, and Merkle hash trees to improve computational...
Private Neural Network Training with Packed Secret Sharing
Hengcheng Zhou
Applications
We present a novel approach for training neural networks that leverages packed Shamir secret sharing scheme. For specific training protocols based on Shamir scheme, we demonstrate how to realize the conversion between packed sharing and Shamir sharing without additional communication overhead. We begin by introducing a method to locally convert between Shamir sharings with secrets stored at different slots. Building upon this conversion, we achieve free conversion from packed sharing to...
Quantum One-Time Protection of any Randomized Algorithm
Sam Gunn, Ramis Movassagh
Foundations
The meteoric rise in power and popularity of machine learning models dependent on valuable training data has reignited a basic tension between the power of running a program locally and the risk of exposing details of that program to the user. At the same time, fundamental properties of quantum states offer new solutions to data and program security that can require strikingly few quantum resources to exploit, and offer advantages outside of mere computational run time. In this work, we...
FLock: Robust and Privacy-Preserving Federated Learning based on Practical Blockchain State Channels
Ruonan Chen, Ye Dong, Yizhong Liu, Tingyu Fan, Dawei Li, Zhenyu Guan, Jianwei Liu, Jianying Zhou
Applications
\textit{Federated Learning} (FL) is a distributed machine learning paradigm that allows multiple clients to train models collaboratively without sharing local data. Numerous works have explored security and privacy protection in FL, as well as its integration with blockchain technology. However, existing FL works still face critical issues. \romannumeral1) It is difficult to achieving \textit{poisoning robustness} and \textit{data privacy} while ensuring high \textit{model accuracy}....
DMM: Distributed Matrix Mechanism for Differentially-Private Federated Learning using Packed Secret Sharing
Alexander Bienstock, Ujjwal Kumar, Antigoni Polychroniadou
Applications
Federated Learning (FL) has gained lots of traction recently, both in industry and academia. In FL, a machine learning model is trained using data from various end-users arranged in committees across several rounds. Since such data can often be sensitive, a primary challenge in FL is providing privacy while still retaining utility of the model. Differential Privacy (DP) has become the main measure of privacy in the FL setting. DP comes in two flavors: central and local. In the former, a...
Sparrow: Space-Efficient zkSNARK for Data-Parallel Circuits and Applications to Zero-Knowledge Decision Trees
Christodoulos Pappas, Dimitrios Papadopoulos
Cryptographic protocols
Space-efficient SNARKs aim to reduce the prover's space overhead which is one the main obstacles for deploying SNARKs in practice, as it can be prohibitively large (e.g., orders of magnitude larger than natively performing the computation). In this work, we propose Sparrow, a novel space-efficient zero-knowledge SNARK for data-parallel arithmetic circuits with two attractive features: (i) it is the first space-efficient scheme where, for a given field, the prover overhead increases with a...
Polynomial Time Cryptanalytic Extraction of Deep Neural Networks in the Hard-Label Setting
Nicholas Carlini, Jorge Chávez-Saab, Anna Hambitzer, Francisco Rodríguez-Henríquez, Adi Shamir
Attacks and cryptanalysis
Deep neural networks (DNNs) are valuable assets, yet their public accessibility raises security concerns about parameter extraction by malicious actors. Recent work by Carlini et al. (Crypto’20) and Canales- Martínez et al. (Eurocrypt’24) has drawn parallels between this issue and block cipher key extraction via chosen plaintext attacks. Leveraging differential cryptanalysis, they demonstrated that all the weights and biases of black-box ReLU-based DNNs could be inferred using a polynomial...
Can KANs Do It? Toward Interpretable Deep Learning-based Side-channel Analysis
Kota Yoshida, Sengim Karayalcin, Stjepan Picek
Attacks and cryptanalysis
Recently, deep learning-based side-channel analysis (DLSCA) has emerged as a serious threat against cryptographic implementations. These methods can efficiently break implementations protected with various countermeasures while needing limited manual intervention. To effectively protect implementation, it is therefore crucial to be able to interpret \textbf{how} these models are defeating countermeasures. Several works have attempted to gain a better understanding of the mechanics of these...
Communication Efficient Secure and Private Multi-Party Deep Learning
Sankha Das, Sayak Ray Chowdhury, Nishanth Chandran, Divya Gupta, Satya Lokam, Rahul Sharma
Applications
Distributed training that enables multiple parties to jointly train
a model on their respective datasets is a promising approach to
address the challenges of large volumes of diverse data for training
modern machine learning models. However, this approach immedi-
ately raises security and privacy concerns; both about each party
wishing to protect its data from other parties during training and
preventing leakage of private information from the model after
training through various...
HierNet: A Hierarchical Deep Learning Model for SCA on Long Traces
Suvadeep Hajra, Debdeep Mukhopadhyay
Attacks and cryptanalysis
In Side-Channel Analysis (SCA), statistical or machine learning methods are employed to extract secret information from power or electromagnetic (EM) traces. In many practical scenarios, raw power/EM traces can span hundreds of thousands of features, with relevant leakages occurring over only a few small segments. Consequently, existing SCAs often select a small number of features before launching the attack, making their success highly dependent on the feasibility of feature selection....
Mario: Multi-round Multiple-Aggregator Secure Aggregation with Robustness against Malicious Actors
Truong Son Nguyen, Tancrède Lepoint, Ni Trieu
Cryptographic protocols
Federated Learning (FL) enables multiple clients to collaboratively train a machine learning model while keeping their data private, eliminating the need for data sharing. Two common approaches to secure aggregation (SA) in FL are the single-aggregator and multiple-aggregator models. This work focuses on improving the multiple-aggregator model.
Existing multiple-aggregator protocols such as Prio (NSDI 2017), Prio+ (SCN 2022), Elsa (S&P 2023) either offer robustness only in the...
Hard-Label Cryptanalytic Extraction of Neural Network Models
Yi Chen, Xiaoyang Dong, Jian Guo, Yantian Shen, Anyu Wang, Xiaoyun Wang
Attacks and cryptanalysis
The machine learning problem of extracting neural network parameters has been proposed for nearly three decades. Functionally equivalent extraction is a crucial goal for research on this problem. When the adversary has access to the raw output of neural networks, various attacks, including those presented at CRYPTO 2020 and EUROCRYPT 2024, have successfully achieved this goal. However, this goal is not achieved when neural networks operate under a hard-label setting where the raw output...
PIGEON: A Framework for Private Inference of Neural Networks
Christopher Harth-Kitzerow, Yongqin Wang, Rachit Rajat, Georg Carle, Murali Annavaram
Cryptographic protocols
Privacy-Preserving Machine Learning (PPML) is one of the most relevant use cases for Secure Multiparty Computation (MPC). While private training of large neural networks such as VGG-16 or ResNet-50 on state-of-the-art datasets such as ImageNet is still out of reach due to the performance overhead of MPC, GPU-based MPC frameworks are starting to achieve practical runtimes for private inference. However, we show that, in contrast to plaintext machine learning, the usage of GPU acceleration for...
Automated Software Vulnerability Static Code Analysis Using Generative Pre-Trained Transformer Models
Elijah Pelofske, Vincent Urias, Lorie M. Liebrock
Applications
Generative Pre-Trained Transformer models have been shown to be surprisingly effective at a variety of natural language processing tasks -- including generating computer code. However, in general GPT models have been shown to not be incredibly effective at handling specific computational tasks (such as evaluating mathematical functions).
In this study, we evaluate the effectiveness of open source GPT models, with no fine-tuning, and with context introduced by the langchain and localGPT...
Client-Aided Privacy-Preserving Machine Learning
Peihan Miao, Xinyi Shi, Chao Wu, Ruofan Xu
Cryptographic protocols
Privacy-preserving machine learning (PPML) enables multiple distrusting parties to jointly train ML models on their private data without revealing any information beyond the final trained models. In this work, we study the client-aided two-server setting where two non-colluding servers jointly train an ML model on the data held by a large number of clients. By involving the clients in the training process, we develop efficient protocols for training algorithms including linear regression,...
Privacy-Preserving Data Deduplication for Enhancing Federated Learning of Language Models
Aydin Abadi, Vishnu Asutosh Dasu, Sumanta Sarkar
Applications
Deduplication is a vital preprocessing step that enhances machine learning model performance and saves training time and energy. However, enhancing federated learning through deduplication poses challenges, especially regarding scalability and potential privacy violations if deduplication involves sharing all clients’ data. In this paper, we address the problem of deduplication in a federated setup by introducing a pioneering protocol, Efficient Privacy-Preserving Multi-Party Deduplication...
Securely Training Decision Trees Efficiently
Divyanshu Bhardwaj, Sandhya Saravanan, Nishanth Chandran, Divya Gupta
Cryptographic protocols
Decision trees are an important class of supervised learning algorithms. When multiple entities contribute data to train a decision tree (e.g. for fraud detection in the financial sector), data privacy concerns necessitate the use of a privacy-enhancing technology such as secure multi-party computation (MPC) in order to secure the underlying training data. Prior state-of-the-art (Hamada et al.) construct an MPC protocol for decision tree training with a communication of $\mathcal{O}(hmN\log...
AITIA: Efficient Secure Computation of Bivariate Causal Discovery
Truong Son Nguyen, Lun Wang, Evgenios M. Kornaropoulos, Ni Trieu
Cryptographic protocols
Researchers across various fields seek to understand causal relationships but often find controlled experiments impractical. To address this, statistical tools for causal discovery from naturally observed data have become crucial. Non-linear regression models, such as Gaussian process regression, are commonly used in causal inference but have limitations due to high costs when adapted for secure computation. Support vector regression (SVR) offers an alternative but remains costly in an...
Improved Multi-Party Fixed-Point Multiplication
Saikrishna Badrinarayanan, Eysa Lee, Peihan Miao, Peter Rindal
Cryptographic protocols
Machine learning is widely used for a range of applications and is increasingly offered as a service by major technology companies. However, the required massive data collection raises privacy concerns during both training and inference. Privacy-preserving machine learning aims to solve this problem. In this setting, a collection of servers secret share their data and use secure multi-party computation to train and evaluate models on the joint data. All prior work focused on the scenario...
FASIL: A challenge-based framework for secure and privacy-preserving federated learning
Ferhat Karakoç, Betül Güvenç Paltun, Leyli Karaçay, Ömer Tuna, Ramin Fuladi, Utku Gülen
Applications
Enhancing privacy in federal learning (FL) without considering robustness can create an open door for attacks such as poisoning attacks on the FL process. Thus, addressing both the privacy and security aspects simultaneously becomes vital. Although, there are a few solutions addressing both privacy and security in the literature in recent years, they have some drawbacks such as requiring two non-colluding servers, heavy cryptographic operations, or peer-to-peer communication topology. In...
MaSTer: Maliciously Secure Truncation for Replicated Secret Sharing without Pre-Processing
Martin Zbudila, Erik Pohle, Aysajan Abidin, Bart Preneel
Cryptographic protocols
Secure multi-party computation (MPC) in a three-party, honest majority scenario is currently the state-of-the-art for running machine learning algorithms in a privacy-preserving manner. For efficiency reasons, fixed-point arithmetic is widely used to approximate computation over decimal numbers. After multiplication in fixed-point arithmetic, truncation is required to keep the result's precision. In this paper, we present an efficient three-party truncation protocol secure in the presence of...
CoGNN: Towards Secure and Efficient Collaborative Graph Learning
Zhenhua Zou, Zhuotao Liu, Jinyong Shan, Qi Li, Ke Xu, Mingwei Xu
Applications
Collaborative graph learning represents a learning paradigm where multiple parties jointly train a graph neural network (GNN) using their own proprietary graph data. To honor the data privacy of all parties, existing solutions for collaborative graph learning are either based on federated learning (FL) or secure machine learning (SML). Although promising in terms of efficiency and scalability due to their distributed training scheme, FL-based approaches fall short in providing provable...
Let Them Drop: Scalable and Efficient Federated Learning Solutions Agnostic to Client Stragglers
Riccardo Taiello, Melek Önen, Clémentine Gritti, Marco Lorenzi
Applications
Secure Aggregation (SA) stands as a crucial component in modern Federated Learning (FL) systems, facilitating collaborative training of a global machine learning model while protecting the privacy of individual clients' local datasets. Many existing SA protocols described in the FL literature operate synchronously, leading to notable runtime slowdowns due to the presence of stragglers (i.e. late-arriving clients).
To address this challenge, one common approach is to consider stragglers as...
BackdoorIndicator: Leveraging OOD Data for Proactive Backdoor Detection in Federated Learning
Songze Li, Yanbo Dai
Applications
In a federated learning (FL) system, decentralized data owners (clients) could upload their locally trained models to a central server, to jointly train a global model. Malicious clients may plant backdoors into the global model through uploading poisoned local models, causing misclassification to a target class when encountering attacker-defined triggers. Existing backdoor defenses show inconsistent performance under different system and adversarial settings, especially when the malicious...
Breaking Indistinguishability with Transfer Learning: A First Look at SPECK32/64 Lightweight Block Ciphers
Jimmy Dani, Kalyan Nakka, Nitesh Saxena
Attacks and cryptanalysis
In this research, we introduce MIND-Crypt, a novel attack framework that uses deep learning (DL) and transfer learning (TL) to challenge the indistinguishability of block ciphers, specifically SPECK32/64 encryption algorithm in CBC mode (Cipher Block Chaining) against Known Plaintext Attacks (KPA). Our methodology includes training a DL model with ciphertexts of two messages encrypted using the same key. The selected messages have the same byte-length and differ by only one bit at the binary...
Fast, Large Scale Dimensionality Reduction Schemes Based on CKKS
Haonan Yuan, Wenyuan Wu, Jingwei Chen
Applications
The proliferation of artificial intelligence and big data has resulted in a surge in data demand and increased data dimensionality. This escalation has consequently heightened the costs associated with storage and processing. Concurrently, the confidential nature of data collected by various institutions, which cannot be disclosed due to personal privacy concerns, has exacerbated the challenges associated with data analysis and machine learning model training. Therefore, designing a secure...
$\mathsf{OPA}$: One-shot Private Aggregation with Single Client Interaction and its Applications to Federated Learning
Harish Karthikeyan, Antigoni Polychroniadou
Applications
Our work aims to minimize interaction in secure computation due to the high cost and challenges associated with communication rounds, particularly in scenarios with many clients. In this work, we revisit the problem of secure aggregation in the single-server setting where a single evaluation server can securely aggregate client-held individual inputs. Our key contribution is the introduction of One-shot Private Aggregation ($\mathsf{OPA}$) where clients speak only once (or even choose not to...
An Efficient and Extensible Zero-knowledge Proof Framework for Neural Networks
Tao Lu, Haoyu Wang, Wenjie Qu, Zonghui Wang, Jinye He, Tianyang Tao, Wenzhi Chen, Jiaheng Zhang
Applications
In recent years, cloud vendors have started to supply paid services for data analysis by providing interfaces of their well-trained neural network models. However, customers lack tools to verify whether outcomes supplied by cloud vendors are correct inferences from particular models, in the face of lazy or malicious vendors. The cryptographic primitive called zero-knowledge proof (ZKP) addresses this problem. It enables the outcomes to be verifiable without leaking information about the...
Automated Creation of Source Code Variants of a Cryptographic Hash Function Implementation Using Generative Pre-Trained Transformer Models
Elijah Pelofske, Vincent Urias, Lorie M. Liebrock
Implementation
Generative pre-trained transformers (GPT's) are a type of large language machine learning model that are unusually adept at producing novel, and coherent, natural language. Notably, these technologies have also been extended to computer programming languages with great success. However, GPT model outputs in general are stochastic and not always correct. For programming languages, the exact specification of the computer code, syntactically and algorithmically, is strictly required in order to...
Secure Latent Dirichlet Allocation
Thijs Veugen, Vincent Dunning, Michiel Marcus, Bart Kamphorst
Applications
Topic modelling refers to a popular set of techniques used to discover hidden topics that occur in a collection of documents. These topics can, for example, be used to categorize documents or label text for further processing. One popular topic modelling technique is Latent Dirichlet Allocation (LDA). In topic modelling scenarios, the documents are often assumed to be in one, centralized dataset. However, sometimes documents are held by different parties, and contain privacy- or...
Computational Attestations of Polynomial Integrity Towards Verifiable Machine Learning
Dustin Ray, Caroline El Jazmi
Applications
Machine-learning systems continue to advance at a rapid pace, demonstrating remarkable utility in various fields and disciplines. As these systems continue to grow in size and complexity, a nascent industry is emerging which aims to bring machine-learning-as-a-service (MLaaS) to market. Outsourcing the operation and training of these systems to powerful hardware carries numerous advantages, but challenges arise when privacy and the correctness of work carried out must be ensured. Recent...
Blind-Folded: Simple Power Analysis Attacks using Data with a Single Trace and no Training
Xunyue Hu, Quentin L. Meunier, Emmanuelle Encrenaz
Attacks and cryptanalysis
Side-Channel Attacks target the recovery of key material in cryptographic implementations by measuring physical quantities such as power consumption during the execution of a program. Simple Power Attacks consist in deducing secret information from a trace using a single or a few samples, as opposed to differential attacks which require many traces. Software cryptographic implementations now all contain a data-independent execution path, but often do not consider variations in power...
Two-Party Decision Tree Training from Updatable Order-Revealing Encryption
Robin Berger, Felix Dörre, Alexander Koch
Cryptographic protocols
Running machine learning algorithms on encrypted data is a way forward to marry functionality needs common in industry with the important concerns for privacy when working with potentially sensitive data. While there is already a growing field on this topic and a variety of protocols, mostly employing fully homomorphic encryption or performing secure multiparty computation (MPC), we are the first to propose a protocol that makes use of a specialized encryption scheme that allows to do secure...
Scoring the predictions: a way to improve profiling side-channel attacks
Damien Robissout, Lilian Bossuet, Amaury Habrard
Attacks and cryptanalysis
Side-channel analysis is an important part of the security evaluations of hardware components and more specifically of those that include cryptographic algorithms. Profiling attacks are among the most powerful attacks as they assume the attacker has access to a clone device of the one under attack. Using the clone device allows the attacker to make a profile of physical leakages linked to the execution of algorithms. This work focuses on the characteristics of this profile and the...
NodeGuard: A Highly Efficient Two-Party Computation Framework for Training Large-Scale Gradient Boosting Decision Tree
Tianxiang Dai, Yufan Jiang, Yong Li, Fei Mei
Cryptographic protocols
The Gradient Boosting Decision Tree (GBDT) is a well-known machine learning algorithm, which achieves high performance and outstanding interpretability in real-world scenes such as fraud detection, online marketing and risk management. Meanwhile, two data owners can jointly train a GBDT model without disclosing their private dataset by executing secure Multi-Party Computation (MPC) protocols. In this work, we propose NodeGuard, a highly efficient two party computation (2PC) framework for...
Fully Homomorphic Training and Inference on Binary Decision Tree and Random Forest
Hojune Shin, Jina Choi, Dain Lee, Kyoungok Kim, Younho Lee
This paper introduces a new method for training decision trees and random forests using CKKS homomorphic encryption (HE) in cloud environments, enhancing data privacy from multiple sources. The innovative Homomorphic Binary Decision Tree (HBDT) method utilizes a modified Gini Impurity index (MGI) for node splitting in encrypted data scenarios. Notably, the proposed training approach operates in a single cloud security domain without the need for decryption, addressing key challenges in...
Single Trace is All It Takes: Efficient Side-channel Attack on Dilithium
Zehua Qiao, Yuejun Liu, Yongbin Zhou, Yuhan Zhao, Shuyi Chen
Attacks and cryptanalysis
As we enter 2024, the post-quantum cryptographic algorithm Dilithium, which emerged from the National Institute of Standards and Technology post-quantum cryptography competition, has now reached the deployment stage. This paper focuses on the practical security of Dilithium. We performed practical attacks on Dilithium2 on an STM32F4 platform. Our results indicate that an attack can be executed with just two signatures within five minutes, with a single signature offering a 60% probability of...
A Decentralized Federated Learning using Reputation
Olive Chakraborty, Aymen Boudguiga
Applications
Nowadays Federated learning (FL) is established as one of the best techniques for collaborative machine learning. It allows a set of clients to train a common model without disclosing their sensitive and private
dataset to a coordination server. The latter is in charge of the model aggregation. However, FL faces some problems, regarding the security of updates, integrity of computation and the availability of a server.
In this paper, we combine some new ideas like clients’ reputation with...
SNOW-SCA: ML-assisted Side-Channel Attack on SNOW-V
Harshit Saurabh, Anupam Golder, Samarth Shivakumar Titti, Suparna Kundu, Chaoyun Li, Angshuman Karmakar, Debayan Das
Attacks and cryptanalysis
This paper presents SNOW-SCA, the first power side-channel analysis (SCA) attack of a 5G mobile communication security standard candidate, SNOW-V, running on a 32-bit ARM Cortex-M4 microcontroller. First, we perform a generic known-key correlation (KKC) analysis to identify the leakage points. Next, a correlation power analysis (CPA) attack is performed, which reduces the attack complexity to two key guesses for each key byte. The correct secret key is then uniquely identified utilizing...
Theoretical Explanation and Improvement of Deep Learning-aided Cryptanalysis
Weixi Zheng, Liu Zhang, Zilong Wang
Attacks and cryptanalysis
At CRYPTO 2019, Gohr demonstrated that differential-neural distinguishers (DNDs) for Speck32/64 can learn more features than classical cryptanalysis's differential distribution tables (DDT). Furthermore, a non-classical key recovery procedure is devised by combining the Upper Confidence Bound (UCB) strategy and the BayesianKeySearch algorithm. Consequently, the time complexity of 11-round key recovery attacks on Speck32/64 is significantly reduced compared with the state-of-the-art results...
Accelerating Training and Enhancing Security Through Message Size Optimization in Symmetric Cryptography
ABHISAR, Madhav Yadav, Girish Mishra
This research extends Abadi and Andersen's exploration of neural networks using secret keys for information protection in multiagent systems. Focusing on enhancing confidentiality properties, we employ end-to-end adversarial training with neural networks Alice, Bob, and Eve. Unlike prior work limited to 64-bit messages, our study spans message sizes from 4 to 1024 bits, varying batch sizes and training steps. An innovative aspect involves training model Bob to approach a minimal error value...
Helium: Scalable MPC among Lightweight Participants and under Churn
Christian Mouchet, Sylvain Chatel, Apostolos Pyrgelis, Carmela Troncoso
Implementation
We introduce Helium, a novel framework that supports scalable secure multiparty computation (MPC) for lightweight participants and tolerates churn. Helium relies on multiparty homomorphic encryption (MHE) as its core building block. While MHE schemes have been well studied in theory, prior works fall short of addressing critical considerations paramount for adoption such as supporting resource-constrained and unstably connected participants. In this work, we systematize the requirements of...
Train Wisely: Multifidelity Bayesian Optimization Hyperparameter Tuning in Side-Channel Analysis
Trevor Yap Hong Eng, Shivam Bhasin, Léo Weissbart
Implementation
Side-Channel Analysis (SCA) is critical in evaluating the security of cryptographic implementations. The search for hyperparameters poses a significant challenge, especially when resources are limited. In this work, we explore the efficacy of a multifidelity optimization technique known as BOHB in SCA. In addition, we proposed a new objective function called $ge_{+ntge}$, which could be incorporated into any Bayesian Optimization used in SCA. We show the capabilities of both BOHB and...
Creating from Noise: Trace Generations Using Diffusion Model for Side-Channel Attack
Trevor Yap, Dirmanto Jap
Implementation
In side-channel analysis (SCA), the success of an attack is largely dependent on the dataset sizes and the number of instances in each class. The generation of synthetic traces can help to improve attacks like profiling attacks.
However, manually creating synthetic traces from actual traces is arduous. Therefore, automating this process of creating artificial traces is much needed.
Recently, diffusion models have gained much recognition after beating another generative model known as...
Zero-Knowledge Proofs of Training for Deep Neural Networks
Kasra Abbaszadeh, Christodoulos Pappas, Jonathan Katz, Dimitrios Papadopoulos
Cryptographic protocols
A zero-knowledge proof of training (zkPoT) enables a party to prove that they have correctly trained a committed model based on a committed dataset without revealing any additional information about the model or the dataset. An ideal zkPoT should offer provable security and privacy guarantees, succinct proof size and verifier runtime, and practical prover efficiency. In this work, we present \name, a zkPoT targeted for deep neural networks (DNNs) that achieves all these goals at once. Our...
SALSA FRESCA: Angular Embeddings and Pre-Training for ML Attacks on Learning With Errors
Samuel Stevens, Emily Wenger, Cathy Yuanchen Li, Niklas Nolte, Eshika Saxena, Francois Charton, Kristin Lauter
Attacks and cryptanalysis
Learning with Errors (LWE) is a hard math problem underlying post-quantum cryptography (PQC) systems for key exchange and digital signatures, recently standardized by NIST. Prior work [Wenger et al., 2022; Li et al., 2023a;b] proposed new machine learning (ML)-based attacks on LWE problems with small, sparse secrets, but these attacks require millions of LWE samples to train on and take days to recover secrets. We propose three key methods—better pre-processing, angular embeddings and model...
Perceived Information Revisited II: Information-Theoretical Analysis of Deep-Learning Based Side-Channel Attacks
Akira Ito, Rei Ueno, Naofumi Homma
Attacks and cryptanalysis
Previous studies on deep-learning-based side-channel attacks (DL-SCAs) have shown that traditional performance evaluation metrics commonly used in DL, like accuracy and F1 score, are not effective in evaluating DL-SCA performance. Therefore, some previous studies have proposed new alternative metrics for evaluating the performance of DL-SCAs. Notably, perceived information (PI) and effective perceived information (EPI) are major metrics based on information theory. While it has been...
IrisLock: Iris Biometric Key Derivation with 42 bits of security
Sohaib Ahmad, Sixia Chen, Luke Demarest, Benjamin Fuller, Caleb Manicke, Alexander Russell, Amey Shukla
Applications
Despite decades of effort, a chasm exists between the theory and practice of device-level biometric authentication. Deployed authentication algorithms rely on data that overtly leaks private information about the biometric; thus systems rely on externalized security measures such as trusted execution environments. The authentication algorithms have no cryptographic guarantees.
This is frustrating given the research that has developed theoretical tools, known as fuzzy extractors, that...
Starlit: Privacy-Preserving Federated Learning to Enhance Financial Fraud Detection
Aydin Abadi, Bradley Doyle, Francesco Gini, Kieron Guinamard, Sasi Kumar Murakonda, Jack Liddell, Paul Mellor, Steven J. Murdoch, Mohammad Naseri, Hector Page, George Theodorakopoulos, Suzanne Weller
Applications
Federated Learning (FL) is a data-minimization approach enabling collaborative model training across diverse clients with local data, avoiding direct data exchange. However, state-of-the-art FL solutions to identify fraudulent financial transactions exhibit a subset of the following limitations. They (1) lack a formal security definition and proof, (2) assume prior freezing of suspicious customers’ accounts by financial institutions (limiting the solutions’ adoption), (3) scale poorly,...
SuperFL: Privacy-Preserving Federated Learning with Efficiency and Robustness
Yulin Zhao, Hualin Zhou, Zhiguo Wan
Applications
Federated Learning (FL) accomplishes collaborative model training without the need to share local training data. However, existing FL aggregation approaches suffer from inefficiency, privacy vulnerabilities, and neglect of poisoning attacks, severely impacting the overall performance and reliability of model training. In order to address these challenges, we propose SuperFL, an efficient two-server aggregation scheme that is both privacy preserving and secure against poisoning attacks. The...
Too Hot To Be True: Temperature Calibration for Higher Confidence in NN-assisted Side-channel Analysis
Seyedmohammad Nouraniboosjin, Fatemeh Ganji
Attacks and cryptanalysis
The past years have witnessed a considerable increase in research efforts put into neural network-assisted profiled side-channel analysis (SCA). Studies have also identified challenges, e.g., closing the gap between metrics for machine learning (ML) classification and side-channel attack evaluation. In fact, in the context of NN-assisted SCA, the NN’s output distribution forms the basis for successful key recovery. In this respect, related work has covered various aspects of integrating...
CL-SCA: Leveraging Contrastive Learning for Profiled Side-Channel Analysis
Annv Liu, An Wang, Shaofei Sun, Congming Wei, Yaoling Ding, Yongjuan Wang, Liehuang Zhu
Attacks and cryptanalysis
Side-channel analysis based on machine learning, especially neural networks, has gained significant attention in recent years. However, many existing methods still suffer from certain limitations. Despite the inherent capability of neural networks to extract features, there remains a risk of extracting irrelevant information. The heavy reliance on profiled traces makes it challenging to adapt to remote attack scenarios with limited profiled traces. Besides, attack traces also contain...
Single-Trace Side-Channel Attacks on CRYSTALS-Dilithium: Myth or Reality?
Ruize Wang, Kalle Ngo, Joel Gärtner, Elena Dubrova
Attacks and cryptanalysis
We present a side-channel attack on CRYSTALS-Dilithium, a post-quantum secure digital signature scheme, with two variants of post-processing. The side-channel attack exploits information leakage in the secret key unpacking procedure of the signing algorithm to recover the coefficients of the polynomials in the secret key vectors ${\bf s}_1$ and ${\bf s}_2$ by profiled deep learning-assisted power analysis. In the first variant, one half of the coefficients of ${\bf s}_1$ and ${\bf s}_2$ is...
Regularized PolyKervNets: Optimizing Expressiveness and Efficiency for Private Inference in Deep Neural Networks
Toluwani Aremu
Applications
Private computation of nonlinear functions, such as Rectified Linear Units (ReLUs) and max-pooling operations, in deep neural networks (DNNs) poses significant challenges in terms of storage, bandwidth, and time consumption. To address these challenges, there has been a growing interest in utilizing privacy-preserving techniques that leverage polynomial activation functions and kernelized convolutions as alternatives to traditional ReLUs. However, these alternative approaches often suffer...
CompactTag: Minimizing Computation Overheads in Actively-Secure MPC for Deep Neural Networks
Yongqin Wang, Pratik Sarkar, Nishat Koti, Arpita Patra, Murali Annavaram
Cryptographic protocols
Secure Multiparty Computation (MPC) protocols enable secure evaluation of a circuit by several parties, even in the presence of an adversary who maliciously corrupts all but one of the parties. These MPC protocols are constructed using the well-known secret-sharing-based paradigm (SPDZ and SPD$\mathbb{Z}_{2^k}$), where the protocols ensure security against a malicious adversary by computing Message Authentication Code (MAC) tags on the input shares and then evaluating the circuit with these...
The Need for MORE: Unsupervised Side-channel Analysis with Single Network Training and Multi-output Regression
Ioana Savu, Marina Krček, Guilherme Perin, Lichao Wu, Stjepan Picek
Attacks and cryptanalysis
Deep learning-based profiling side-channel analysis has gained widespread adoption in academia and industry due to its ability to uncover secrets protected by countermeasures. However, to exploit this capability, an adversary must have access to a clone of the targeted device to obtain profiling measurements and know secret information to label these measurements. Non-profiling attacks avoid these constraints by not relying on secret information for labeled data. Instead, they attempt all...
Model Stealing Attacks On FHE-based Privacy-Preserving Machine Learning through Adversarial Examples
Bhuvnesh Chaturvedi, Anirban Chakraborty, Ayantika Chatterjee, Debdeep Mukhopadhyay
Attacks and cryptanalysis
Classic MLaaS solutions suffer from privacy-related risks since the user is required to send unencrypted data to the server hosting the MLaaS. To alleviate this problem, a thriving line of research has emerged called Privacy-Preserving Machine Learning (PPML) or secure MLaaS solutions that use cryptographic techniques to preserve the privacy of both the input of the client and the output of the server. However, these implementations do not take into consideration the possibility of...
An End-to-End Framework for Private DGA Detection as a Service
Ricardo Jose Menezes Maia, Dustin Ray, Sikha Pentyala, Rafael Dowsley, Martine De Cock, Anderson C. A. Nascimento, Ricardo Jacobi
Applications
Domain Generation Algorithms (DGAs) are used by malware to generate pseudorandom domain names to establish communication between infected bots and Command and Control servers. While DGAs can be detected by machine learning (ML) models with great accuracy, offering DGA detection as a service raises privacy concerns when requiring network administrators to disclose their DNS traffic to the service provider.
We propose the first end-to-end framework for privacy-preserving classification as a...
Formal Analysis of Non-profiled Deep-learning Based Side-channel Attacks
Akira Ito, Rei Ueno, Rikuma Tanaka, Naofumi Homma
Attacks and cryptanalysis
This paper formally analyzes two major non-profiled deep-learning-based side-channel attacks (DL-SCAs): differential deep-learning analysis (DDLA) by Timon and collision DL-SCA by Staib and Moradi. These DL-SCAs leverage supervised learning in non-profiled scenarios. Although some intuitive descriptions of these DL-SCAs exist, their formal analyses have been rarely conducted yet, which makes it unclear why and when the attacks succeed and how the attack can be improved. In this paper, we...
LLM for SoC Security: A Paradigm Shift
Dipayan Saha, Shams Tarek, Katayoon Yahyaei, Sujan Kumar Saha, Jingbo Zhou, Mark Tehranipoor, Farimah Farahmandi
Applications
As the ubiquity and complexity of system-on-chip (SoC) designs increase across electronic devices, the task of incorporating security into an SoC design flow poses significant challenges. Existing security solutions are inadequate to provide effective verification of modern SoC designs due to their limitations in scalability, comprehensiveness, and adaptability. On the other hand, Large Language Models (LLMs) are celebrated for their remarkable success in natural language understanding,...
Evaluating GPT-4’s Proficiency in Addressing Cryptography Examinations
Vasily Mikhalev, Nils Kopal, Bernhard Esslinger
Applications
In the rapidly advancing domain of artificial intelligence, ChatGPT, powered by the GPT-4 model, has emerged as a state-of-the-art interactive agent, exhibiting substantial capabilities across various domains. This paper aims to assess the efficacy of GPT-4 in addressing and solving problems found within cryptographic examinations. We devised a multi-faceted methodology, presenting the model with a series of cryptographic questions of varying complexities derived from real academic...
PERFORMANCE EVALUATION OF MACHINE LEARNING ALGORITHMS FOR INTRUSION DETECTION SYSTEM
Sudhanshu Sekhar Tripathy, Bichitrananda Behera
Implementation
The escalation of hazards to safety and hijacking of digital networks are among the strongest perilous difficulties that must be addressed in the present day. Numerous safety procedures were set up to track and recognize any illicit activity on the network's infrastructure. IDS are the best way to resist and recognize intrusions on internet connections and digital technologies. To classify network traffic as normal or anomalous, Machine Learning (ML) classifiers are increasingly utilized. An...
Polynomial Time Cryptanalytic Extraction of Neural Network Models
Isaac A. Canales-Martínez, Jorge Chavez-Saab, Anna Hambitzer, Francisco Rodríguez-Henríquez, Nitin Satpute, Adi Shamir
Attacks and cryptanalysis
Billions of dollars and countless GPU hours are currently
spent on training Deep Neural Networks (DNNs) for a variety of tasks.
Thus, it is essential to determine the difficulty of extracting all the parameters of such neural networks when given access to their black-box
implementations. Many versions of this problem have been studied over
the last 30 years, and the best current attack on ReLU-based deep neural
networks was presented at Crypto’20 by Carlini, Jagielski, and Mironov.
It...
Too Close for Comfort? Measuring Success of Sampled-Data Leakage Attacks Against Encrypted Search
Dominique Dittert, Thomas Schneider, Amos Treiber
Attacks and cryptanalysis
The well-defined information leakage of Encrypted Search Algorithms (ESAs) is predominantly analyzed by crafting so-called leakage attacks. These attacks utilize adversarially known auxiliary data and the observed leakage to attack an ESA instance built on a user's data. Known-data attacks require the auxiliary data to be a subset of the user's data. In contrast, sampled-data attacks merely rely on auxiliary data that is, in some sense, statistically close to the user's data and hence...
High-precision RNS-CKKS on fixed but smaller word-size architectures: theory and application
Rashmi Agrawal, Jung Ho Ahn, Flavio Bergamaschi, Ro Cammarota, Jung Hee Cheon, Fillipe D. M. de Souza, Huijing Gong, Minsik Kang, Duhyeong Kim, Jongmin Kim, Hubert de Lassus, Jai Hyun Park, Michael Steiner, Wen Wang
Cryptographic protocols
A prevalent issue in the residue number system (RNS) variant of the Cheon-Kim-Kim-Song (CKKS) homomorphic encryption (HE) scheme is the challenge of efficiently achieving high precision on hardware architectures with a fixed, yet smaller, word-size of bit-length $W$, especially when the scaling factor satisfies $\log\Delta > W$.
In this work, we introduce an efficient solution termed composite scaling. In this approach, we group multiple RNS primes as $q_\ell:= \prod_{j=0}^{t-1}...
More Insight on Deep Learning-aided Cryptanalysis
Zhenzhen Bao, Jinyu Lu, Yiran Yao, Liu Zhang
Attacks and cryptanalysis
In CRYPTO 2019, Gohr showed that well-trained neural networks could perform cryptanalytic distinguishing tasks superior to differential distribution table (DDT)-based distinguishers. This suggests that the differential-neural distinguisher (ND) may use additional information besides pure ciphertext differences. However, the explicit knowledge beyond differential distribution is still unclear. In this work, we provide explicit rules that can be used alongside DDTs to enhance the effectiveness...
Experimenting with Zero-Knowledge Proofs of Training
Sanjam Garg, Aarushi Goel, Somesh Jha, Saeed Mahloujifar, Mohammad Mahmoody, Guru-Vamsi Policharla, Mingyuan Wang
Cryptographic protocols
How can a model owner prove they trained their model according to the correct specification? More importantly, how can they do so while preserving the privacy of the underlying dataset and the final model? We study this problem and formulate the notion of zero-knowledge proof of training (zkPoT), which formalizes rigorous security guarantees that should be achieved by a privacy-preserving proof of training.
While it is theoretically possible to design zkPoT for any model using generic...
Practical Privacy-Preserving Machine Learning using Fully Homomorphic Encryption
Michael Brand, Gaëtan Pradel
Cryptographic protocols
Machine learning is a widely-used tool for analysing large datasets, but increasing public demand for privacy preservation and the corresponding introduction of privacy regulations have severely limited what data can be analysed, even when this analysis is for societal benefit.
Homomorphic encryption, which allows computation on encrypted data, is a natural solution to this dilemma, allowing data to be analysed without sacrificing privacy.
Because homomorphic encryption is computationally...
Constant-Round Private Decision Tree Evaluation for Secret Shared Data
Nan Cheng, Naman Gupta, Aikaterini Mitrokotsa, Hiraku Morita, Kazunari Tozawa
Cryptographic protocols
Decision tree evaluation is extensively used in machine learning to construct accurate classification models. Often in the cloud-assisted communication paradigm cloud servers execute remote evaluations of classification models using clients’ data. In this setting, the need for private decision tree evaluation (PDTE) has emerged to guarantee no leakage of information for the client’s input nor the service provider’s trained model i.e., decision tree. In this paper, we propose a private...
SIGMA: Secure GPT Inference with Function Secret Sharing
Kanav Gupta, Neha Jawalkar, Ananta Mukherjee, Nishanth Chandran, Divya Gupta, Ashish Panwar, Rahul Sharma
Cryptographic protocols
Secure 2-party computation (2PC) enables secure inference that offers protection for both proprietary machine learning (ML) models and sensitive inputs to them. However, the existing secure inference solutions suffer from high latency and communication overheads,
particularly for transformers. Function secret sharing (FSS) is a recent paradigm for
obtaining efficient 2PC protocols with a preprocessing phase.
We provide SIGMA, the first end-to-end system for secure transformer inference...
A Note on “Secure Quantized Training for Deep Learning”
Marcel Keller, Ke Sun
Implementation
Keller and Sun (ICML'22) have found a gap in the accuracy between floating-point deep learning in cleartext and secure quantized deep learning using multi-party computation. We have discovered that this gap is caused by a bug in the implementation of max-pooling. In this note, we present updated figures to support this conclusion. We also add figures for another network on CIFAR-10.
Collaborative Privacy-Preserving Analysis of Oncological Data using Multiparty Homomorphic Encryption
Ravit Geva, Alexander Gusev, Yuriy Polyakov, Lior Liram, Oded Rosolio, Andreea Alexandru, Nicholas Genise, Marcelo Blatt, Zohar Duchin, Barliz Waissengrin, Dan Mirelman, Felix Bukstein, Deborah T. Blumenthal, Ido Wolf, Sharon Pelles-Avraham, Tali Schaffer, Lee A. Lavi, Daniele Micciancio, Vinod Vaikuntanathan, Ahmad Al Badawi, Shafi Goldwasser
Applications
Real-world healthcare data sharing is instrumental in constructing broader-based and larger clinical data sets that may improve clinical decision-making research and outcomes. Stakeholders are frequently reluctant to share their data without guaranteed patient privacy, proper protection of their data sets, and control over the usage of their data. Fully homomorphic encryption (FHE) is a cryptographic capability that can address these issues by enabling computation on encrypted data without...
REED: Chiplet-Based Accelerator for Fully Homomorphic Encryption
Aikata Aikata, Ahmet Can Mert, Sunmin Kwon, Maxim Deryabin, Sujoy Sinha Roy
Implementation
Fully Homomorphic Encryption (FHE) enables privacy-preserving computation and has many applications. However, its practical implementation faces massive computation and memory overheads. To address this bottleneck, several Application-Specific Integrated Circuit (ASIC) FHE accelerators have been proposed. All these prior works put every component needed for FHE onto one chip (monolithic), hence offering high performance. However, they suffer from practical problems associated with...
zkDL: Efficient Zero-Knowledge Proofs of Deep Learning Training
Haochen Sun, Tonghe Bai, Jason Li, Hongyang Zhang
Applications
The recent advancements in deep learning have brought about significant changes in various aspects of people's lives. Meanwhile, these rapid developments have raised concerns about the legitimacy of the training process of deep neural networks. To protect the intellectual properties of AI developers, directly examining the training process by accessing the model parameters and training data is often prohibited for verifiers.
In response to this challenge, we present zero-knowledge deep...
An End-to-end Plaintext-based Side-channel Collision Attack without Trace Segmentation
Lichao Wu, Sébastien Tiran, Guilherme Perin, Stjepan Picek
Attacks and cryptanalysis
Side-channel Collision Attacks (SCCA) constitute a subset of non-profiling attacks that exploit information dependency leaked during cryptographic operations. Unlike traditional collision attacks, which seek instances where two different inputs to a cryptographic algorithm yield identical outputs, SCCAs specifically target the internal state, where identical outputs are more likely. In CHES 2023, Staib et al. presented a Deep Learning-based SCCA (DL-SCCA), which enhanced the attack...
Provably Secure Blockchain Protocols from Distributed Proof-of-Deep-Learning
Xiangyu Su, Mario Larangeira, Keisuke Tanaka
Cryptographic protocols
Proof-of-useful-work (PoUW), an alternative to the widely used proof-of-work (PoW), aims to re-purpose the network's computing power. Namely, users evaluate meaningful computational problems, e.g., solving optimization problems, instead of computing numerous hash function values as in PoW. A recent approach utilizes the training process of deep learning as ``useful work''. However, these works lack security analysis when deploying them with blockchain-based protocols, let alone the informal...
End-to-end Privacy Preserving Training and Inference for Air Pollution Forecasting with Data from Rival Fleets
Gauri Gupta, Krithika Ramesh, Anwesh Bhattacharya, Divya Gupta, Rahul Sharma, Nishanth Chandran, Rijurekha Sen
Applications
Privacy-preserving machine learning (PPML) promises to train
machine learning (ML) models by combining data spread across
multiple data silos. Theoretically, secure multiparty computation
(MPC) allows multiple data owners to train models on their joint
data without revealing the data to each other. However, the prior
implementations of this secure training using MPC have three limitations: they have only been evaluated on CNNs, and LSTMs have
been ignored; fixed point approximations...
Falkor: Federated Learning Secure Aggregation Powered by AES-CTR GPU Implementation
Mariya Georgieva Belorgey, Sofia Dandjee, Nicolas Gama, Dimitar Jetchev, Dmitry Mikushin
Cryptographic protocols
We propose a novel protocol, Falkor, for secure aggregation for Federated Learning in the multi-server scenario based on masking of local models via a stream cipher based on AES in counter mode and accelerated by GPUs running on the aggregating servers. The protocol is resilient to client dropout and has reduced clients/servers communication cost by a factor equal to the number of aggregating servers (compared to the naïve baseline method). It scales simultaneously in the two major...
Differential Privacy for Free? Harnessing the Noise in Approximate Homomorphic Encryption
Tabitha Ogilvie
Applications
Homomorphic Encryption (HE) is a type of cryptography that allows computing on encrypted data, enabling computation on sensitive data to be outsourced securely. Many popular HE schemes rely on noise for their security. On the other hand, Differential Privacy seeks to guarantee the privacy of data subjects by obscuring any one individual's contribution to an output. Many mechanisms for achieving Differential Privacy involve adding appropriate noise. In this work, we investigate the extent to...
Efficient FHE-based Privacy-Enhanced Neural Network for AI-as-a-Service
Kwok-Yan Lam, Xianhui Lu, Linru Zhang, Xiangning Wang, Huaxiong Wang, Si Qi Goh
Applications
AI-as-a-Service has emerged as an important trend for supporting
the growth of the digital economy. Digital service providers make
use of their vast amount of user data to train AI models (such as
image recognitions, financial modelling and pandemic modelling
etc.) and offer them as a service on the cloud. While there are convincing advantages for using such third-party models, the fact that
users need to upload their data to the cloud is bound to raise serious
privacy concerns,...
A Comparison of Multi-task learning and Single-task learning Approaches
Thomas Marquet, Elisabeth Oswald
Attacks and cryptanalysis
In this paper, we provide experimental evidence for the benefits of multi-task learning in the context of masked AES implementations (via the ASCADv1-r and ASCADv2 databases). We develop an approach for comparing single-task and multi-task approaches rather than comparing specific resulting models: we do this by training many models with random hyperparameters (instead of comparing a few highly tuned models). We find that multi-task learning has significant practical advantages that make it...
FedVS: Straggler-Resilient and Privacy-Preserving Vertical Federated Learning for Split Models
Songze Li, Duanyi Yao, Jin Liu
Cryptographic protocols
In a vertical federated learning (VFL) system consisting of a central server and many distributed clients, the training data are vertically partitioned such that different features are privately stored on different clients. The problem of split VFL is to train a model split between the server and the clients. This paper aims to address two major challenges in split VFL: 1) performance degradation due to straggling clients during training; and 2) data and model privacy leakage from clients’...
Implementing and Optimizing Matrix Triples with Homomorphic Encryption
Johannes Mono, Tim Güneysu
Implementation
In today’s interconnected world, data has become a valuable asset, leading to a growing interest in protecting it through techniques such as privacy-preserving computation. Two well-known approaches are multi-party computation and homomorphic encryption with use cases such as privacy-preserving machine learning evaluating or training neural networks. For multi-party computation, one of the fundamental arithmetic operations is the secure multiplication in the malicious security model and by...
Blockchain Large Language Models
Yu Gai, Liyi Zhou, Kaihua Qin, Dawn Song, Arthur Gervais
Applications
This paper presents a dynamic, real-time approach to detecting anomalous blockchain transactions. The proposed tool, BlockGPT, generates tracing representations of blockchain activity and trains from scratch a large language model to act as a real-time Intrusion Detection System. Unlike traditional methods, BlockGPT is designed to offer an unrestricted search space and does not rely on predefined rules or patterns, enabling it to detect a broader range of anomalies. We demonstrate the...
SAFEFL: MPC-friendly Framework for Private and Robust Federated Learning
Till Gehlhar, Felix Marx, Thomas Schneider, Ajith Suresh, Tobias Wehrle, Hossein Yalame
Implementation
Federated learning (FL) has gained widespread popularity in a variety of industries due to its ability to locally train models on devices while preserving privacy. However, FL systems are susceptible to i) privacy inference attacks and ii) poisoning attacks, which can compromise the system by corrupt actors. Despite a significant amount of work being done to tackle these attacks individually, the combination of these two attacks has received limited attention in the research community.
To...
Squirrel: A Scalable Secure Two-Party Computation Framework for Training Gradient Boosting Decision Tree
Wen-jie Lu, Zhicong Huang, Qizhi Zhang, Yuchen Wang, Cheng Hong
Applications
Gradient Boosting Decision Tree (GBDT) and its variants are widely used in industry, due to their strong interpretability. Secure multi-party computation allows multiple data owners to compute a function jointly while keeping their input private. In this work, we present Squirrel, a two-party GBDT training framework on a vertically split dataset, where two data owners each hold different features of the same data samples. Squirrel is private against semi-honest adversaries, and no sensitive...
Neural Network Quantisation for Faster Homomorphic Encryption
Wouter Legiest, Furkan Turan, Michiel Van Beirendonck, Jan-Pieter D'Anvers, Ingrid Verbauwhede
Applications
Homomorphic encryption (HE) enables calculating
on encrypted data, which makes it possible to perform privacy-
preserving neural network inference. One disadvantage of this
technique is that it is several orders of magnitudes slower than
calculation on unencrypted data. Neural networks are commonly
trained using floating-point, while most homomorphic encryption
libraries calculate on integers, thus requiring a quantisation of the
neural network. A straightforward approach would be to...
Force: Highly Efficient Four-Party Privacy-Preserving Machine Learning on GPU
Tianxiang Dai, Li Duan, Yufan Jiang, Yong Li, Fei Mei, Yulian Sun
Cryptographic protocols
Tremendous efforts have been made to improve the efficiency of secure Multi-Party Computation (MPC), which allows n ≥ 2 parties to jointly evaluate a target function without leaking their own private inputs. It has been confirmed by previous research that Three-Party Computation (3PC) and outsourcing computations to GPUs can lead to huge performance improvement of MPC in computationally intensive tasks such as Privacy-Preserving Machine Learning (PPML). A natural question to ask is whether...
Flamingo: Multi-Round Single-Server Secure Aggregation with Applications to Private Federated Learning
Yiping Ma, Jess Woods, Sebastian Angel, Antigoni Polychroniadou, Tal Rabin
Applications
This paper introduces Flamingo, a system for secure aggregation of data across a large set of clients. In secure aggregation, a server sums up the private inputs of clients and obtains the result without learning anything about the individual inputs beyond what is implied by the final sum. Flamingo focuses on the multi-round setting found in federated learning in which many consecutive summations (averages) of model weights are performed to derive a good model. Previous protocols, such as...
Secure Floating-Point Training
Deevashwer Rathee, Anwesh Bhattacharya, Divya Gupta, Rahul Sharma, Dawn Song
Cryptographic protocols
Secure 2-party computation (2PC) of floating-point arithmetic is improving in performance and recent work runs deep learning algorithms with it, while being as numerically precise as commonly used machine learning (ML) frameworks like PyTorch. We find that the existing 2PC libraries for floating-point support generic computations and lack specialized support for ML training. Hence, their latency and communication costs for compound operations (e.g., dot products) are high. We provide novel...
AI Attacks AI: Recovering Neural Network architecture from NVDLA using AI-assisted Side Channel Attack
Naina Gupta, Arpan Jati, Anupam Chattopadhyay
Attacks and cryptanalysis
During the last decade, there has been a stunning progress in the domain of AI with adoption in both safety-critical and security-critical applications. A key requirement for this is highly trained Machine Learning (ML) models, which are valuable Intellectual Property (IP) of the respective organizations. Naturally, these models have become targets for model recovery attacks through side-channel leakage. However, majority of the attacks reported in literature are either on simple embedded...
SALSA PICANTE: a machine learning attack on LWE with binary secrets
Cathy Li, Jana Sotáková, Emily Wenger, Mohamed Malhou, Evrard Garcelon, Francois Charton, Kristin Lauter
Attacks and cryptanalysis
Learning with Errors (LWE) is a hard math problem underpinning many proposed post-quantum cryptographic (PQC) systems. The only PQC Key Exchange Mechanism (KEM) standardized by NIST is based on module~LWE, and current publicly available PQ Homomorphic Encryption (HE) libraries are based on ring LWE. The security of LWE-based PQ cryptosystems is critical, but certain implementation choices could weaken them. One such choice is sparse binary secrets, desirable for PQ HE schemes for efficiency...
Orca: FSS-based Secure Training and Inference with GPUs
Neha Jawalkar, Kanav Gupta, Arkaprava Basu, Nishanth Chandran, Divya Gupta, Rahul Sharma
Cryptographic protocols
Secure Two-party Computation (2PC) allows two parties to compute any function on their private inputs without revealing their inputs to each other. In the offline/online model for 2PC, correlated randomness that is independent of all inputs to the computation, is generated in a preprocessing (offline) phase and this randomness is then utilized in the online phase once the inputs to the parties become available. Most 2PC works focus on optimizing the online time as this overhead lies on the...
FssNN: Communication-Efficient Secure Neural Network Training via Function Secret Sharing
Peng Yang, Zoe Lin Jiang, Shiqi Gao, Hongxiao Wang, Jun Zhou, Yangyiye Jin, Siu-Ming Yiu, Junbin Fang
Cryptographic protocols
Privacy-preserving neural network based on secure multi-party computation (MPC) enables multiple parties to jointly train neural network models without revealing sensitive data. In privacy-preserving neural network, the high communication costs of securely computing non-linear functions is the primary performance bottleneck. For commonly used non-linear functions, such as ReLU, existing work adopts an offline-online computation paradigm and utilizes distributed comparison function (DCF) to...
Cognitive Cryptography using behavioral features from linguistic-biometric data
Jose Contreras
Implementation
This study presents a proof-of-concept for a cognitive-based authentication system that uses an individual's writing style as a unique identifier to grant access to a system. A machine learning SVM model was trained on these features to distinguish between texts generated by each user. The stylometric feature vector was then used as an input to a key derivation function to generate a unique key for each user. The experiment results showed that the developed system achieved up to 87.42\%...
Quantum-Safe Protocols and Application in Data Security of Medical Records
Adrian-Daniel Stefan, Ionut-Petrisor Anghel, Emil Simion
Cryptographic protocols
The use of traditional cryptography based on symmetric keys has been replaced with the revolutionary idea discovered by Diffie and Hellman in 1976 that fundamentally changed communication systems by ensuring a secure transmission of information over an insecure channel. Nowadays public key cryptography is frequently used for authentication in e-commerce, digital signatures and encrypted communication. Most of the public key cryptosystems used in practice are based on integer factorization...
Autoencoder-enabled Model Portability for Reducing Hyperparameter Tuning Efforts in Side-channel Analysis
Marina Krček, Guilherme Perin
Attacks and cryptanalysis
Hyperparameter tuning represents one of the main challenges in deep learning-based profiling side-channel analysis. For each different side-channel dataset, the typical procedure to find a profiling model is applying hyperparameter tuning from scratch. The main reason is that side-channel measurements from various targets contain different underlying leakage distributions. Consequently, the same profiling model hyperparameters are usually not equally efficient for other targets. This paper...
The rising demand for data privacy in cloud-based environments has led to the development of advanced mechanisms for securely managing sensitive information. A prominent solution in this domain is the "Data Privacy Vault," a concept that is being provided commercially by companies such as Hashicorp, Basis Theory, Skyflow Inc., VGS, Evervault, Protegrity, Anonomatic, and BoxyHQ. However, no existing work has rigorously defined the security notions required for a Data Privacy Vault or proven...
End-to-end encryption (E2EE) has become the gold standard for securing communications, bringing strong confidentiality and privacy guarantees to billions of users worldwide. However, the current push towards widespread integration of artificial intelligence (AI) models, including in E2EE systems, raises some serious security concerns. This work performs a critical examination of the (in)compatibility of AI models and E2EE applications. We explore this on two fronts: (1) the integration of...
Federated Learning (FL) enables collaborative model training while preserving data privacy by avoiding the sharing of raw data. However, in large-scale FL systems, efficient secure aggregation and dropout handling remain critical challenges. Existing state-of-the-art methods, such as those proposed by Liu et al. (UAI'22) and Li et al. (ASIACRYPT'23), suffer from prohibitive communication overhead, implementation complexity, and vulnerability to poisoning attacks. Alternative approaches that...
Zero-knowledge proofs (ZKPs) are cryptographic protocols that enable one party to prove the validity of a statement without revealing the underlying data. Such proofs have applications in privacy-preserving technologies and verifiable computations. However, slow proof generation poses a significant challenge in the wide-scale adoption of ZKP. Orion is a recent ZKP scheme with linear prover time. It leverages coding theory, expander graphs, and Merkle hash trees to improve computational...
We present a novel approach for training neural networks that leverages packed Shamir secret sharing scheme. For specific training protocols based on Shamir scheme, we demonstrate how to realize the conversion between packed sharing and Shamir sharing without additional communication overhead. We begin by introducing a method to locally convert between Shamir sharings with secrets stored at different slots. Building upon this conversion, we achieve free conversion from packed sharing to...
The meteoric rise in power and popularity of machine learning models dependent on valuable training data has reignited a basic tension between the power of running a program locally and the risk of exposing details of that program to the user. At the same time, fundamental properties of quantum states offer new solutions to data and program security that can require strikingly few quantum resources to exploit, and offer advantages outside of mere computational run time. In this work, we...
\textit{Federated Learning} (FL) is a distributed machine learning paradigm that allows multiple clients to train models collaboratively without sharing local data. Numerous works have explored security and privacy protection in FL, as well as its integration with blockchain technology. However, existing FL works still face critical issues. \romannumeral1) It is difficult to achieving \textit{poisoning robustness} and \textit{data privacy} while ensuring high \textit{model accuracy}....
Federated Learning (FL) has gained lots of traction recently, both in industry and academia. In FL, a machine learning model is trained using data from various end-users arranged in committees across several rounds. Since such data can often be sensitive, a primary challenge in FL is providing privacy while still retaining utility of the model. Differential Privacy (DP) has become the main measure of privacy in the FL setting. DP comes in two flavors: central and local. In the former, a...
Space-efficient SNARKs aim to reduce the prover's space overhead which is one the main obstacles for deploying SNARKs in practice, as it can be prohibitively large (e.g., orders of magnitude larger than natively performing the computation). In this work, we propose Sparrow, a novel space-efficient zero-knowledge SNARK for data-parallel arithmetic circuits with two attractive features: (i) it is the first space-efficient scheme where, for a given field, the prover overhead increases with a...
Deep neural networks (DNNs) are valuable assets, yet their public accessibility raises security concerns about parameter extraction by malicious actors. Recent work by Carlini et al. (Crypto’20) and Canales- Martínez et al. (Eurocrypt’24) has drawn parallels between this issue and block cipher key extraction via chosen plaintext attacks. Leveraging differential cryptanalysis, they demonstrated that all the weights and biases of black-box ReLU-based DNNs could be inferred using a polynomial...
Recently, deep learning-based side-channel analysis (DLSCA) has emerged as a serious threat against cryptographic implementations. These methods can efficiently break implementations protected with various countermeasures while needing limited manual intervention. To effectively protect implementation, it is therefore crucial to be able to interpret \textbf{how} these models are defeating countermeasures. Several works have attempted to gain a better understanding of the mechanics of these...
Distributed training that enables multiple parties to jointly train a model on their respective datasets is a promising approach to address the challenges of large volumes of diverse data for training modern machine learning models. However, this approach immedi- ately raises security and privacy concerns; both about each party wishing to protect its data from other parties during training and preventing leakage of private information from the model after training through various...
In Side-Channel Analysis (SCA), statistical or machine learning methods are employed to extract secret information from power or electromagnetic (EM) traces. In many practical scenarios, raw power/EM traces can span hundreds of thousands of features, with relevant leakages occurring over only a few small segments. Consequently, existing SCAs often select a small number of features before launching the attack, making their success highly dependent on the feasibility of feature selection....
Federated Learning (FL) enables multiple clients to collaboratively train a machine learning model while keeping their data private, eliminating the need for data sharing. Two common approaches to secure aggregation (SA) in FL are the single-aggregator and multiple-aggregator models. This work focuses on improving the multiple-aggregator model. Existing multiple-aggregator protocols such as Prio (NSDI 2017), Prio+ (SCN 2022), Elsa (S&P 2023) either offer robustness only in the...
The machine learning problem of extracting neural network parameters has been proposed for nearly three decades. Functionally equivalent extraction is a crucial goal for research on this problem. When the adversary has access to the raw output of neural networks, various attacks, including those presented at CRYPTO 2020 and EUROCRYPT 2024, have successfully achieved this goal. However, this goal is not achieved when neural networks operate under a hard-label setting where the raw output...
Privacy-Preserving Machine Learning (PPML) is one of the most relevant use cases for Secure Multiparty Computation (MPC). While private training of large neural networks such as VGG-16 or ResNet-50 on state-of-the-art datasets such as ImageNet is still out of reach due to the performance overhead of MPC, GPU-based MPC frameworks are starting to achieve practical runtimes for private inference. However, we show that, in contrast to plaintext machine learning, the usage of GPU acceleration for...
Generative Pre-Trained Transformer models have been shown to be surprisingly effective at a variety of natural language processing tasks -- including generating computer code. However, in general GPT models have been shown to not be incredibly effective at handling specific computational tasks (such as evaluating mathematical functions). In this study, we evaluate the effectiveness of open source GPT models, with no fine-tuning, and with context introduced by the langchain and localGPT...
Privacy-preserving machine learning (PPML) enables multiple distrusting parties to jointly train ML models on their private data without revealing any information beyond the final trained models. In this work, we study the client-aided two-server setting where two non-colluding servers jointly train an ML model on the data held by a large number of clients. By involving the clients in the training process, we develop efficient protocols for training algorithms including linear regression,...
Deduplication is a vital preprocessing step that enhances machine learning model performance and saves training time and energy. However, enhancing federated learning through deduplication poses challenges, especially regarding scalability and potential privacy violations if deduplication involves sharing all clients’ data. In this paper, we address the problem of deduplication in a federated setup by introducing a pioneering protocol, Efficient Privacy-Preserving Multi-Party Deduplication...
Decision trees are an important class of supervised learning algorithms. When multiple entities contribute data to train a decision tree (e.g. for fraud detection in the financial sector), data privacy concerns necessitate the use of a privacy-enhancing technology such as secure multi-party computation (MPC) in order to secure the underlying training data. Prior state-of-the-art (Hamada et al.) construct an MPC protocol for decision tree training with a communication of $\mathcal{O}(hmN\log...
Researchers across various fields seek to understand causal relationships but often find controlled experiments impractical. To address this, statistical tools for causal discovery from naturally observed data have become crucial. Non-linear regression models, such as Gaussian process regression, are commonly used in causal inference but have limitations due to high costs when adapted for secure computation. Support vector regression (SVR) offers an alternative but remains costly in an...
Machine learning is widely used for a range of applications and is increasingly offered as a service by major technology companies. However, the required massive data collection raises privacy concerns during both training and inference. Privacy-preserving machine learning aims to solve this problem. In this setting, a collection of servers secret share their data and use secure multi-party computation to train and evaluate models on the joint data. All prior work focused on the scenario...
Enhancing privacy in federal learning (FL) without considering robustness can create an open door for attacks such as poisoning attacks on the FL process. Thus, addressing both the privacy and security aspects simultaneously becomes vital. Although, there are a few solutions addressing both privacy and security in the literature in recent years, they have some drawbacks such as requiring two non-colluding servers, heavy cryptographic operations, or peer-to-peer communication topology. In...
Secure multi-party computation (MPC) in a three-party, honest majority scenario is currently the state-of-the-art for running machine learning algorithms in a privacy-preserving manner. For efficiency reasons, fixed-point arithmetic is widely used to approximate computation over decimal numbers. After multiplication in fixed-point arithmetic, truncation is required to keep the result's precision. In this paper, we present an efficient three-party truncation protocol secure in the presence of...
Collaborative graph learning represents a learning paradigm where multiple parties jointly train a graph neural network (GNN) using their own proprietary graph data. To honor the data privacy of all parties, existing solutions for collaborative graph learning are either based on federated learning (FL) or secure machine learning (SML). Although promising in terms of efficiency and scalability due to their distributed training scheme, FL-based approaches fall short in providing provable...
Secure Aggregation (SA) stands as a crucial component in modern Federated Learning (FL) systems, facilitating collaborative training of a global machine learning model while protecting the privacy of individual clients' local datasets. Many existing SA protocols described in the FL literature operate synchronously, leading to notable runtime slowdowns due to the presence of stragglers (i.e. late-arriving clients). To address this challenge, one common approach is to consider stragglers as...
In a federated learning (FL) system, decentralized data owners (clients) could upload their locally trained models to a central server, to jointly train a global model. Malicious clients may plant backdoors into the global model through uploading poisoned local models, causing misclassification to a target class when encountering attacker-defined triggers. Existing backdoor defenses show inconsistent performance under different system and adversarial settings, especially when the malicious...
In this research, we introduce MIND-Crypt, a novel attack framework that uses deep learning (DL) and transfer learning (TL) to challenge the indistinguishability of block ciphers, specifically SPECK32/64 encryption algorithm in CBC mode (Cipher Block Chaining) against Known Plaintext Attacks (KPA). Our methodology includes training a DL model with ciphertexts of two messages encrypted using the same key. The selected messages have the same byte-length and differ by only one bit at the binary...
The proliferation of artificial intelligence and big data has resulted in a surge in data demand and increased data dimensionality. This escalation has consequently heightened the costs associated with storage and processing. Concurrently, the confidential nature of data collected by various institutions, which cannot be disclosed due to personal privacy concerns, has exacerbated the challenges associated with data analysis and machine learning model training. Therefore, designing a secure...
Our work aims to minimize interaction in secure computation due to the high cost and challenges associated with communication rounds, particularly in scenarios with many clients. In this work, we revisit the problem of secure aggregation in the single-server setting where a single evaluation server can securely aggregate client-held individual inputs. Our key contribution is the introduction of One-shot Private Aggregation ($\mathsf{OPA}$) where clients speak only once (or even choose not to...
In recent years, cloud vendors have started to supply paid services for data analysis by providing interfaces of their well-trained neural network models. However, customers lack tools to verify whether outcomes supplied by cloud vendors are correct inferences from particular models, in the face of lazy or malicious vendors. The cryptographic primitive called zero-knowledge proof (ZKP) addresses this problem. It enables the outcomes to be verifiable without leaking information about the...
Generative pre-trained transformers (GPT's) are a type of large language machine learning model that are unusually adept at producing novel, and coherent, natural language. Notably, these technologies have also been extended to computer programming languages with great success. However, GPT model outputs in general are stochastic and not always correct. For programming languages, the exact specification of the computer code, syntactically and algorithmically, is strictly required in order to...
Topic modelling refers to a popular set of techniques used to discover hidden topics that occur in a collection of documents. These topics can, for example, be used to categorize documents or label text for further processing. One popular topic modelling technique is Latent Dirichlet Allocation (LDA). In topic modelling scenarios, the documents are often assumed to be in one, centralized dataset. However, sometimes documents are held by different parties, and contain privacy- or...
Machine-learning systems continue to advance at a rapid pace, demonstrating remarkable utility in various fields and disciplines. As these systems continue to grow in size and complexity, a nascent industry is emerging which aims to bring machine-learning-as-a-service (MLaaS) to market. Outsourcing the operation and training of these systems to powerful hardware carries numerous advantages, but challenges arise when privacy and the correctness of work carried out must be ensured. Recent...
Side-Channel Attacks target the recovery of key material in cryptographic implementations by measuring physical quantities such as power consumption during the execution of a program. Simple Power Attacks consist in deducing secret information from a trace using a single or a few samples, as opposed to differential attacks which require many traces. Software cryptographic implementations now all contain a data-independent execution path, but often do not consider variations in power...
Running machine learning algorithms on encrypted data is a way forward to marry functionality needs common in industry with the important concerns for privacy when working with potentially sensitive data. While there is already a growing field on this topic and a variety of protocols, mostly employing fully homomorphic encryption or performing secure multiparty computation (MPC), we are the first to propose a protocol that makes use of a specialized encryption scheme that allows to do secure...
Side-channel analysis is an important part of the security evaluations of hardware components and more specifically of those that include cryptographic algorithms. Profiling attacks are among the most powerful attacks as they assume the attacker has access to a clone device of the one under attack. Using the clone device allows the attacker to make a profile of physical leakages linked to the execution of algorithms. This work focuses on the characteristics of this profile and the...
The Gradient Boosting Decision Tree (GBDT) is a well-known machine learning algorithm, which achieves high performance and outstanding interpretability in real-world scenes such as fraud detection, online marketing and risk management. Meanwhile, two data owners can jointly train a GBDT model without disclosing their private dataset by executing secure Multi-Party Computation (MPC) protocols. In this work, we propose NodeGuard, a highly efficient two party computation (2PC) framework for...
This paper introduces a new method for training decision trees and random forests using CKKS homomorphic encryption (HE) in cloud environments, enhancing data privacy from multiple sources. The innovative Homomorphic Binary Decision Tree (HBDT) method utilizes a modified Gini Impurity index (MGI) for node splitting in encrypted data scenarios. Notably, the proposed training approach operates in a single cloud security domain without the need for decryption, addressing key challenges in...
As we enter 2024, the post-quantum cryptographic algorithm Dilithium, which emerged from the National Institute of Standards and Technology post-quantum cryptography competition, has now reached the deployment stage. This paper focuses on the practical security of Dilithium. We performed practical attacks on Dilithium2 on an STM32F4 platform. Our results indicate that an attack can be executed with just two signatures within five minutes, with a single signature offering a 60% probability of...
Nowadays Federated learning (FL) is established as one of the best techniques for collaborative machine learning. It allows a set of clients to train a common model without disclosing their sensitive and private dataset to a coordination server. The latter is in charge of the model aggregation. However, FL faces some problems, regarding the security of updates, integrity of computation and the availability of a server. In this paper, we combine some new ideas like clients’ reputation with...
This paper presents SNOW-SCA, the first power side-channel analysis (SCA) attack of a 5G mobile communication security standard candidate, SNOW-V, running on a 32-bit ARM Cortex-M4 microcontroller. First, we perform a generic known-key correlation (KKC) analysis to identify the leakage points. Next, a correlation power analysis (CPA) attack is performed, which reduces the attack complexity to two key guesses for each key byte. The correct secret key is then uniquely identified utilizing...
At CRYPTO 2019, Gohr demonstrated that differential-neural distinguishers (DNDs) for Speck32/64 can learn more features than classical cryptanalysis's differential distribution tables (DDT). Furthermore, a non-classical key recovery procedure is devised by combining the Upper Confidence Bound (UCB) strategy and the BayesianKeySearch algorithm. Consequently, the time complexity of 11-round key recovery attacks on Speck32/64 is significantly reduced compared with the state-of-the-art results...
This research extends Abadi and Andersen's exploration of neural networks using secret keys for information protection in multiagent systems. Focusing on enhancing confidentiality properties, we employ end-to-end adversarial training with neural networks Alice, Bob, and Eve. Unlike prior work limited to 64-bit messages, our study spans message sizes from 4 to 1024 bits, varying batch sizes and training steps. An innovative aspect involves training model Bob to approach a minimal error value...
We introduce Helium, a novel framework that supports scalable secure multiparty computation (MPC) for lightweight participants and tolerates churn. Helium relies on multiparty homomorphic encryption (MHE) as its core building block. While MHE schemes have been well studied in theory, prior works fall short of addressing critical considerations paramount for adoption such as supporting resource-constrained and unstably connected participants. In this work, we systematize the requirements of...
Side-Channel Analysis (SCA) is critical in evaluating the security of cryptographic implementations. The search for hyperparameters poses a significant challenge, especially when resources are limited. In this work, we explore the efficacy of a multifidelity optimization technique known as BOHB in SCA. In addition, we proposed a new objective function called $ge_{+ntge}$, which could be incorporated into any Bayesian Optimization used in SCA. We show the capabilities of both BOHB and...
In side-channel analysis (SCA), the success of an attack is largely dependent on the dataset sizes and the number of instances in each class. The generation of synthetic traces can help to improve attacks like profiling attacks. However, manually creating synthetic traces from actual traces is arduous. Therefore, automating this process of creating artificial traces is much needed. Recently, diffusion models have gained much recognition after beating another generative model known as...
A zero-knowledge proof of training (zkPoT) enables a party to prove that they have correctly trained a committed model based on a committed dataset without revealing any additional information about the model or the dataset. An ideal zkPoT should offer provable security and privacy guarantees, succinct proof size and verifier runtime, and practical prover efficiency. In this work, we present \name, a zkPoT targeted for deep neural networks (DNNs) that achieves all these goals at once. Our...
Learning with Errors (LWE) is a hard math problem underlying post-quantum cryptography (PQC) systems for key exchange and digital signatures, recently standardized by NIST. Prior work [Wenger et al., 2022; Li et al., 2023a;b] proposed new machine learning (ML)-based attacks on LWE problems with small, sparse secrets, but these attacks require millions of LWE samples to train on and take days to recover secrets. We propose three key methods—better pre-processing, angular embeddings and model...
Previous studies on deep-learning-based side-channel attacks (DL-SCAs) have shown that traditional performance evaluation metrics commonly used in DL, like accuracy and F1 score, are not effective in evaluating DL-SCA performance. Therefore, some previous studies have proposed new alternative metrics for evaluating the performance of DL-SCAs. Notably, perceived information (PI) and effective perceived information (EPI) are major metrics based on information theory. While it has been...
Despite decades of effort, a chasm exists between the theory and practice of device-level biometric authentication. Deployed authentication algorithms rely on data that overtly leaks private information about the biometric; thus systems rely on externalized security measures such as trusted execution environments. The authentication algorithms have no cryptographic guarantees. This is frustrating given the research that has developed theoretical tools, known as fuzzy extractors, that...
Federated Learning (FL) is a data-minimization approach enabling collaborative model training across diverse clients with local data, avoiding direct data exchange. However, state-of-the-art FL solutions to identify fraudulent financial transactions exhibit a subset of the following limitations. They (1) lack a formal security definition and proof, (2) assume prior freezing of suspicious customers’ accounts by financial institutions (limiting the solutions’ adoption), (3) scale poorly,...
Federated Learning (FL) accomplishes collaborative model training without the need to share local training data. However, existing FL aggregation approaches suffer from inefficiency, privacy vulnerabilities, and neglect of poisoning attacks, severely impacting the overall performance and reliability of model training. In order to address these challenges, we propose SuperFL, an efficient two-server aggregation scheme that is both privacy preserving and secure against poisoning attacks. The...
The past years have witnessed a considerable increase in research efforts put into neural network-assisted profiled side-channel analysis (SCA). Studies have also identified challenges, e.g., closing the gap between metrics for machine learning (ML) classification and side-channel attack evaluation. In fact, in the context of NN-assisted SCA, the NN’s output distribution forms the basis for successful key recovery. In this respect, related work has covered various aspects of integrating...
Side-channel analysis based on machine learning, especially neural networks, has gained significant attention in recent years. However, many existing methods still suffer from certain limitations. Despite the inherent capability of neural networks to extract features, there remains a risk of extracting irrelevant information. The heavy reliance on profiled traces makes it challenging to adapt to remote attack scenarios with limited profiled traces. Besides, attack traces also contain...
We present a side-channel attack on CRYSTALS-Dilithium, a post-quantum secure digital signature scheme, with two variants of post-processing. The side-channel attack exploits information leakage in the secret key unpacking procedure of the signing algorithm to recover the coefficients of the polynomials in the secret key vectors ${\bf s}_1$ and ${\bf s}_2$ by profiled deep learning-assisted power analysis. In the first variant, one half of the coefficients of ${\bf s}_1$ and ${\bf s}_2$ is...
Private computation of nonlinear functions, such as Rectified Linear Units (ReLUs) and max-pooling operations, in deep neural networks (DNNs) poses significant challenges in terms of storage, bandwidth, and time consumption. To address these challenges, there has been a growing interest in utilizing privacy-preserving techniques that leverage polynomial activation functions and kernelized convolutions as alternatives to traditional ReLUs. However, these alternative approaches often suffer...
Secure Multiparty Computation (MPC) protocols enable secure evaluation of a circuit by several parties, even in the presence of an adversary who maliciously corrupts all but one of the parties. These MPC protocols are constructed using the well-known secret-sharing-based paradigm (SPDZ and SPD$\mathbb{Z}_{2^k}$), where the protocols ensure security against a malicious adversary by computing Message Authentication Code (MAC) tags on the input shares and then evaluating the circuit with these...
Deep learning-based profiling side-channel analysis has gained widespread adoption in academia and industry due to its ability to uncover secrets protected by countermeasures. However, to exploit this capability, an adversary must have access to a clone of the targeted device to obtain profiling measurements and know secret information to label these measurements. Non-profiling attacks avoid these constraints by not relying on secret information for labeled data. Instead, they attempt all...
Classic MLaaS solutions suffer from privacy-related risks since the user is required to send unencrypted data to the server hosting the MLaaS. To alleviate this problem, a thriving line of research has emerged called Privacy-Preserving Machine Learning (PPML) or secure MLaaS solutions that use cryptographic techniques to preserve the privacy of both the input of the client and the output of the server. However, these implementations do not take into consideration the possibility of...
Domain Generation Algorithms (DGAs) are used by malware to generate pseudorandom domain names to establish communication between infected bots and Command and Control servers. While DGAs can be detected by machine learning (ML) models with great accuracy, offering DGA detection as a service raises privacy concerns when requiring network administrators to disclose their DNS traffic to the service provider. We propose the first end-to-end framework for privacy-preserving classification as a...
This paper formally analyzes two major non-profiled deep-learning-based side-channel attacks (DL-SCAs): differential deep-learning analysis (DDLA) by Timon and collision DL-SCA by Staib and Moradi. These DL-SCAs leverage supervised learning in non-profiled scenarios. Although some intuitive descriptions of these DL-SCAs exist, their formal analyses have been rarely conducted yet, which makes it unclear why and when the attacks succeed and how the attack can be improved. In this paper, we...
As the ubiquity and complexity of system-on-chip (SoC) designs increase across electronic devices, the task of incorporating security into an SoC design flow poses significant challenges. Existing security solutions are inadequate to provide effective verification of modern SoC designs due to their limitations in scalability, comprehensiveness, and adaptability. On the other hand, Large Language Models (LLMs) are celebrated for their remarkable success in natural language understanding,...
In the rapidly advancing domain of artificial intelligence, ChatGPT, powered by the GPT-4 model, has emerged as a state-of-the-art interactive agent, exhibiting substantial capabilities across various domains. This paper aims to assess the efficacy of GPT-4 in addressing and solving problems found within cryptographic examinations. We devised a multi-faceted methodology, presenting the model with a series of cryptographic questions of varying complexities derived from real academic...
The escalation of hazards to safety and hijacking of digital networks are among the strongest perilous difficulties that must be addressed in the present day. Numerous safety procedures were set up to track and recognize any illicit activity on the network's infrastructure. IDS are the best way to resist and recognize intrusions on internet connections and digital technologies. To classify network traffic as normal or anomalous, Machine Learning (ML) classifiers are increasingly utilized. An...
Billions of dollars and countless GPU hours are currently spent on training Deep Neural Networks (DNNs) for a variety of tasks. Thus, it is essential to determine the difficulty of extracting all the parameters of such neural networks when given access to their black-box implementations. Many versions of this problem have been studied over the last 30 years, and the best current attack on ReLU-based deep neural networks was presented at Crypto’20 by Carlini, Jagielski, and Mironov. It...
The well-defined information leakage of Encrypted Search Algorithms (ESAs) is predominantly analyzed by crafting so-called leakage attacks. These attacks utilize adversarially known auxiliary data and the observed leakage to attack an ESA instance built on a user's data. Known-data attacks require the auxiliary data to be a subset of the user's data. In contrast, sampled-data attacks merely rely on auxiliary data that is, in some sense, statistically close to the user's data and hence...
A prevalent issue in the residue number system (RNS) variant of the Cheon-Kim-Kim-Song (CKKS) homomorphic encryption (HE) scheme is the challenge of efficiently achieving high precision on hardware architectures with a fixed, yet smaller, word-size of bit-length $W$, especially when the scaling factor satisfies $\log\Delta > W$. In this work, we introduce an efficient solution termed composite scaling. In this approach, we group multiple RNS primes as $q_\ell:= \prod_{j=0}^{t-1}...
In CRYPTO 2019, Gohr showed that well-trained neural networks could perform cryptanalytic distinguishing tasks superior to differential distribution table (DDT)-based distinguishers. This suggests that the differential-neural distinguisher (ND) may use additional information besides pure ciphertext differences. However, the explicit knowledge beyond differential distribution is still unclear. In this work, we provide explicit rules that can be used alongside DDTs to enhance the effectiveness...
How can a model owner prove they trained their model according to the correct specification? More importantly, how can they do so while preserving the privacy of the underlying dataset and the final model? We study this problem and formulate the notion of zero-knowledge proof of training (zkPoT), which formalizes rigorous security guarantees that should be achieved by a privacy-preserving proof of training. While it is theoretically possible to design zkPoT for any model using generic...
Machine learning is a widely-used tool for analysing large datasets, but increasing public demand for privacy preservation and the corresponding introduction of privacy regulations have severely limited what data can be analysed, even when this analysis is for societal benefit. Homomorphic encryption, which allows computation on encrypted data, is a natural solution to this dilemma, allowing data to be analysed without sacrificing privacy. Because homomorphic encryption is computationally...
Decision tree evaluation is extensively used in machine learning to construct accurate classification models. Often in the cloud-assisted communication paradigm cloud servers execute remote evaluations of classification models using clients’ data. In this setting, the need for private decision tree evaluation (PDTE) has emerged to guarantee no leakage of information for the client’s input nor the service provider’s trained model i.e., decision tree. In this paper, we propose a private...
Secure 2-party computation (2PC) enables secure inference that offers protection for both proprietary machine learning (ML) models and sensitive inputs to them. However, the existing secure inference solutions suffer from high latency and communication overheads, particularly for transformers. Function secret sharing (FSS) is a recent paradigm for obtaining efficient 2PC protocols with a preprocessing phase. We provide SIGMA, the first end-to-end system for secure transformer inference...
Keller and Sun (ICML'22) have found a gap in the accuracy between floating-point deep learning in cleartext and secure quantized deep learning using multi-party computation. We have discovered that this gap is caused by a bug in the implementation of max-pooling. In this note, we present updated figures to support this conclusion. We also add figures for another network on CIFAR-10.
Real-world healthcare data sharing is instrumental in constructing broader-based and larger clinical data sets that may improve clinical decision-making research and outcomes. Stakeholders are frequently reluctant to share their data without guaranteed patient privacy, proper protection of their data sets, and control over the usage of their data. Fully homomorphic encryption (FHE) is a cryptographic capability that can address these issues by enabling computation on encrypted data without...
Fully Homomorphic Encryption (FHE) enables privacy-preserving computation and has many applications. However, its practical implementation faces massive computation and memory overheads. To address this bottleneck, several Application-Specific Integrated Circuit (ASIC) FHE accelerators have been proposed. All these prior works put every component needed for FHE onto one chip (monolithic), hence offering high performance. However, they suffer from practical problems associated with...
The recent advancements in deep learning have brought about significant changes in various aspects of people's lives. Meanwhile, these rapid developments have raised concerns about the legitimacy of the training process of deep neural networks. To protect the intellectual properties of AI developers, directly examining the training process by accessing the model parameters and training data is often prohibited for verifiers. In response to this challenge, we present zero-knowledge deep...
Side-channel Collision Attacks (SCCA) constitute a subset of non-profiling attacks that exploit information dependency leaked during cryptographic operations. Unlike traditional collision attacks, which seek instances where two different inputs to a cryptographic algorithm yield identical outputs, SCCAs specifically target the internal state, where identical outputs are more likely. In CHES 2023, Staib et al. presented a Deep Learning-based SCCA (DL-SCCA), which enhanced the attack...
Proof-of-useful-work (PoUW), an alternative to the widely used proof-of-work (PoW), aims to re-purpose the network's computing power. Namely, users evaluate meaningful computational problems, e.g., solving optimization problems, instead of computing numerous hash function values as in PoW. A recent approach utilizes the training process of deep learning as ``useful work''. However, these works lack security analysis when deploying them with blockchain-based protocols, let alone the informal...
Privacy-preserving machine learning (PPML) promises to train machine learning (ML) models by combining data spread across multiple data silos. Theoretically, secure multiparty computation (MPC) allows multiple data owners to train models on their joint data without revealing the data to each other. However, the prior implementations of this secure training using MPC have three limitations: they have only been evaluated on CNNs, and LSTMs have been ignored; fixed point approximations...
We propose a novel protocol, Falkor, for secure aggregation for Federated Learning in the multi-server scenario based on masking of local models via a stream cipher based on AES in counter mode and accelerated by GPUs running on the aggregating servers. The protocol is resilient to client dropout and has reduced clients/servers communication cost by a factor equal to the number of aggregating servers (compared to the naïve baseline method). It scales simultaneously in the two major...
Homomorphic Encryption (HE) is a type of cryptography that allows computing on encrypted data, enabling computation on sensitive data to be outsourced securely. Many popular HE schemes rely on noise for their security. On the other hand, Differential Privacy seeks to guarantee the privacy of data subjects by obscuring any one individual's contribution to an output. Many mechanisms for achieving Differential Privacy involve adding appropriate noise. In this work, we investigate the extent to...
AI-as-a-Service has emerged as an important trend for supporting the growth of the digital economy. Digital service providers make use of their vast amount of user data to train AI models (such as image recognitions, financial modelling and pandemic modelling etc.) and offer them as a service on the cloud. While there are convincing advantages for using such third-party models, the fact that users need to upload their data to the cloud is bound to raise serious privacy concerns,...
In this paper, we provide experimental evidence for the benefits of multi-task learning in the context of masked AES implementations (via the ASCADv1-r and ASCADv2 databases). We develop an approach for comparing single-task and multi-task approaches rather than comparing specific resulting models: we do this by training many models with random hyperparameters (instead of comparing a few highly tuned models). We find that multi-task learning has significant practical advantages that make it...
In a vertical federated learning (VFL) system consisting of a central server and many distributed clients, the training data are vertically partitioned such that different features are privately stored on different clients. The problem of split VFL is to train a model split between the server and the clients. This paper aims to address two major challenges in split VFL: 1) performance degradation due to straggling clients during training; and 2) data and model privacy leakage from clients’...
In today’s interconnected world, data has become a valuable asset, leading to a growing interest in protecting it through techniques such as privacy-preserving computation. Two well-known approaches are multi-party computation and homomorphic encryption with use cases such as privacy-preserving machine learning evaluating or training neural networks. For multi-party computation, one of the fundamental arithmetic operations is the secure multiplication in the malicious security model and by...
This paper presents a dynamic, real-time approach to detecting anomalous blockchain transactions. The proposed tool, BlockGPT, generates tracing representations of blockchain activity and trains from scratch a large language model to act as a real-time Intrusion Detection System. Unlike traditional methods, BlockGPT is designed to offer an unrestricted search space and does not rely on predefined rules or patterns, enabling it to detect a broader range of anomalies. We demonstrate the...
Federated learning (FL) has gained widespread popularity in a variety of industries due to its ability to locally train models on devices while preserving privacy. However, FL systems are susceptible to i) privacy inference attacks and ii) poisoning attacks, which can compromise the system by corrupt actors. Despite a significant amount of work being done to tackle these attacks individually, the combination of these two attacks has received limited attention in the research community. To...
Gradient Boosting Decision Tree (GBDT) and its variants are widely used in industry, due to their strong interpretability. Secure multi-party computation allows multiple data owners to compute a function jointly while keeping their input private. In this work, we present Squirrel, a two-party GBDT training framework on a vertically split dataset, where two data owners each hold different features of the same data samples. Squirrel is private against semi-honest adversaries, and no sensitive...
Homomorphic encryption (HE) enables calculating on encrypted data, which makes it possible to perform privacy- preserving neural network inference. One disadvantage of this technique is that it is several orders of magnitudes slower than calculation on unencrypted data. Neural networks are commonly trained using floating-point, while most homomorphic encryption libraries calculate on integers, thus requiring a quantisation of the neural network. A straightforward approach would be to...
Tremendous efforts have been made to improve the efficiency of secure Multi-Party Computation (MPC), which allows n ≥ 2 parties to jointly evaluate a target function without leaking their own private inputs. It has been confirmed by previous research that Three-Party Computation (3PC) and outsourcing computations to GPUs can lead to huge performance improvement of MPC in computationally intensive tasks such as Privacy-Preserving Machine Learning (PPML). A natural question to ask is whether...
This paper introduces Flamingo, a system for secure aggregation of data across a large set of clients. In secure aggregation, a server sums up the private inputs of clients and obtains the result without learning anything about the individual inputs beyond what is implied by the final sum. Flamingo focuses on the multi-round setting found in federated learning in which many consecutive summations (averages) of model weights are performed to derive a good model. Previous protocols, such as...
Secure 2-party computation (2PC) of floating-point arithmetic is improving in performance and recent work runs deep learning algorithms with it, while being as numerically precise as commonly used machine learning (ML) frameworks like PyTorch. We find that the existing 2PC libraries for floating-point support generic computations and lack specialized support for ML training. Hence, their latency and communication costs for compound operations (e.g., dot products) are high. We provide novel...
During the last decade, there has been a stunning progress in the domain of AI with adoption in both safety-critical and security-critical applications. A key requirement for this is highly trained Machine Learning (ML) models, which are valuable Intellectual Property (IP) of the respective organizations. Naturally, these models have become targets for model recovery attacks through side-channel leakage. However, majority of the attacks reported in literature are either on simple embedded...
Learning with Errors (LWE) is a hard math problem underpinning many proposed post-quantum cryptographic (PQC) systems. The only PQC Key Exchange Mechanism (KEM) standardized by NIST is based on module~LWE, and current publicly available PQ Homomorphic Encryption (HE) libraries are based on ring LWE. The security of LWE-based PQ cryptosystems is critical, but certain implementation choices could weaken them. One such choice is sparse binary secrets, desirable for PQ HE schemes for efficiency...
Secure Two-party Computation (2PC) allows two parties to compute any function on their private inputs without revealing their inputs to each other. In the offline/online model for 2PC, correlated randomness that is independent of all inputs to the computation, is generated in a preprocessing (offline) phase and this randomness is then utilized in the online phase once the inputs to the parties become available. Most 2PC works focus on optimizing the online time as this overhead lies on the...
Privacy-preserving neural network based on secure multi-party computation (MPC) enables multiple parties to jointly train neural network models without revealing sensitive data. In privacy-preserving neural network, the high communication costs of securely computing non-linear functions is the primary performance bottleneck. For commonly used non-linear functions, such as ReLU, existing work adopts an offline-online computation paradigm and utilizes distributed comparison function (DCF) to...
This study presents a proof-of-concept for a cognitive-based authentication system that uses an individual's writing style as a unique identifier to grant access to a system. A machine learning SVM model was trained on these features to distinguish between texts generated by each user. The stylometric feature vector was then used as an input to a key derivation function to generate a unique key for each user. The experiment results showed that the developed system achieved up to 87.42\%...
The use of traditional cryptography based on symmetric keys has been replaced with the revolutionary idea discovered by Diffie and Hellman in 1976 that fundamentally changed communication systems by ensuring a secure transmission of information over an insecure channel. Nowadays public key cryptography is frequently used for authentication in e-commerce, digital signatures and encrypted communication. Most of the public key cryptosystems used in practice are based on integer factorization...
Hyperparameter tuning represents one of the main challenges in deep learning-based profiling side-channel analysis. For each different side-channel dataset, the typical procedure to find a profiling model is applying hyperparameter tuning from scratch. The main reason is that side-channel measurements from various targets contain different underlying leakage distributions. Consequently, the same profiling model hyperparameters are usually not equally efficient for other targets. This paper...