Security researchers have developed the first functional defense mechanism capable of protecting against “cryptanalytic” attacks used to “steal” the model parameters that define how an AI system works.
“AI systems are valuable intellectual property, and cryptanalytic parameter extraction attacks are the most efficient, effective, and accurate way to ‘steal’ that intellectual property,” says Ashley Kurian, first author of a paper on the work and a Ph.D. student at North Carolina State University. “Until now, there has been no way to defend against those attacks. Our technique effectively protects against these attacks.”
“Cryptanalytic attacks are already happening, and they’re becoming more frequent and more efficient,” says Aydin Aysu, corresponding author of the paper and an associate professor of electrical and computer engineering at NC State. “We need to implement defense mechanisms now, because implementing them after an AI model’s parameters have been extracted is too late.”
At issue are cryptanalytic parameter extraction attacks. Parameters are the essential information used to describe an AI model. Essentially, parameters are how AI systems perform tasks. Cryptanalytic parameter extraction attacks are a purely mathematical way of determining what a given AI model’s parameters are, allowing a third party to recreate the AI system.
“In a cryptanalytic attack, someone submits inputs and looks at outputs,” Aysu says. “They then use a mathematical function to determine what the parameters are. So far, these attacks have only worked against a type of AI model called a neural network. However, many – if not most – commercial AI systems are neural networks, including large language models such as ChatGPT.”
So, how do you defend against a mathematical attack?
The new defense mechanism relies on a key insight the researchers had regarding cryptanalytic parameter extraction attacks. While analyzing these attacks, the researchers identified a core principle that every attack relied on. To understand what they learned, you have to understand the basic architecture of a neural network.
The fundamental building block of a neural network model is called a “neuron.” Neurons are arranged in layers and are used in sequence to assess and respond to input data. Once the data has been processed by the neurons in the first layer, the outputs of that layer are passed to a second layer. This process continues until the data has been processed by the entire system, at which point the system determines how to respond to the input data.
“What we observed is that cryptanalytic attacks focus on differences between neurons,” says Kurian. “And the more different the neurons are, the more effective the attack is. Our defense mechanism relies on training a neural network model in a way that makes neurons in the same layer of the model similar to each other. You can do this only in the first layer, or on multiple layers. And you could do it with all of the neurons in a layer, or only on a subset of neurons.”
“This approach creates a ‘barrier of similarity’ that makes it difficult for attacks to proceed,” says Aysu. “The attack essentially doesn’t have a path forward. However, the model still functions normally in terms of its ability to perform its assigned tasks.”
In proof-of-concept testing, the researchers found that AI models which incorporated the defense mechanism had an accuracy change of less than 1%.
“Sometimes a model that was retrained to incorporate the defense mechanism was slightly more accurate, sometimes slightly less accurate – but the overall change was minimal,” Kurian says.
“We also tested how well the defense mechanism worked,” says Kurian. “We focused on models that had their parameters extracted in less than four hours using cryptanalytic techniques. After retraining to incorporate the defense mechanism, we were unable to extract the parameters with cryptanalytic attacks that lasted for days.”
As part of this work, the researchers also developed a theoretical framework that can be used to quantify the success probability of cryptanalytic attacks.
“This framework is useful, because it allows us to estimate how robust a given AI model is against these attacks without running such attacks for days,” says Aysu. “There is value in knowing how secure your system is – or isn’t.”
“We know this mechanism works, and we’re optimistic that people will use it to protect AI systems from these attacks,” says Kurian. “And we are open to working with industry partners who are interested in implementing the mechanism.”
“We also know that people trying to circumvent security measures will eventually find a way around them – hacking and security are engaged in a constant back and forth,” says Aysu. We’re hopeful that there will be sources of funding moving forward that allow those of us working on new security efforts to keep pace.”
The paper, “Train to Defend: First Defense Against Cryptanalytic Neural Network Parameter Extraction Attacks,” will be presented at the Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS) being held Dec. 2-7 in San Diego, California.
This work was done with support from the National Science Foundation under grant 1943245.