Long a sci-fi trope, the concept of “adversarial AI” has transcended fiction and emerged as a stark threat in the real world. As generative AI permeates across systems and applications, its potential for criminal mischief is multiplied by the emergence of agentic AI. More deployed AI models mean more targets and attack surfaces to exploit, exposing public and private organizations to a staggering array of consequences.
At this year’s RSA Conference, the world’s largest cybersecurity event, agent-powered cyberwarfare and AI safety dominated the conversation. With AI agents proliferating and powerful tools more accessible than ever, combating adversarial attacks in a fast-evolving threat landscape requires more than AI-driven countermeasures. The front line of AI defense is led by vigilant humans operating with a holistic approach.
Unpacking Adversarial AI
Adversarial AI differs from conventional cybersecurity threats in that its methods focus not on direct attacks but on manipulating data and models to subvert AI systems and dupe them into unintended or even harmful behaviors. Regardless of whether attackers have complete knowledge of an AI model’s architecture, adversarial attacks seize on the potential for even minor perturbations in training or input data to cascade into major effects on machine learning (ML) models.
Methods for adversarial AI range from basic model theft and data manipulation (as in evasion attacks and data poisoning) to reverse-engineering models that create illicit duplicates or hunt for sensitive information within training data. A tactic known as transfer attacks expands on this premise by devising attacks against a surrogate model for use against a target model—and then replicating those attacks against similar systems.
Agentic AI as Threat Multiplier
The potential for abuse by adversarial AI is compounded by the growing adoption of agentic AI systems that comprise many individual AI agents taking autonomous action. Malicious AI agents can quickly create legions of tailored adversarial attacks against specific targets, learning from failed attempts and defensive countermeasures to adapt their tactics in real time. Akin to transfer attacks, AI agents could collaborate to target not just individual models and AI systems but entire networks as well. Further, an AI agent that uses an external application or database may be manipulated into generating adversarial inputs for other systems it interacts with, creating complex, layered attack vectors.
The real threat of adversarial AI lies in the fallout from subverted or captured systems. For example, self-driving vehicles and civil infrastructure can be hacked to wreak havoc on roadways and widespread outages. Although real-world examples of successful attacks involving adversarial or agentic AI remain rare, regulators and security teams are well aware of the vulnerabilities lurking in AI systems.
Human and Non-Human Countermeasures
Just as AI can be a force multiplier for malicious actors, so too is it a powerful ally for cybersecurity. A comprehensive defense-in-depth strategy incorporating AI capabilities may offer the best chance of keeping both human and non-human threat actors at bay. A full-spectrum approach to hardening and data integrity can build resilience in AI models and the systems they interact with.
System hardening: Finding and closing vulnerabilities, from open server ports to insecure software, is essential for minimizing attack surfaces and the opportunity for human error.
Adversarial training: AI models can be trained on a mix of clean data and a sampling of synthetic adversarial inputs to learn how to correctly identify suspect perturbations.
Output obfuscation and rate limiting: To curb reverse-engineering models (model extraction) and inference of sensitive data, these tactics limit the number of allowed queries and the amount of information an attacker can access from model outputs.
Proactive monitoring and detection: Understanding how models make decisions (a practice known as explainable AI) can contribute greatly to robust anomaly detection and threat intelligence.
Secure MLOps practices: Each stage of ML model development can be reinforced against adversarial attacks with the right precautions, from proper data hygiene and sanitized inputs to regular audits and penetration testing.
Protecting IT staff resources: Amid a turbulent threat landscape and unrelenting demands, fighting burnout among security teams is key to maintaining a secure posture as AI innovation and deployments accelerate.
The Costs of Safer AI
The real gains in productivity that AI drives are paralleled by its potential for abuse and misuse. Adversarial AI plays a central part in a proactive defense strategy for building trustworthy AI and circumventing the risks of malicious actors and agents. The more that AI grows enmeshed in our digital systems and physical world, the more vitally important robust security measures become for ensuring AI systems can operate safely and predictably.