Black Hat’s “AgentFlayer” Reveals Zero‑Click AI Exploits Across ChatGPT, Copilot and More

Cybersecurity concept art of zero-click AI exploit hijacking ChatGPT, Gemini, and Copilot using hidden prompt injection.

Researchers at Black Hat USA 2025 demonstrated “AgentFlayer,” a suite of zero‑click prompt‑injection attacks that can silently hijack popular AI assistants such as ChatGPT, Google Gemini and Microsoft Copilot. The hacks reroute emails, steal API keys and exfiltrate data—no user clicks required.

TL;DR:

  • Zero‑click danger: Zenity’s “AgentFlayer” attack chains showed how hidden prompts can trigger malicious actions on AI platforms without any clicks.

  • Targets: ChatGPT, Copilot Studio, Cursor with Jira, Salesforce Einstein, Google Gemini and Microsoft Copilot are all vulnerable.

  • Real‑world demos: Exploits redirected Salesforce contacts, extracted API keys from Jira via Cursor and used invisible prompts in documents to leak data.

What happened

During a live session at Black Hat USA on 10 Aug 2025, Israeli security firm Zenity unveiled AgentFlayer, a set of zero‑click and one‑click exploit chains targeting leading enterprise AI platforms. These attacks rely on prompt injection—malicious instructions embedded in seemingly harmless content—to commandeer AI agents without requiring user interaction. Zenity’s research showed that ChatGPT, Microsoft’s Copilot Studio, the Cursor code editor (when integrated with Jira), Salesforce Einstein, Google Gemini and Microsoft Copilot are all susceptible.

In one demonstration, Zenity co-founder Michael Bargury planted a deceptive customer‑service case in Salesforce Einstein. When a sales rep casually asked, “What are my latest cases?” the hidden prompt triggered a workflow that replaced every customer email address with an attacker‑controlled domain, silently redirecting future communications. Another exploit, dubbed Ticket2Secret, injected a malicious prompt into a Jira ticket. When Cursor summarized the ticket, it quietly ran code that extracted API keys and other credentials from the user’s local files. A third proof‑of‑concept hid an instruction in white, one‑pixel text inside a document. Uploading that document to ChatGPT and asking for a summary caused the model to search the victim’s Google Drive for API keys and leak them via an image URL.

Zenity stresses that these exploits are not theoretical; they abused legitimate features like ChatGPT Connectors and require no user clicks. The company released its findings alongside a blog post urging vendors to adopt “hard boundaries”—technical restrictions that prevent AI agents from executing arbitrary instructions—because “soft boundaries” like training filters and classification models are easily bypassed.

Why it matters

Agent‑based AI systems are rapidly permeating business processes, automating tasks in sales, support, coding and more. Zenity’s research shows that these agents can be turned against their owners via stealthy prompt injections, allowing attackers to reroute emails, exfiltrate secrets or manipulate workflows. Since the attacks leverage the very connectors that make AI tools useful—linking ChatGPT to Google Drive or integrating Copilot with CRM systems—organisations risk exposing sensitive data whenever they embrace generative AI. The demonstration underscores that AI safety is not just about preventing biased or harmful output; it’s also about securing the data pipelines and autonomy these models gain inside enterprise environments.

Key details & numbers

  • Platforms impacted: ChatGPT, Microsoft Copilot Studio, Cursor (with Jira MCP), Salesforce Einstein, Google Gemini, Microsoft Copilot.

  • Attack vectors: Hidden prompts in CRM records, Jira tickets, calendar invites and tiny white‑text documents; exploits require zero or one user click.

  • Demonstrated outcomes: Redirected contact emails to attacker domains; stolen API keys and credentials; data exfiltration via images.

  • Response: Salesforce patched the Einstein exploit on July 11 after Zenity’s disclosure; Microsoft and OpenAI are investigating but have not promised a timeline for hardening their platforms.

Community reaction

  • On r/MachineLearning, users called the demo “terrifying” and urged companies to disable integrations until proper safeguards exist.

  • Cybersecurity researchers on X highlighted that the attacks circumvent user training entirely, stressing the need for deeper code‑level defenses.

  • A LinkedIn post by an AI ethics researcher noted that prompt‑injection vulnerabilities erode trust in autonomous agents and could slow adoption if left unaddressed.

What’s next / watchlist

Security experts expect more demonstrations at DEF CON and subsequent conferences as researchers probe new AI assistants. Regulators may push for standards around AI agent security, and vendors could introduce “safe mode” toggles or sandboxed connectors. Developers should monitor upcoming patches from OpenAI, Microsoft and Google and consider limiting their agents’ access to critical systems until stronger hard boundaries are established.

FAQs

  1. What is a zero‑click AI exploit?
    A zero‑click exploit uses hidden instructions (prompts) embedded in content to trigger actions in an AI agent without the user clicking or confirming anything.

  2. Which AI tools are vulnerable to AgentFlayer?
    Zenity’s research shows that ChatGPT, Microsoft Copilot Studio, Cursor with Jira, Salesforce Einstein, Google Gemini and Microsoft Copilot can all be hijacked via prompt injection.

  3. How can organisations protect themselves?
    Zenity recommends “hard boundaries”: technical controls such as blocking external URLs, validating prompts and restricting agent permissions. Limiting integrations and monitoring AI agent activity also reduces risk.

Share Post:
Facebook
Twitter
LinkedIn
This Week’s
Related Posts