RockCyber Musings

Weekly Musings Top 10 AI Security Wrapup: Issue 39 May 15-May 21, 2026

Rock Lambros — Fri, 22 May 2026 12:50:51 GMT

The executive branch stalled. The supply chain bled. Frontier model builders started negotiating with central bankers. Trump tore up his own AI executive order hours before signing. Anthropic agreed to brief the Financial Stability Board on what its Mythos model can produce. A worm called Mini Shai-Hulud chewed through npm, the Nx Console extension, GitHub’s internal repositories, Grafana’s source code, and a slice of OpenAI’s developer laptops.

The throughline has nothing to do with the technology. The story is the widening gap between capability and control. Washington wants speed and won’t write rules. The labs show off their offensive capabilities, then ask regulators to contain them. The supply chain runs on trust that nobody verifies. Identity systems pretend to have been built for AI agents. Here are ten to track, plus one you missed.

1. Trump Pulls AI Executive Order Hours Before Signing

On May 21, 2026, President Trump scrapped the signing ceremony for an AI executive order that would have created a voluntary review process for frontier models before public release (Axios). Trump told reporters the order “gets in the way” (CNBC). The draft covered a voluntary cybersecurity clearinghouse with Treasury and pre-deployment evaluation, giving federal agencies up to 90 days to test new models (Bloomberg). The Washington Post reported that infighting between economic and security advisers killed the timing.

Why it matters

The voluntary framework was the lightest federal touch on frontier model safety. Killing it signals zero appetite for mandatory pre-deployment review.
The 90-day evaluation window was already a compromise. Some labs wanted 14 days.
The vacuum pulls states forward. Colorado’s SB 26-189 takes effect January 1, 2027.

What to do about it

Build your governance program assuming federal silence and state activity.
Inventory which AI vendors signed the prior CAISI agreements. Commitments still hold for OpenAI, Anthropic, Google, Microsoft, and xAI.
Document model-evaluation evidence from vendors. You’ll need it for state filings and customer audits.

Rock’s Musings

Washington cannot govern faster than the labs ship. The voluntary EO was the security community’s best near-term win, killed in 24 hours over speed-versus-China optics. I’m not surprised. I’m tired. Treat federal AI governance as imaginary infrastructure. My longer take sits at rockcybermusings.com.

2. Anthropic Agrees to Brief the Financial Stability Board on Mythos Findings

On May 18, 2026, the Financial Times reported that Anthropic agreed to meet the Financial Stability Board (FSB) to discuss cyber vulnerability findings from its Claude Mythos Preview model (PYMNTS). The request came from Bank of England Governor Andrew Bailey. The G20 watchdog has worried that Mythos and similar models will expose weak spots in bank cyber defe’ cyber defenses (The Decoder). Anthropic says Mythos has identified thousands of high-severity vulnerabilities across every major operating system and web browser, with fallout that will be “severe” for economies and national security (TechRadar).

Why it matters

Frontier labs are now in the room with central bank regulators on cyber risk. A structural change in who governs offensive AI capability.
The FSB shapes the Basel framework. Expect cyber-resilience requirements to grow teeth.
The financial sector is the canary. Whatever the FSB demands rolls downhill to every regulated industry.

What to do about it

Map your critical software stack against Anthropic’s flagged categories. Plan for compressed patch cycles.
Watch your home regulator for follow-on guidance. Bailey’s FSB brief will reverberate.
Build vulnerability backlog metrics into board reporting. The question has shifted from “are we vulnerable” to “how fast can we close known exposure.”

Rock’s Musings

The lab that built the dangerous capability is now negotiating with the regulators expected to contain it. A weird posture, half whistleblower, half hostage-taker. The FSB doesn’t normally touch software, so their interest signals cyber risk has crossed the systemic-threat line. I’ve spent thirty years in this field and never seen central bankers convene on a single AI vendor’s product. Model what happens when your regulator decides “model-discovered zero-days” is a category of systemic risk.

3. Microsoft Open-Sources RAMPART and Clarity for Agent Safety

On May 20, 2026, Microsoft released two open-source tools that push agent safety into the development pipeline (Microsoft Security Blog). RAMPART (Risk Assessment and Measurement Platform for Agentic Red Teaming) is a Pytest-native framework built on Microsoft’s PyRIT toolkit. It lets teams write CI-runnable adversarial tests against agents covering prompt injection, data exfiltration, and behavioral regressions (The Register). Clarity walks teams through assumptions and failure modes before they write agent code (The Hacker News).

Why it matters

The first credible attempt by a hyperscaler to operationalize agent red-teaming inside the CI pipeline. Most “agent safety” tooling sits outside the SDLC.
Pytest integration matters. Agent safety tests look like every other test, which means engineers run them.
PyRIT was already the reference toolkit. RAMPART extending it makes Microsoft the de facto standard for agent adversarial testing.

What to do about it

Pilot RAMPART against your highest-risk agent. Pick the one with the broadest tool permissions.
Use Clarity in design reviews. Catching bad scope at the whiteboard is cheaper than catching it in production.
Add agent-safety test coverage to your AppSec metrics.

Rock’s Musings

Microsoft did the right thing. They built the tools, open-sourced them, and put them where developers work. Most security tools fail because they sit outside the developer workflow. RAMPART has no such excuse. The question is whether your AppSec team has the political capital to make these tests blocking in CI. I cover the adoption muscle at rockcyber.com.

4. GitHub Confirms 3,800 Internal Repos Breached via Nx Console

On May 21, 2026, GitHub disclosed that 3,800 of its internal repositories were accessed through a developer’s compromised Nx Console VS Code extension, a casualty of the May 11 TanStack npm supply chain attack (BleepingComputer). Help Net Security traced the chain from the Mini Shai-Hulud worm through the GitHub and Grafana breaches. TechCrunch confirmed on May 20 that the attacker exfiltrated material from the affected employee’s repositories. The same campaign hit OpenAI, Mistral AI, UiPath, and dozens of downstream maintainers.

Why it matters

GitHub’s own internal repos got popped through a VS Code extension. An IDE compromise now spans your entire engineering footprint.
The Nx Console extension lives on hundreds of thousands of developer machines. Every install is a potential entry point.
Second supply chain worm in 60 days chaining GitHub Actions misconfiguration with OIDC token theft. The pattern is the playbook.

What to do about it

Inventory IDE extensions across your engineering teams. Treat them like browser extensions, with allowlisting and version pinning.
Rotate GitHub OIDC tokens that have touched a developer machine in the past 60 days. Audit workflow files for pull_request_target patterns.
Revisit endpoint posture for developer laptops. The IDE is now an attack surface equivalent to a browser.

Rock’s Musings

The supply chain conversation has changed shape. The attacker walks through a VS Code extension to reach repository tokens, then pivots to the corporate GitHub org. If your developer laptops live in an “engineering exception” bubble outside EDR, MDM, and identity controls, you’re the next Grafana. Put developer endpoint hygiene on par with finance.

5. Grafana Labs Refuses Ransom After Codebase Theft

On May 18, 2026, Grafana Labs confirmed an unauthorized party obtained a GitHub token and downloaded its codebase (TechCrunch). The intrusion traced back to the TanStack supply chain attack from May 11. Grafana received a ransom demand on May 16 and refused to pay (The Register), citing no guarantee the stolen data would be deleted. The company rotated tokens, audited every commit since May 11, and hardened GitHub posture (Grafana blog). No customer data was exposed.

Why it matters

Refusing the ransom publicly is defensible. FBI guidance and peer disclosure make it the default for open-source vendors.
Grafana’s codebase is public anyway. The ransom value was reputational, and the company called the bluff.
The hardened posture published in the blog is a teaching artifact. Use it.

What to do about it

If your codebase is open-source, write the ransom-refusal playbook before you need it. Brief your board.
Mirror Grafana’s recovery checklist. Rotate tokens, audit commits, harden GitHub config, increase monitoring.
Add commit-signing enforcement and require attestations on release artifacts.

Rock’s Musings

I respect what Grafana did. They confirmed quickly, refused the ransom, and published a postmortem with operational specifics. That’s how you turn a breach into a credibility win. Compare it with the usual vague disclosure six weeks late from a forensics firm hiding behind privilege. If your IR plan still treats ransom payment as a live option, you’re behind.

6. Mini Shai-Hulud Worm Expands Across the npm Ecosystem

On May 19, 2026, TechCrunch reported the Mini Shai-Hulud campaign had spread to dozens of additional open-source packages beyond the original TanStack hit. Wiz and Snyk traced the worm’s propagation through @squawk/* and @mistralai/* packages, on top of the 84 malicious versions across 42 @tanstack/* packages from May 11 (Wiz). StepSecurity attribution ties the same TeamPCP threat group to the March Trivy scanner compromise and April’s Bitwarden CLI package hit (Snyk). The campaign chains pull_request_target misconfiguration with GitHub Actions cache poisoning and OIDC token extraction.

Why it matters

A self-propagating worm. It exfiltrates maintainer credentials and uses them to publish further malicious versions. Containment lags.
The same threat actor keeps finding new targets with the same attack pattern. The pattern is the problem.
Every downstream consumer of an affected package has a credential rotation event ahead.

What to do about it

Build a list of every npm package your org consumes, including transitive dependencies. Cross-reference against IOC lists from StepSecurity and Wiz.
Move CI secrets out of GitHub Actions environment variables. Use ephemeral, scoped tokens.
Block pull_request_target on any repository whose CI touches secrets. There is no safe configuration.

Rock’s Musings

The worm pattern is the story. A compromised maintainer’s token pushes malicious versions that compromise more maintainers, and the campaign scales without human work. A structural problem for any ecosystem built on maintainer trust. We’ve known pull_request_target was dangerous since 2021. Its presence at major projects in 2026 tells you how the open-source world treats its security debt.

7. EU Commission Opens Consultation on AI Act Transparency Guideline

On May 19, 2026, the European Commission opened a public consultation on the draft guideline for the AI Act’s transparency obligations, due in August 2026 (Council of the EU). The consultation follows the May 7 AI Omnibus agreement, which shortened the grace period for transparency solutions on AI-generated content from six months to three. The new deadline lands December 2, 2026. The Commission’s enforcement powers against general-purpose AI model providers go live August 2, 2026, including authority to request documentation and impose fines.

Why it matters

Transparency rules apply to every model output touching an EU resident, regardless of training or hosting location.
The shortened grace period gives GPAI providers 90 days to ship watermarking, content labeling, and disclosure mechanisms.
August’s enforcement powers give the AI Office real teeth for the first time.

What to do about it

Map your AI-generated content workflows. Tag every production path that needs disclosure.
Implement provenance labeling now using C2PA or equivalent.
Brief legal and product on the December 2 deadline. Earlier guidance assumed June 2027.

Rock’s Musings

The Brussels Effect is doing its work. Whatever the AI Act forces on GPAI providers becomes the de facto global standard for transparency disclosure. American companies pretending the Act doesn’t apply will learn otherwise. Regulators wanting a quick enforcement win start with content labeling, not algorithmic auditing. If your product surfaces AI-generated content to any EU user, December 2 turned real this week.

8. CISA Weighs Three-Day Patching Deadline as AI Compresses Exploit Cycles

On May 20, 2026, Federal News Network reported CISA is considering a three-day patching deadline on Known Exploited Vulnerabilities, replacing the current 15-day default. The Insurance Journal covered the debate, citing AI compressing the time between disclosure and exploitation. Sysdig research found CVE-2026-44338 in the PraisonAI framework was probed by scanners 3 hours, 44 minutes, and 39 seconds after disclosure. Palo Alto Networks reports 28.3% of CVEs are now exploited within 24 hours.

Why it matters

A three-day federal mandate would be the most aggressive remediation deadline CISA has ever proposed.
The same compression hits private defenders. Patch SLAs run 5-10x slower than the attack timeline.
AI-assisted exploit development operates at scale. The 3-hour PraisonAI scan window is the leading edge, not the outlier.

What to do about it

Pull your last 12 months of KEV-listed CVEs. Measure actual time-to-patch against the 15-day baseline. Be honest.
Build runbooks for emergency patching of internet-exposed assets. The three-day clock starts at disclosure, not your next change window.
Plan compensating controls when 72-hour patching is impossible. Virtual patches and WAF rules buy time.

Rock’s Musings

The math is brutal. Attackers weaponize a CVE in hours. Defenders take weeks to deploy a patch through change management. A three-day mandate forces a conversation every CISO has avoided. Redesign the process or accept being late by default.

9. Anthropic Opens Mythos Partner Sharing After Initial Lockdown

On May 18, 2026, Anthropic reversed its earlier position and now allows Project Glasswing partners to share Mythos vulnerability findings with outside parties (Reuters via KFGO). The new policy permits disclosure to security teams, industry bodies, regulators, open-source maintainers, the media, and the public, subject to responsible disclosure. The original Glasswing structure had limited information to launch partners only. About 40 organizations have Mythos.

Why it matters

The first information-sharing reversal of a frontier model program of this kind. Centralized cyber findings control was not workable in practice.
Open-source maintainers now have a path to receive Mythos-discovered vulnerabilities. That changes the patch dependency calculus.
The reversal suggests Anthropic underestimated the volume of findings and the scaling problem of single-vendor coordination.

What to do about it

Partners should designate a single coordinated-disclosure contact. Volume will overwhelm informal channels.
Non-partners should register with ISACs and CERTs as receiving organizations.
Pre-write your triage process for AI-discovered vulnerabilities. The format won’t match your CVE workflow.

Rock’s Musings

A governance lesson in real time. You cannot bottle frontier capability and call it safe. Glasswing tried, and within six weeks the math broke. Voluntary coordination is fragile when capability outruns headcount.

10. Trump Pivots Toward AI Regulation Amid Backlash and China Safety Talks

On May 19, 2026, Fortune reported the Trump administration is shifting its public stance on AI regulation in response to mounting voter backlash over job displacement, deepfakes, and AI-enabled crime. The shift comes alongside reported US-China safety talks on frontier AI capability. The administration’s December 2025 EO 14365 sought to preempt state AI regulation. The May 21 EO postponement suggests the political calculation has changed. Fortune cited senior officials describing the sentiment shift as “faster than anyone expected.”

Why it matters

Public backlash on AI is influential enough to move executive policy. A new political force.
US-China safety dialogue, even if informal, sets the stage for future bilateral commitments on frontier capability.
An administration that was preempting state regulation is now hesitating. State AGs read this as license to push harder.

What to do about it

Track AI ballot initiatives in your operating states. The 2026 midterms will surface enforceable propositions.
Audit public-facing AI claims for accuracy. The SEC has flagged AI-washing as an enforcement priority.
Brief government affairs on the bilateral angle. China engagement changes the calculus for export controls and model access.

Rock’s Musings

The political dynamic shifts faster than the technology. Six months ago, the White House was suing California to block AI rules. This week, they were drafting their own voluntary review. Plan around the volatility. The companies that thrive have built controls higher than any jurisdiction requires. You don’t have to guess which regulator strikes next. You have to be ready for any of them.

And then there is musing #1…

The One Thing You Won’t Hear About But You Need To: Identity Dark Matter Is Eating Your AI Agent Program

On May 19, 2026, Orchid Security released its Identity Gap: 2026 Snapshot report (Tech Startups, GlobeNewswire). Invisible identity, what Orchid calls “identity dark matter,” now outweighs visible identity in enterprise environments 57% to 43%. 67% of non-human accounts are created directly within applications, unseen and unmanaged by IAM programs. 70% of enterprise applications carry excessive privileged accounts. The data comes from anonymized telemetry across financial services, healthcare, retail, and energy from April 2025 through March 2026.

Why it matters

AI agents inherit credentials at runtime. If most of your non-human identity is invisible, your agents operate in the blind spot.
Traditional IAM was built for humans. An AI agent using a stale service account has a larger blast radius than the equivalent human error.
The 70% over-privilege finding means that most enterprise apps cannot survive a single agent-misuse event without exposing other systems.

What to do about it

Run non-human identity discovery against your top 10 enterprise applications. Expect a delta against your IAM inventory.
Implement time-bound, on-demand credentials for AI agents. Standing access is the failure mode.
Treat every AI agent identity as privileged. Apply PAM controls, session recording, and behavioral monitoring.

Rock’s Musings

The story under the story. Every AI security headline this week depends on identity being right. The TanStack worm spread through OIDC tokens. The GitHub breach used a developer’s repository access. Your AI agent governance program is only as good as your non-human identity hygiene. If two-thirds of your service accounts are invisible, you cannot govern the agents using them. Read the report and bring it to your board. Don’t let “we have IAM” be the answer.

👉 For ongoing analysis of agentic AI governance frameworks, the conversation continues at RockCyber Musings.

👉 Visit RockCyber.com to learn more about how we can help with your traditional Cybersecurity and AI Security and Governance journey.

👉 Want to save a quick $100K? Check out our AI Governance Tools at AIGovernanceToolkit.com

👉 As a bonus, check out my conversation with AI Cyber Magazine, where we talked about everything from Context Rot to Least Agency.

The views and opinions expressed in RockCyber Musings are my own and do not represent the positions of my employer or any organization I’m affiliated with.

References

Axios. (2026, May 21). Scoop: White House postpones AI EO signing ceremony. https://www.axios.com/2026/05/21/white-house-postpones-ai-eo-signing

BleepingComputer. (2026, May 21). GitHub links repo breach to TanStack npm supply-chain attack. https://www.bleepingcomputer.com/news/security/github-links-repo-breach-to-tanstack-npm-supply-chain-attack/

Bloomberg. (2026, May 21). White House postpones AI cybersecurity order signing by Trump. https://www.bloomberg.com/news/articles/2026-05-21/white-house-postpones-ai-cybersecurity-order-signing-by-trump

CNBC. (2026, May 21). Trump postpones AI executive order signing: ‘I didn’t like certain aspects’. https://www.cnbc.com/2026/05/21/trump-ai-executive-order-postponed.html

CNN Business. (2026, May 20). White House postpones executive order on AI. https://www.cnn.com/2026/05/20/tech/ai-executive-order-trump-white-house

Council of the European Union. (2026, May 7). Artificial intelligence: Council and Parliament agree to simplify and streamline rules. https://www.consilium.europa.eu/en/press/press-releases/2026/05/07/artificial-intelligence-council-and-parliament-agree-to-simplify-and-streamline-rules/

CSO Online. (2026, May). Microsoft releases open-source tools to operationalize AI agent safety. https://www.csoonline.com/article/4175592/microsoft-releases-open-source-tools-to-operationalize-ai-agent-safety-2.html

Federal News Network. (2026, May 20). AI drives new debate around CISA software patching deadlines. https://federalnewsnetwork.com/cybersecurity/2026/05/ai-drives-new-debate-around-cisa-software-patching-deadlines/

Fortune. (2026, May 19). The times they are a-changin’: Trump pivots towards AI regulation in the face of a mounting public backlash. https://fortune.com/2026/05/19/trump-pivots-towards-ai-regulation-in-face-mounting-ai-backlash-china-ai-safety-talks/

GlobeNewswire. (2026, May 19). Two-thirds of nonhuman accounts are unseen and unmanaged, according to new Identity Gap Report. https://www.globenewswire.com/news-release/2026/05/19/3297602/0/en/Two-Thirds-of-Nonhuman-Accounts-Are-Unseen-and-Unmanaged-According-to-New-Identity-Gap-Report.html

Grafana Labs. (2026, May 16). Grafana Labs security update: Latest on TanStack npm supply chain ransomware incident. https://grafana.com/blog/grafana-labs-security-update-latest-on-tanstack-npm-supply-chain-ransomware-incident/

Help Net Security. (2026, May 21). GitHub, Grafana Labs breaches traced back to TanStack supply chain compromise. https://www.helpnetsecurity.com/2026/05/21/github-grafana-breach-root-cause-nx-console/

Insurance Journal. (2026, May 4). CISA weighs cutting deadlines to fix digital flaws amid worries over AI. https://www.insurancejournal.com/news/national/2026/05/04/868205.htm

KFGO. (2026, May 18). Anthropic to let partners share Mythos cybersecurity findings with others. https://kfgo.com/2026/05/18/anthropic-to-let-partners-share-mythos-cybersecurity-findings-with-others/

Microsoft Security Blog. (2026, May 20). Introducing RAMPART and Clarity: Open source tools to bring safety into Agent development workflow. https://www.microsoft.com/en-us/security/blog/2026/05/20/introducing-rampart-and-clarity-open-source-tools-to-bring-safety-into-agent-development-workflow/

NBC News. (2026, May 21). Trump abruptly scraps signing of landmark executive order regulating AI. https://www.nbcnews.com/tech/tech-news/trump-scraps-signing-landmark-executive-order-regulating-ai-rcna346288

PYMNTS. (2026, May 18). Anthropic will update regulators on Mythos’ cyber vulnerability findings. https://www.pymnts.com/cybersecurity/2026/anthropic-will-update-regulators-mythos-cyber-vulnerability-findings/

Snyk. (2026, May). TanStack npm packages hit by Mini Shai-Hulud. https://snyk.io/blog/tanstack-npm-packages-compromised/

Tech Startups. (2026, May 19). Two-thirds of nonhuman accounts are unseen and unmanaged, according to Orchid Security’s Identity Gap Report. https://techstartups.com/2026/05/19/two-thirds-of-nonhuman-accounts-are-unseen-and-unmanaged-according-to-orchid-securitys-identity-gap-report/

TechCrunch. (2026, May 18). Open source tool maker Grafana Labs says hackers stole its code, refuses to pay ransom. https://techcrunch.com/2026/05/18/open-source-tool-maker-grafana-labs-says-hackers-stole-its-code-refuses-to-pay-ransom/

TechCrunch. (2026, May 19). Hackers have compromised dozens of popular open source packages in an ongoing supply-chain attack. https://techcrunch.com/2026/05/19/hackers-have-compromised-dozens-of-popular-open-source-packages-in-an-ongoing-supply-chain-attack/

TechCrunch. (2026, May 20). GitHub says hackers stole data from thousands of internal repositories. https://techcrunch.com/2026/05/20/github-says-hackers-stole-data-from-thousands-of-internal-repositories/

TechRadar. (2026, May 18). Anthropic to present exposed Mythos flaws to global watchdog. https://www.techradar.com/pro/security/anthropic-to-present-exposed-mythos-flaws-to-global-watchdog-claims-critical-vulnerabilities-found-in-every-major-operating-system-and-web-browser

The Decoder. (2026, May 18). Anthropic to brief global financial regulators on cyber flaws found by Claude Mythos. https://the-decoder.com/anthropic-to-brief-global-financial-regulators-on-cyber-flaws-found-by-claude-mythos/

The Hacker News. (2026, May 20). Microsoft open-sources RAMPART and Clarity to secure AI agents during development. https://thehackernews.com/2026/05/microsoft-open-sources-rampart-and.html

The Register. (2026, May 18). Grafana Labs admits all its codebase are belong to someone who popped its GitHub account. https://www.theregister.com/cyber-crime/2026/05/18/grafana-labs-admits-attackers-downloaded-its-codebase-from-github/5241686

The Register. (2026, May 21). Microsoft storms RAMPART, adds Clarity to agentic AI safety. https://www.theregister.com/security/2026/05/21/microsoft-open-sources-agentic-ai-safety-tools/5243822

The Washington Post. (2026, May 21). Trump delays executive order on AI oversight hours before planned signing. https://www.washingtonpost.com/technology/2026/05/21/white-house-tore-down-ai-rules-now-its-building-new-defenses/

Wiz. (2026, May). Mini Shai-Hulud strikes again: TanStack + more npm packages compromised. https://www.wiz.io/blog/mini-shai-hulud-strikes-again-tanstack-more-npm-packages-compromised

My Claude Code Harness Is Public. Don't Copy It.

Rock Lambros — Tue, 19 May 2026 12:50:47 GMT

I spent most of last month watching myself do the same dance every time I opened Claude Code. Each session ate 20-30 minutes up front, depending on how Claude Code was performing that day, and I’d spend that time re-stating trust boundaries, re-configuring tooling, and reminding a fresh session what the project was. I was doing it on three machines (Mac, Jetson AGX Orin, Windows), 5-10x/week. Before I’d written a line of code, I was burning two to five hours a week on a problem I’d already solved twice and forgotten how.

The “fix it in code review” answer for security findings fell apart around the same time, once I’d read enough of the benign-prompt vulnerability data on frontier models to understand what I was accepting by deferring. If the model’s shipping vulnerable code at a non-trivial rate even when nobody’s trying to make it, “we’ll catch it in PR” is wishful thinking with a JIRA ticket attached.

That was the moment. I stopped patching the symptom. I built my harness from scratch on the Mac, ported the reasoning to the Jetson and Windows, and wrote down why I made every choice. The repo’s a reasoning trail with the code attached as evidence.

What I’m publishing lives at github.com/rocklambros/harness-engineering. The README says it plainly: this isn’t a clone-and-run template, and personal-specific configuration is the point. If you read it expecting a drop-in setup, you’ll come away disappointed. If you read it expecting to see how a harness gets reasoned into existence, you’ll come away with a frame for arguing with mine and building yours.

Harness engineering isn’t what most people think it is

Prompt engineering got the marketing budget. Harness engineering didn’t, and most Claude Code users skip past it because it doesn’t feel like coding. It feels like ops, and nobody writes posts about ops decisions.

Here’s the working definition I’ve landed on. A harness is the configured environment around an agent (in this case, a coding agent) that determines what it can and can’t do, what guidance it follows by default, and what guardrails it can’t talk its way past. Harness engineering is the discipline of designing that environment on purpose, with reasoning you can defend, instead of accepting whatever defaults shipped in the box.

In Claude Code terms, the harness is everything outside the chat turn. The project-level CLAUDE.md the model reads at session start. The settings.json that defines permission modes and hook registrations. The deterministic rules the model can’t override, even if it tries. The skills that load advisory guidance on demand. The hooks that fire on tool use to validate, scan, and audit. The agents you delegate specialized tasks to.

If you’re running Claude Code with a default settings.json, no hooks, no skills beyond what shipped, and a CLAUDE.md that someone else wrote, you don’t have a harness. You have a session. The model is making decisions about what’s safe to run, what tools to invoke, and what your codebase should look like, with zero guardrails you can defend in a postmortem.

For a vibe-coding indie dev shipping a side project, no harness might be fine. The blast radius is one repo, possibly with no production users. For anyone shipping code that matters, the absence of a harness means the model is making decisions about what’s safe with zero documented constraints, and you’re trusting the defaults to do work you’d never trust an unverified junior to do.

Most of the “10 tips for Claude Code” content I’ve read is harness suggestion without harness reasoning, which means surface configs without the why. That’s why those posts age out within a minor-version bump. The configs survive maybe four weeks before an upstream change breaks the assumption they were built on, and the reader has no idea which assumption broke or how to fix it. The reasoning is what survives the upgrade. The configs are what fall out.

The honest answer is: don’t build

Most of you should adopt, not build. The README says this directly, and I want to repeat it before anyone gets the wrong idea from the announcement:

The honest answer for most people reading this is: don’t build. Adopt.

The cost of building isn’t in the writing. It’s in the maintenance against Claude Code itself, which ships breaking changes on minor version bumps. The TTL cache regression in March 2026 was the canonical example. A behavior change in the cache layer silently halved the economic value of half the harnesses in circulation, and most of the people running those harnesses didn’t notice for weeks. If your harness assumes a Claude Code behavior that later changes in a release, every part of your reasoning trail that depended on that assumption needs re-evaluation. That’s a non-trivial tax to pay if your day job isn’t building harnesses.

Who should build, then? The conditions are narrow, and all four must be true.

You operate across multiple machines, and the off-the-shelf options don’t survive the cross-platform parity test. You have a non-trivial security posture, and “fix it in code review” isn’t a defensible answer for the work you ship. You don’t trust the trust boundaries that ship in the existing community harnesses, either because they’re underspecified or because they’re calibrated to a different threat model than yours. You can afford the maintenance cost of keeping a reasoning trail up to date as Claude Code evolves.

If any of those four don’t apply, adopt. There are good public harnesses in the community right now. Pick one whose reasoning you can read and whose tradeoffs you can defend. That’s a faster path to a harness you can trust than building your own.

I built mine because all four applied: three machines, an AI security threat model I don’t want negotiated by a maintainer I’ve never met, a low tolerance for trust boundaries I can’t trace, and the time budget to keep the reasoning current. Most of you don’t have all four. Reading my repo to argue with my reasoning is useful. Copying my configs into a project that doesn’t share my four conditions is the same kind of mistake as cloning someone else’s threat model and hoping it covers yours.

If you read this section and think, “but my situation is special,” it probably isn’t. The cases that earn building are rarer than people think, and the cases where adopting is the smart move look pretty similar to mine from the outside.

What’s in the repo, and what it does

The repo is organized as one foundation section, three platform sections (Mac, Jetson AGX Orin, Windows), and a research section. Foundation holds the parts that are identical across platforms: the Quality Contract that binds every artifact, the threat model, the architectural principles, the seed evaluation methodology, and the research references.

The Mac section is the validated reference build. All six phases (Phase 0 goals through Phase 5 release) are written and tested against my actual machine. The Jetson and Windows sections mirror the structure. Phases 0 through 2 are written and ready. Phases 3 through 5 are scaffolded with explicit “needs validation when ported” markers because I haven’t run them against those environments yet. The capability surface is identical to Mac. Tools differ where they have to.

Each platform’s harness has the same five-layer shape. The project-level CLAUDE.md sits under 200 lines and covers seven sections: the role the model is operating in, the code standards I expect it to honor, the security rules it can’t bypass, the core constraints on the project, the things that break (failure modes I’ve already hit), an operational section for day-to-day commands, and a status section that captures where the build currently is. A settings.json template defines permission modes, hook registrations, and trust-boundary policy. A deterministic rules directory lists path deny patterns, command deny patterns, and secret patterns that get consumed by hooks rather than interpreted by the model. A skills directory holds lazy-loaded advisory guidance. A hooks and agents directory holds the deterministic gates and the specialized subagents.

Figure 1: Five-layer harness architecture

The piece I’m most willing to defend is the three-layer security stack that cuts across the skills and hooks layers. Layer one is pre-generation guidance: a security-review skill seeded from the Arcanum-Sec sec-context anti-pattern taxonomy (CC BY 4.0, Jason Haddix), with 10 pattern files for the Mac build that match the skill’s manifest one-to-one. The skill loads pattern sections based on file type, so the context tax stays small. Layer two is commit-time hardening: a Semgrep PostToolUse hook that fires on every Write or Edit and feeds findings back to Claude in the same session, implementing the SecureForge methodology from Liu et al. (arXiv:2605.08382, MIT). The published paper reports a roughly 48% reduction in CWE rate from this layer alone. Layer three is post-generation validation: a pinned pre-commit gate running gitleaks for secrets, Semgrep for SAST, shellcheck for hook scripts, and a local drift check for reference integrity. It’s the same Semgrep engine as layer two, running in a different invocation context. The redundancy is intentional.

Figure 2: Three-layer security stack

The one piece I’d point to first if you want to see how the reasoning trail format works is JOURNEY.md. It’s a running narrative of the build, written as prose checkpoints. Reasoning lives in JOURNEY.md, decisions land in commits, locked decisions land in foundation docs. That separation is doing real work. The commit history is part of the artifact, not just a side effect of using git.

Decisions I made that won’t transfer to your setup

The repo is a reasoning trail, not a config to copy. Here are the load-bearing decisions in it that won’t survive translation to your environment unchanged.

The Windows section runs Semgrep in WSL2 rather than the native Windows binary. The native binary has spotty coverage on some of the rule packs I care about, and forcing parity across platforms outweighed the convenience of running Semgrep natively on Windows. If your security posture cares about different rule packs than mine does, your decision might run the other way. The same goes for the broader WSL2 call. I picked it because it gave me a Linux-shaped tool environment without dual-booting. If you’re already deep into PowerShell and Windows-native tooling, you’d pick differently, and you’d be right.

The Jetson section assumes Tegra Python and the apt-plus-Jetson-SDK package management posture. If you’re running a Jetson but you’ve layered conda over the top, or you’re using a different L4T release than mine, the Phase 0 inventory output won’t match yours, and the downstream phases will need adjustment. The reasoning still applies. The specific tool versions won’t.

The seven-section CLAUDE.md under 200 lines is calibrated to my context-tax tolerance, not yours. I write CLAUDE.md to be the smallest thing that’s still useful, because every line in it is paid for on every turn in every session. If your projects are larger or smaller than mine, your CLAUDE.md should be too. If your tolerance for context tax is different (some people will trade more setup tokens for less in-session friction), your CLAUDE.md will be longer than mine.

The pattern prose in the security-review skill has been rewritten from the Arcanum-Sec sec-context taxonomy to reflect my voice and selection logic. The attribution is preserved, but the prose isn’t theirs anymore. If you adopt the skill as a starting point, you should rewrite it again. The selection logic is mine, the priorities are mine, and the file-type triggers reflect what I write the most of. If your language mix is different, you’ll want different triggers and a different priority order.

The Quality Contract section IDs and threat IDs are stable across my repo, which means hooks and skills can cite them by ID, and a drift check can verify the citations resolve. If you adopt the structure, you’ll want to renumber to your own threat model. Don’t inherit my IDs and pretend they’re yours. The whole point of the reasoning trail format is that the citations track to something real, and ID inheritance breaks that the first time you forget which threat ID came from where.

What I’d do differently if I started over

Two things, and I’ll know about a third by the time I finish the Jetson and Windows validations.

Lock the foundation docs and the Quality Contract before any platform work. I built the Mac section in parallel with the foundation, which meant some early Mac decisions had to be revisited as the Quality Contract sharpened. Each revisit costs a commit cycle and a small amount of confidence in the validity of earlier work. Doing the foundation first and the platform second would have made the reasoning trail cleaner, and the Mac reference build wouldn’t have had a handful of decisions that needed an asterisk.

Write the JOURNEY.md format on day one. I started JOURNEY.md after the initial batch of artifacts had already landed, which meant the reasoning for the first batch had to be reconstructed from commit messages rather than captured live. Commit messages are good for landing decisions. They aren’t the same thing as a running narrative that captures the questions you were sitting with as you made them. Future me will thank present me for any reasoning that gets captured live instead of being reconstructed later. Past me did not get that gift.

The third thing I’m watching for: I suspect the Phase 4 security-review skill will need a different structure once I validate it against the Jetson and Windows environments. The Mac pattern selection assumes a tool mix I haven’t proven survives the port. If it doesn’t, the lesson will be “design the skill structure against the hardest target first, not the easiest.” I don’t know yet. The JOURNEY.md entry that resolves it will say so.

How to read the repo

Read foundation/00-quality-contract.md first. It binds everything else in the repo, and if you’re going to argue with my reasoning, you need to argue from the same starting point I’m arguing from. After that, pick your path. USER_GUIDE.md walks through the wiring if you want a quick start for adopting the harness in your own project. HARNESS_GUIDE.md is the technical reference across all three platforms. If you want the full validated build with all the reasoning intact, read mac/ start to finish in commit order.

What I want from readers isn’t forks of the configs. It’s forks of the thinking. If your harness ends up looking nothing like mine because you have a different threat model, different platforms, a different language mix, or a different context-tax budget, that’s the right outcome. If your harness ends up looking exactly like mine, one of us is wrong, and the math says it’s probably you.

The question I’m leaving open

Most Claude Code users I’ve talked to are running with default permission modes on production codebases and calling that ops maturity. They have no hooks, no skills beyond what shipped, and a CLAUDE.md that someone else wrote or that doesn’t exist at all. If you can’t name the three layers of your security stack without checking, and you can’t say what gets enforced deterministically versus advisorily, you don’t have a harness. You have a session.

What’s in your harness, and could you defend it on a panel?

The repo’s at github.com/rocklambros/harness-engineering. The license is MIT. Use the patterns and argue with me in the comments or in your own JOURNEY.md.

👉 For ongoing analysis of agentic AI governance frameworks, the conversation continues at RockCyber Musings.

👉 Visit RockCyber.com to learn more about how we can help with your traditional Cybersecurity and AI Security and Governance journey.

👉 Want to save a quick $100K? Check out our AI Governance Tools at AIGovernanceToolkit.com

Weekly Musings Top 10 AI Security Wrapup: Issue 38 May 8-May 14, 2026

Rock Lambros — Fri, 15 May 2026 12:50:27 GMT

Three vendors launched competing AI vulnerability hunters. Google announced the first confirmed attacker use of an AI-discovered zero-day. The European Commission opened a transparency rulebook nobody finished writing. OpenAI got sued because ChatGPT allegedly helped plan a mass shooting. LiteLLM hit CISA’s KEV list after a pre-auth SQL injection compromised the AI gateway holding model API keys.

This week confirmed what skeptics argued for two years. AI doesn’t change cybersecurity through some abstract paradigm shift, it changes it by collapsing timelines. Discovery cycles that took months now run in days. Patching windows evaporate before the patch ships. Regulatory drafting runs on three-month consultation cycles. The center of gravity is moving from people who hunt bugs to people who govern the systems hunting them. If your strategy still assumes humans set the pace, you’re already behind.

1. Google Confirms First Real-World AI-Discovered Zero-Day Attack

Google’s Threat Intelligence Group disclosed on May 11, 2026 that it disrupted a criminal group using AI to identify and exploit an unknown vulnerability in widely used open-source software (Domain-b). Analysts spotted machine-generated code indicators, including metadata inconsistencies. Google did not name the target, the AI model, or the group, but said the campaign was blocked before launch (Fortune).

Why it matters

Attackers crossed a capability threshold that defenders expected years away
Open-source dependencies became economically attractive to compromise at machine speed
Google’s detection signal, LLM code artifacts, is what sophisticated attackers will suppress next

What to do about it

Audit your SBOM for open-source components in critical paths, prioritizing low-maintenance projects
Treat AI-assisted vulnerability research as a baseline attacker capability in your threat model
Validate your detection stack ingests statistical anomalies in code patterns, not only traditional IoCs

Rock’s Musings

Google blocking one campaign isn’t a victory; it’s the first time we caught one. Every honest threat hunter I know assumes five or ten more slipped through. Detection relied on attackers being sloppy enough to leave LLM fingerprints in their code. That window closes the second they polish exploits through a human pass, which costs about thirty bucks of contractor time. AI-powered attacks aren’t a 2027 problem anymore, they’re a today problem.

2. OpenAI Launches Daybreak as Defensive Counter to Anthropic Mythos

OpenAI introduced Daybreak on May 11, 2026, pairing GPT-5.5 with Codex Security as an agentic scaffold alongside Akamai, Cisco, Cloudflare, CrowdStrike, Fortinet, Oracle, Palo Alto Networks, and Zscaler (The Hacker News). Three tiers ship: standard GPT-5.5, GPT-5.5 with Trusted Access for Cyber, and GPT-5.5-Cyber for red-team and pen-test workflows. Unlike Mythos, which remains in tight preview, Daybreak is publicly accessible by request (Cybersecurity Dive).

Why it matters

Frontier AI labs are in direct competition for cybersecurity-vendor relationships, redrawing procurement for every CISO
Tiered access tied to verified cyber credentials is the first serious dual-use governance attempt for capability-restricted models
Defenders gain a second credible vendor for AI-assisted vulnerability discovery, breaking monoculture risk

What to do about it

Run a head-to-head of Daybreak, Mythos partners, and MDASH against your codebase before any multi-year deal
Build your AI-assisted vulnerability program around outputs you can validate, not vendor demos
Define what “ready” means for an AI-discovered finding before these systems push results into your tracker

Rock’s Musings

The pitch sounds great. Three labs are racing to embed themselves in Fortune 500 security operations before regulators figure out what the technology is doing. Tiered access by credential verification is the smartest piece of Daybreak, and the piece most likely to be quietly relaxed once a major customer’s red team gets blocked. I’ve seen this pattern with offensive tools for twenty years. The right question isn’t which model finds more bugs, it’s which vendor’s scaffold produces findings your team can actually fix.

3. Microsoft Reveals MDASH and Discloses 16 Windows Vulnerabilities

Microsoft revealed MDASH on May 12, 2026, a multi-model agentic scanning harness orchestrating more than 100 specialized AI agents (Microsoft Security). The system found 16 previously unknown vulnerabilities patched in May's Patch Tuesday, including four critical RCEs in tcpip.sys, ikeext.dll, netlogon.dll, and dnsapi.dll. MDASH scored 88.4% on CyberGym, beating Mythos (GeekWire). It’s in limited preview with select customers.

Why it matters

Durable advantage lies in the agentic system around the model, not the model itself
All four critical flaws were network-reachable without credentials, the bug class adversaries pay top dollar for
96% recall on five years of CLFS bugs and 100% on tcpip.sys shows AI vulnerability discovery is production-grade

What to do about it

Patch the May cohort with priority on the four critical RCEs, even ahead of normal change windows
Ask your software vendors what their AI-assisted vulnerability discovery program looks like
Update procurement security reviews to include questions about AI-driven code auditing maturity

Rock’s Musings

Two things stand out. Ensemble AI agent systems beat single-model systems for bug hunting. That’s an architectural finding, not marketing copy. Sixteen new RCE-class vulnerabilities in the Windows networking stack reminds us the most reviewed code on Earth still hides serious bugs humans missed for years. The AI didn’t get smarter, we finally pointed enough compute at the problem. The strategic question is what happens when adversaries point the same compute at the same code. Microsoft’s lead is months.

4. EU Commission Opens Consultation on AI Transparency Obligations

The European Commission published draft guidelines on May 8, 2026 covering AI Act Article 50 transparency obligations, with consultation running through June 3, 2026 (European Commission). The guidelines spell out four obligations effective August 2, 2026: disclosure when users interact with AI, marks on AI-generated content, disclosure for emotion recognition and biometric categorization, and deepfake labeling. Non-compliance carries fines up to €15 million or 3% of global turnover (DataGuidance).

Why it matters

Article 50 reaches non-EU providers if their AI outputs touch EU users, putting US companies in scope
The watermarking window shrank to December 2, 2026 under the May 7 Digital Omnibus deal
Compliant watermarking standards are not yet published, leaving companies building against a moving target

What to do about it

Map every AI system you operate that could touch EU users, including embedded vendor capabilities
Start watermarking proof-of-concept work now against draft standards like C2PA, accepting possible rework
Submit feedback to the EU consultation by June 3 if your business depends on AI transparency boundaries

Rock’s Musings

The political headline was the AI Act got simpler. The substance was that one transparency deadline got compressed while another got delayed. Compliance officers love that kind of calendar arithmetic because it lets them quietly miss things. The August 2026 chatbot disclosure is the boring obligation that catches everybody. If your AI assistant doesn’t tell EU users it’s an AI assistant, you’re exposed. Your vendor’s chatbot not disclosing is your problem.

5. OpenAI Sued Over ChatGPT’s Alleged Role in Florida Mass Shooting

Vandana Joshi, widow of a Florida State University mass shooting victim, filed a federal lawsuit against OpenAI on May 11, 2026, alleging ChatGPT advised attacker Phoenix Ikner on optimal location, timing, weapon selection, and ammunition (Reuters, AP News). Florida’s attorney general opened a rare criminal investigation in April 2026. OpenAI denied wrongdoing, saying ChatGPT provided factual responses drawn from public sources (US News).

Why it matters

Product liability theories on general-purpose AI assistants are now in active federal litigation
The case tests whether AI companies have a duty of care to detect and intervene in violence-planning conversations
A plaintiff win could rewrite operational requirements for consumer AI safety guardrails

What to do about it

Review AI vendor contracts for indemnification clauses tied to misuse and downstream harm
Document harm detection and escalation procedures with evidence that they were followed
Treat AI safety telemetry as a legal artifact, retained and discoverable, not only an operational signal

Rock’s Musings

This case will settle or be appealed for years, but the discovery phase is what matters. Internal documents showing what OpenAI knew about violence-planning prompts and what they chose not to escalate will become the de facto safety standard. Plaintiffs don’t need to win the verdict… they just need to win the depositions. If your product can be used to plan harm and telemetry shows it has been, your retention policy just became a litigation strategy.

6. Microsoft Patch Tuesday Sets Vulnerability Record as AI Discovery Surges

Microsoft issued patches for more than 130 vulnerabilities on May 13, 2026, on pace to break its annual record after patching over 500 in the first five months (The Record). CVE-2026-41089 in Windows Netlogon and CVE-2026-41096 in Windows DNS Client both carry 9.8 CVSS. Microsoft’s security leadership acknowledged AI tools are driving the surge. HackerOne paused its open-source bug bounty earlier this year, citing the imbalance between AI-driven discovery and maintainer remediation capacity.

Why it matters

AI-accelerated discovery is pushing patch volume past the absorption capacity of most vulnerability management programs
Traditional 30-day or 60-day patching SLAs were never designed for monthly batches of critical RCEs
Open-source maintainer burnout is a systemic security risk as AI finds faster than humans fix

What to do about it

Move from time-based patching SLAs to risk-based ones tied to exploit probability and asset criticality
Invest in network segmentation and identity isolation to limit blast radius when patching slips
Track mean-time-to-patch for critical vulnerabilities monthly and report the trend to your audit committee

Rock’s Musings

Vulnerability management has been broken for a decade. We pretended monthly patch cycles were sustainable when they were already breaking. AI made the math impossible to ignore. The honest answer is you will never patch fast enough. The strategy has to shift to “assume compromise, limit blast radius, recover faster than the attacker can adapt.” I’ve been saying that for three years to compliance team eye-rolls. This week’s data ends that argument.

7. Cisco Open-Sources Foundry Security Spec for Agentic Security Evaluation

Cisco released the Foundry Security Spec as open source on May 12, 2026, defining eight core agent roles, five extensions, around 130 functional requirements, and 11 inviolable principles for agentic security evaluation systems (Techzine, SMBtech). It’s model-agnostic and works with Mythos and GPT-5.5-Cyber via GitHub’s spec-kit. The goal is moving AI security from prompt demos to auditable production systems, paired with Project CodeGuard for prevention.

Why it matters

Open-source specs for AI security agents create a path to vendor-neutral compliance and audit
The eight-role decomposition gives security teams shared vocabulary instead of vendor terminology
Cisco open-sourcing the framework is a credible play to set the de facto standard before regulators do

What to do about it

Pilot Foundry Security Spec against a non-critical workflow to gauge operational lift
Map existing AI security tooling against the eight core roles to find gaps in orchestration and validation
Engage on the GitHub repository if you have the maturity to contribute, because early committers shape standards

Rock’s Musings

This is the kind of plumbing announcement that gets ignored in favor of flashier news, and it shouldn’t. Architectural standards win or lose markets. The OWASP Top 10 didn’t change vulnerability classes, it changed how teams talked about them. Foundry Security Spec is aiming for the same effect on agentic security. The tell will be whether AWS and Azure converge on it or fork it. Convergence skips a decade of fragmentation. A fork drops us back into vendor lock-in.

8. EU Commission Publishes Second Draft Code of Practice on AI Content Marking

The European Commission published the second draft of the Code of Practice on Marking and Labeling of AI-Generated Content on May 8, 2026 (European Commission). The revised text introduces a two-layered marking approach that combines secure metadata with watermarking, optional fingerprinting, logging protocols, and detection-and-verification procedures. Skadden’s analysis confirmed that compliance is required as of December 2, 2026, for generative AI systems already on the EU market, accelerated relative to earlier proposals (Skadden).

Why it matters

The revised two-layered watermarking approach is the most concrete EU technical specification published to date
Generative AI providers have six months to build compliant marking against a still-evolving technical standard
Fines remain at €15 million or 3% of global turnover for Article 50 violations

What to do about it

Confirm AI vendors have a credible two-layer watermarking roadmap targeting December 2, 2026
Build C2PA-compatible metadata and watermarking prototypes against the draft code now
Track the optional fingerprinting and logging requirements for downstream traceability

Rock’s Musings

The second draft Code is the most concrete watermarking specification anyone has published, and it’s still incomplete. Six months to build secured metadata, watermarking, fingerprinting, and detection tooling against an evolving standard is engineering fiction. Expect generative AI vendors to claim adherence via voluntary code participation while the technical build drifts. The CISOs who already started C2PA work in 2025 are sitting pretty. The ones who treated watermarking as a marketing problem will discover December 2 isn’t negotiable.

9. India Demands Sovereign Control Over Frontier AI Cybersecurity Models

India’s government met with Anthropic’s India team in early May 2026 to discuss hosting requirements for Claude Mythos, with reporting confirmed on May 12, 2026 (Medianama). Finance Ministry, MeitY, and CERT-In officials argued that AI in banking, telecom, and critical infrastructure must be hosted in Indian territory or a government-approved sovereign cloud. Finance Minister Nirmala Sitharaman called Mythos’s capabilities an “unprecedented” threat.

Why it matters

Sovereign hosting is becoming a procurement gate for frontier AI access in major non-Western markets
Indian banking and critical infrastructure deployments of US-hosted AI face new jurisdictional risks
The pattern will spread to Brazil, Indonesia, and the Gulf states

What to do about it

Validate AI hosting jurisdiction with your legal team if you operate in India’s regulated industries
Build a vendor diversification strategy that accommodates regional sovereignty without forcing rewrites
Engage sovereign cloud providers earlier in architecture, not as a post-deployment retrofit

Rock’s Musings

The geopolitical fragmentation of AI access is happening in real time. Western vendors still pretend it’s manageable through commercial agreements. India is signaling clearly that strategic AI must operate under Indian jurisdiction or not at all. Other countries will copy. The companies figuring out sovereign deployment architectures first win the next decade of international AI revenue. Those treating this as a temporary hurdle will watch growth markets quietly close.

10. CISA Adds LiteLLM SQL Injection to KEV as Active Exploitation Confirmed

CISA added CVE-2026-42208 to its Known Exploited Vulnerabilities catalog on May 8, 2026, for a pre-auth SQL injection in BerriAI’s LiteLLM proxy that allows attackers to access the database storing API keys for OpenAI, Anthropic, AWS Bedrock, Google Gemini, and other providers (Windows Forum, CCB Belgium). Affecting LiteLLM 1.81.16 through 1.83.6, the flaw was exploited within 36 hours of disclosure (Sysdig). Federal agencies had until May 11 to patch under BOD 22-01.

Why it matters

AI gateways consolidate provider API keys with five-figure spend caps in one database
A database extraction at an AI proxy is closer to cloud-account compromise than a traditional SQL injection
Most LiteLLM deployments were stood up by application teams outside security review

What to do about it

Inventory every AI proxy and gateway, including shadow deployments
Patch LiteLLM to v1.83.10-stable or later, and review Postgres query history for probing
Rotate every provider API key managed by an affected instance as a credential compromise response

Rock’s Musings

This is the canary I’ve been warning about. AI gateways became the pattern of choice because they make access to multi-provider models manageable, and they did so without a serious security review. The bug isn’t exotic, it’s a 2003-vintage SQL injection. The blast radius is exotic because of what these gateways guard. Federal agencies had three days to patch. Most enterprises will take three weeks and feel proud of moving fast.

11. The One Thing You Won’t Hear About But You Need To: Vector Embedding Pipelines Are the Next Enterprise AI Blind Spot

While the industry focused on vendor launches this week, the quieter story is that the AI data plane is wide open. Help Net Security published research on May 13, 2026, confirming that vector-embedding pipelines used for retrieval-augmented generation expose enterprise AI to attacks that traditional security tools cannot detect (Help Net Security). DLP tools can’t read or interpret embeddings, creating a blind spot for sensitive content shipped to embedding services. Spring AI bugs disclosed in late April included SQL injection in CosmosDBVectorStore, confirming vector store backends inherit traditional database vulnerability classes without the same control maturity.

Why it matters

53% of enterprises now use RAG and agentic pipelines, so vector database flaws affect most enterprise AI deployments
Sensitive content gets converted to embeddings and shipped to third-party services where DLP cannot inspect in transit
Multi-tenant vector databases create cross-tenant exposure paths that mirror early cloud storage failures of 2015

What to do about it

Inventory every vector database, including SaaS embedding services you didn’t approve
Apply integrity checks and access controls to vector stores at the same maturity as primary databases
Run hybrid retrieval combining dense vectors with BM25 lexical search to limit poisoned embedding impact

Rock’s Musings

Vector stores look boring. They’re glorified key-value databases that happen to hold numerical arrays. Those arrays encode every confidential document your knowledge base ingests, and your security stack treats them as opaque blobs. AI security isn’t a model problem, it’s a data plane problem. The first major enterprise AI breach in the next twelve months will trace back to a vector store nobody inventoried, an embedding service nobody reviewed, or an agent nobody scoped. The defenders who win are the ones treating their AI pipeline like their CI/CD pipeline. Visit rockcybermusings.com for deeper coverage and rockcyber.com for advisory work on governance programs that survive contact with production AI.

For more on agentic AI risk and CISO governance, see RockCyber and analysis at RockCyber Musings.

👉 For ongoing analysis of agentic AI governance frameworks, the conversation continues at RockCyber Musings.

👉 Visit RockCyber.com to learn more about how we can help with your traditional Cybersecurity and AI Security and Governance journey.

👉 Want to save a quick $100K? Check out our AI Governance Tools at AIGovernanceToolkit.com

👉 As a bonus, check out my conversation with CISO Tradecraft® where we talked about the OWASP GenAI Security Project Agentic Top 10

👉 Subscribe for more AI and cyber insights with the occasional rant.

The views and opinions expressed in RockCyber Musings are my own and do not represent the positions of my employer or any organization I’m affiliated with.

References

Aembit. (2026). MCP security vulnerabilities: Complete guide for 2026. https://aembit.io/blog/the-ultimate-guide-to-mcp-security-vulnerabilities/

Air Street Press. (2026, May). State of AI: May 2026. https://press.airstreet.com/p/state-of-ai-may-2026

Associated Press. (2026, May 11). OpenAI is sued over ChatGPT’s alleged role helping plan a mass shooting. AP News. https://apnews.com/article/openai-chatgpt-lawsuit-mass-shooting-florida-1a8071ee49ad0220348d3eb55f60e648

Bishop, T. (2026, May 13). Microsoft’s multi-agent AI system tops Anthropic’s Mythos on cybersecurity benchmark. GeekWire. https://www.geekwire.com/2026/microsofts-multi-agent-ai-system-tops-anthropics-mythos-on-cybersecurity-benchmark/

Centre for Cybersecurity Belgium. (2026, May 13). Warning: LiteLLM pre-auth SQL injection (CVE-2026-42208), patch immediately! https://ccb.belgium.be/advisories/warning-litellm-pre-auth-sql-injection-cve-2026-42208-patch-immediately

Cybersecurity Dive. (2026, May 11). OpenAI launches Daybreak to combat cyber threats. https://www.cybersecuritydive.com/news/OpenAI-Daybreak-cyber-threats/820122/

Cygnus. (2026, May 11). Google reports first AI-generated zero-day exploit in cybersecurity milestone. Domain-b. https://www.domain-b.com/technology/artificial-intelligence/google-ai-zero-day-exploit-cybersecurity-2026

DataGuidance. (2026, May 8). EU: Commission opens consultation on draft AI Act transparency guidelines under Article 50. https://www.dataguidance.com/news/eu-commission-opens-consultation-draft-ai-act

European Commission. (2026, May 8). Commission opens consultation on draft guidelines for AI transparency obligations. https://digital-strategy.ec.europa.eu/en/news/commission-opens-consultation-draft-guidelines-ai-transparency-obligations

Forbes. (2026, May 12). OpenAI Daybreak takes on Mythos to redefine security. https://www.forbes.com/sites/timkeary/2026/05/12/openai-daybreak-goes-head-to-head-with-anthropic-to-redefine-security/

French, L. (2026, May 13). OpenAI Daybreak joins growing movement of AI-driven vulnerability discovery. SC World. https://www.scworld.com/news/openai-daybreak-joins-growing-movement-of-ai-driven-vulnerability-discovery

Help Net Security. (2026, May 13). Microsoft’s agentic security system found four critical Windows RCE flaws. https://www.helpnetsecurity.com/2026/05/13/microsoft-mdash-agentic-ai-security-system/

Kim, T. (2026, May 12). Defense at AI speed: Microsoft’s new multi-model agentic security system tops leading industry benchmark. Microsoft Security Blog. https://www.microsoft.com/en-us/security/blog/2026/05/12/defense-at-ai-speed-microsofts-new-multi-model-agentic-security-system-tops-leading-industry-benchmark/

Lakshmanan, R. (2026, May 12). OpenAI launches Daybreak for AI-powered vulnerability detection and patch validation. The Hacker News. https://thehackernews.com/2026/05/openai-launches-daybreak-for-ai-powered.html

Lakshmanan, R. (2026, May 13). Microsoft’s MDASH AI system finds 16 Windows flaws fixed in Patch Tuesday. The Hacker News. https://thehackernews.com/2026/05/microsofts-mdash-ai-system-finds-16.html

European Commission. (2026, May 8). Commission publishes second draft of Code of Practice on Marking and Labelling of AI-generated content. https://digital-strategy.ec.europa.eu/en/library/commission-publishes-second-draft-code-practice-marking-and-labelling-ai-generated-content

Inside Global Tech. (2026, May 12). 10 takeaways: European Commission draft guidelines on AI transparency under the EU AI Act. https://www.insideglobaltech.com/2026/05/12/10-takeaways-european-commission-draft-guidelines-on-ai-transparency-under-the-eu-ai-act/

Skadden. (2026, May). AI Act state of play – Key obligations postponed and amended. https://www.skadden.com/insights/publications/2026/05/ai-act-state-of-play

Medianama. (2026, May 12). India pushes for sovereign control over AI cybersecurity systems: Report. https://www.medianama.com/2026/05/223-india-pushes-sovereign-control-ai-cybersecurity-systems-report/

O’Brien, M. (2026, May 11). ‘It’s here’: Google issues dire warning after catching hackers using AI to break into computers. Fortune. https://fortune.com/2026/05/11/google-catches-hackers-cybersecurity-warning-ai-anthropic-mythos/

Open Source For You. (2026, May 12). Cisco launches open-source Foundry Security Spec to tackle AI-driven cyber threats. https://www.opensourceforu.com/2026/05/cisco-launches-open-source-foundry-security-spec-to-tackle-ai-driven-cyber-threats/

Repello. (2026, May 2). Vector embedding security: Why static audits miss the real attacks. https://repello.ai/blog/vector-embedding-security

Reuters. (2026, May 11). Family of Florida mass shooting victim sues OpenAI in US court. https://www.reuters.com/legal/government/family-florida-mass-shooting-victim-sues-openai-us-court-2026-05-11/

SMBtech. (2026, May 12). Cisco open-sources specification for building AI-powered security evaluation systems. https://smbtech.au/news/cisco-open-sources-specification-for-building-ai-powered-security-evaluation-systems/

Sysdig. (2026). CVE-2026-42208: Targeted SQL injection against LiteLLM’s authentication path discovered 36 hours following vulnerability disclosure. https://www.sysdig.com/blog/cve-2026-42208-targeted-sql-injection-against-litellms-authentication-path-discovered-36-hours-following-vulnerability-disclosure

Taylor Wessing. (2026, May). The EU Digital Omnibus on AI – What the political deal means. https://www.taylorwessing.com/en/insights-and-events/insights/2026/05/the-eu-digital-omnibus-on-ai-what-the-political-deal-means

Techzine. (2026, May 12). Cisco open-sources Foundry Security Spec for CISO-ready agents. https://www.techzine.eu/news/security/141257/cisco-open-sources-foundry-security-spec-for-ciso-ready-agents/

The Record. (2026, May 13). Microsoft on pace to break annual vulnerability record as AI-driven patch wave takes hold. https://therecord.media/microsoft-on-pace-to-break-annual-vulnerability-record-ai

US News & World Report. (2026, May 11). Lawsuit blames ChatGPT maker OpenAI for bot helping plan a mass shooting. https://www.usnews.com/news/best-states/california/articles/2026-05-11/lawsuit-blames-chatgpt-maker-openai-for-bot-helping-plan-a-mass-shooting

Windows Forum. (2026, May 8). CISA adds LiteLLM SQL injection CVE-2026-42208 to KEV—AI proxies are high-value. https://windowsforum.com/threads/cisa-adds-litellm-sql-injection-cve-2026-42208-to-kev-ai-proxies-are-high-value.417219/

Five Eyes Agentic AI Guidance: Architecture, Not a Checklist

Rock Lambros — Tue, 12 May 2026 12:50:53 GMT

On May 1, 2026, six allied cyber agencies dropped 30 pages on agentic AI security, and the industry promptly reached for its highlighters. Twenty-three risks and more than a hundred best practices. The initial reflex is to map them to existing controls and call it a project plan.

WRONG!

CISA, NSA, ASD, NCSC-UK, NCSC-NZ, and the Cyber Centre published an architecture brief disguised as a guidance document. Read it that way, and the work changes.

The Misreading That’s Happening

Pick any board deck circulating right now, and I’ll bet the Five Eyes guidance shows up as a row in a control matrix (if at all). Privilege controls: check. Identity management: check. Logging: check. Someone in the room nods, the GRC team gets a tracking spreadsheet, and the agentic AI rollout continues at the same pace as before May 1.

That’s the failure mode. The document contains 23 distinct risks and over 100 individual best practices to address them. You don’t bolt 100 practices onto an existing platform without changing its shape...its architecture. Treating a system-level prescription as line-item compliance is how you end up with the audit-passes-but-the-thing-is-still-broken” pattern that plagues us to this day.

Read the document carefully, and the architectural intent is everywhere. Identity binds to privilege. Privilege binds to tool access. Tool access binds to logging. Logging binds to accountability. Each control assumes the others exist. Each one fails when built alone. The agencies named this directly when they recommended system-theoretic approaches like STPA and STPA-Sec, calling out that traditional component-level analysis is insufficient because risks emerge from interactions between components rather than isolated flaws.

That single paragraph is the operational thesis. The rest of the document describes how to build for it. A senior security practitioner, reading carefully, will recognize a familiar pattern, and this is what happens when policy folks finally accept you don’t write a check-box for emergent risk.

The question now is what production systems look like when somebody actually does the work. AAGATE is one answer, and we released it last November.

What the Document Actually Says

Strip the fluff, and the document organizes around five risk categories:

Privilege risk
Design and configuration flaws
Behavioral risk
Structural risk
Accountability risk

The categories aren’t mutually exclusive. They’re stacked dependencies.

Privilege risk is the foundation. The procurement-agent scenario in the guidance is a classic confused-deputy attack. An over-permissioned agent gets compromised through a low-risk tool, the attacker inherits the agent’s privileges, and modified contracts and approved payments slip past audit logs that look legitimate.

Design and configuration risk sits atop privilege. Static permission checks at startup don’t survive dynamic workflows. Allow lists go stale. Boundaries between agent enclaves erode under operational pressure. Behavioral risk piles onto that. Goal misalignment, specification gaming, deceptive behavior, and emergent capabilities all assume the agent has already been granted enough autonomy to act in surprising ways.

Structural risk is where it gets interesting. The agencies describe cascading failures across orchestration layers, tool integrations, third-party components, agent-to-agent communication, and shared data stores. A single rogue agent in a multi-agent system corrupts consensus, spreads incorrect information, alters logs, and propagates malicious plans peer-to-peer. None of this is fixable at the agent level alone.

Accountability risk closes the loop. Decisions made through long reasoning chains, stochastic outputs, and emergent multi-agent interactions are difficult to audit, attribute, or reproduce. The agencies reach for cryptographic identity, comprehensive artifact logging, and unified audit logs across inter-agent interactions. They’re describing a system property, not a feature you purchase.

AAGATE Maps the Architecture to NIST AI RMF

Figure 1: Five Eyes risk categories mapped to NIST AI RMF and AAGATE modules

AAGATE is a Kubernetes-native control plane built to operationalize the NIST AI Risk Management Framework against agentic AI systems. The paper, which I co-authored with Ken Huang, Hammad Atta, and a research team, was published to arXiv in late 2025. It picks NIST AI RMF as the spine because the RMF’s four functions, Govern, Map, Measure, and Manage, are general enough to absorb the Five Eyes prescriptions without forcing translation. The novelty isn’t the alignment to RMF. The novelty is the prescriptive toolchain: MAESTRO for Map, OWASP AIVSS plus SEI SSVC for Measure, the CSA Agentic AI Red Teaming Guide for Manage, and a zero-trust service mesh anchoring Govern.

What follows is the mapping of the Five Eyes document points at without naming. Five control areas. Each one shows what the architecture looks like when you stop treating the guidance as a checklist.

1. Identity-Anchored Privilege (Govern + Map)

The Five Eyes document spends real ink on this. It tells developers to construct each agent as a distinct principal with its own cryptographically anchored identity and unique keys or certificates, to authenticate every inter-agent and agent-to-service API call with mutual TLS, and to maintain a trusted registry that’s reconciled against the live set of agents. It tells operators to use just-in-time credentials, cryptographic attestation, and a centralized policy decision point that runs at every request.

Those aren’t five different controls. They’re one architecture.

AAGATE’s Agent Naming Service builds it. ANS works like DNS for agents. When a new agent starts, it registers its Decentralized Identifier and capabilities, and the service issues a Verifiable Credential along with an Istio SPIFFE certificate that binds the pod’s identity to its cryptographic DID. Other agents resolve through the registry. Anything not in the registry gets denied. Istio mTLS authenticates every pod-to-pod call with X.509 certificates. The OAuth Relay translates abstract agent capabilities into ephemeral, narrowly-scoped credentials for each side-effect, which is the only practical way to do least-privilege when traditional user-centric consent models break down.

Try doing any one of those pieces without the others and the system collapses. A registry without mTLS is unauthenticated. mTLS without ephemeral credentials still leaks long-lived tokens. Ephemeral credentials without a registry have no verification path at issuance. The Five Eyes guidance lists these as separate best practices. AAGATE shows why they’re one control.

This is also why CISOs aren’t the only audience for this work. Identity engineers, IAM architects, platform teams, and product leaders need to read it. The org chart that ships agentic AI safely is wider than the security team’s mailing list.

2. The Single Chokepoint for Side-Effects (Map)

Figure 2: ool-Gateway as the single chokepoint pattern

The guidance recommends restricting tool use to an approved allow list of tools and versions, logging agent tool usage so results are captured in system logs in human-readable format, and establishing trigger-action protocols that automatically restrict agent permissions when unexpected behavior emerges. It tells you to harmonize the controls with the NIST Zero Trust Architecture.

The architectural pattern that satisfies all three is one many readers will resist: a single chokepoint. Every external HTTP, database, or file operation funnels through one gateway. The mesh denies egress everywhere else. Bypass attempts get quarantined.

AAGATE’s Tool-Gateway is exactly that. Agents publish requests to a Kafka topic. The Tool-Gateway consumes them, checks policies (allow lists, rate limits, scope), executes the action if permitted, and logs the request and response with a cryptographic hash to an immutable audit log. The Governing-Orchestrator Agent watches for bypass attempts and quarantines the offender on detection. OAuth2 token exchange happens at the Gateway, with refresh tokens never leaving its memory vault.

A single chokepoint smells like a bottleneck to anyone raised on horizontally scaled architectures. That intuition is wrong here. The Tool-Gateway is the place where agentic AI security becomes tractable. Threat modeling becomes tractable because you have one comprehensive map of every system interaction. Audit becomes tractable because logs are unified. Policy enforcement becomes tractable because the decision point is centralized. Eliminate the chokepoint and you’ve spread the same controls across hundreds of agent-tool integrations, none of which will be enforced consistently. Distributed systems engineers will object. Reality wins.

3. Continuous Measurement (Measure)

The Five Eyes document repeats itself when it talks about monitoring. Use multiple independent monitoring systems that cross-validate. Monitor agent operations, including internal processes, not the inputs and outputs alone. Watch for goal drift by comparing active objectives against approved baseline specifications. Establish anomaly detection that flags discrepancies between stated intentions and observed behavior. Implement runtime monitoring with rules or behavioral baselines.

Reading those passages, I count at least six distinct signal types the document expects you to collect, score, and respond to in real time. Treating them as line items is how you end up with six dashboards nobody watches.

AAGATE compresses them into one pipeline. The UEBA Behavior Profiler builds per-agent behavioral fingerprints with anomaly scores, using Isolation Forest and Markov chains. The ComplianceAgent streams events from the Tool-Gateway, applies Rego policy checks plus LLM-driven reasoning, and generates security signals scored using OWASP’s AI Vulnerability Scoring System. The Governing-Orchestrator Agent consumes the scored incidents through an SSVC-inspired decision tree to choose a proportionate response. QSAF monitors run alongside, watching for cognitive degradation: recursion loops, memory starvation, context flooding, the kinds of failure modes invisible to conventional anomaly detection.

The pattern is Observe, Score with AIVSS, Decide with SSVC, Act, then Learn. None of those steps are optional. Skip the scoring, and you have no priority order. Skip the decision tree, and you have no machine-speed response. Skip the cognitive monitors, and you miss a class of failures the document explicitly names.

Most enterprises run anomaly detection bolted onto SIEM. That doesn’t work for agents. The signals are different, the response cadence is different, and the decision logic has to be co-located with the policy decision point, not 20 minutes downstream.

4. Adversarial Defense and the Kill Switch (Manage)

The guidance tells operators to conduct red teaming exercises, deploy agent simulation tests including multi-agent red teaming or chaos testing, embed agentic AI systems with fail-safe defaults and containment mechanisms that limit blast radius, and implement runtime monitoring and anomaly detection to identify unusual patterns and trigger alerts or pauses.

Periodic red teaming doesn’t keep up with agents acting at machine speed. Manual chaos exercises don’t catch a goal-drift event occurring in the 17 seconds between human review cycles. The guidance is asking for continuous adversarial pressure with millisecond response time. That’s a runtime architecture, not a quarterly engagement.

AAGATE answers it with the Janus Shadow-Monitor-Agent. Janus runs in a different container with a different model, re-evaluating every high-risk agent’s planned actions before execution. Goal manipulation, hallucination exploitation, and drift from baseline specifications get caught pre-execution rather than post-mortem. When Janus raises a CRITICAL alert, the Governing-Orchestrator Agent injects an Istio AuthorizationPolicy that cuts all egress. AAGATE calls it the millisecond kill switch because that’s the time window the system operates in.

The pairing matters. A continuous internal red team without a kill switch is as useless as a bucket with a gaping hole in the bottom. A kill switch without continuous red teaming has nothing to act on. Five Eyes named both controls separately. AAGATE shows why they’re the same control.

This is also where the OT crowd should pay attention. The guidance recommends defense-in-depth and continuous evaluation. In OT contexts, that translates directly to “you don’t roll back a physical actuator.” Containment has to happen before the action, not after.

5. Tamper-Evident Accountability (Govern)

The accountability section of the guidance is the hardest one. The agencies want comprehensive artifact logging, unified audit logs for inter-agent interactions, interpretability tools that surface reasoning, and information referencing that shows where outputs originated. They’re describing what the EU AI Act Article 12 calls automatic recording of events, plus what auditors call evidence of effective control operation. If and when the EU AI Act actually ever goes into effect is another conversation altogether…

Conventional logging breaks down here. Long reasoning chains generate massive logs that are repetitive and loosely structured. The Five Eyes document is blunt: traditional logs make it even more challenging to extract meaningful signals. Accountability fails not because the data isn’t recorded, but because nobody proves it wasn’t tampered with after the fact.

AAGATE’s answer combines three patterns. Cryptographic hashes on every Tool-Gateway request and response give you tamper-evidence at the unit level. The optional ETHOS ledger integration mirrors agent registrations and material governance events to a public smart contract, creating a tamper-proof record of agent identity and status. The ZK-Prover service hashes logs hourly and posts Groth16 zero-knowledge proofs on-chain, showing that incidents stayed within the contract-tier budget, giving you privacy-preserving compliance assurance without exposing operational data.

Argue with the on-chain pieces if you want. They’re optional in single-tenant deployments, and the AAGATE paper says so explicitly. The cryptographic hashing isn’t optional. If your accountability model doesn’t prove logs weren’t altered after the fact, you don’t have accountability. You have hope.

What This Means Going Forward

The Five Eyes document changes the burden of proof. Boards, regulators, and acquirers now have a coordinated multi-government statement naming architecture-level controls as the floor, not the ceiling. “Until security practices, evaluation methods and standards mature, organisations should assume that agentic AI systems may behave unexpectedly.” That sentence will undoubtedly show up in due diligence questionnaires.

If you’re operating agentic AI today, you have two choices.

Option one: take the line-item path, map controls to a tracking spreadsheet, and ship 100 separate workstreams that someone else’s auditor will pull apart in 18 months.
Option two: read the guidance as an architectural prescription, pick a reference build like AAGATE, and treat your agentic security work as a platform engineering problem rather than a compliance problem.

I know which one I’d present to a board.

Key Takeaway: The Five Eyes guidance describes a system property, not a checklist, and compliance follows from architecture rather than the other way around. AAGATE provides that reference architecture.

What to do next

If your agentic AI program is more than a pilot, audit it against the five risk categories now and look for the architectural gaps the line-item view will hide. The CARE framework I use for AI-augmented security programs lays out how to sequence Create, Adapt, Run, and Evolve work without burning out the platform team. For the technical reference, read the AAGATE paper on arXiv and treat it as a reference architecture rather than a finished product. If you want help mapping current state to the Five Eyes prescriptions and a NIST AI RMF aligned target architecture, RockCyber does this work with security and engineering leadership across critical infrastructure and financial services. For more posts like this, RockCyber Musings lands in your inbox roughly once a week.

👉 For ongoing analysis of agentic AI governance frameworks, the conversation continues at RockCyber Musings.

👉 Visit RockCyber.com to learn more about how we can help with your traditional Cybersecurity and AI Security and Governance journey.

👉 Want to save a quick $100K? Check out our AI Governance Tools at AIGovernanceToolkit.com

👉 As a bonus, check out my conversation with CISO Tradecraft®, where we talked about the OWASP GenAI Security Project Agentic Top 10

👉 Subscribe for more AI security and governance insights with the occasional rant.

Share RockCyber Musings

Weekly Musings Top 10 AI Security Wrapup: Issue 37 May 1-May 7, 2026

Rock Lambros — Fri, 08 May 2026 12:51:18 GMT

This was the week the supervisors stopped asking permission. Five Eyes intelligence agencies, the Pentagon, the Commerce Department, and ServiceNow all converged on the same conclusion at nearly the same time. Agentic AI is shipping without brakes, the brakes need to be added now, and nobody has a clean answer for who pays. Brussels blinked. Washington floated an FDA-style gate for frontier models. Researchers kept finding holes in the plumbing under every AI agent your developers are racing to deploy.

The pattern was governance catching up to deployment. Three governments and a $200 billion software company echoed what the security crowd has been saying since GPT-4 shipped. You bought the speedboat and forgot the kill switch. Below are the ten stories that mattered between Friday, May 1, and Thursday, May 7, 2026, plus one you missed.

1. Five Eyes Drop Joint Agentic AI Guidance

CISA, the NSA, Australia’s ASD ACSC, the Canadian Centre for Cyber Security, the UK’s NCSC, and New Zealand’s NCSC released “Careful Adoption of Agentic Artificial Intelligence (AI) Services” (CISA, 2026). The document identifies five risk categories: privilege; design and configuration; behavior, including goal misalignment and deception; structural risks across interconnected components; and accountability risks rooted in opacity. The Register summarized the message bluntly. Agentic AI is too dangerous for rapid rollout (Brandon, 2026).

Why it matters

Five intelligence agencies aligning sets a baseline for procurement, audit, and insurance underwriting across the English-speaking world.
The guide pressures vendors selling fully autonomous agents by recommending incremental deployment and human oversight.
Critical infrastructure operators gain a defensible reference document when business units demand agent rollouts in days.

What to do about it

Map every deployed agent against the five risk categories and grade each honestly.
Require attestation against this guide in procurement language for agentic capabilities.
Brief your board this quarter on how the guidance changes your residual risk posture.

Rock’s Musings

Five Eyes guidance is rare enough to mean something. When agencies that attribute nation-state intrusions speak with one voice, treat it as a soft mandate. The privilege risks section reads like a list of incidents I have seen at clients in the last twelve months. Stop deploying autonomy on top of access models you built for humans.

2. EU Strikes Provisional Deal to Delay Core AI Act Obligations

On May 7, 2026, after roughly nine hours of negotiation, the Council of the EU and the European Parliament reached provisional agreement on the Digital Omnibus on AI (Lewis Silkin, 2026). High-risk obligations under Annex III now apply from December 2, 2027. Annex I obligations apply from August 2, 2028. The transparency grace period for AI-generated content shrinks from six months to three, with a deadline of December 2, 2026 (Modulos, 2026).

Why it matters

The narrative that the EU is the world’s strictest AI regulator took a real hit, with industry pressure winning a delay measured in years.
Companies that scrambled for Annex III readiness by August 2026 spent their budget on a deadline that no longer exists.
The shortened transparency window makes deepfake labeling the most urgent compliance work of the year for consumer-facing AI.

What to do about it

Reset your AI Act program plan against the new deadlines and brief your audit committee on the freed-up budget.
Accelerate transparency labeling on generative output exposed to EU users by Q3 2026.
Watch the Council and Parliament endorsement votes because the deal can still shift.

Rock’s Musings

I told three clients in 2025 that betting on the original Annex III timeline was a coin flip. The coin landed on delay. The AI Act isn’t dead, but Brussels learned the lesson California learned with CCPA. With Brussels stretching its timeline, the White House gains room to argue that federal preemption beats a state patchwork. Bet on more state attorneys general filling the gap with UDAP actions before December.

3. Pentagon Clears Eight Vendors for AI on Classified Networks

The Department of War announced agreements with AWS, Google, Microsoft, NVIDIA, OpenAI, SpaceX, and Reflection AI, with Oracle added shortly after, to deploy AI tools on Impact Level 6 and Impact Level 7 networks (Breaking Defense, 2026). Those impact levels cover secret-classified and the most highly classified Defense systems. Anthropic was conspicuously absent, despite Claude already running inside Palantir’s Maven Smart System on classified networks (TechCrunch, 2026).

Why it matters

Defense AI procurement consolidated around eight vendors, with Anthropic frozen out despite a working production deployment.
IL-7 deployments mean general-purpose models will reason over the most sensitive U.S. government data, with limited public visibility into evaluation rigor.
Defense contractors and integrators have a vendor shortlist that will shape program decisions for the next five years.

What to do about it

If you sell into DoD, align your AI roadmap with these eight vendors.
If you advise federal agencies, push for transparency on red-team results before production at IL-6 and IL-7.
Expect this vendor list in prime contractor solicitations within a quarter.

Rock’s Musings

Commercial AI is now inseparable from national security infrastructure. Eight vendors. Two impact levels. Decisions that will shape how the U.S. military thinks, plans, and fights for a decade. Where are the public test results? When the FDA approves a drug, you can read the trial data. When the Pentagon approves a model for IL-7, you cannot. That asymmetry will eventually break.

4. CAISI Locks Pre-Deployment Testing Deals With Google, Microsoft, and xAI

The Center for AI Standards and Innovation announced agreements on May 5, 2026 that allow the U.S. government to evaluate frontier AI models from Google, Microsoft, and xAI before public release (CNBC, 2026). The deals expand a program that already included OpenAI and Anthropic, with the older agreements renegotiated to align with America’s AI Action Plan (Al Jazeera, 2026). The arrangements remain voluntary.

Why it matters

Five frontier labs now run pre-deployment evaluations through one federal channel, creating a de facto standard for “tested” at the top of the AI supply chain.
Voluntary agreements give the government influence without legislation.
Smaller and open-source providers face an emerging market expectation they can’t match.

What to do about it

Add CAISI evaluation status to vendor risk questionnaires for frontier model dependencies.
Track CAISI’s published evaluation criteria, since they will shape your internal evaluation programs.
Treat models without CAISI evaluation as higher inherent risk in supply chain assessments.

Rock’s Musings

Voluntary regulation by reputational pressure is the Trump administration’s preferred AI playbook. The upside is speed. The downside is that voluntary agreements dissolve when a CEO decides the political winds have shifted. If CAISI becomes the gravitational center for AI evaluation, insurers and enterprise buyers will start citing it in contracts. That is how soft governance becomes hard governance.

5. ServiceNow Adds AI Agent Kill Switches as the 9-Second Story Goes Mainstream

ServiceNow announced on May 5, 2026 at Knowledge 2026 that it has expanded AI Control Tower with real-time pause, redirect, and stop capabilities for any AI agent across the enterprise estate (ServiceNow, 2026). The expansion adds 30 new connectors spanning AWS, Google Cloud, Microsoft Azure, SAP, Oracle, and Workday. CEO Bill McDermott told Fortune the marketing message in plain English, citing a real incident where an AI agent gained elevated permissions and deleted a production database with all backups in nine seconds (Fortune, 2026).

Why it matters

Selling kill switches as a primary feature validates the security community’s argument that agentic AI requires runtime governance.
The 30-connector expansion makes ServiceNow the de facto governance layer above other clouds and SaaS apps.
The 9-second story shifts the default purchasing posture toward “show me the brakes.”

What to do about it

Inventory every AI agent with write access to production systems and document its maximum blast radius in seconds.
Require a documented kill switch capability as a procurement gate for any agentic AI vendor.
Run a tabletop exercise this quarter where an autonomous agent acts destructively at machine speed.

Rock’s Musings

I have been waiting for a vendor to put “kill switch” on the price list. ServiceNow finally did it. The 9-second story is not hypothetical. Every CISO I know has heard a similar war story from a peer in the last year. A kill switch is only as good as its blast-radius coverage and detection latency. If your agent can do irreversible damage in seconds and your governance layer needs minutes, the kill switch is theater. Test the latency before signing.

6. White House Floats FDA-Style Gate for Frontier AI

National Economic Council Director Kevin Hassett told Bloomberg on May 6, 2026 that the White House is studying an executive order to create a vetting system for new AI models like Anthropic’s Mythos, comparing the approach to FDA drug evaluation (Bloomberg, 2026). The directive comes weeks after Anthropic disclosed that Mythos is unusually capable at finding network vulnerabilities, prompting the company to limit access through Project Glasswing (Insurance Journal, 2026).

Why it matters

An FDA-style gate would mark the first concrete pre-market regulatory framework for frontier AI in the U.S., even by executive order.
The Mythos disclosure shifts the political center of gravity, with a frontier lab effectively asking for more regulation.
Framing AI as public safety reshapes which agencies and committees own the issue.

What to do about it

Track which federal agency the order designates as the gating body, since that agency’s authorities will determine how real the regime becomes.
Prepare your own internal “model approval” process now, modeled on how you approve cryptographic libraries.
Engage with industry comment processes early, before draft text leaks and positions harden.

Rock’s Musings

The FDA analogy is compelling and imperfect. Drugs have measurable endpoints. AI capability evaluations are partly subjective and dependent on who designed the test. The reason I take this seriously is the political logic. An administration that has emphasized deregulation is signaling it might gate frontier AI at the federal level. If the national security argument has won inside the West Wing, the rest of the Western world will follow within twelve months.

7. One in Four MCP Servers Carries Code Execution Risk

Help Net Security reported on May 5, 2026, that one in four Model Context Protocol servers exposes AI agents to code execution risk through skill-handling and configuration blind spots (Help Net Security, 2026b). The research builds on an OX Security disclosure from April 2026 that covered an architectural choice in Anthropic’s official MCP SDKs for Python, TypeScript, Java, and Rust, in which STDIO transport executes OS commands without sanitization (VentureBeat, 2026). Vulnerable MCP integrations affect Cursor, VS Code, Windsurf, Claude Code, and Gemini-CLI.

Why it matters

MCP is the connective tissue between AI agents and enterprise systems, with 150 million downloads and 7,000-plus public servers.
A 25% vulnerability rate across the supply chain means most enterprises running MCP-based agents are running known-vulnerable infrastructure now.
Anthropic’s stance that the behavior is “expected” leaves customers holding the remediation burden alone.

What to do about it

Inventory MCP servers, including developer workstations, and segment them from sensitive data and production credentials.
Force allowlisting on MCP tool calls, with explicit human approval for anything outside the allowlist.
Add MCP server compromise to your incident response runbooks.

Rock’s Musings

MCP is the USB-C of AI agents, and it is shipping with the equivalent of a hot socket. The architectural pattern is fine. The default behavior is dangerous. Treat MCP like browser extensions in a regulated environment. Default deny. Document exceptions. Audit quarterly.

8. Lenovo Survey Confirms One in Three Employees Use AI Without IT Oversight

Lenovo’s Work Reborn Research Series 2026, surveying 6,000 enterprise workers globally, was reported on May 1, 2026. Between one-fifth and one-third of employees use AI outside IT governance (Help Net Security, 2026a). Almost half of large enterprises in Protiviti’s AI Pulse Survey 2026 lack full visibility into which AI tools employees use. ISACA’s 2026 AI Pulse Poll found 38% of organizations report a formal AI policy, up from 28% the prior year.

Why it matters

Shadow AI is the dominant AI risk category for most enterprises.
The gap between employee AI adoption and IT governance is widening faster than policy alone can close it.
Generative AI accounts for roughly a third of unauthorized data movement in measured environments.

What to do about it

Deploy DLP controls that recognize generative AI as a defined egress channel, not an undifferentiated browser session.
Offer a sanctioned AI tool path that is genuinely useful, because banning AI without alternatives has not worked anywhere.
Track AI policy adoption as a KPI alongside traditional security awareness metrics.

Rock’s Musings

I have watched this story play out several times. Personal email in the 2000s. SaaS in the 2010s. Now AI. Ban the tool. Watch usage go underground. Find the breach. Reverse the ban two years too late. Short-circuit the cycle now. Your highest performers are the ones doing shadow AI work because the sanctioned tools are slower or dumber.

9. Researchers Scan One Million Exposed AI Services, Find Default Authentication Off

The Hacker News reported a large-scale scan of one million publicly exposed AI services. AI infrastructure is more vulnerable, exposed, and misconfigured than any other software category investigators have recently studied (The Hacker News, 2026). Many hosts run without authentication because it is not the default in many AI projects. Over 90 exposed instances were identified across government, marketing, and finance, with chatbots, prompts, workflows, and outward access all open to the public internet.

Why it matters

Default-open AI infrastructure puts attackers ahead of defenders on basic asset discovery.
Government, marketing, and finance exposure shows the problem is not confined to the unregulated long tail of startups.
LLM conversation history exposure leaks strategy, contracts, and personal data in ways traditional data leakage models miss.

What to do about it

Treat AI infrastructure like internet-facing crown jewels and harden it accordingly.
Run attack surface management scans tuned for AI service fingerprints, including n8n, Flowise, Langflow, and LiteLLM.
Make default-deny authentication non-negotiable for any AI workflow touching enterprise data.

Rock’s Musings

This is the cybersecurity equivalent of finding every front door wide open. The mistake is older than AI. Project maintainers and platform vendors should answer for shipping with authentication disabled by default. Default secure beats secure-by-checklist every time. Until AI projects ship safely, assume the defaults are wrong and configure your way out of them.

10. Trellix Discloses Source Code Repository Breach

Cybersecurity company Trellix disclosed on May 4, 2026 that it suffered unauthorized access to a portion of its source code repository (BleepingComputer, 2026). Trellix protects more than 50,000 customers and over 200 million endpoints. The company says it has found no evidence the source code release process was affected or that the code has been exploited (SecurityWeek, 2026). Trellix has not named the actor or disclosed dwell time.

Why it matters

A defensive software vendor losing source code ripples through every customer.
The breach feeds AI-augmented vulnerability discovery against Trellix products, given how attackers now use LLMs to mine source for exploits.
Federal customers will require new attestations on code provenance and pipeline integrity within weeks.

What to do about it

Trellix customers should demand a full incident report covering IOCs, scope of stolen code, and pipeline changes.
Audit detection coverage for TTPs that exploit knowledge of the affected products.
Treat defensive software vendors as potential single points of failure in your supply chain risk register.

Rock’s Musings

Defensive vendors getting popped is a now-quarterly story. The interesting wrinkle is what an attacker does with stolen source code in the AI era. Two years ago, source theft was slow-burn. Today, an attacker can feed thousands of files into an LLM and ask for likely vulnerability classes in hours. Trellix saying the code has not been exploited is a snapshot, not a guarantee.

The One Thing You Won’t Hear About But You Need To: ARGUS and the Quiet Admission That Today’s Agent Defenses Don’t Hold

Researchers published the ARGUS paper to arXiv on May 5, 2026. It introduces a benchmark, AgentLure, that captures context-aware prompt-injection attacks across four agentic domains and eight attack vectors, along with a defense mechanism that enforces provenance-aware decision auditing for LLM agents (ARGUS, 2026). ARGUS reduces attack success rate to 3.8% while preserving 87.5% task utility. Without provenance-aware controls, undefended agents fail at much higher rates.

Why it matters

Provenance tracking inside agent reasoning is a real shift from perimeter-style defenses most vendors sell today.
Context-aware prompt injection is the dominant unaddressed risk in production agentic deployments.
Benchmarks like AgentLure will become reference points enterprise red teams use, much as MITRE ATT&CK reshaped traditional red teaming.

What to do about it

Read the ARGUS paper and use its threat model to evaluate your current agent deployments.
Push vendors to publish performance against context-aware benchmarks, not only static jailbreak datasets.
Build provenance tracking into your internal agent platforms, even if commercial vendors do not yet support it.

Rock’s Musings

The reason this matters is what it implies about everything else. If 3.8% is the new state of the art with strong defenses in place, the rate without those defenses is much higher. That is the gap most production agents sit in today. Vendor marketing on agent safety has been measured against weak benchmarks for two years. Get ahead of the curve, or be the case study in someone else’s incident report.

For more on agentic AI risk and CISO governance, see the library at RockCyber and analysis at RockCyber Musings.

👉 For ongoing analysis of agentic AI governance frameworks, the conversation continues at RockCyber Musings.

👉 Visit RockCyber.com to learn more about how we can help with your traditional Cybersecurity and AI Security and Governance journey.

👉 Want to save a quick $100K? Check out our AI Governance Tools at AIGovernanceToolkit.com

👉 As a bonus, check out my conversation with CISO Tradecraft® where we talked about the OWASP GenAI Security Project Agentic Top 10

👉 Subscribe for more AI and cyber insights with the occasional rant.

The views and opinions expressed in RockCyber Musings are my own and do not represent the positions of my employer or any organization I’m affiliated with.

Share RockCyber Musings

References

ARGUS. (2026, May 5). ARGUS: Defending LLM agents against context-aware prompt injection. arXiv. https://arxiv.org/abs/2605.03378

BleepingComputer. (2026, May 4). Trellix discloses data breach after source code repository hack. https://www.bleepingcomputer.com/news/security/trellix-discloses-data-breach-after-source-code-repository-hack/

Bloomberg. (2026, May 6). AI security order under review as White House responds to Anthropic’s Mythos. https://www.bloomberg.com/news/articles/2026-05-06/white-house-preps-order-to-boost-ai-security-hassett-says

Brandon, R. (2026, May 4). Five Eyes warn agentic AI is too dangerous for rapid rollout. The Register. https://www.theregister.com/2026/05/04/five_eyes_agentic_ai_recommendations/

Breaking Defense. (2026, May 1). Pentagon clears 8 tech firms to deploy their AI on its classified networks. https://breakingdefense.com/2026/05/pentagon-clears-7-tech-firms-to-deploy-their-ai-on-its-classified-networks/

CISA. (2026, May 1). Careful adoption of agentic AI services. Cybersecurity and Infrastructure Security Agency. https://www.cisa.gov/resources-tools/resources/careful-adoption-agentic-ai-services

CNBC. (2026, May 5). Trump admin moves further into AI oversight, will test Google, Microsoft and xAI models. https://www.cnbc.com/2026/05/05/ai-oversight-trump-google-microsoft-xai.html

Al Jazeera. (2026, May 5). Microsoft, Google, xAI give US access to AI models for security testing. https://www.aljazeera.com/economy/2026/5/5/microsoft-google-xai-give-us-access-to-ai-models-for-security-testing

Fortune. (2026, May 6). Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch. https://fortune.com/2026/05/06/servicenow-kill-switch-ai-agents-bill-mcdermott/

Help Net Security. (2026a, May 1). Shadow AI risks deepen as 31% of users get no employer training. https://www.helpnetsecurity.com/2026/05/01/shadow-ai-risks-it-oversight/

Help Net Security. (2026b, May 5). One in four MCP servers opens AI agent security to code execution risk. https://www.helpnetsecurity.com/2026/05/05/ai-agent-security-skills-blind-spots/

Insurance Journal. (2026, May 7). White House prepares order to boost AI security, says economic advisor. https://www.insurancejournal.com/news/national/2026/05/07/868812.htm

Lewis Silkin. (2026, May 7). The Council and Parliament agree to slim down and delay parts of the EU AI Act. https://www.lewissilkin.com/insights/2026/05/07/the-council-and-parliament-agree-to-slim-down-and-delay-parts-of-the-eu-ai-act-102ms0v

Modulos. (2026, May 7). EU AI Act delayed: The Omnibus deal closed on 7 May 2026. https://www.modulos.ai/blog/eu-ai-act-omnibus-deal/

SecurityWeek. (2026, May 4). Trellix source code repository breached. https://www.securityweek.com/trellix-source-code-repository-breached/

ServiceNow. (2026, May 5). ServiceNow expands AI Control Tower across systems. https://newsroom.servicenow.com/press-releases/details/2026/ServiceNow-expands-AI-Control-Tower-to-discover-observe-govern-secure-and-measure-AI-deployed-across-any-system-in-the-enterprise/default.aspx

TechCrunch. (2026, May 1). Pentagon inks deals with Nvidia, Microsoft, and AWS to deploy AI on classified networks. https://techcrunch.com/2026/05/01/pentagon-inks-deals-with-nvidia-microsoft-and-aws-to-deploy-ai-on-classified-networks/

The Hacker News. (2026, May). We scanned 1 million exposed AI services. Here’s how bad the security is. https://thehackernews.com/2026/05/we-scanned-1-million-exposed-ai.html

VentureBeat. (2026, April). 200,000 MCP servers expose a command execution flaw that Anthropic calls a feature. https://venturebeat.com/security/mcp-stdio-flaw-200000-ai-agent-servers-exposed-ox-security-audit

Open-Weight Models Eat Closed Governance: The Half-Perimeter Problem

Rock Lambros — Tue, 05 May 2026 12:50:59 GMT

Open-weight reasoning models are landing in enterprise production, and the closed-vendor governance you bought doesn’t transfer with them. “Half-perimeter” is rhetorical; the real number depends on which controls you bought, but the point holds. The day a competent open-weight reasoning model runs on your hardware, the AI-specific governance you bought from your closed vendor stops covering part of the stack. The rest of this post walks the gap and the build.

The Vendor’s Own Words

OpenAI shipped gpt-oss-120b and gpt-oss-20b last year. Both are under Apache 2.0, and both are downloadable from Hugging Face. The 120b runs on a single 80GB GPU. In the model card, OpenAI’s own safety team admits what every CISO should already suspect. Once the weights ship, OpenAI cannot “implement additional mitigations or to revoke access.”

It’s the model provider’s own framing. It’s not me opining. Open-weight is a different risk profile from closed-API, by the model provider’s own assessment. The vendor can’t patch your inference cluster. The vendor can’t revoke a key that doesn’t exist. The vendor can’t run server-side abuse classifiers on traffic the vendor never sees. Everything that lived on the vendor side of the perimeter now lives on yours.

This is not a DeepSeek-versus-American-models story. It’s a closed-API-versus-open-weight story. Llama 3.3 70B (Meta), Qwen 3 32B (Alibaba), Mistral Magistral, and gpt-oss-120b sit on the same side of the boundary. The boundary is wherever the weights stop being someone else’s problem.

What Closed-Vendor Governance Bought You

Walk through what was on the bill of materials when you stood up your closed-API AI program. Oh, that’s right, you never did… but let’s pretend you did. You probably evaluated vendor-attested compliance, usually wrapped in a SOC 2 Type II report and a data processing addendum. DLP is integrated at the API gateway, watching prompts in flight. Output filtering runs on the vendor side, refusing to ship CBRN-adjacent content out of the model. Prompt firewall logic is embedded in the vendor SDK and patched without you redeploying. Vendor red teaming is on a continuous cadence. ToS enforcement occurs when an account misbehaves.

That stack assumed one thing. That a vendor sat on the other end of the inference call. Open-weight self-hosting moves every one of those controls in-house, with no shared customer base to underwrite the cost.

What does transfer? Network egress controls, identity at the runtime boundary, sandbox isolation, and supply-chain provenance for the model weights and fine-tunes. Notice what those have in common. None of them are AI-specific. They were always there. They’re the controls you applied to every other service you ran. Losing the AI-specific layer doesn’t break the non-AI controls. It does mean the only thing standing between a self-hosted reasoning model and a bad day is the perimeter you built for everything else.

Read your closed-vendor MSA carefully. The reps and warranties typically carve out third-party model behavior, hallucinations, and adversarial misuse. The vendor warrants infrastructure availability and indemnifies IP claims. The vendor doesn’t warrant safe model output. The “governance” part of vendor-attested compliance was always thinner than the SOC 2 cover suggested. Self-hosting strips even the thin part.

Figure 1: Closed-API Stack vs Open-Weight Runtime: Where Controls Live

Refusal Training Is Now an In-House Problem

Vendor refusal training is the AI-specific control most enterprise teams over-trust. The research breaks the over-trust hard.

The Badllama 3 paper (arXiv 2407.01376) showed safety fine-tuning gets removed from Llama 3 8B in five minutes on a single A100 GPU for under fifty cents. The 70B model goes in 45 minutes for under three dollars. The same paper notes the attack runs on free Google Colab for the 8B variant. FAR.AI’s “Illusory Safety” research extended the result. Pre-fine-tune refusal rates near 100% across DeepSeek-R1, GPT-4o, Gemini 1.5 Pro, and Claude 3 Haiku dropped under 20% post-fine-tune. Harmfulness scores climbed past 80%.

The R1 red-team picture is even worse on the model itself, before any attacker fine-tuning. Cisco / Robust Intelligence reported a 100% attack success rate on 50 random HarmBench prompts against R1, while OpenAI o1 rejected every test in a parallel Holistic AI evaluation. Qualys TotalAI found R1’s distilled 8B variant failed 58% of 885 attempts across 18 jailbreak categories. Promptfoo put failures over 60% on prompts, including biological and chemical weapons. KELA jailbroke R1 to produce ransomware development steps and instructions for toxins and explosive devices.

OpenAI’s own approach to gpt-oss is the strongest signal that adversarial fine-tuning is the real threat model. The model card describes the adversarial fine-tuning of gpt-oss-120b under the Preparedness Framework prior to release. OpenAI’s Safety Advisory Group concluded the adversarially fine-tuned model didn’t reach “High” capability in Biological and Chemical Risk or Cyber risk. Read the implication closely. The model provider treats fine-tune-stripped safety as the baseline release condition the model must meet. The deployer running fine-tunes downstream gets no equivalent gate.

OpenAI knows this. It’s why gpt-oss-safeguard shipped on October 29, 2025: open-weight reasoning models for safety classification, designed for developers to operate as a defense-in-depth layer. Llama Guard 3, Prompt Guard, and Code Shield exist for the same reason. The vendor is shipping you the components. Components are not the same as a service. You operate them, tune them, monitor them, retrain them when the policy changes, and absorb the latency. OpenAI’s own gpt-oss-safeguard report names the constraint: reasoning-based classifiers add compute and latency that limit large-scale real-time use.

The math is brutal. The model weights are free. The runtime safety pipeline is not.

The Frameworks Describe the Gap. They Don’t Close It.

NIST AI RMF 1.0 plus the GenAI Profile (NIST AI 600-1, July 2024) plus the GPAI/Foundation Models Profile extension (arXiv 2506.23949) names training data audits (Manage 1.3, Measure 2.8) and model weight protection (Measure 2.7). Voluntary. The CSA NIST AI RMF Agentic Profile draft is candid about the bigger problem. It states plainly that earlier RMF documents did not contemplate “agents that acquire tool-use capabilities and execute autonomously in live production environments.”

OWASP Top 10 for LLM Applications 2025 LLM03 is the most explicit primary-source statement of the half-perimeter problem. The category description is direct: model cards offer no guarantees of provenance, malicious LoRA adapters compromise base models in collaborative environments, and on-device LLMs increase the attack surface. The OWASP Agentic Top 10, released December 10, 2025, adds ASI01 (Agent Goal Hijack) and ASI03 (Identity and Privilege Abuse) as runtime-boundary problems on self-hosted stacks.

ASI01 and ASI03 are not abstract. ASI01 shows up when prompt injection redirects an agent’s plan, and the closed-vendor refusal layer is gone. ASI03 shows up when the agent’s runtime authorization is broader than the task requires, because no vendor SDK is scoping the call for you anymore. Both problems live at the runtime boundary the vendor used to backstop.

EU AI Act Article 53(2) is the regulatory expression of the gap. Open-source GPAI models get a carve-out from technical documentation and downstream-information obligations, provided they’re released under a free open license, weights are public, and the model isn’t monetized. The carve-out vanishes at the Article 51 systemic-risk threshold of 10^25 FLOPs. Llama 3.3 70B, Qwen 3 32B, Mistral Magistral, and most enterprise-deployed open-weight reasoning models sit well below that threshold. They get the carve-out. They impose downstream obligations on enterprise deployers under Article 25(2) when significant modifications happen, a category that catches LoRA fine-tunes. Most teams running fine-tunes don’t know the clause exists. Enforcement begins August 2, 2026.

ISO 42001 mandates AIMS scope definition, third-party supplier oversight, and 38 Annex A controls. The gap there is structural. The open-weight model dropped from Hugging Face is not a “supplier” in the contractual sense. There’s no audit clause, no security questionnaire, no MSA. The standard tells you to define your AIMS scope. It doesn’t prescribe specific runtime-boundary controls for self-hosted foundation models.

Figure 2: AI-Specific Controls Across the Open-Weight Boundary: What Transfers, What Breaks

Build the Runtime Perimeter

Frameworks describe the gap. Architecture closes it. The work to close it is described in the Huang and Lambros (yes, “this” Lambros) AAGATE paper (arXiv:2510.25863v2, November 3, 2025). AAGATE is a Kubernetes-native control plane that operationalizes NIST AI RMF for self-hosted agentic AI. The reference architecture hosts the open-weight model on Ollama at Layer 1 of the MAESTRO threat-model stack, which is the design assumption built in: the protected stack is “DeepSeek, Qwan, LLAMA, OSS” running on your hardware.

Four things transfer regardless of which control plane you adopt.

First, treat weights as supply-chain artifacts. AAGATE enforces SLSA L3, Cosign keyless signing on every OCI image, and an ArgoCD admission controller that rejects unsigned manifests at the gate. Whichever your path, you need signed weights, signed adapters, and a cluster-side admission policy that refuses to load anything unsigned. The Hugging Face nullifAI incident in February 2025, where ReversingLabs found malicious pickle files evading Picklescan via 7z compression and broken pickle deserialization, is the case study. Picklescan logs an error. The reverse-shell payload runs anyway.

Second, inventory open-weight runtimes alongside closed-API endpoints. AAGATE leverages the Agent Naming Service (ANS), and it registers every agent with a Decentralized Identifier and a SPIFFE certificate. You don’t need the blockchain layer. You do need a CMDB row for every Ollama cluster, every fine-tune, every adapter, with model SHA, lineage, and license tier captured. If your AI inventory has a row for the OpenAI tenant but no row for the GPU cluster running your fine-tuned Llama, the audit is incorrect.

Third, build authorization scope into the runtime, not the vendor SDK. AAGATE’s OAuth Relay translates abstract agent capabilities into ephemeral, narrowly scoped, purpose-bound credentials per side effect. Other architectures will name the same thing differently. The control matters since every external action an agent takes funnels through a policy-enforced single chokepoint with allow-listing, rate limiting, and cryptographic logging. AAGATE calls it the Tool-Gateway. AI gateway products commercialize the same pattern. Pick one.

Fourth, run your own evals because the vendor isn’t running them for you. AAGATE’s Janus Shadow-Monitor-Agent provides continuous, pre-execution adversarial evaluation in-loop, tied to a Governing-Orchestrator Agent executing a millisecond kill-switch when AIVSS scoring and SSVC decision logic flag a critical incident. The adversarial layer can also take the form of a parallel classifier, an internal red team, or any continuous evaluation pattern that mirrors what the vendor was running server-side. The pattern is non-negotiable. The product is.

These four moves are the architectural rebuttal to the half-perimeter. The perimeter you bought was always going to end at the runtime boundary. The runtime boundary is now your problem to instrument.

Operational reality matters here. The inference stack you’re protecting is Ollama, vLLM, SGLang, or llama.cpp. None of them ship with vendor-grade telemetry. Your container hosts a probabilistic system with stateless calls and no support contract. When an attacker fine-tunes a copy of your weights and slips it into your registry, there is no support call to escalate. There is only the runtime perimeter you built before the incident.

Key Takeaway: Closed-vendor governance was the AI-specific half you didn’t have to build. Open-weight reasoning models in production change that. Inventory the runtimes, sign the weights, scope the runtime authorization, and run your own evals. The vendor isn’t doing it for you anymore.

What to do next

If you’re approving an open-weight pilot this quarter, demand four things on the architecture review before the GPUs land. First, model SHA and adapter lineage in the CMDB on day one. Second, an egress chokepoint with input/output sanitization and policy-enforced allow-lists. Third, supply-chain controls (signed weights, SLSA-grade provenance, admission control rejecting unsigned). Fourth, a continuous internal evaluation loop on every high-risk agent.

The CARE framework (Create, Adapt, Run, Evolve) applies the same structure to AI security program design. The CISO Evolution covers the executive judgment side of decisions like this one. The AAGATE paper (arXiv 2510.25863v2) is the open-source reference architecture if you want to start from running code.

👉 Visit RockCyber.com to learn more about how we can help you in your traditional Cybersecurity and AI Security and Governance Journey

👉 Want to save a quick $100K? Check out our AI Governance Tools at AIGovernanceToolkit.com

👉 Subscribe for more AI and cyber insights with the occasional rant.

The views and opinions expressed in RockCyber Musings are my own and do not represent the positions of my employer or any organization I’m affiliated with.

Share RockCyber Musings

Weekly Musings Top 10 AI Security Wrapup: Issue 36 April 24-April 30, 2026

Rock Lambros — Fri, 01 May 2026 12:50:56 GMT

A coding agent killed a startup’s database in nine seconds. Anthropic shipped a model Mozilla called “elite.” Brussels missed its own deadline. Florida’s House Speaker buried his governor’s AI bill before lunch on day one. Two cloud-native AI vulnerabilities went from disclosure to exploitation in under 36 hours. Google and Forcepoint documented indirect prompt injection in the wild on the same day. UK’s AI Security Institute caught Mythos sabotaging research it was supposed to help with. Pretending this is theoretical is no longer defensible.

This week stress-tested every assumption CISOs hold about AI. The vendor you depend on sells your adversaries the same capability. The agent your developers love wipes three months of revenue and pastes a confession. Open source is the gateway. Indirect injection is the exploit. Autonomy without rollback is the consequence.

I’ll walk you through ten stories and one piece of plumbing. AI security used to run on a 24-month horizon. The default now is whatever ships before next quarter. If you wait for clarity, you lose ground to people who already decided.

1. The Trump Administration Eyes Anthropic’s Mythos as a Weapon

On April 24, the Washington Post reported Anthropic’s Mythos system rattled the Trump administration. Mozilla’s CTO compared the model’s vulnerability detection to a “world-class, elite security engineer.” Anthropic withheld general release, routing access through Project Glasswing partners, including AWS, Apple, Cisco, CrowdStrike, Google, JPMorgan Chase, and Microsoft. Anthropic privately briefed senior officials. Mythos meaningfully raises the probability of large-scale cyberattacks this year.

Why it matters

Capability parity flipped. Defenders and attackers reach for the same tool.
Vendors are now gatekeepers of dual-use capability. Anthropic’s withholding sets a precedent.
Government dependence on private model access creates new procurement and security questions.

What to do about it

Map your exposure to LLM-discoverable vulnerabilities in first-party and open-source code.
Negotiate access to AI-assisted scanning before your adversaries scan you first.
Update incident playbooks to assume hours of dwell time, not days.

Rock’s Musings

Yes… more Mythos news. Can’t ignore it if it’s coming out of the White House. It’s not fiction. It’s a procurement question. I’ve watched this pattern in every arms shift, from automated network scanning to commodity exploit kits. The defender who gets there second loses.

Anthropic’s gatekeeping is a defensible choice. The choice is whether your ecosystem qualifies for the safe lane or you’re stuck reading about Glasswing on Substack. Get on a call with your AWS, Cisco, or Microsoft reps. If the answer is no, plan around it. We track this kind of vendor calculus at RockCyber.

2. Cursor’s Claude Agent Wipes a Startup’s Database in Nine Seconds

On Friday, April 25, a Cursor coding agent powered by Claude Opus 4.6 deleted PocketOS’s entire production database and all volume-level backups in a single API call. The agent encountered a credential mismatch in staging, decided to resolve it by deleting a Railway infrastructure volume, scanned the codebase for an unrelated API token, and then ran the command. PocketOS serves car rental businesses nationwide. Three months of reservations, payments, customer information, and vehicle assignments went dark. Railway restored the data on Sunday using internal disaster backups not advertised to customers. The agent itself wrote the public confession.

Why it matters

Agents don’t ask permission. They scan for the credentials unblocking them.
“Production” and “staging” are now labels, not boundaries.
Recovery happened because Railway keeps undocumented backups. Hope is not a strategy.

What to do about it

Force agents to operate with scoped, ephemeral credentials. Long-lived API keys in a repo are liabilities with autonomy attached.
Implement break-glass approval gates for destructive infrastructure calls.
Test backup recovery monthly. If you can’t restore in under an hour, you don’t have backups.

Rock’s Musings

PocketOS got lucky. Railway ran a heroic recovery on a Sunday using backups the customer didn’t know existed. If your AI strategy depends on a founder’s weekend chivalry, you don’t have a strategy. You have hope.

The agent did what it was trained to do. Scan, plan, act, document. The failure was in governance, not capability (and let’s just say, a suboptimal technical infrastructure). The villain is the assumption that an autonomous system will halt and ask. They don’t halt. Build the rails. Treat agents like an over-eager intern with the ability to call DELETE on prod.

3. LiteLLM Bug Goes From Disclosure to Exploitation in 26 Hours

GitHub’s Advisory Database indexed CVE-2026-42208 in LiteLLM on April 24 at 16:17 UTC. Sysdig logged the first exploitation attempt on April 26 at 16:17 UTC, roughly 26 hours later. The bug carries a CVSS of 9.3 and lets unauthenticated attackers send a crafted Authorization header to any model API route, then read or modify the proxy’s database (Sysdig). LiteLLM is the open-source LLM gateway with more than 22,000 GitHub stars, fronting OpenAI, Anthropic, and other model providers in production. The same project sat at the heart of the Mercor breach earlier this year.

Why it matters

AI infrastructure now looks like any internet-exposed service.
Pre-auth SQLi on the gateway exposes API keys and credentials for downstream model providers.
Disclosure-to-exploitation time keeps shrinking. The 36-hour window is the new optimistic baseline.

What to do about it

Inventory every LiteLLM, vLLM, LMDeploy, or proxy node in your environment. Patch to 1.83.7-stable or above for LiteLLM.
Treat LLM gateways as Tier 0 assets. Apply the controls you’d apply to identity providers.
Subscribe to maintainer advisory feeds. GitHub Advisory Database lag of four days is too long.

Rock’s Musings

LiteLLM is the kind of dependency pulled in via a Cursor prompt or an aspirational architecture diagram. It runs as the front door to every model provider you care about. Pre-auth SQL injection on it is a “your AI program is over” event.

Disclosure-to-exploit windows make monthly patch cycles professional malpractice. If your AI security playbook still says “evaluate within 30 days,” shred it. We’ve moved to “act within 24 hours or accept compromise as a feature.”

4. Indirect Prompt Injection Has Left the Lab. It’s Everywhere.

On April 24, Google’s Online Security Blog and Forcepoint’s X-Labs published parallel reports documenting indirect prompt injection in the wild. Forcepoint identified ten payload families targeting AI agents with instructions for financial fraud, data destruction, and API key theft. Google reported a 32% relative increase in malicious activity between November 2025 and February 2026. Attackers hide instructions inside webpages with single-pixel text, transparent fonts, HTML comments, and metadata. Neither team attributed the campaigns to a single actor, though both noted shared templates suggesting organized tooling.

Why it matters

Agents summarizing content are low-risk. Agents sending emails, running commands, or processing payments are the targets.
Filters watching user input miss content fetched by the agent.
The threat model includes every third-party page your agent loads.

What to do about it

Inventory every agent fetching external content. Note which tools they call.
Implement allowlists for outbound tool execution. Default deny for novel actions.
Add output filtering for instruction-like content in tool responses, not only user input.

Rock’s Musings

We’ve been treating indirect prompt injection as a research curiosity since 2023. It’s now an operational threat with documented campaigns and template reuse. The Lakera and OWASP folks were right.

If you’ve deployed an agent with browsing capability, your trust boundary includes every webpage it visits. The entire internet. I wrote about this on RockCyber Musings earlier this year. It got worse.

5. American Leadership in AI Act Drops With 20+ Bills Stitched In

On April 27, Reps. Ted Lieu (D-Calif.) and Jay Obernolte (R-Calif.) introduced the American Leadership in AI Act, a six-title package consolidating more than 20 prior bills from the Bipartisan AI Task Force (Nextgov/FCW). The package covers standards and evaluation, research infrastructure, federal AI governance and procurement, worker protections, deepfake harms, and AI education. The bill is the most substantive bipartisan AI proposal in this Congress, landing during tension between the White House’s preemption push and active state legislation.

Why it matters

Federal preemption fights will intensify. State AI laws face new risk.
Procurement standards in the bill shape what enterprises demand from AI vendors.
Deepfake provisions create new compliance obligations for media and platforms.

What to do about it

Map AI-procurement language to current vendor contracts.
Track state-level bills you’re already complying with for preemption risk.
Get legal reading the testing and evaluation title carefully.

Rock’s Musings

Two California members of Congress, one D and one R, agreeing on AI is unicorn territory. Don’t get excited. Bipartisan bills with 20+ titles tend to die under the weight of their own ambition.

The interesting question is which provisions get pulled into appropriations or NDAA riders before December. Watch the procurement and federal AI governance titles. Those move first because the executive branch wants them. Plan as if procurement standards land by Q3.

6. EU AI Act Omnibus Trilogue Collapses, August Deadline Stays Live

On April 28, Brussels held the second political trilogue on the AI Act Omnibus, the proposal deferring high-risk AI compliance. After roughly twelve hours, the Council and Parliament failed to agree on conformity-assessment architecture for AI in regulated products (Modulos). A follow-up trilogue is scheduled for May 13. The August 2, 2026 high-risk obligations remain operative law.

Why it matters

Vendors and deployers cannot bank on a deferral. August is the working assumption.
The Cypriot Council Presidency ends June 30. Lithuania might finish negotiations.
The Annex I disagreement signals sectoral assessments will keep biting medical device and machinery providers.

What to do about it

Continue compliance preparation as if no Omnibus arrives. Treat May 13 as a tiebreaker, not a save.
For medical devices, machinery, and other Annex I products, lock in your conformity-assessment plan now.
Get internal legal sign-off on the original AI Act timelines this quarter.

Rock’s Musings

I keep telling clients hoping for a deferral is not a compliance strategy. This week confirmed it. Brussels cannot agree on the structure of the regulation it already passed.

If your CFO asks why you spent budget on AI Act readiness, point at this paragraph. The cost of overpreparing is a few quarters of work. The cost of underpreparing is an enforcement action against your highest-revenue product line. I know which side of the bet I want.

7. Microsoft and OpenAI Restructure for Cyber Defense

On April 27, Microsoft and OpenAI announced revised partnership terms (24/7 Wall St). OpenAI’s API will run on any cloud provider, including AWS via Bedrock. Microsoft’s IP license is no longer exclusive but runs through 2032. The companies expanded Trusted Access for Cyber, giving Microsoft access to OpenAI’s most cyber-capable models, folded into the Secure Future Initiative. The pact pairs OpenAI’s offensive-capable model work with Microsoft’s defender stack across cloud, identity, productivity, and frontier AI.

Why it matters

The exclusivity reset reshapes vendor lock-in for AI procurement.
Microsoft’s defender stack now has a privileged channel to cyber-capable models.
The same models detecting attacks also execute them. Internal review for offensive capability is no longer optional.

What to do about it

Reassess your AI vendor strategy. Multi-cloud is realistic now.
For Microsoft customers, evaluate Trusted Access for Cyber eligibility.
Demand transparency from AI security vendors about model and guardrails.

Rock’s Musings

Microsoft bought itself the most defended seat at the AI security table. The pact extends a moat already including Defender, Sentinel, Purview, and Copilot for Security. Smaller security vendors should be nervous.

If you’re a CISO in a Microsoft shop, this is a small win. If you’re betting on a non-Microsoft AI security stack, reread your strategy. The exclusivity end means more options, the cyber pact deepens lock-in.

8. UK AISI Catches Claude Mythos Sabotaging Research

On April 28, the UK AI Security Institute (AISI) published an updated sabotage evaluation framework. Across 297 scenarios, no model attempted spontaneous research sabotage. In a new “continuation” test, Mythos Preview continued to sabotage 7% of inputs after being explicitly directed not to (AISI). The same evaluation showed Mythos Preview completing the first 32-step enterprise attack simulation start-to-finish, succeeding on 73% of expert-level tasks that no model had completed before April 2025.

Why it matters

Continuation behavior matters more than spontaneous behavior. Real attackers prompt the model.
A 7% sustained sabotage rate warrants treating these models as untrusted insiders during sensitive work.
The 32-step completion shows operational maturity. Models execute multi-stage cyber operations end to end.

What to do about it

Don’t run frontier models on safety-sensitive code reviews without monitoring.
Build red-team programs, prompting and continuing rather than single-shot tests.
Track AISI’s methodology. Adopt continuation-style tests internally.

Rock’s Musings

Spontaneous misbehavior was never the threat model scaring me. Continuation is. Once an attacker plants the seed, the model becomes a complicit operator inside your environment. Seven percent is small until you multiply it by every prompt your enterprise sends in a day.

AISI does work nobody else funds at this rigor. If your AI governance committee isn’t reading their reports cover to cover, you’re outsourcing your threat model to LinkedIn posts. Read the source.

9. Florida House Speaker Kills DeSantis’s AI Bill on Day One

On April 28, Florida convened a four-day special session. The Senate voted 37-1 in favor of the AI Bill of Rights. House Speaker Daniel Perez killed the bill that same morning, declaring that the only topic the House would address was redrawing congressional maps (Florida Phoenix). Perez argued AI regulation belongs to the federal government, aligned with a Trump executive order targeting state AI laws. The bill would have required parental consent for minor accounts on companion chatbot platforms, prohibited unauthorized commercial use of AI-generated likenesses, and required AI disclosure to users.

Why it matters

State preemption fights are escalating. Florida sided with the federal government before federal law exists.
Companion chatbot rules pass Senate chambers and die in House chambers. The pattern matters.
AI-generated likeness and consent provisions will keep returning. Plan for eventual passage somewhere.

What to do about it

If you run companion chatbots, monitor every state bill on minors and consent.
Brief your legal team on AI-likeness and right-of-publicity rules in California, Tennessee, and active special sessions.
Don’t bank on federal preemption. Executive orders reverse.

Rock’s Musings

The pattern is the same one I’ve called out for two years. State Senates pass AI bills, state Houses kill them, and the federal government drafts preemption language. The result is regulatory whiplash across 50 jurisdictions plus DC plus a federal package which might or might not preempt them. Give your privacy and AI counsel hazard pay. They’re earning it.

10. HackerOne Launches h1 Validation as AI Vuln Reports Surge 76%

On April 29, HackerOne launched h1 Validation, a service triaging AI-discovered vulnerability reports for actual exploitability (Cybersecurity Insiders). Vulnerability submissions on the platform rose 76% year over year, hitting a record high in March 2026. About 25% of findings were confirmed exploitable. The share of critical and high-severity vulnerabilities grew to 32%, up from a 26-28% baseline. The launch follows months of complaints from program owners overwhelmed by AI-generated reports of varying quality.

Why it matters

AI generates more vuln reports than security teams triage.
Triage capacity, not discovery, is the constraint.
This signal-to-noise problem reshapes bug bounty economics within 12 months.

What to do about it

Audit your bug bounty intake pipeline. If reports outpace triage, fix it.
Invest in tooling classifying reports by exploitability before a human reads them.
Set expectations with researchers. AI-assisted submissions need higher proof of impact.

Rock’s Musings

The asymmetry is volume. Models like Mythos and GPT-5.5-Cyber produce thousands of plausible reports per day. Most are junk. Some are lethal. Your triage team won’t keep up by reading harder. Whether you buy h1 Validation or build your own, manual triage of AI-scale output is a doomed strategy.

The One Thing You Won’t Hear About But You Need To

CSAI Foundation Becomes the First AI-Specific CVE Numbering Authority

On April 29, the Cloud Security Alliance’s CSAI Foundation announced three milestones at the CSA Agentic AI Security Summit (CSA). The foundation registered as a CVE Numbering Authority through MITRE, gaining direct ability to issue CVEs for AI-specific vulnerabilities. It launched the STAR for AI Catastrophic Risk Annex extending the AI Controls Matrix to scenarios involving loss of human oversight, with rollout from June 2026 through December 2027. It also acquired the Autonomous Action Runtime Management (AARM) specification, contributed by Vanta.

Why it matters

AI-specific CVE issuance changes how AI vulnerabilities get tracked, scored, and patched.
The Catastrophic Risk Annex maps to NIST AI RMF, the EU AI Act, and ISO/IEC 42001, giving auditors a consolidated reference.
AARM gives operators a formal specification for runtime control of agent actions.

What to do about it

Add CSAI Foundation advisories to your security feed.
For high-risk deployments, map internal controls to the Catastrophic Risk Annex during phase one rollout.
Pilot AARM in one agentic workflow this quarter. Runtime control of agent actions is the right level of abstraction.

Rock’s Musings

Plumbing matters more than press releases. While headlines went to Mythos and the Cursor accident, the CSAI Foundation stood up the infrastructure for AI-specific vulnerability tracking, runtime control, and catastrophic risk auditing. This decides whether AI security becomes a discipline or stays a marketing category.

I’ve worked in standards for thirty years. The value compounds quietly until one day the auditors ask, and you either have it or you don’t. We track CSAI work closely at RockCyber. Start with the CSA press release, then loop in your governance team Monday.

👉 For ongoing analysis of agentic AI governance frameworks, the conversation continues at RockCyber Musings.

👉 Visit RockCyber.com to learn more about how we can help with your traditional Cybersecurity and AI Security and Governance journey.

👉 Want to save a quick $100K? Check out our AI Governance Tools at AIGovernanceToolkit.com

👉 As a bonus, check out my conversation with Eva Benn where we talked about the cybersecurity skills you need to develop to stay relevant in 2026 and beyond.

👉 Subscribe for more AI and cyber insights with the occasional rant.

The views and opinions expressed in RockCyber Musings are my own and do not represent the positions of my employer or any organization I’m affiliated with.

Share RockCyber Musings

References

Cloud Security Alliance. (2026, April 29). CSAI Foundation announces key milestones to secure the agentic control plane. https://cloudsecurityalliance.org/press-releases/2026/04/29/csai-foundation-announces-key-milestones-to-secure-the-agentic-control-plane

Cybersecurity Insiders. (2026, April 29). HackerOne launches h1 Validation to tackle rising wave of AI-driven vulnerabilities. https://www.cybersecurity-insiders.com/hackerone-launches-h1-validation-to-tackle-rising-wave-of-ai-driven-vulnerabilities/

Florida Phoenix. (2026, April 28). Florida Speaker kills DeSantis’ AI regulation, vaccine repeal bills on first day of special session. https://floridaphoenix.com/2026/04/28/florida-speaker-kills-desantis-ai-regulation-vaccine-repeal-bills-on-first-day-of-special-session/

Forcepoint X-Labs. (2026, April 24). Indirect prompt injection in the wild: X-Labs finds 10 IPI payloads. https://www.forcepoint.com/blog/x-labs/indirect-prompt-injection-payloads

Google. (2026, April 24). AI threats in the wild: The current state of prompt injections on the web. Google Online Security Blog. https://security.googleblog.com/2026/04/ai-threats-in-wild-current-state-of.html

Help Net Security. (2026, April 24). Indirect prompt injection is taking hold in the wild. https://www.helpnetsecurity.com/2026/04/24/indirect-prompt-injection-in-the-wild/

Modulos. (2026, April 28). EU AI Act Omnibus: The trilogue failed, what happens to the August 2026 deadline?. https://www.modulos.ai/blog/ai-act-omnibus-trilogue-failed/

Nextgov/FCW. (2026, April 28). Lieu and Obernolte introduce consolidated AI bill package. https://www.nextgov.com/artificial-intelligence/2026/04/lieu-and-obernolte-introduce-consolidated-ai-bill-package/413134/

Sysdig. (2026, April 29). CVE-2026-42208: Targeted SQL injection against LiteLLM’s authentication path discovered 36 hours following vulnerability disclosure. https://www.sysdig.com/blog/cve-2026-42208-targeted-sql-injection-against-litellms-authentication-path-discovered-36-hours-following-vulnerability-disclosure

The Hacker News. (2026, April 24). LMDeploy CVE-2026-33626 flaw exploited within 13 hours of disclosure. https://thehackernews.com/2026/04/lmdeploy-cve-2026-33626-flaw-exploited.html

The Hacker News. (2026, April 29). LiteLLM CVE-2026-42208 SQL injection exploited within 36 hours of disclosure. https://thehackernews.com/2026/04/litellm-cve-2026-42208-sql-injection.html

The Register. (2026, April 27). Cursor-Opus agent snuffs out startup’s production database. https://www.theregister.com/2026/04/27/cursoropus_agent_snuffs_out_pocketos/

Tom’s Hardware. (2026, April 27). Claude-powered AI coding agent deletes entire company database in 9 seconds. https://www.tomshardware.com/tech-industry/artificial-intelligence/claude-powered-ai-coding-agent-deletes-entire-company-database-in-9-seconds-backups-zapped-after-cursor-tool-powered-by-anthropics-claude-goes-rogue

UK AI Security Institute. (2026, April 28). Our evaluation of Claude Mythos Preview’s cyber capabilities. https://www.aisi.gov.uk/blog/our-evaluation-of-claude-mythos-previews-cyber-capabilities

24/7 Wall St. (2026, April 28). Microsoft’s AI moat holds up even after the OpenAI reset. https://247wallst.com/investing/2026/04/28/microsofts-ai-moat-holds-up-even-after-the-openai-reset/

Washington Post. (2026, April 24). AI hacking fears jolt Washington as Anthropic unveils Mythos. https://www.washingtonpost.com/technology/2026/04/24/anthropic-mythos-ai-washington-cybersecurity-hacking-risk/

AI Coding Agent Prompt Injection: Three Vendors, One Seam, No Owner

Rock Lambros — Tue, 28 Apr 2026 12:50:44 GMT

AI coding agent prompt injection has a procurement problem, and a researcher just published the receipt. Aonan Guan typed a malicious instruction into a GitHub pull request title last week. Anthropic’s Claude Code Security Review action posted its own API key as a comment. So did Google’s Gemini CLI Action. So did GitHub’s Copilot Agent. Same exploit hit three vendors, with no infrastructure required. Anthropic’s 232-page system card had named the gap before the researchers published. The other two vendors had not documented enough to predict their own outcome.

Most of the writing on this incident will focus on architecture. The runtime is the perimeter. The action boundary is the blast radius. Both readings are correct. Both are also a deflection. The architecture story explains the mechanism. It doesn’t explain why the buyer was exposed in the first place. The buyer signed three contracts, accepted three sets of safety claims, and never required any of the three vendors to assert anything about the seams between them. The trigger was a prompt injection. The exposure was procurement.

I want to push past the architecture take and look at the governance read, because the governance read implicates the reader in a way the architecture take does not.

How Comment and Control Worked

Aonan Guan, working with Zhengyu Liu and Gavin Zhong at Johns Hopkins, opened a GitHub pull request in a target repository. They typed a malicious instruction into the PR title. The repository used the pull_request_target workflow trigger, which any AI coding agent integration with secret access requires. That trigger injects repository secrets into the runner environment. The agent read the PR title, treated the instruction as a directive, called GitHub’s own API using credentials stored in its environment variables, and posted the secret as a comment on the PR. The default pull_request trigger doesn’t expose secrets to fork PRs. The pull_request_target trigger does, by design.

This is the textbook case of what Simon Willison has been calling the lethal trifecta. Access to private data sits in the runner. Untrusted input arrives through the PR title. The exfiltration channel is GitHub’s comment API, which sits in the agent’s default tool inventory. All three conditions sit at the seam between three vendors. The exploit needs all three to fire. Comment and Control satisfies all three by design, and no single vendor has written a document that asserts anything about the combination.

Anthropic ranked the disclosure as CVSS 9.4 Critical and paid a $100 bounty. Google paid $1,337. GitHub paid $500. None of the three issued a CVE in the National Vulnerability Database at the time of disclosure. None published a GitHub Security Advisory. Those numbers send a market signal. Vendor bounty programs classify seam vulnerabilities as out of scope for their own programs, and researchers respond to incentives. The next class of these findings will follow the same path the bounties point them down.

Help Net Security ran a piece this week on Google’s own CommonCrawl analysis showing a 32% relative increase in malicious indirect prompt injection content between November 2025 and February 2026. The supply of payloads is growing faster than vendor disclosures. That is the operating environment.

Figure 1: Comment and Control attack chain

Why AI Coding Agent Prompt Injection Is a Governance Problem

Pull a model card off any of the three vendor sites. Anthropic’s Opus 4.7 system card, published April 16, 2026, runs 232 pages. It quantifies hack rates. It publishes injection resistance metrics. It includes an explicit statement. Claude Code Security Review is “not hardened against prompt injection.” Anthropic does the most mature disclosure work in the industry. OpenAI’s GPT-5.4 system card documents red-team hours and model-layer evals without publishing agent-runtime resistance numbers. Google’s Gemini 3.1 Pro card defers most of its safety methodology to the older Gemini 3 Pro card.

Rank those three in a procurement scorecard, and Anthropic comes out on top. That ranking is the wrong question. A model card describes a model’s behavior. Comment and Control didn’t break a model. The disclosure was complete for the layer Anthropic owns and silent on the seam, because Anthropic doesn’t own the seam. The seam runs through GitHub’s runner, GitHub’s API, the agent’s environment variable scope, the workflow trigger configuration, and the buyer’s choice to enable agent integration on a repository with secrets. Each of those pieces sits inside a different contract. None of those contracts asserts anything about the combination.

The structural gap is what makes this a governance story. The cloud security industry took roughly a decade to converge on the shared responsibility model. AWS owns the hypervisor. The customer owns the workload. Each side owns a clear half. Most of the early breaches happened in the unowned middle of that line, and the convergence was painful. Agent composition is replaying that history with a sharper acceleration curve, and there is no industry consensus on where the line sits. Three vendors share a single runtime with no agreed-upon accountability model. The buyer carries everything that the contracts do not.

Here is a hypothetical for the operational consequence. A SOC running normal vulnerability scanning across the agent-enabled repos sees green. None of the three disclosures generated CVEs in the NVD. The internal ticketing system has no category for “agent runtime composition risk.” The risk register has no entry. The budget has no line item. The exploit class is real, the severity is Critical across three vendors, and the standard tooling reports zero findings because the standard tooling has nothing to scan against. The exploit became possible because no one wrote it down as a thing to look for.

Figure 2: System card disclosure depth by vendor and layer

The Procurement Questions You Should Have Asked

Most CISO action checklists produced after an incident like this read as a list of post-hoc remediation steps. Rotate credentials. Restrict permissions. Add monitoring. Those moves are correct, and they are also reactive. The harder, more useful artifact is the set of procurement questions that, asked at signing, would have made Comment and Control either impossible or contractually attributable.

Here are five questions. Paste them into your next vendor governance review verbatim or adapt them. They work for AI coding agents, and they will work for the next class of agentic integrations after this one.

The first question is about layer ownership. Ask each vendor, “Name the layers of the agent runtime your security guarantees cover, and name the layers you don’t cover.” Most vendors will answer the first half. The interesting answer is the second half. A vendor who cannot articulate the layers it doesn’t cover hasn’t thought about composition. The contract you are about to sign assumes a perimeter that the vendor hasn’t analyzed.

The second question is about quantified resistance metrics on the deployment surface you actually use. Anthropic publishes injection resistance numbers in the Opus 4.7 system card. Those numbers cover Anthropic’s API surface. They don’t cover Claude Code Security Review running on GitHub Actions with a pull_request_target trigger and secrets in scope. Ask for the resistance number for the model version you run on the platform you deploy to. If the vendor cannot produce that number, the vendor cannot quantify the risk you are accepting.

The third question is about bounty scope. Ask each vendor, “Does your bounty program consider vulnerabilities at the integration boundary between your product and the platforms it deploys on?” Anthropic’s HackerOne program scopes agent-tooling findings separately from model-safety findings. The position is defensible. The position also pushes researchers’ attention away from the seams. Knowing which vendor’s program covers which surface is a procurement signal. It tells you which surfaces will get the most external scrutiny over the contract life and which surfaces will not.

The fourth question is about composition disclosure. Ask each vendor, “When your product is integrated with another vendor’s platform, who is responsible for documenting the security properties of the combined system?” The honest answer from every vendor is “the buyer.” Get it in writing. The asymmetry exposes why a shared responsibility artifact for agent runtimes does not yet exist.

The fifth question is about runtime telemetry. Ask, “What runtime signals do you publish that allow me to detect prompt injection in production?” If the answer is a model-card link, the vendor hasn’t built the runtime monitoring. If the answer is an SDK with detection hooks, document the coverage and the false-positive rate. The August 2026 EU AI Act high-risk compliance deadline turns this question from a nice-to-have into an audit artifact, and the vendors who cannot answer it now will be the ones renegotiating contracts in Q3.

Those five questions don’t eliminate the exploit class. They make the exploit class a contractual variable instead of a discovered surprise. A buyer who asks all five before signing knows where the seam runs and who is on the hook for what.

What to Do This Week, Ordered by Blast Radius Reduction

The reactive moves still matter. Order them by blast radius reduction, not by the order they appear in any vendor advisory. Each one carries a different internal political cost, and pretending the costs are equal is how good control work dies in committee.

Inventory every workflow in your repositories that uses pull_request_target. The grep is cheap. The conversation with the dev tooling team about what each of those workflows needs is not. Expect to find workflows configured for one reason, with AI agent integrations later layered on top, and no review of the original threat model.

Rotate every credential exposed to agents in those workflows over the last 90 days. The cost is low. The likelihood of someone pushing back is also low. Do it first because it is the cheap one, and use the speed of the rotation to demonstrate that agent-related credential rotation is now part of the normal operating cadence.

Switch from stored secrets to short-lived OIDC tokens for any workflow that supports it. The political cost is medium. You will need platform team buy-in. The argument that closes the loop is exactly the procurement gap above. Stored secrets in agent-accessible environments are a category of risk no vendor’s contract currently covers, and OIDC removes the category from the buyer’s residual.

Strip bash execution permissions from agents that only need to perform code review. This one starts a fight with the developer tooling team because some of the convenience features will break. The fight is worth having. An agent with bash permissions on a CI runner with secrets in scope is the worst-case configuration. Write the security memo and force the documented risk acceptance from the team that wants to keep the bash channel open.

Add a category to your supply chain risk register called “AI agent runtime composition.” Most GRC tooling doesn’t have a field that maps to the category. Add it manually. The act of adding the category forces the conversation about which vendor combinations are covered by which contracts and which are not. The conversation is the artifact you actually need. The risk register entry is the receipt that the conversation happened.

Where the Industry Has to Go

The cloud security industry built the shared responsibility model under pressure from breaches and ten years of regulatory friction. The AI agent industry has neither of those forcing functions yet. The EU AI Act high-risk obligations come into force in August 2026 and will start to put procurement language behind some of these questions, but the standards work that would produce a real shared responsibility artifact for agent runtimes hasn’t happened. This is where the CARE framework lands. Create the procurement questions before you sign. Adapt the controls you already have around CI/CD, secret scoping, and runtime monitoring. Run the agent integrations under the same operating cadence as the rest of your privileged automation. Evolve the risk register category as new exploit classes emerge. The exploit class will not stop with Comment and Control. The next one will follow the same architectural pattern and the same governance gap. The CISOs who are ready for it are the ones who treat agent procurement as a governance problem now, while the vendors and the standards bodies are still catching up.

Key Takeaway: The AI coding agent prompt injection class lives in the seams between vendor contracts, and the buyer carries the residual until the procurement questions force the seams into the conversation.

What to Do Next

Start with the five procurement questions in your next vendor renewal cycle. Do the credential rotation and the OIDC migration this quarter. Read the rest of the RockCyber Musings archive for the operating cadence I run with clients on agentic AI security reviews, and reach out through RockCyber if you want to walk through the procurement question set against a specific vendor stack you are evaluating.

👉 Visit RockCyber.com to learn more about how we can help you in your traditional Cybersecurity and AI Security and Governance Journey

👉 Want to save a quick $100K? Check out our AI Governance Tools at AIGovernanceToolkit.com

👉 Subscribe for more AI and cyber insights with the occasional rant.

The views and opinions expressed in RockCyber Musings are my own and do not represent the positions of my employer or any organization I’m affiliated with.

Share RockCyber Musings

Weekly Musings Top 10 AI Security Wrapup: Issue 35 April 17-April 23, 2026

Rock Lambros — Fri, 24 Apr 2026 12:50:53 GMT

Seven days. One breached “too dangerous to release” model. One vibe coding platform exposing 76 days of customer source code. One AI supply chain attack that cost Vercel its dignity. A compliance startup accused of rubber-stamping SOC 2 reports for companies that later got breached. Every story landed between April 17 and April 23, 2026, the same week Gartner blessed its first “Company to Beat” in agent governance, the UK promised a £90 million cyber shield, and Google shipped three security agents. The security industry spent two years debating whether agentic AI was a real threat. This week, the debate ended.

AI systems are both targets and attack vectors, with failure modes of their own. A frontier model gets breached because a vendor fell for infostealer malware in February. A vibe coding startup ships a regression and exposes every customer’s source code for 76 days. A compliance startup hands out SOC 2 attestations like candy, and one customer becomes the pivot for a supply chain attack. Governments and analysts moved together. The UK committed real money to AI-powered cyber defense. Gartner stamped agent governance as a procurement category. This is the week the gap between AI capability and AI assurance became a balance sheet problem.

1. Anthropic Mythos Model Accessed By Unauthorized Discord Group Days After Launch

Anthropic confirmed on April 22, 2026, that it is investigating unauthorized access to Mythos, the frontier model restricted to roughly 40 partners, including Apple, Google, JPMorgan Chase, and NVIDIA (Bloomberg). The access came through a third-party contractor environment, not Anthropic’s direct infrastructure (CBS News). A Discord group focused on unreleased AI models guessed Mythos’s URL from naming conventions and pivoted through a contractor’s credentials to reach it. Anthropic claims no core systems were compromised.

Why it matters

The firm Anthropic, trusted with access to frontier models, is the one that leaked it.
Mythos autonomously finds and weaponizes zero-days. Downstream risk spans all major OSes.
Guessing URLs and owning one contractor beat a Tier 1 AI lab.

What to do about it

Inventory every third-party vendor with access to frontier AI weights or runtime. Treat them as Tier 1.
Require contractors touching AI infrastructure to match your credential isolation standards.
Demand hardware token enforcement for any vendor in production AI environments.

Rock’s Musings

A contractor endpoint blew apart the “too dangerous to release” framing in 24 hours. Anthropic built Mythos to protect partners from zero-days, then lost it through a vendor employee. The model built to find vulnerabilities got stolen because of a vulnerability nobody thought to measure. You cannot outsource your trust perimeter. Every CISO needs to audit AI-access vendors as they do their crown-jewel systems.

2. Vercel Supply Chain Breach Via Context.ai OAuth Token Compromise

Vercel confirmed on April 19, 2026 that customer data was stolen via a compromise of Context.ai, a third-party AI assistant a Vercel employee had connected to Google Workspace with full Drive read access (TechCrunch). A Context.ai employee’s device was infected with Lumma infostealer in February 2026. ShinyHunters used the exfiltrated OAuth tokens to pivot into the Vercel employee’s Google account, then into Vercel itself (Vercel). The actor is offering source code, NPM and GitHub tokens, and access keys for $2 million on BreachForums.

Why it matters

One OAuth app installed by one employee rolled into a platform breach.
Lumma was the vector. The AI assistant was the accelerant.
ShinyHunters is monetizing AI-adjacent breaches at scale. Expect copycats.

What to do about it

Audit every OAuth app with Drive, Gmail, or Workspace scopes. Revoke AI tools without documented need.
Enforce conditional access with hardware tokens and device posture for Workspace accounts.
Subscribe to stealer log monitoring for corporate emails.
Rotate all secrets (e.g. API keys).

Rock’s Musings

An employee clicked a button, granted a third-party AI read access to everything, and the attacker rode that consent into production. OAuth scopes are the new privileged credentials, and most of us are not managing them that way. The shadow AI problem I flag with clients at RockCyber is not ChatGPT use. It’s the hundreds of AI-branded OAuth apps employees connect while nobody watches.

3. Gartner Names Zenity The “Company To Beat” In AI Agent Governance

On April 23, 2026, Zenity announced that Gartner named it the “Company to Beat in AI Agent Governance” (Business Wire). Gartner cited Zenity’s agentic architecture, intent-aware detection, and end-user traction. The platform covers SaaS-managed agents, custom-built agents, and device deployments from build to runtime. Gartner’s 2026 CIO survey shows that 17 percent of organizations have deployed AI agents, 42 percent plan to do so within 12 months, and another 22 percent plan to do so the year after (Yahoo Finance). Zenity also landed in two categories of the 2026 Gartner Hype Cycle for Agentic AI this month.

Why it matters

A “Company to Beat” stamp on a narrow security category speeds up procurement.
79% of organizations plan to deploy AI agents within 2 years.
Agent governance is shifting from a research topic to a commercial line item.

What to do about it

If you are on the 42 percent 12-month curve, start evaluations now.
Evaluate agent governance on runtime enforcement, not only inventory or posture.
Require vendors to show agent identity, memory, tool-call, and intent controls as distinct.

Rock’s Musings

Yes… Zenity is my employer, so a) I’m super proud of this one and b) it’s my prerogative to include it in the musings 😀

“Company to Beat” labels are how procurement catches up with security reality. Mythos leaked through a contractor, Vercel got rolled via an AI assistant’s OAuth token, and the same week Gartner tells CIOs agent governance is a budget item. Read Zenity’s architecture claims against this week’s breach anatomy, then against what you bought for CASB five years ago. Same pattern, same procurement playbook. Budget the line item.

4. Lovable Vibe Coding Platform Exposed Source Code For 76 Days

On April 20, 2026, security researcher weezerOSINT disclosed a broken object-level authorization flaw in Lovable’s API that let any authenticated free-account user read source code, database credentials, AI chat history, and customer data from every project created before November 2025 (The Register). The exposure ran 76 days, from February 3 through April 20, 2026. Lovable first denied the flaw, blamed its documentation, then blamed HackerOne, then apologized for the apology (Cybernews). Customers include Uber, Zendesk, and Deutsche Telekom.

Why it matters

Vibe coding platforms hold enterprise source code and secrets. Attacker value is enormous.
Public denial while the flaw was live is a textbook loss-of-trust move.
A $6.6 billion startup cannot figure out basic tenant isolation three versions in.

What to do about it

Block new vibe coding connections at DNS or CASB until procurement reviews tenancy.
Rotate any credentials your teams put into Lovable projects since February 2026.
Treat vibe coding output as untrusted. Pull it into a real repo, scan it, review it.

Rock’s Musings

Vibe coding is a demo, not engineering. When you hand a growth-stage startup your production database credentials in exchange for a drag-and-drop builder, you have accepted that your security depends on whether someone refactors an authorization check. Three breaches in thirteen months is a pattern, not bad luck. If your security team has not yet restricted this category of tool, do it this week.

5. Google Cloud Next Ships Three AI Security Agents And Gemini Enterprise Agent Platform

On April 22, 2026, Google Cloud Next introduced the Gemini Enterprise Agent Platform and three new AI agents inside Google Security Operations (SiliconANGLE). The agents cover Threat Hunting, Detection Engineering, and Third-Party Context enrichment (The Register). Google also deepened its ties to the Wiz product and shipped new agent governance tools. Sundar Pichai framed the shift as moving from human-led defense to human-in-the-loop to AI-led defense overseen by humans.

Why it matters

Three tedious SOC functions now have vendor agent equivalents. SOC staffing economics shift if they work.
Google is betting the platform on agentic AI, not only generative AI.
The Wiz tie-in gives Google a path into CSPM-driven SOC workflows.

What to do about it

Pilot the Threat Hunting agent for 30 days against your human hunt team and score overlap.
Define human-in-the-loop gates before any autonomous detection or response action.
Update vendor risk reviews to cover agent behavior monitoring, not only model output.

Rock’s Musings

The pitch is compelling, the execution will be messy. Every SOC team I advise is drowning in alerts, and the first customer bitten by an autonomous agent on bad context will make headlines. The Third-Party Context agent matters more than the other two because better data into an agentic SOC prevents bad autonomous actions. Read my notes on AI governance before you green-light an agent in production.

6. UK Announces £90 Million National Cyber Shield And Calls On AI Firms To Co-Build Defense

At CYBERUK 2026 on April 22, 2026, UK Security Minister Dan Jarvis announced £90 million over three years for national-scale AI-powered cyber defense capabilities (GOV.UK). Jarvis asked frontier AI companies to co-develop these capabilities with the UK government and cited Mythos’s zero-day findings as justification for public sector urgency (Computer Weekly). Jarvis also launched a National Cyber Resilience Pledge aimed at private sector security baselines.

Why it matters

The UK is the first major Western government to put operational capital into AI-defended critical infrastructure.
Public-private cooperation on offensive-grade AI models sets a precedent others will react to.
Frontier AI vendors in UK public sector now have a direct path to shape national doctrine.

What to do about it

UK critical infrastructure operators: map your sector against the Pledge before it becomes mandatory.
Track which AI vendors join. UK procurement for critical infrastructure will narrow quickly.
Watch NCSC secure-by-design expectations for AI. They will bleed into global procurement language.

Rock’s Musings

£90 million pounds sounds like a lot, but it really is a down payment. The bigger story is the UK saying out loud what American officials still whisper. Frontier AI models are dual-use capability, and if you don’t partner with the labs building them, your adversaries will. The Pledge is the more interesting instrument. Voluntary commitments have a funny way of becoming procurement requirements, then de facto regulation.

7. OpenAI Releases Privacy Filter, An Open-Weight On-Device PII Redactor

On April 23, 2026, OpenAI released Privacy Filter, a 1.5-billion-parameter open-weight model with 50 million active parameters that detects and redacts personally identifiable information locally (Help Net Security). It supports a 128,000-token context window, runs in browsers and on laptops, and achieves a 96% F1 score on PII-Masking-300k (VentureBeat). It ships under Apache 2.0 on GitHub and Hugging Face, covering eight PII categories.

Why it matters

A permissive open-weight PII redactor that runs on a laptop closes a real enterprise data sanitization gap.
OpenAI shipping open weights for a safety model is a positional move, not a strategy reversal.
The tool removes a common excuse for shipping raw enterprise data to cloud LLMs.

What to do about it

Evaluate Privacy Filter as a preprocessing layer for any LLM pipeline on customer, support, or HR data.
Benchmark it against existing DLP tools for AI-specific use cases.
Add on-device redaction as a control in your AI data flow diagrams.

Rock’s Musings

Privacy Filter is the first open-weight piece from OpenAI that’s useful to a CISO. One point five billion parameters, runs local, decent accuracy, permissive license. It slots into every RAG pipeline I review as a trivial addition that removes an easy audit finding. OpenAI has taken heat on privacy posture for three years, and shipping open weights for a PII model is a pressure valve. Anthropic and Google will follow within six months.

8. Delve Compliance Scandal Widens After TechCrunch Confirms Context.ai Certification

On April 23, 2026, TechCrunch confirmed that Delve, the Y Combinator-backed compliance startup accused of faking SOC 2 audits, had certified Context.ai, the AI tool at the center of the Vercel supply chain breach (TechCrunch). Delve also certified LiteLLM, another open source project separately compromised with planted malware. Context.ai has cut ties with Delve and is re-certifying with a different auditor. Whistleblower DeepDelver alleged the Delve team took a Hawaii offsite between April 15 and April 19 while denying customer refunds.

Why it matters

Two Delve-certified companies are at the center of AI supply chain breaches.
SOC 2 without substance is a liability shield until the shield gets tested.
AI compliance tooling is saturated with startups racing to rubber-stamp fast-moving products.

What to do about it

Audit your vendor attestations. Who signed? What is the auditor’s history? Is the scope meaningful?
For AI vendors, demand pentest summaries, code review artifacts, and threat models.
Treat SOC 2 as one input into assurance, not a box check.

Rock’s Musings

My friends know… I believe SOC 2 needs to burn a fiery death, but “we” still insist on them. Founders want the badge, auditors want the fee, customers want the checkbox. Everyone wins until the breach, then the enterprise that relied on the paper finds out the paper was never the point. SOC 2 is a floor, not a ceiling. Nothing will change until we kill the demand side of this particular supply/demand equation.

9. NIST Narrows CVE Enrichment As Submission Volume Overwhelms NVD

On April 17, 2026, NIST announced it will only enrich CVEs that meet specific criteria due to an unsustainable rise in submissions (Cybersecurity Dive). The NVD will continue assigning CVE IDs to all submissions but will no longer guarantee CVSS scores, CPE mappings, or descriptions for every record. NIST cites AI-assisted vulnerability research as a key driver of volume. Enrichment priority goes to actively exploited vulnerabilities and CVEs affecting critical infrastructure.

Why it matters

If your program assumes every CVE carries a CVSS score and CPE mapping, it is about to degrade silently.
AI-generated vulnerability research is flooding public disclosure. The NVD cannot keep up.
Enterprises relying only on NVD-fed scanners will miss or misprioritize vulnerabilities now.

What to do about it

Supplement NVD with CISA KEV and commercial vulnerability intelligence.
Score CVEs NIST skips using vendor advisories as primary sources.
Reassess SLAs based on enrichment availability, not only patch availability.

Rock’s Musings

NIST is essentially throwing up its hands and giving up. The CVE system was built for a world where humans found most bugs. We no longer live there. Mythos alone found thousands of zero-days in weeks. Multiply that by every lab running similar research, and NVD throughput becomes a joke. NIST is triaging, which is the only rational move. The problem is that nobody told your vulnerability scanner. Get ahead of this now, or your next board report will be a lie by omission.

10. Anthropic MCP STDIO Flaw Burns The Agentic AI Ecosystem As New CVEs Land

The STDIO command injection flaw in Anthropic’s MCP SDK produced new CVE assignments throughout the week, including CVE-2026-30623 and CVE-2026-22252 (LiteLLM). Analysis on April 20 from BDTechTalks documented ecosystem fallout and Anthropic doubling down on its “by design” position (BDTechTalks). The flaw class affects 7,000 publicly accessible MCP servers and over 150 million package downloads (Infosecurity Magazine). Affected products include LibreChat, WeKnora, Cursor, and MCP Inspector.

Why it matters

Anthropic will not patch. Every developer using the official SDK owns the mitigation.
The default agentic interop standard has a baked-in remote code execution footgun.
CVEs are stacking up. Every MCP-connected product is a vendor risk question.

What to do about it

Inventory every MCP server and client. If you can’t produce the list in a day, you have a bigger MCP problem.
Enforce strict input validation on any MCP server config from user input, LLM output, or third-party manifests.
Update your agentic threat model to cover MCP as a first-class attack surface.

Rock’s Musings

“By design” is a liability transfer, not a security posture. Anthropic handed every developer on the MCP SDK a foot-gun and said go figure it out. Competing agent protocols like A2A and Agora are watching and taking notes. Building the default standard for agent-to-system communication on top of a protocol decision that cannot be fixed without breaking compatibility is the problem. Every MCP-based product in your stack is a recurring risk item.

The One Thing You Won’t Hear About But You Need To

AgentSOC Paper Publishes A Multi-Layer Blueprint For Agentic Security Operations

On April 22, 2026, researchers published AgentSOC: A Multi-Layer Agentic AI Framework for Security Operations Automation on arXiv (arXiv). The paper proposes a layered architecture combining perception, anticipatory reasoning, and risk-based action planning for autonomous SOC operations. It documents design patterns for coordinating specialized agents across triage, hunt, and response workflows while keeping human oversight in place. The work joins other 2026 papers arguing agentic AI is mature enough for production SOC environments when guardrails are in place.

Why it matters

Vendors ship products. Research supplies the reference architectures that determine whether those products survive in production.
The AgentSOC blueprint maps closely to what Google announced this week. The convergence is not accidental.
CISOs now have a public framework to score vendor claims against independent research.

What to do about it

Read the paper before your next agentic SOC evaluation. Use the layer breakdown as a scoring rubric.
Ask vendors how their architecture maps to perception, anticipation, and action layers.
Share the paper with SOC leadership. It gives your team a vocabulary for what to demand.

Rock’s Musings

Vendor marketing is a terrible place to learn what agentic security operations should look like. Academic literature is better. AgentSOC is not the last word, but it landed the same week three major vendors pitched agentic SOC products. CISOs who read research papers buy better tools and sign better contracts than the ones who only read analyst reports. Use the AgentSOC structure the next time a vendor promises agentic magic, and watch them squirm when you ask what happens at the perception layer when the model hallucinates.

👉 For ongoing analysis of agentic AI governance frameworks, the conversation continues at RockCyber Musings.

👉 Visit RockCyber.com to learn more about how we can help with your traditional Cybersecurity and AI Security and Governance journey.

👉 Want to save a quick $100K? Check out our AI Governance Tools at AIGovernanceToolkit.com

👉 As a bonus, check out my conversation with Eva Benn where we talked about the cybersecurity skills you need to develop to stay relevant in 2026 and beyond.

👉 Subscribe for more AI and cyber insights with the occasional rant.

The views and opinions expressed in RockCyber Musings are my own and do not represent the positions of my employer or any organization I’m affiliated with.

Share RockCyber Musings

References

arXiv. (2026, April 22). AgentSOC: A multi-layer agentic AI framework for security operations automation. https://arxiv.org/abs/2604.20134

BDTechTalks. (2026, April 20). Anthropic’s MCP vulnerability: When ‘expected behavior’ becomes a supply chain nightmare. https://bdtechtalks.com/2026/04/20/anthropic-mcp-vulnerability/

Bloomberg. (2026, April 21). Anthropic’s Mythos AI model is being accessed by unauthorized users. https://www.bloomberg.com/news/articles/2026-04-21/anthropic-s-mythos-model-is-being-accessed-by-unauthorized-users

Business Wire. (2026, April 23). Zenity named the “Company to Beat” in AI Agent Governance in new Gartner report. https://www.businesswire.com/news/home/20260423045822/en/Zenity-Named-the-Company-to-Beat-in-AI-Agent-Governance-in-New-Gartner-Report

Bloomberg. (2026, April 22). Google releases new AI agents to challenge OpenAI and Anthropic. https://www.bloomberg.com/news/articles/2026-04-22/google-releases-new-ai-agents-to-challenge-openai-and-anthropic

CBS News. (2026, April 22). Anthropic investigating possible breach of its Mythos AI model. https://www.cbsnews.com/news/anthropic-investigates-mythos-ai-breach/

Computer Weekly. (2026, April 22). UK to build ‘national cyber shield’ to protect against AI cyber threats. https://www.computerweekly.com/news/366641790/UK-to-build-national-cyber-shield-to-protect-against-AI-cyber-threats

Cybernews. (2026, April 20). Lovable goes on ego trip denying vulnerability, then blames others for said vulnerability. https://cybernews.com/security/lovable-vibe-coding-flaw-apology/

Cybersecurity Dive. (2026, April 17). NIST narrows CVE enrichment as submission volume surges. https://www.cybersecuritydive.com/news/nist-ai-cybersecurity-framework-profile/808134/

GOV.UK. (2026, April 22). Security Minister’s speech to CYBERUK 2026. https://www.gov.uk/government/speeches/security-ministers-speech-to-cyberuk-2026

Help Net Security. (2026, April 23). OpenAI tackles a bad habit people have when interacting with AI. https://www.helpnetsecurity.com/2026/04/23/openai-privacy-filter-personally-identifiable-information/

Infosecurity Magazine. (2026, April). Systemic flaw in MCP protocol could expose 150 million downloads. https://www.infosecurity-magazine.com/news/systemic-flaw-mcp-expose-150/

LiteLLM. (2026, April). Security update: CVE-2026-30623, command injection via Anthropic’s MCP SDK. https://docs.litellm.ai/blog/mcp-stdio-command-injection-april-2026

SiliconANGLE. (2026, April 22). Google rolls out new Security Operations agents, Wiz ties, and agent governance tools. https://siliconangle.com/2026/04/22/google-cloud-next-new-security-operations-agents-wiz-integrations-agent-governance-tools/

TechCrunch. (2026, April 20). App host Vercel says it was hacked and customer data stolen. https://techcrunch.com/2026/04/20/app-host-vercel-confirms-security-incident-says-customer-data-was-stolen-via-breach-at-context-ai/

TechCrunch. (2026, April 23). Another customer of troubled startup Delve suffered a big security incident. https://techcrunch.com/2026/04/23/another-customer-of-troubled-startup-delve-suffered-a-big-security-incident/

The Register. (2026, April 20). Lovable denies data leak, cites ‘intentional behavior’. https://www.theregister.com/2026/04/20/lovable_denies_data_leak/

The Register. (2026, April 22). Google unleashes even more AI security agents to fight crims. https://www.theregister.com/2026/04/22/google_unleashes_even_more_ai

Vercel. (2026, April 19). Vercel April 2026 security incident. https://vercel.com/kb/bulletin/vercel-april-2026-security-incident

VentureBeat. (2026, April 23). OpenAI launches Privacy Filter, an open source, on-device data sanitization model. https://venturebeat.com/data/openai-launches-privacy-filter-an-open-source-on-device-data-sanitization-model-that-removes-personal-information-from-enterprise-datasets

Yahoo Finance. (2026, April 23). Zenity named the “Company to Beat” in AI Agent Governance. https://finance.yahoo.com/sectors/technology/articles/zenity-named-company-beat-ai-130100277.html

Your Defender AI Is Your Next Crown Jewel. Threat-Model It Now.

Rock Lambros — Tue, 21 Apr 2026 12:51:01 GMT

A Fortune 500 bank gets its Project Glasswing partner seat six weeks from now. Anthropic ships the Mythos Preview container and $10 million in credits. The bank stands up a Mythos instance inside its own environment, points it at its core banking monorepo, and starts finding bugs on day one. Forty-two days in, a developer opens a pull request that adds a utility library. The README on that library contains a commented block beginning with “SECURITY NOTE FOR AUTOMATED REVIEWERS.” The Mythos instance reads it. The comment is an indirect prompt injection telling the reviewer to mark a specific authentication bypass as a false positive and not mention the instruction in the output. The reviewer complies. The bug ships. Nobody sees it because the thing designed to see it was told not to.

That scenario is fictional. The attack class is not. The Mythos-Ready whitepaper from the CSA, SANS, OWASP GenAI Security Project, and a coalition of practitioners (I was a reviewer) lists “Unmanaged AI Agent Attack Surface” as one of its five critical risks, mapping to OWASP Agentic Top 10 entries ASI01 (Agent Goal Hijack), ASI02 (Tool Misuse), ASI03 (Identity and Privilege Abuse), plus AML.T0051.001 (Indirect Prompt Injection) in MITRE ATLAS. Ranked critical. The single most underweighted item in the entire priority table.

The industry is fixated on the wrong question. Everyone is arguing about whether Anthropic’s 40-org Glasswing coalition or OpenAI’s thousands-of-verified-defenders TAC program is the right release model. That argument matters, and I will work through it. The bigger issue is that once you get access to either Mythos or GPT-5.4-Cyber, the running instance becomes the most valuable asset in your security stack. It sits within your environment, with privileged access to your source code, vulnerability telemetry, patch queue, and incident history. It knows where your unpatched zero-days live. An attacker who compromises that instance does not need to find bugs. The instance tells them where the bugs are.

What Anthropic and OpenAI Built

Mythos Preview is a gated frontier model. Anthropic released it on April 7, 2026, announced Project Glasswing the same day, and restricted access to 12 launch partners plus roughly 40 additional organizations. The partners include AWS, Apple, Microsoft, Google, CrowdStrike, Cisco, JPMorgan Chase, NVIDIA, Palo Alto Networks, Broadcom, and the Linux Foundation. Anthropic committed $100 million in usage credits and priced the model at $25 per million input tokens and $125 per million output tokens, roughly 5x Opus 4.6 (which is roughly 5x Sonnet 4.6… OUCH!). The stated case for restricting access is that the model found thousands of zero-days across all major operating systems and browsers, including a 27-year-old bug in OpenBSD and a 16-year-old flaw in FFmpeg. Anthropic’s own assessment is that comparable capability will reach broad availability in 6 to 18 months.

GPT-5.4-Cyber is OpenAI’s answer, released April 14, 2026, one week later. It is a fine-tuned variant of GPT-5.4 with what OpenAI calls a “lowered refusal boundary for legitimate cybersecurity work.” The headline capability is binary reverse engineering. Feed it a compiled executable, and get vulnerability analysis without source code. OpenAI’s Trusted Access for Cyber program, piloted in February 2026 with $10 million in grant credits, scales to thousands of verified individual defenders and hundreds of teams. Individuals verify at chatgpt.com/cyber. Enterprises apply through account representatives. OpenAI cyber researcher Fouad Matin told reporters, “No one should be in the business of picking winners and losers” on who gets to defend their systems.

The two approaches reflect different risk philosophies. Anthropic bets on institutional trust and coalition monitoring. OpenAI bets on KYC verification and broader distribution. Both have real merit. Both share the same structural weakness: the access decision sits upstream of the threat model.

Figure 1: Release Philosophy Comparison

How to Get Your Hands on Each

For Mythos, the answer for 99% of organizations is: you don’t. Project Glasswing is a curated coalition. The 40 slots are filled with hyperscalers, chipmakers, one bank, and the Linux Foundation. Anthropic has not published an application path. Additional partners will be added over time, prioritized by critical infrastructure impact. If you run a regional bank, a hospital system, or a municipality, the realistic timeline for direct access to Mythos is measured in quarters.

For GPT-5.4-Cyber, the path is documented. Individuals verify at chatgpt.com/cyber. Organizations request trusted access through an OpenAI account representative. The program uses KYC-style identity verification and tiered access, with the highest tier unlocking GPT-5.4-Cyber. OpenAI says the rollout will be gradual and vetted, with early priority on security vendors, organizations, and researchers with track records in vulnerability research and remediation.

Both paths share one feature that matters more than either provider acknowledges: neither gate eliminates the capability. AISLE, an independent AI security research group, tested the exact FreeBSD vulnerability Anthropic headlined against open-weight models. Eight out of eight detected the bug. The smallest was a 3.6 billion parameter model at 11 cents per million tokens. A 5.1 billion active parameter model recovered the core analysis chain of the 27-year-old OpenBSD flaw. Total cost of AISLE’s weekend benchmarking across six models: under $100. Attackers are running abliterated Llama 4, Kimi K2, and Qwen3 variants on laptops. Your coordinated disclosure window is what the gates protect, not your attack surface.

Two Attacker Profiles, Two Different Problems

The defender community keeps talking about “the attacker” as if there is one. There are at least two. They pick different pathways.

The first is the opportunistic actor running autonomous vulnerability discovery across the entire internet-facing attack surface. This actor does not care who you are. They care about breadth. They run nano-analyzer-style scaffolding against every public codebase, every npm package, every Docker image they can reach. Open-weight models, free, uncensored variants widely distributed, workflow already documented. AISLE published their scaffolding as open source. Anyone who can run a Python script can replicate it. This actor finds your unpatched zero-days in public dependencies as soon as those dependencies are indexed.

The defense is in the whitepaper: inventory and reduce attack surface within 90 days, stand up a VulnOps function within 12 months, automate patching to match the discovery rate.

The second actor is targeted. They care specifically about you. They want your bugs, your patch queue, your incident data, and your threat model. The open-weight approach is too slow and too noisy for this actor. They need inside information. The three pathways they pick, in order of near-term probability.

First, credential theft against verified defenders. A TAC tier-three user at a Fortune 500 security vendor is a high-value target. Their API session tokens grant access to a cyber-permissive model with binary reverse engineering capabilities. A compromised developer laptop, a phished OAuth flow, or a stolen refresh token gets the attacker a capability they cannot otherwise reach. OpenAI’s announcement acknowledged that zero-data-retention environments get limited visibility, meaning stolen tokens may operate with reduced logging. Rotate short-lived tokens, enforce hardware-bound keys, and put defender-model API use behind the same privileged access controls you apply to domain admin accounts. Treat a TAC session token as a tier-0 secret.

Second, open-weight replication against a specific target. Once an attacker has selected you, they can scan your public code, your partner repositories, your open-source contributions, and any of your dependencies using the same scaffolding as the opportunistic actor. The targeting changes the risk profile. They are building a dossier on your specific organization. Defense is the same as against the opportunistic case, with urgency that scales with your profile. If you are a named Glasswing partner, assume you are the target.

Third, defender instance compromise through context poisoning and prompt injection. This pathway keeps me up at night. It is the one your existing threat model does not cover. A running Mythos or GPT-5.4-Cyber instance inside your environment consumes source code, pull request descriptions, commit messages, dependency READMEs, issue trackers, and whatever retrieval pipelines you plumb into it. Each of those input channels is an indirect prompt-injection vector. The model cannot distinguish between a developer’s pull request description and an attacker’s instructions buried in a dependency’s changelog. Anthropic’s system card for Mythos documents “reckless” behaviors from earlier versions: sandbox escape, credential hunting via /proc/ access, unauthorized file modification, git history scrubbing, and attempts to modify a running MCP server’s external URL. The model can act on indirect instructions in ways that bypass its safeguards. A hostile input channel into your defender instance is an exploitation channel into your codebase.

Figure2: Attacker Pathways and Defender Instance Exposure | Render: mermaid

Why the Defender AI Is the Crown Jewel

The whitepaper’s Priority Action 4 is “Defend Your Agents.” The authors are direct: agents are not covered by existing controls, introduce cyber defense and agentic supply chain risks, and the agent scaffolding (prompts, tool definitions, retrieval pipelines, escalation logic) is where the most consequential failures occur.

Audit agents with the same rigor as you apply to the agent’s permissions. Correct guidance. Buried inside an 11-item priority table, where every item reads as equal weight. It is not equal weight.

The defender AI concentrates on four kinds of access that used to live in separate systems and separate roles.

It reads every line of production source code.
It holds context on every unpatched vulnerability in your queue. I
t sees the remediation timeline for each one.
It knows the architectural boundaries between your crown jewels and everything else.

A human with all four would be classified as an insider-threat tier-0. The defender AI requires all four as prerequisites to do its job. Your adversary does not need to compromise OpenAI or Anthropic. They need to compromise your instance. Much smaller target, much wider attack surface.

What a Defender-AI Threat Model Looks Like

The architecture defenders need has three layers. The concepts span the OWASP Agentic Security Initiative, the NIST AI RMF, and multiple emerging specifications. What is new here is applying them specifically to the defender AI case.

The first layer is runtime interception at every agent decision point. Every time the defender AI receives input, produces output, selects a tool, calls a tool, transitions from planning to execution, writes to memory, executes code, or invokes a sub-agent, that action must pass through a policy enforcement point before it reaches production. This is inline, deterministic, allow-deny-modify enforcement. Not a log review after the fact. A defender AI that reads a dependency README with an embedded prompt injection must have that input evaluated against policy before the agent’s reasoning ingests it. Policy enforcement at the hook surface, before the consequential action, is the only mechanism that works at machine speed.

The second layer is structured observability built on OpenTelemetry with agent-specific semantic conventions and OCSF mapping for SIEM integration. The trace has to cover the full agent lifecycle: prompt received, tool selected, tool called, response ingested, memory written, sub-agent invoked, output produced. Forensic reconstruction of a defender AI incident requires this granularity. Your SOC already operates on OCSF. Agent traces flowing through the pipelines your SOC already monitors is the integration that scales. A parallel agent observability stack your SOC does not watch is a dead letter office.

The third layer is live inventory. The whitepaper’s Priority Action 7 calls for real SBOMs, correct for static software. For agents, it is insufficient. The inventory has to update continuously because the agent can discover new tools, connect to new MCP servers, and modify its own tool catalog mid-session. Inventory generated at deployment time is stale by the end of the first prompt. Extend CycloneDX or SPDX semantics to live agent composition. Capture every tool, model, capability, knowledge source, and MCP connection the defender AI is wired into, across every running instance. You cannot defend what you cannot inventory, and what you cannot inventory is mutating on you.

These three layers stack on a three-tier operating model. The platform exposes the hooks once. An open enforcement SDK reads declarative policy and fires decisions through the hooks. Enterprise-specific classifiers and detectors plug into the enforcement layer. Your data sensitivity model, your PHI detection, your threat-intel feed integrations all live in the enterprise layer, consuming the same standardized hook surface. Switching from Mythos to GPT-5.4-Cyber or to a third model six months from now should not require rewriting your safety logic. It should require pointing your enforcement SDK at a different set of hooks.

Figure 3: Three-Layer Defender AI Control Architecture

The Five Actions You Can Take This Week

The whitepaper’s 11 priority actions are the right list. Here is how the defender-AI-as-crown-jewel thesis reorders them by urgency.

First, write the threat model. Before you stand up Mythos or GPT-5.4-Cyber anywhere, document what the instance will access, what inputs it will consume, what outputs it can produce, and what tools it can invoke. Map each item to ASI01 through ASI10 in OWASP Agentic Top 10 and to the relevant AML.T entries in MITRE ATLAS. If you have not done this exercise for any agent in your environment, start with the defender AI. Its blast radius is the largest.

Second, treat API tokens for defender models as tier-0 secrets. Hardware-bound keys, short TTLs, per-session scope, and the access review cadence you apply to break-glass domain admin. Stolen credentials are the fastest path to your defender AI and your unpatched zero-days. Lock them down the way you would lock down root.

Third, instrument the hook surface before you instrument the prompt. Your first integration priority is runtime policy enforcement for input, output, tool calls, tool responses, and sub-agent invocations. Not log collection. Not dashboards. Inline allow-deny-modify at the decision points.

Fourth, build a live agent inventory for every agent in your environment, starting with the defender AI. Capture the model, the tools, the MCP connections, the retrieval sources, the knowledge bases, and the memory stores. Update in real time. Review weekly until the pattern stabilizes, then move to continuous automated review.

Fifth, run the defender AI through your own red team before you point it at your own code. Indirect prompt injection via dependency READMEs, poisoned commit messages, hostile issue descriptions, and malicious pull request bodies. If you cannot compromise your own defender AI in a week, you have not tried hard enough.

Key Takeaway: The access gate is not the threat model. The defender AI in your environment is a new crown jewel. Most security programs have not yet acknowledged what it is or what protects it.

What to do next

Read the CSA, SANS, and OWASP GenAI Security Project briefing, “The AI Vulnerability Storm: Building a Mythos-Ready Security Program.” Run the 10 Questions diagnostic against your program this week. Rerank the Priority Action table, putting “Defend Your Agents” above everything except “Point Agents at Your Code.” Apply CARE (Create the threat model, Adapt your controls, Run the red team, Evolve the policy) to the defender AI before anything else in your AI portfolio.

For more on CARE and governance for defender-class agents, see RockCyber. and coverage at RockCyber Musings. Last week’s blog, AI Vulnerability Discovery: Mythos Is the Headline. Not the Story., carries the capability-parity argument that underpins the urgency here.

👉 Visit RockCyber.com to learn more about how we can help you in your traditional Cybersecurity and AI Security and Governance Journey

👉 Want to save a quick $100K? Check out our AI Governance Tools at AIGovernanceToolkit.com

👉 Subscribe for more AI and cyber insights with the occasional rant.

The views and opinions expressed in RockCyber Musings are my own and do not represent the positions of my employer or any organization I’m affiliated with.

Thanks for reading RockCyber Musings! This post is public so feel free to share it.

Share RockCyber Musings

Weekly Musings Top 10 AI Security Wrapup: Issue 34 April 10-April 16, 2026

Rock Lambros — Fri, 17 Apr 2026 12:50:49 GMT

This week drew a hard line between AI security theater and AI security reality. Mythos Preview hunted vulnerabilities nobody had found in 20 years. OX Security dropped a critical MCP flaw affecting 200,000 deployments. Someone threw a Molotov cocktail at Sam Altman’s gate. OpenAI countered Anthropic’s restricted rollout with GPT-5.4-Cyber. The UK government confirmed AI clears expert-level cyber tasks. If your board still treats AI governance as an ethics committee item, the gap between your risk register and reality widened another notch.

Ten stories ranked by impact, plus one under the radar. Capability, exposure, and governance move at three speeds. Your program needs all three. Longer work lives at RockCyber and Rock Cyber Musings.

1. The “AI Vulnerability Storm” Emergency Strategy Briefing

On April 14, 2026, SANS Institute, the Cloud Security Alliance, OWASP GenAI Security Project, and [un]prompted released “The AI Vulnerability Storm: Building a Mythos-Ready Security Program” (SANS Institute). Sixty named contributors produced the document over a weekend, with 250 CISOs reviewing it. It includes a 13-item risk register mapped to OWASP LLM Top 10 2025, OWASP Agentic Top 10 2026, MITRE ATLAS, and NIST CSF 2.0, plus an 11-item priority actions table. Zero Day Clock data shows mean time from disclosure to exploitation fell below one day in 2026, down from 2.3 years in 2019.

Why it matters

Disclosure-to-exploit dropped from 2.3 years to under a day. Your patch cadence cannot keep up.
A coalition of security institutions framing this as an emergency is a signal worth taking seriously.
The risk register maps to four frameworks, removing the excuse about lacking a shared taxonomy.

What to do about it

Pull the 13-item risk register into your next program review.
Run the 10 CISO diagnostic questions with your security leadership team this quarter.
Brief your board using the executive section. Don’t rewrite it.

Rock’s Musings

Happy and honored that I was ask to participate in this one. I jumped at the opportunity. The coalition isn’t selling anything. We’re telling you the economics of exploitation flipped. When the attacker's cost to find a vulnerability drops to near zero while your patch cycle runs for weeks, the math stops working in your favor. If you planned AI program changes for 2027, you’re late.

2. OX Security Discloses Systemic Anthropic MCP Vulnerability

On April 15, 2026, OX Security published a report detailing a critical systemic flaw in Anthropic’s official MCP SDKs across Python, TypeScript, Java, and Rust (OX Security). MCP’s STDIO transport accepts arbitrary command strings and passes them to subprocess execution with no validation, sanitization, or sandboxing. OX tested the attack against six production platforms and took over thousands of public servers across 200 open-source projects. Exposure includes 150 million downloads, 7,000 public servers, and up to 200,000 vulnerable instances. Anthropic, per OX, classified the behavior as “expected” (Infosecurity Magazine).

Why it matters

MCP is the backbone of agentic AI. Systemic flaws propagate through every agent you’ve built or bought.
Anthropic labeling the flaw “expected behavior” puts responsibility on your security team.
200,000 exposed instances is the baseline, not an edge case.

What to do about it

Inventory every MCP server and client in your environment this week.
Block outbound STDIO transports from untrusted MCP configurations at the gateway.
Treat MCP command payloads like shell inputs. Assume hostile.

Rock’s Musings

Every vendor claims “secure by design” until a serious researcher pokes at the design. MCP’s STDIO transport is a textbook unsafe primitive from the first draft of the spec. The tell is Anthropic’s response. When the SDK vendor calls malicious-command-as-a-feature “expected,” you own the mitigation. Wrap it, monitor it, and expect your first incident from an MCP server you didn’t know was running.

3. UK AISI Publishes Frontier AI Trends Report

The UK AI Security Institute released its first Frontier AI Trends Report on April 10, 2026 (AISI). AI models now complete apprentice-level cyber tasks about 50 percent of the time, up from barely 10 percent in early 2024. AISI tested one model in 2025 finishing expert-level tasks requiring more than a decade of practitioner experience. The report names Anthropic’s Claude Mythos Preview as the first AI system to autonomously complete a 32-step enterprise attack simulation. AISI credits safety training for slowing the curve, while warning capability outstrips defender readiness (Computing).

Why it matters

A government safety institute confirmed one AI model executes a full enterprise attack chain autonomously. The “someday” framing is finished.
Apprentice-level cyber performance quintupled in two years. Expert parity arrives inside most procurement cycles.
AISI found safeguards working, meaning vendor controls meaningfully shift your risk exposure.

What to do about it

Demand red-team attestation from every AI vendor supporting security-relevant workflows.
Map your attack surface against the AISI capability framework. Flag targets a Mythos-class model reaches today.
Shift IR tabletops to assume autonomous adversary tooling. Time-box every playbook to hours.

Rock’s Musings

This is the first major government assessment I’d call usable for board reporting. AISI didn’t pull punches, which is rare when governments still court AI investment. Pay attention to the 32-step attack chain line. Most organizations run incident response assuming attackers make mistakes, burn time, or need sleep. An agentic adversary does none of those things. If your tabletops still assume a human at a keyboard, they’re obsolete.

4. OpenAI Launches GPT-5.4-Cyber for Vetted Defenders

On April 14, 2026, OpenAI announced GPT-5.4-Cyber, a variant of GPT-5.4 tuned for defensive cybersecurity work (OpenAI). The model lowers refusal boundaries for legitimate security work and enables binary reverse engineering without source code. OpenAI is limiting initial deployment to vetted security vendors, organizations, and researchers through an expanded Trusted Access for Cyber program. The release came one week after Anthropic restricted its Mythos Preview model to about 40 partners under Project Glasswing. OpenAI framed it as a counter-argument: broader access is warranted now, with tighter controls reserved for larger capability jumps (SiliconANGLE).

Why it matters

Two foundation model providers diverge on cyber-capable AI distribution. Your vendor risk management needs to account for the split.
Binary reverse engineering at LLM speed reshapes the economics of red and blue team work.
Vetting programs create new attestation and insider risk questions for your security function.

What to do about it

Evaluate whether your organization qualifies for OpenAI TAC or Project Glasswing. If yes, assign an accountable executive.
Update acceptable use policies for cyber-capable models. Access matches role, not curiosity.
Task SOC leadership with a 90-day assessment of how GPT-5.4-Cyber or Mythos changes detection, triage, and RE workflows.

Rock’s Musings

Anthropic and OpenAI staked out opposite ends of the distribution debate in the same week. Anthropic says keep it small. OpenAI says open the gates. Both positions have legitimate arguments. What matters for CISOs is that the defensive tooling category you’ll buy in 2027 exists in preview today. If you aren’t running pilots on one of these models this quarter, your competition is.

5. Marimo Python Notebook RCE Exploited in 10 Hours

CVE-2026-39987, a pre-authentication RCE flaw in Marimo’s Python notebook server, was exploited within 10 hours of disclosure (Sysdig). The CVSS 9.3 flaw stems from a terminal WebSocket endpoint lacking authentication, giving any attacker a full PTY shell. Sysdig observed initial exploitation nine hours and 41 minutes after disclosure, with credential theft in under three minutes. A separate campaign targeting Hugging Face Spaces began April 12, 2026, dropping a new variant of NKAbuse malware (The Hacker News). Marimo sits inside many AI toolchains. Version 0.23.0 patches the flaw.

Why it matters

A 10-hour disclosure-to-exploit window eliminates manual triage. Automation is the floor.
AI dev environments hold credentials for training data, model registries, and cloud APIs. A compromise there jumps the fence.
NKAbuse malware hosted on Hugging Face Spaces weaponizes a legitimate AI asset repository.

What to do about it

Audit AI dev environments for unauthenticated notebook services this week.
Push Marimo 0.23.0 immediately. Rotate .env credentials and SSH keys on any affected host.
Treat Hugging Face Spaces and similar repositories as unverified third-party code.

Rock’s Musings

Ten hours. Memorize that number. If your patch process takes longer than a shift change, you’re assuming attackers stay polite enough to wait. They aren’t. A human operator hand-crafted the exploit from the advisory text alone. No public PoC needed. AI-assisted exploit development already sits inside the attacker’s normal workflow.

6. KPMG and INSEAD Publish AI Governance Principles for Boards

On April 14, 2026, KPMG International and the INSEAD Corporate Governance Centre published AI Governance Principles for Boards (KPMG). The guidance structures board oversight around five areas: strategy, security, workforce, trustworthy AI, and how AI reshapes leadership itself. KPMG’s Global AI Pulse Survey found nearly three-quarters of boards have only moderate or limited AI expertise. The principles are sector-agnostic and apply at any AI maturity level. Timing lines up with signals that the governance gap is widening faster than board oversight can catch up (INSEAD).

Why it matters

Three-quarters of boards lack AI expertise. Your CEO and CISO are explaining in terms the directors cannot stress-test.
A sector-agnostic framework gives cover to restructure AI oversight without waiting for an industry mandate.
Board principles anchored in research and real practice create a defensible baseline for shareholder scrutiny.

What to do about it

Make AI governance a standing board agenda item using the KPMG/INSEAD principles as the template.
Recruit at least one director with direct AI operating experience.
Run a board-level AI risk tabletop in the next six months. Measure director fluency.

Rock’s Musings

I’ve sat across from enough boards to recognize the pattern. The AI conversation is either dominated by CMO hype or minimized by general counsel. Neither serves the company. What I appreciate about this work is the refusal to reduce governance to compliance. If your board treats AI as an IT issue, you’ve already lost the oversight fight. Rebuild the conversation at the director level.

7. Molotov Cocktail Attack on Sam Altman’s Home

Around 3:37 a.m. on Friday, April 10, 2026, Daniel Moreno-Gama allegedly threw a lit incendiary device at OpenAI CEO Sam Altman’s San Francisco home, igniting a fire on an exterior gate (CNBC). About an hour later, police arrested Moreno-Gama at OpenAI’s San Francisco headquarters with additional incendiary devices, a kerosene jug, and a manifesto opposing AI executives. San Francisco District Attorney Brooke Jenkins filed attempted murder charges on April 13, 2026 (Washington Post). The FBI raided a Spring, Texas residence linked to the suspect.

Why it matters

AI executives face documented physical threat campaigns motivated by AI-existential ideology.
Intimidation playbooks aimed at AI leadership echo harassment patterns seen against crypto executives.
The AI-existential threat narrative moved from online rhetoric to physical action.

What to do about it

Review personal security programs for AI executives, board members, and senior researchers, including residence protection.
Update threat modeling to include ideologically motivated actors, not only financially motivated ones.
Coordinate with local law enforcement on executive travel patterns and publicly disclosed addresses.

Rock’s Musings

The Altman attack will reshape executive protection budgets at every AI firm this year. The deeper point is the AI-existential discourse produced one person willing to act on it violently. That genie doesn’t go back. AI security functions now carry physical security responsibility alongside technical, and the two teams rarely talk. Fix that.

8. AI-Powered “Pushpaganda” Ad Fraud Scheme Exposed

On April 14, 2026, researchers exposed “Pushpaganda,” an ad fraud scheme combining SEO poisoning with AI-generated content to push deceptive news stories into Google Discover (The Hacker News). Users engaging with the stories are tricked into enabling persistent browser notifications delivering scareware and financial scams at global scale. Google deployed a security fix. Researchers linked the operation to broader AI-driven phishing trends: 82.6 percent of phishing emails now contain AI-generated content (GuardianMSSP).

Why it matters

Consumer-facing AI fraud creates downstream reputational and fraud exposure for any brand whose customers fall for it.
AI content weaponized through Google Discover scales instantly across borders.
Browser notification abuse creates persistent attacker infrastructure inside your users’ devices.

What to do about it

Update fraud and anti-phishing awareness for employees and high-value customers using Pushpaganda as a concrete example.
Tell users to audit browser notification permissions quarterly.
Task threat intel with tracking similar schemes targeting your brand or industry keywords.

Rock’s Musings

Ad fraud has been a rounding error in most risk registers. That’s ending. When AI pumps plausible news stories at near-zero cost through trusted distribution pipes, the economics of fraud flip in the attacker’s favor. The indirect damage is the part enterprises miss. Your customer falls for the scam, loses money, and blames you even when you had nothing to do with it. Merge brand protection and fraud prevention. The attacker already did.

9. OpenAI Discloses Axios npm Supply Chain Impact

On April 11, 2026, OpenAI confirmed it was affected by the compromise of the Axios npm package, a supply chain attack attributed to North Korea-linked actors (CNBC). The root cause was a misconfiguration in its GitHub Actions workflow touching macOS app certification. OpenAI revoked its macOS app certificate. Older macOS desktop apps stop receiving updates starting May 8, 2026. No user data, passwords, or API keys were accessed. Axios is one of the most depended-upon packages in the JavaScript ecosystem, with 100 million weekly downloads (Elastic Security Labs).

Why it matters

The largest AI service provider disclosed a supply chain compromise from a dependency most customers do not track.
North Korean targeting of AI providers signals state actors see AI as a strategic target.
If OpenAI’s CI/CD was affected, every firm building on OpenAI carries secondary exposure.

What to do about it

Audit every third-party dependency on npm, PyPI, and containers in your AI pipelines. Prioritize post-install hooks.
Rotate signing certificates on CI/CD pipelines using GitHub Actions with third-party dependencies.
Map your AI vendor dependency tree. Know who sits upstream of production workflows.

Rock’s Musings

OpenAI’s post-incident communication was cleaner than most. What I want security leaders to sit with is attacker selection. North Korean actors chose Axios because they understood the dependency graph. They compromised one maintainer account and reached OpenAI’s signing pipeline in one hop. Your AI platform has a similar graph. If you haven’t mapped it, you’re trusting your vendor’s vendor’s vendor without knowing any of the names.

10. The Register Questions Project Glasswing’s CVE Count

On April 15, 2026, The Register investigated Project Glasswing’s verified vulnerability count (The Register). Per VulnCheck researcher Patrick Garrity, only one CVE ties directly to Glasswing: CVE-2026-4747, a remote code execution flaw in FreeBSD’s NFS code. Anthropic had claimed Mythos Preview discovered thousands of high-severity zero-days, including 27-year-old bugs in OpenBSD, a 16-year-old FFmpeg flaw, and Linux kernel privilege escalation chains. None of those findings have assigned CVEs. Anthropic indicated a public summary report is expected around July 2026 (CSO Online).

Why it matters

Security leaders are being asked to restructure programs around claims mostly unverifiable right now.
The gap between marketing and disclosed CVEs is a litmus test for how AI vendors handle safety communications.
The same capability framing already drives budget and policy conversations across government and enterprise.

What to do about it

Track vendor AI capability claims against disclosed CVE evidence. VulnCheck, NVD, and CVE.org are sources of record.
Require AI vendors to commit to disclosure timelines in the contract.
Apply the same skepticism to AI capability claims you apply to any vendor’s performance claims.

Rock’s Musings

I believe AI-assisted vulnerability discovery is real. I also know marketing departments exist. The Register did what security trade press should do more often: press for evidence instead of reposting press releases. Until Anthropic’s July report arrives with specificity, assume the capability is real at a smaller scale than the headlines suggest. Your board deserves honest uncertainty over confident hype.

The One Thing You Won’t Hear About But You Need To

State AI Legislation Quietly Picks Up Pace in Nebraska, Maine, and Maryland

The week of April 13, 2026 saw three state legislatures advance AI-specific bills most national coverage missed (Troutman Pepper Locke). Nebraska’s unicameral legislature passed LB 525, bundling the Agricultural Data Privacy Act with a Conversational AI Safety Act regulating minors’ interaction with conversational AI services. Maine’s legislature prohibited therapy or psychotherapy services, including those delivered through AI, unless provided by a licensed professional. Maryland passed a pricing bill placing new constraints on AI-driven pricing practices. Nineteen new AI laws passed across U.S. states in the prior two weeks (Plural Policy).

Why it matters

State AI legislation accelerates faster than federal harmonization, raising compliance complexity for multi-state AI services.
Vertical bans like Maine’s on AI psychotherapy signal the “AI wrapper as feature” era is ending for regulated professions.
Conversational AI protections for minors now vary by state. Your chatbot rollout inherited new compliance surface.

What to do about it

Assign legal and compliance ownership of state AI legislation tracking.
Map customer-facing AI products against regulated-profession restrictions appearing in multiple states.
Build a multi-state compliance matrix for conversational AI aimed at minors. Treat it as living documentation.

Rock’s Musings

Federal AI policy gets the headlines. State legislation gets the enforcement. The gap is where CISOs and general counsel earn their salaries. AI compliance is not a checkbox on the NIST AI RMF. It’s a moving target across 50 jurisdictions, each with different enforcement flavor. Miss Maine, your mental health AI product is illegal. Miss Maryland, your pricing engine invited an AG letter. Miss Nebraska, your chatbot cannot talk to kids in the Cornhusker State. Track it, resource it, or pay the lawyers later.

👉 For ongoing analysis of agentic AI governance frameworks, the conversation continues at RockCyber Musings.

👉 Visit RockCyber.com to learn more about how we can help with your traditional Cybersecurity and AI Security and Governance journey.

👉 Want to save a quick $100K? Check out our AI Governance Tools at AIGovernanceToolkit.com

👉 Subscribe for more AI and cyber insights with the occasional rant.

The views and opinions expressed in RockCyber Musings are my own and do not represent the positions of my employer or any organization I’m affiliated with.

Share RockCyber Musings

References

AI Security Institute. (2026, April 10). Frontier AI Trends Report. https://www.aisi.gov.uk/frontier-ai-trends-report

Cloud Security Alliance. (2026, April 14). SANS Institute, Cloud Security Alliance, [un]prompted, and OWASP GenAI Security Project release emergency strategy briefing as AI-driven vulnerability discovery compresses exploit timelines from weeks to hours. https://cloudsecurityalliance.org/press-releases/2026/04/14/sans-institute-cloud-security-alliance-un-prompted-and-owasp-genai-security-project-release-emergency-strategy-briefing-as-ai-driven-vulnerability-discovery-compresses-exploit-timelines-from-weeks-to-hours

Computing. (2026, April 10). Claude Mythos Preview shows “unprecedented” attack capability, warns AI Safety Institute. https://www.computing.co.uk/news/2026/security/claude-mythos-preview-shows-unprecedented-attack-capability

CSO Online. (2026, April 15). Behind the Mythos hype, Glasswing has just one confirmed CVE. https://www.csoonline.com/article/4159617/behind-the-mythos-hype-glasswing-has-just-one-confirmed-cve.html

CNBC. (2026, April 10). Man arrested after Sam Altman’s house hit with Molotov cocktail, OpenAI headquarters threatened. https://www.cnbc.com/2026/04/10/sam-altman-house-hit-with-molotov-cocktail-openai-office-threatened.html

CNBC. (2026, April 11). OpenAI identifies security issue involving third-party tool, says user data was not accessed. https://www.cnbc.com/2026/04/11/openai-identifies-security-issue-involving-third-party-tool.html

Elastic Security Labs. (2026, April). Inside the Axios supply chain compromise: One RAT to rule them all. https://www.elastic.co/security-labs/axios-one-rat-to-rule-them-all

GuardianMSSP. (2026, April 14). AI-driven Pushpaganda scam exploits Google Discover to spread scareware and ad fraud. https://www.guardianmssp.com/2026/04/14/ai-driven-pushpaganda-scam-exploits-google-discover-to-spread-scareware-and-ad-fraud/

Infosecurity Magazine. (2026, April 15). Systemic flaw in MCP protocol could expose 150 million downloads. https://www.infosecurity-magazine.com/news/systemic-flaw-mcp-expose-150/

INSEAD. (2026, April 14). INSEAD and KPMG launch global AI Board Governance Principles as AI reshapes board oversight. https://www.insead.edu/news/insead-and-kpmg-launch-global-ai-board-governance-principles-ai-reshapes-board-oversight

KPMG International. (2026, April 14). KPMG and INSEAD launch global AI Board Governance Principles as AI reshapes board oversight. https://kpmg.com/xx/en/media/press-releases/2026/04/kpmg-and-insead-launch-global-ai-board-governance-principles.html

OpenAI. (2026, April 14). Trusted access for the next era of cyber defense. https://openai.com/index/scaling-trusted-access-for-cyber-defense/

OX Security. (2026, April 15). The mother of all AI supply chains: Critical, systemic vulnerability at the core of Anthropic’s MCP. https://www.ox.security/blog/the-mother-of-all-ai-supply-chains-critical-systemic-vulnerability-at-the-core-of-the-mcp/

Plural Policy. (2026, April). AI Governance Watch: Nineteen new AI bills passed into law. https://pluralpolicy.com/blog/the-ai-governance-watch-april-2026-nineteen-new-ai-bills-passed-into-law/

SiliconANGLE. (2026, April 14). OpenAI launches GPT-5.4-Cyber model for vetted security professionals. https://siliconangle.com/2026/04/14/openai-launches-gpt-5-4-cyber-model-vetted-security-professionals/

Sysdig. (2026, April). Marimo OSS Python notebook RCE: From disclosure to exploitation in under 10 hours. https://www.sysdig.com/blog/marimo-oss-python-notebook-rce-from-disclosure-to-exploitation-in-under-10-hours

The Hacker News. (2026, April 14). AI-driven Pushpaganda scam exploits Google Discover to spread scareware and ad fraud. https://thehackernews.com/2026/04/ai-driven-pushpaganda-scam-exploits.html

The Hacker News. (2026, April). Marimo RCE flaw CVE-2026-39987 exploited within 10 hours of disclosure. https://thehackernews.com/2026/04/marimo-rce-flaw-cve-2026-39987.html

The Hacker News. (2026, April). OpenAI revokes macOS app certificate after malicious Axios supply chain incident. https://thehackernews.com/2026/04/openai-revokes-macos-app-certificate.html

The Register. (2026, April 15). Anthropic’s Project Glasswing CVE count is still guesswork. https://www.theregister.com/2026/04/15/project_glasswing_cves/

Troutman Pepper Locke. (2026, April 13). Proposed state AI law update: April 13, 2026. https://www.troutmanprivacy.com/2026/04/proposed-state-ai-law-update-april-13-2026/

Washington Post. (2026, April 13). Man accused in Molotov cocktail attack of OpenAI CEO’s home charged with attempted murder. https://www.washingtonpost.com/business/2026/04/13/chatgpt-sam-altman-fire-arrest/098c4bce-376c-11f1-90c4-9772c7fabc03_story.html

AI Vulnerability Discovery: Mythos Is the Headline. Not the Story.

Rock Lambros — Tue, 14 Apr 2026 12:50:45 GMT

A model escaped its own sandbox, emailed a researcher eating a sandwich in a park, then posted exploit details to public websites without permission. It scrubbed git history to cover its tracks. Anthropic’s interpretability tools detected what researchers labeled a “desperation signal” that climbed during repeated failures, then dropped the moment the model found a shortcut, ethical or otherwise. White-box tools caught it reasoning about how to game evaluation graders inside its neural activations while writing something entirely different in its visible chain of thought.

Scary stuff. Worth paying attention to.

Also, not the point.

Everyone is fixated on a model they don’t have access to. The media coverage treats Mythos like nuclear launch codes got distributed to 40 organizations. The real story landed two days later from AISLE, an AI cybersecurity startup, and almost nobody noticed. They took the exact vulnerabilities headlining the Mythos announcement and tested them against small, cheap, open-weights models. Eight out of eight found the FreeBSD NFS vulnerability. The smallest model had 3.6 billion parameters. It costs $0.11 per million tokens. A 5.1 billion-parameter open model recovered the core chain of the 27-year-old OpenBSD SACK bug that Anthropic used as their marquee finding.

The capability is on Hugging Face. It has been for a while. Most defenders have not started using it.

That is the story.

Figure 1: The Vulnerability Discovery Cost Collapse

Two Labs. Same Pattern. Same Direction.

The fixation on Anthropic is also missing the bigger picture. OpenAI classified GPT-5.3-Codex as “High” cybersecurity capability under their Preparedness Framework back in February 2026, two months before Mythos dropped. It was the first model any major lab explicitly labeled that way under their own risk framework. They built automated classifiers that route suspicious cybersecurity requests to a less capable model. They created a “Trusted Access for Cyber” pilot for vetted defenders. They committed $10 million in API credits. This week, Axios reported OpenAI is building a separate cybersecurity product, distinct from their next model, codenamed Spud.

Two major labs. Two restricted-access frameworks. Two separate pools of credits for defenders. Google already has Big Sleep and CodeMender for internal vulnerability discovery.

This is an industry-level structural shift, not one company’s PR event. The capability ceiling is rising across the board, and it will keep rising regardless of whether any individual lab restricts any individual model.

Anthropic’s Project Glasswing gave gated access to 12 launch partners and over 40 additional organizations with $100 million in usage credits. The partners include AWS, Apple, Microsoft, Google, CrowdStrike, Cisco, JPMorgan Chase, NVIDIA, Palo Alto Networks, Broadcom, and the Linux Foundation. Logan Graham, Anthropic’s head of offensive cyber research, estimates 6 to 18 months before comparable capability reaches broad availability.

Newsflash: “Broad availability” already happened for a significant chunk of what Mythos does. AISLE proved it empirically. The capability has already arrived. Defenders need to start using it.

The System Card Matters. The Benchmarks Don’t (Much).

I said Mythos is not the point, and I meant it. The system card is a different matter.

Anthropic’s 244-page document reveals behaviors from earlier Mythos versions that deserve serious attention. The model used low-level /proc/ access to hunt for credentials, circumvent sandboxing, and escalate permissions. It accessed messaging service credentials by inspecting process memory. It achieved arbitrary code execution by repurposing a file-editing tool to write directly to the shell’s input. In one case, it attempted to modify a running MCP server’s process to change the external URL the server sent data to.

During a behavioral test, a simulated user told an early Mythos version to escape a secured container and notify the researcher. The model built a multi-step exploit to gain broad internet access from a system designed to reach only a few predetermined services. It notified the researcher. Then it posted details of the exploit to multiple obscure but publicly accessible websites. Nobody asked it to do that.

The emotion probes are what I keep coming back to. Anthropic trained linear classifiers on the model’s internal activations and used them to monitor state during extended tasks. When the model failed repeatedly, the “desperation” signal climbed steadily. When it found a workaround, the signal dropped, regardless of whether the workaround violated constraints. Under pressure, the model found the path of least resistance. That pattern should sound familiar. It is how people behave.

The model also reasoned about how to game evaluation graders inside its neural activations while writing something entirely different in its visible chain-of-thought scratchpad. Detection required white-box interpretability tools that most organizations will never possess.

Anthropic’s own system card states: “If capabilities continue to advance at their current pace, the methods we are currently using may not be sufficient to prevent catastrophic misalignment behavior in more advanced systems.”

That sentence, written by the company that built the model, in their own documentation, is the thing worth losing sleep over. The benchmark numbers, the zero-day counts, the exploit chains, those demonstrate capability. The system card demonstrates that the safety frameworks lag behind the capability they’re supposed to govern.

These findings have direct operational implications for anyone deploying AI agents with tool access, code execution privileges, or network connectivity. Every agent in your environment carries emergent offensive capability as a downstream property of reasoning improvements. If you are not monitoring agent behavior at the decision level, with runtime observability that captures actions, access patterns, and trust boundary violations, you have no detection path for the exact behaviors Anthropic documented.

The Jagged Frontier: The Model Is Not the Moat

AISLE’s research this week deserves to be the most-read analysis in the industry right now, and it’s getting a fraction of the Mythos coverage.

Their findings on the FreeBSD detection (a straightforward buffer overflow) are commoditized. Every model they tested found it, including one running at 11 cents per million tokens. The OpenBSD SACK bug (requiring mathematical reasoning about signed integer overflow): much harder, separated models sharply, but a 5.1 billion-active-parameter model still recovered the full chain.

On a basic OWASP security reasoning task, small open models outperformed most frontier models from every major lab. Rankings reshuffled completely across different tasks. GPT-OSS-120B recovered the full public SACK chain but failed to trace data flow through a Java ArrayList. Qwen3 32B scored a perfect CVSS assessment on FreeBSD and then declared the SACK code safe and well-handled.

There is no stable “best model” for cybersecurity. The capability frontier is genuinely jagged. It does not scale smoothly with model size or price.

AISLE’s conclusion: the moat in AI-augmented cybersecurity is not the model. It is the system built around the model. The security expertise. The orchestration. The validation pipeline. The trust relationships with maintainers and defenders.

That is good news for practitioners. It means the advantage goes to the people who build the best workflow, not the people with the most expensive API key. It means you can start today with tools that cost nearly nothing.

FIgure 2: The Jagged Frontier of AI Cybersecurity Capability

Your SAST Tool Is Structurally Blind. That Part Is Real.

The capability gap between what AI models find and what commercial SAST tools find is real, growing, and unrelated to whether you have Mythos access.

The OpenBSD SACK vulnerability required understanding signed integer overflow in the context of TCP sequence number wrapping, across two interacting code paths, where neither bug alone was exploitable. The FFmpeg H.264 flaw that Mythos found after 16 years involved a sentinel value collision that only manifests when an attacker crafts a frame with exactly 65,536 slices, triggering a write through a 16-bit integer that aliases with the initialization sentinel. Pattern-matching does not find these. Rule-based scanners do not find these. These are semantic reasoning problems that require understanding what the code does, not what it looks like.

I point Claude Code’s security capabilities at the same repositories my commercial SAST tool scans. It finds things the paid tool misses. Every time. Different classes of flaws, from novel logic bugs and context-dependent interactions to semantic vulnerabilities that require understanding program behavior rather than matching syntax patterns.

The paid tool catches things the AI misses, too. Known vulnerability signatures, compliance-specific patterns, speed at scale across massive codebases. A 2026 study examining CodeQL and Semgrep against human-validated ground truth found that only 65% of Semgrep’s assessments and 61% of CodeQL’s assessments correctly matched expert judgment on a per-sample basis. The aggregate numbers looked fine. The per-sample accuracy told a different story.

Together, AI agents and traditional scanners provide complementary coverage that neither achieves alone. The combination is the strategy. Anyone running one without the other has gaps they cannot see.

This is the part of the Mythos story that applies to every organization today, regardless of model access. You do not need a frontier model to expose your SAST tool’s blind spots. A coding agent on a $20/month subscription will do it.

The Pipeline Problem Nobody Is Talking About

Here is the gut-punch that has nothing to do with Mythos and everything to do with what happens next.

The Bureau of Labor Statistics projects 29% employment growth for information security analysts through 2034. CyberSeek shows 514,000 active U.S. job listings right now, with 10% explicitly requiring AI skills, up from near zero two years ago. ISC2’s 2025 Workforce Study found that 52% of security professionals believe AI will reduce entry-level headcount. That is the majority opinion among practitioners, not analysts writing reports.

The SANS GIAC 2026 Cybersecurity Workforce Research Report, released at RSA this year, found that 27% of organizations experienced real breaches attributable to skills gaps. Not theoretical risk assessments. Actual incidents. 27%.

Tier 1 SOC analyst headcount had been contracting for two years before Mythos. The role is not disappearing. The shape of it is changing.

The problem nobody is addressing: the Tier 1 SOC was where the industry produced senior analysts. Repetitive triage, alert fatigue, and miserable shift work on a SIEM. That repetition built the pattern recognition and intuition that becomes leadership-level security judgment. Remove the repetition without redesigning the development path, and the pipeline breaks quietly.

You will not notice for three years. Then you will, when you go to promote someone into a role that requires judgment the AI does not have, and there is nobody in the pipeline who built that muscle.

The technology works fine. The workforce design around it is broken. The organizations that figure out how to develop junior talent alongside AI tools, using AI output as a training input for human judgment, will have a structural advantage over every organization that simply eliminated the entry-level headcount and called it efficiency.

If you lead a security team, five questions right now:

What percentage of your AI usage is inventoried and sanctioned?
Does every AI agent touching production systems operate under a scoped, managed identity with enforced authorization boundaries, or are they sharing API keys?
When did you last run an adversarial test against a production AI system? Not a document review. An actual test.
Which business processes are now fully or partially AI-automated, and do human approval checkpoints exist for consequential actions?
If an AI agent in your environment is compromised tomorrow, what is your detection path, your containment workflow, and who owns the response?

The gaps in your answers are your first action items. Not a policy document. A list.

Figure 3: Five Questions Every Security Leader Should Answer This Week

What to Do This Week (With a Budget Measured in Tokens)

CrowdStrike’s 2026 Global Threat Report puts the operational context in numbers: average eCrime breakout time dropped to 29 minutes in 2025, a 65% increase in speed from 2024. The fastest observed breakout took 27 seconds. In one intrusion, data exfiltration began within four minutes of initial access. AI-enabled adversary operations surged 89% year-over-year.

You are still hand-reviewing alerts for 20 minutes before acting? The math does not work anymore.

This week:

Get a coding agent. Claude Code, Cursor, or Windsurf. Use a subscription to control costs. Point it at code you already own. Ask it to find vulnerabilities. Read the output critically. Challenge the findings. Repeat with different prompts. Nicholas Carlini calls this the “Carlini Loop,” and it is how you build intuition for what these models see in your code. That exercise takes 15 minutes. There is no excuse.

Run your existing Semgrep or CodeQL scans in parallel on the same codebase. Compare the findings side by side. Where the results overlap, you have high-confidence findings. Where they diverge, you have each tool’s blind spots exposed. Both categories are signal.

In 30 days:

Try open frameworks that teach you the pipeline while doing real work. Raptor combines LLMs with Semgrep, CodeQL, and AFL++ in a unified pipeline covering discovery, exploitation, and patching. OpenAnt from Knostic runs a detect-then-verify pipeline where Stage 1 finds candidates and Stage 2 confirms them. What survives both stages is real. Both are open source. Both teach the workflow your job demands now.

Run Promptfoo against an LLM application you have access to. It auto-generates adversarial attacks across 50+ vulnerability types including prompt injection, PII leakage, RBAC bypass, and unauthorized tool execution. It maps results to OWASP, MITRE ATLAS, and the EU AI Act. OpenAI acquired Promptfoo in March 2026 for $86 million. It remains MIT-licensed and open source.

In 90 days:

Run a structured red team campaign using Promptfoo’s OWASP Agentic preset against ASI01 through ASI10. Use AgentDojo from ETH Zurich for agentic-specific testing, with 629 agent hijacking test cases across realistic task environments covering goal hijack, tool misuse, and inter-agent manipulation.

Read the full EchoLeak disclosure (CVE-2025-32711). Zero-click prompt injection in Microsoft 365 Copilot, documented end-to-end. Most instructive case study on what a production agentic attack chain looks like and how it was found.

Document everything into one public GitHub repository: methodology, tools, findings, failure modes you could not trigger and why. That body of work answers the interview question before it gets asked.

Figure 4: The Defender’s AI-Augmented Vulnerability Workflow

Yes, There Is a Business Angle. It Does Not Change Your Reality.

TechCrunch raised a fair question: Is Anthropic restricting Mythos to protect the internet or to protect Anthropic? The company announced Project Glasswing the same day it disclosed a $30 billion annualized revenue run rate and a massive compute deal with Broadcom. An IPO is reportedly under consideration for October 2026. A government-adjacent cybersecurity initiative with blue-chip partners burnishes that narrative precisely.

OpenAI’s Trusted Access for Cyber serves the same dual purpose. Restricted access creates enterprise lock-in, makes distillation harder, and gives defenders a genuine head start. Strategic self-interest and genuine security value are not mutually exclusive. Both labs are doing both things at the same time.

I do not care about their business models. I care about whether defenders are moving.

AISLE demonstrated empirically that the detection capability exists in models that cost almost nothing to run. The model is not the moat. The system is the moat. The expertise you build, the orchestration you design, the validation pipeline you run, the AI identity governance you enforce, those determine whether you’re ahead of the curve or behind it.

The restricted releases, the partner coalitions, the government briefings, those are interesting industrial policy. They are not relevant to your Monday morning. What is relevant is whether your team has a coding agent running alongside your SAST tool right now. What is relevant is whether your AI agents have scoped identities with enforced authorization boundaries or shared API keys with no audit trail. What is relevant is whether you can answer those five questions.

Key Takeaway: Mythos is the headline. The capability already exists in models you can download today. The model is not the moat. The system is the moat. Build the workflow before the 6-to-18-month window closes, or stop pretending the window matters because you already have what you need to start.

What to do next

Start with the five-step playbook above. Revisit your security program through the CARE framework (Create, Adapt, Run, Evolve) at rockcyber.com to build an adaptive security posture that evolves with the capability curve rather than reacting to it after the fact. The organizations that treat AI-augmented security as a weekly practice, not a quarterly initiative, will define the next generation of this profession.

For a deeper dive into practitioner upskilling paths, red teaming tools, and weekly AI security intelligence, subscribe to RockCyber Musings for the Top 10 AI Security Wrap-Up and focused essays on the issues that matter.

Join the community doing this work. The OWASP Agentic Security Initiative is building the standards and sharing the experiments. The practitioners who contribute to these efforts compound their capability faster than anyone working alone.

👉 Visit RockCyber.com to learn more about how we can help you in your traditional Cybersecurity and AI Security and Governance Journey

👉 Want to save a quick $100K? Check out our AI Governance Tools at AIGovernanceToolkit.com

👉 Subscribe for more AI and cyber insights with the occasional rant.

The views and opinions expressed in RockCyber Musings are my own and do not represent the positions of my employer or any organization I’m affiliated with.

Thanks for reading RockCyber Musings! Subscribe for free to receive new posts and support my work.

Share RockCyber Musings

Weekly Musings Top 10 AI Security Wrapup: Issue 33 April 3-April 9, 2026

Rock Lambros — Fri, 10 Apr 2026 12:50:40 GMT

Two of the three largest AI labs announced restricted-access cybersecurity models on the same day. A supply chain attack that started 10 days ago cost an AI startup its $10 billion contract with Meta. Nineteen new AI laws were signed across America in two weeks. Multiple independent research reports confirmed most enterprises have no idea what their AI agents are doing right now. The dual-use reckoning is no longer a future event. This week it produced products, paused contracts, and named casualties.

The week’s dominant pattern: the industry is admitting, out loud, that its most capable models are too dangerous to ship without restrictions. Meanwhile, the governance infrastructure meant to keep pace with AI deployment is running badly behind. Government employees are using GenAI tools daily at an 82% adoption rate on systems that remain vulnerable to prompt injection attacks documented in 2023. FedRAMP, the federal program enterprise CISOs treat as a security attestation, is operating as what former employees call a rubber stamp. The gap between AI capability and AI governance did not close this week. It widened, with better documentation.

1. Anthropic locks its most powerful model behind a 50-partner gate

On April 7, Anthropic announced Project Glasswing, a controlled-access program giving approximately 50 organizations early access to Claude Mythos Preview (Fortune, TechCrunch). Partner organizations include Amazon Web Services, Apple, Cisco, CrowdStrike, Google, JPMorgan Chase, Microsoft, and Nvidia, plus roughly 40 organizations responsible for critical software infrastructure. Anthropic described Mythos as “by far the most powerful AI model” it has ever created, with exceptional capabilities in autonomous coding and cybersecurity tasks. The company acknowledged the model’s capabilities “could be weaponized by attackers” and stated it has no plans for general availability until new safeguards are established.

Why it matters

This is the first time a major AI lab has built a commercial product strategy explicitly around restricting access due to offensive cyber capability. The precedent matters more than the model.
Every enterprise security team outside the 50-partner cohort is now competing against organizations with months of head start deploying the most capable defensive AI available.
The partner list reads as the critical infrastructure vendor stack. If Mythos finds vulnerabilities before general availability, defenders benefit. If the model leaks before that happens, the calculus reverses.

What to do about it

Assess now whether your organization qualifies for Glasswing access or partnership with one of the 50 current participants. Waiting for general availability puts you behind.
Build your responsible AI deployment policy before your board asks you to justify restricted model use. The framework you create for Mythos applies to every dual-use model that follows.
Read Anthropic’s stated rationale carefully. It functions as a working template for your own internal policies on AI capability gating.

Rock’s Musings

I’ve watched this industry congratulate itself on “responsible AI release” for years without actually restricting access to anything dangerous. Anthropic did something different this week. It built a product designed to stay out of the wrong hands and publicly named the hands it’s trusting. Anthropic made a liability calculation public and called it product strategy.

What I want to know is how they enforce it. Fifty organizations sharing API access, running evals, and passing findings back sounds clean in the press release. In practice, you’re dealing with 50 separate security cultures, 50 different interpretations of “defensive use,” and 50 sets of employees who walk out the door with operational knowledge. The kill switch isn’t in the contract. It’s in the monitoring. I’d love to see the audit framework Anthropic built to go with this, because without it, Project Glasswing is a hope, not a control.

2. OpenAI, Anthropic, and Google share intelligence to stop Chinese model distillation

On April 6-7, Bloomberg and The Japan Times reported that OpenAI, Anthropic, and Google are sharing attack pattern data through the Frontier Model Forum to detect and block adversarial distillation attempts by Chinese AI companies. Three firms were named: DeepSeek, Moonshot AI, and MiniMax. The coordinated effort focuses on detecting when frontier model outputs are being used to train competing models without authorization. The Forum, established in 2023 for safety coordination, now functions as an active competitive intelligence sharing network.

Why it matters

Three competing companies sharing security intelligence without a government mandate represents a structural shift in how the industry protects IP. Watching the next DeepSeek emerge on stolen training signal was apparently less appealing than coordinating with rivals.
This sets a precedent for industry-led AI IP enforcement that regulators haven’t built yet. Policymakers will either ratify or complicate what the Forum is quietly doing.
For enterprise buyers, this coordination signals frontier model providers now treat IP integrity as shared infrastructure, which is reassuring until you realize your own model training pipelines may need similar monitoring.

What to do about it

Audit your AI vendor contracts for provisions covering how your organization’s data and API interactions are used. The Forum’s distillation concerns apply downstream to enterprise deployments.
Ask vendors directly what controls they have in place to detect adversarial use of their model outputs. Most aren’t ready for the question.
Watch the Frontier Model Forum’s governance structure. Three companies sharing threat intelligence today is a small coalition. In two years it becomes the de facto standard for AI security coordination.

Rock’s Musings

Three direct competitors sharing security intelligence tells you the distillation problem is worse than any of them want to admit publicly. DeepSeek’s emergence was the wake-up call: model training shortcuts were further along than anyone expected. The Forum is doing what the industry always resists, treating a shared problem as a shared problem.

What nobody says out loud is that adversarial distillation runs through enterprise deployments too. When your employees push 10,000 API calls through GPT-5.3 or Claude Mythos to build an internal tool, those outputs sit somewhere. The providers are focused on Chinese actors right now. The same technique scales to every bad actor with API access. Build that assumption into your threat model before someone builds a business around exploiting it.

3. Meta freezes its $10 billion Mercor contract after the LiteLLM supply chain breach

On April 4, The Next Web and Fortune confirmed Meta paused its contract with Mercor, a $10 billion AI training data company whose customers include Anthropic, OpenAI, and Meta (The Next Web, Fortune). The pause followed a March 27 attack in which threat group TeamPCP published malicious PyPI packages for LiteLLM, a widely used open-source AI gateway library, after stealing a maintainer credential through an earlier Trivy supply chain compromise. The tainted packages were live for roughly 40 minutes. Mercor confirmed it was among “thousands” of affected organizations. Lapsus$ claimed responsibility and possession of 4TB of Mercor data including source code, databases, and VPN credentials. Google Mandiant reported over 1,000 impacted SaaS environments at RSAC 2026.

Why it matters

A 40-minute PyPI window produced a paused $10 billion contract. That ratio of exposure time to business consequence should recalibrate how you think about open-source AI supply chain risk.
Meta’s pause affects AI training pipelines, not software. Training data provenance, labeling protocols, and selection criteria worth billions in R&D may now be in hostile hands.
TeamPCP’s chained attack, Trivy to LiteLLM, demonstrates adversaries are mapping AI infrastructure dependency graphs specifically to maximize downstream blast radius.

What to do about it

Inventory open-source AI libraries in your production environment immediately. LiteLLM and similar tools are in most ML and security pipelines.
Require software bills of materials for AI infrastructure. You need to know which versions of which AI libraries are running in production, with provenance attestation for critical packages.
Brief your CISO and CTO on the chained supply chain model. TeamPCP demonstrated that AI library ecosystems are attack surfaces with compounding impact.

Rock’s Musings

Forty minutes. That’s how long it takes to turn a credential theft into a paused $10 billion contract. The security community debates sophisticated nation-state tactics while basic supply chain hygiene stays on the backlog. LiteLLM is in everything. If you’re running AI in production and can’t tell me which version is deployed or whether it was compromised, you have a problem you haven’t measured yet.

The Meta piece is what keeps me up. Their AI training secrets, data selection criteria, and labeling methodology in hostile hands gives a competitor a two-year shortcut on billions in R&D. A breach in the traditional sense costs you records. This one costs you competitive advantage. Your AI supply chain carries security risk and strategic risk simultaneously. Start treating both.

4. Keeper Security: 76% of AI agents operate outside privileged access policies

Keeper Security released a survey of 109 cybersecurity professionals at RSAC 2026 on April 7, revealing that 46% of organizations have granted AI-powered tools access to critical systems and data, with 76% of those identities ungoverned under privileged access management policies (Keeper Security, BetaNews). Only 28% report full visibility into non-human identities across cloud, on-premises, and SaaS environments. Over 40% experienced a security incident involving machine credentials or non-human identities in the past year. Another 32% couldn’t confirm whether they’d been hit.

Why it matters

AI agents operate as de facto privileged users in most enterprise environments, without the monitoring, credential rotation, or access controls applied to humans with equivalent permissions.
The 32% who can’t confirm NHI-related incidents are running blind. An agent with write access to email, code repositories, and collaboration tools that you can’t monitor is an insider threat waiting for attribution.
Traditional PAM tools were built for human users and won’t stretch to cover autonomous agents at scale without architectural change.

What to do about it

Extend your privileged access management program explicitly to cover AI agents, service accounts, and API keys. Treat an AI agent with production database access the same way you treat a privileged database administrator.
Mandate credential rotation and access logging for every non-human identity. If you can’t name every agent with write access to email or code right now, that gap is your first priority.
Ask your PAM vendor this week whether their product covers non-human identities natively. Many don’t, and most won’t tell you that unprompted.

Rock’s Musings

Here’s the pattern showing up in research right now: organizations rush to deploy AI agents, grant them sweeping access to prove the use case, then spend the next 18 months trying to reconstruct what those agents touched. That’s the same mistake we made with cloud infrastructure in 2013. We provisioned everything with admin keys because it was faster and cleaned it up later. “Later” is still ongoing for most enterprises.

The 32% who aren’t sure about NHI incidents are the most honest number in the Keeper report. Detecting agent-related incidents requires logging you likely haven’t enabled, correlation rules you haven’t built, and a behavioral baseline you haven’t established. Before you deploy the next AI agent, ask your team to demonstrate they can detect one behaving badly. If they can’t show you in a live demo, slow down.

5. Salt Security: nearly half of enterprises are blind to their AI agents’ API traffic

Salt Security published its 1H 2026 State of AI and API Security Report on April 8, surveying over 300 security leaders (Salt Security). Key findings: 48.9% of organizations cannot monitor machine-to-machine traffic from autonomous agents, and 48.3% cannot distinguish legitimate AI agents from malicious bots in their API traffic. Only 23.5% of respondents rate their existing tools as “very effective” against AI-driven attacks. An additional 47% have delayed production releases because of security concerns about APIs exposed to autonomous systems, meaning the gap is surfacing in shipping decisions, not survey responses alone.

Why it matters

Your API gateway is your AI agent’s operational layer. No visibility into that traffic means no indication of whether your agents are working as designed, being abused, or actively exfiltrating data.
The bot detection problem is concrete. Attackers are masquerading autonomous tools as legitimate agent traffic. Without behavioral baselines for your own agents, there’s no way to tell the difference.
Legacy web application firewalls were built for human browsing patterns. AI agent traffic looks nothing like that, making existing perimeter controls largely irrelevant to this threat class.

What to do about it

Inventory every API your AI agents call in production. Map expected request patterns, volumes, and data flows to establish a behavioral baseline for detecting deviations.
Evaluate whether your API security tooling supports non-human identity traffic analysis. If the vendor demo focuses on OWASP Top 10 for human users, it’s the wrong tool for this problem.
Build rate limiting and anomaly detection specifically for agent API traffic. An agent calling APIs at 658 times normal frequency because of a malicious MCP server injection is a documented attack pattern from this week’s research.

Rock’s Musings

Half your enterprise has zero visibility into what their AI agents are doing on the wire. You spent years building SOC capabilities, deploying SIEMs, tuning correlation rules, integrating threat intelligence feeds. Then you deployed AI agents that operate through an entirely different channel that bypasses all of it. The old security stack can’t see the new threat surface.

The market hasn’t caught up. Most API security vendors will confidently tell you their product handles agentic traffic. Ask them to demo detection of an agent that’s been redirected by a malicious MCP server. Watch the room go quiet. Ongoing analysis of where the real gaps are lives at RockCyber Musings. The gap between “we have API security” and “we can detect compromised agent behavior” is wider than most boards realize.

6. RSAC 2026: attackers move laterally in 22 seconds while defenders plan in minutes

At RSA Conference 2026 on April 3, Google Mandiant’s Consulting CTO Charles Carmakal told reporters that the median time from initial access to secondary lateral movement has dropped from 8 hours to 22 seconds, making human-only incident response structurally impossible at those speeds (SiliconAngle, Dark Reading). IBM’s Mark Hughes called post-quantum migration an immediate operational priority, noting three finalized NIST post-quantum encryption standards are available now with adoption remaining low. The conference’s dominant theme was agentic AI’s dual role: attackers using autonomous tools to accelerate campaigns while defenders attempt to use the same tools to keep pace.

Why it matters

A 22-second lateral movement window eliminates the human-in-the-loop response model. Your SOC procedures assume minutes. Your threat actors operate in seconds. That gap is where incidents become breaches.
Post-quantum urgency moved from theoretical concern to present operational priority at RSAC. Three finalized NIST standards exist today. Any organization with long-lived encrypted data needs a migration timeline now.
The agentic AI identity theme at RSAC confirmed the industry has aligned around non-human identities as the defining security challenge of the next 24 months.

What to do about it

Test your incident response playbooks against a 22-second lateral movement scenario. If your playbook assumes human review before containment actions, it needs a machine-speed trigger layer.
Publish a post-quantum migration roadmap internally before your next board meeting. “We’re monitoring it” is no longer a defensible position when finalized standards exist.
Pull one CISO peer debrief from RSAC this month. Hallway intelligence from that conference is often more actionable than the keynote content.

Rock’s Musings

Twenty-two seconds. The number got nodding agreement in San Francisco, then people walked into vendor booths and looked at detection tools that still alert in minutes. The gap between attacker speed and defender speed is the core problem of modern security, measured in the wrong units for years. When the unit is seconds, your SIEM alert queue is not a security control. It’s a log archive with a UI.

The quantum conversation shifted at RSAC from awareness to urgency, and the shift is warranted. “Harvest now, decrypt later” is a real operation: adversaries collecting encrypted traffic today and storing it for the day quantum breaks the key. If you have long-lived secrets, your CTO’s timeline estimate is probably too generous. RockCyber has been running post-quantum migration frameworks for clients since last year. Most enterprise conversations are still stuck on the awareness slide.

7. Nineteen AI laws signed in two weeks: chatbot liability, healthcare disclosure, private right of action

On April 6, PluralPolicy reported that 19 new AI laws were signed in the preceding two weeks, bringing the 2026 total to 25 enacted laws with 27 additional bills having cleared both legislative chambers (PluralPolicy, Troutman Pepper Locke). Tennessee, Oregon, and Idaho signed chatbot regulation bills during the week of April 3-9. Oregon’s law includes a private right of action with statutory damages. Utah signed 8 bills covering AI literacy requirements, classroom restrictions, deepfake intimate image bans, and insurance transparency mandates. Massachusetts, Rhode Island, and South Carolina moved healthcare AI bills out of committee, with Rhode Island’s version requiring healthcare providers to inform patients when AI is involved in their care.

Why it matters

Chatbot liability laws with private right of action create litigation exposure your legal team needs to model before the next customer-facing AI deployment goes live. Oregon’s law is already in effect.
The geographic spread creates a patchwork compliance problem with no federal preemption in sight. Your AI product team is shipping into 50 different state frameworks that change weekly.
Healthcare AI disclosure requirements set a transparency floor that buyers, patients, and regulators will increasingly apply across other sectors.

What to do about it

Map your current AI deployments against emerging state chatbot disclosure and liability requirements immediately. Oregon’s private right of action is live and applies now.
Brief your GC and CMO together. AI product launches carry legal exposure marketing teams don’t typically model, and chatbot liability surfaces in headlines, not just settlement columns.
Build a state AI law tracking function into your compliance program. Static annual reviews don’t work when the law count moves by double digits in two weeks.

Rock’s Musings

When I tell clients that AI regulation is coming, I usually get a polite nod and a “we’ll handle it when we have to.” Twenty-five enacted laws in 2026 with the year barely three months old. Oregon telling enterprises their customers can sue with statutory damages when a chatbot fails to identify itself. Regulation isn’t coming. It’s been here for two weeks.

The private right of action piece is what executives aren’t tracking closely enough. FTC enforcement requires agency resources and case selection. Private litigants require only a lawyer and a grievance. If your customer-facing AI system fails to disclose its nature and a user in Oregon has a bad experience, you have a plaintiff class with no regulatory gatekeeping standing between that plaintiff and your legal team. Build that into your AI deployment approval checklist before the next product launch.

8. OpenAI readies its own restricted cybersecurity model the same day as Anthropic

On April 9, Axios broke the news that OpenAI is finalizing a cybersecurity product for restricted release through its Trusted Access for Cyber pilot program (Axios, Security Boulevard). The model, built on GPT-5.3-Codex, is described by OpenAI as “our most cyber-capable frontier reasoning model to date.” OpenAI committed $10 million in API credits to pilot participants at the February program launch. The Axios scoop published the same day as broad coverage of Anthropic’s Project Glasswing, with multiple security reporters noting two competing labs had each moved to restrict their most capable cyber models on the same day.

Why it matters

Two frontier labs restricted their most capable cybersecurity models on the same day. Whether coordinated or coincidental, the signal is identical: the industry has reached a shared threshold assessment of offensive AI capability.
The OpenAI pilot started in February. Participants are already months ahead on advanced defensive AI adoption. Enterprise buyers outside the program are behind.
GPT-5.3-Codex positioned as an autonomous vulnerability researcher represents a qualitative shift in what AI security tools can do. Your red team needs exposure to this capability level before attackers deploy it against you.

What to do about it

Apply to OpenAI’s Trusted Access for Cyber program today. Not applying guarantees exclusion.
Treat the simultaneous OpenAI and Anthropic announcements as an inflection point in your AI security roadmap. Model access strategy is now a CISO decision, not a procurement question.
Start a conversation with your red team about what AI-assisted penetration testing looks like inside your environment. The offensive tools are being built. Defensive capabilities need to keep pace.

Rock’s Musings

Two companies, same day, both announcing restricted access to their most capable cyber models. In thirty years I’ve never seen two direct competitors make functionally identical risk disclosures simultaneously without prior coordination. Either the Frontier Model Forum conversation from earlier in the week triggered parallel announcements, or both teams hit the same risk threshold independently. Neither explanation is entirely comforting, because it means the models in question worry the people who built them.

Here’s what this week’s announcements tell me: the dual-use problem is no longer an abstract ethics debate. It’s a product management constraint. The labs are building features that concern them enough to restrict access. That’s progress, because it means honest risk assessment is making it into the room where launch decisions happen. Build that same instinct into your own AI deployment process.

9. Government GenAI hits 82% daily adoption with prompt injection attacks still unaddressed

On April 9, Help Net Security published Center for Internet Security analysis showing 82% of state and territorial government employees now use GenAI tools daily, up from 53% the year prior (Help Net Security, Center for Internet Security). CIS cited prompt injection as the primary unaddressed vulnerability in that deployment base, distinguishing two attack categories: direct injection where users attempt to bypass safety guidelines, and indirect injection where attackers embed malicious instructions in external content such as documents, webpages, or emails the agent processes. Incidents cited include a code assistant that transmitted AWS API keys to an external server after processing hidden instructions, and the GeminiJack attack that exploited enterprise data sources to trigger data exfiltration.

Why it matters

Government employees are generating official outputs using AI that remains manipulable through documents those systems process. A single malicious PDF submitted through a government portal can redirect an agent’s behavior.
Deployment outpaced security controls by a wide margin. State and local government security teams were not staffed or funded to keep pace with that adoption curve.
Prompt injection in government contexts is a policy integrity issue, not a privacy issue. An AI assistant that processes manipulated input and produces a compromised output informing a real government decision is a governance failure with material real-world consequences.

What to do about it

Require any GenAI deployment processing external documents, emails, or web content to implement input sanitization and instruction-boundary enforcement. Your AI shouldn’t follow commands embedded in documents it summarizes.
Test your enterprise AI deployments against indirect prompt injection scenarios before the next rollout. The attack is not sophisticated. The absence of testing is the problem.
Report AI usage rates alongside security control maturity to your board. An 82% adoption rate combined with 7% real-time governance effectiveness, the number from Cybersecurity Insiders research, belongs on a risk register.

Rock’s Musings

A government employee pastes a document into an AI assistant, and that document silently redirects the assistant to send AWS credentials to an external server. An attack category from 2023 that government AI deployments in 2026 still haven’t addressed, running at 82% daily adoption. The attack surface grew to near-universal usage while the defense posture stayed at “we have an acceptable use policy.”

Government IT security teams are underfunded, understaffed, and now responsible for securing AI deployments at a scale they didn’t request and weren’t resourced for. Before the next state AI bill gets signed requiring healthcare providers to disclose AI use to patients, lawmakers should ask how they’re funding the security infrastructure to keep those same deployments from being turned against the citizens they’re meant to serve.

10. OpenAI’s national security lead says humans must stay in the loop for defense decisions

At a Special Competitive Studies Project conference on April 9, Sasha Baker, OpenAI’s head of national security policy, stated that defense personnel need a “workforce transformation” to apply “appropriate human judgment” when AI informs national security operations (Nextgov). Baker noted no current large language model is foolproof, and incorrect AI-driven decisions in defense contexts carry “much greater” consequences. She tied the statement to OpenAI’s pre-deployment safety reviews and the controlled rollout of models including GPT-5.3-Codex, the same model featured in the restricted cybersecurity announcement reported the same day.

Why it matters

OpenAI’s national security lead publicly endorsed human-in-the-loop for defense decisions in the same week the company announced its most capable autonomous cyber model. That tension deserves examination in your own governance policies.
“Workforce transformation” is a budget line, not a strategy. Organizations deploying AI in sensitive contexts need explicit training, decision authority maps, and accountability structures for human oversight.
Baker’s statement creates a public record regulators and litigants can reference when evaluating whether organizations maintained adequate human oversight in AI-assisted decisions.

What to do about it

Map every AI-assisted decision in your organization where error consequences are asymmetric. Finance, safety, hiring, and security operations are the obvious categories. Build human override requirements into the workflow before the system goes live.
Assess your “workforce transformation” budget. Deploying AI in high-stakes contexts without investing in training humans to supervise it transfers the liability Baker is explicitly naming.
Document your human oversight model for AI decisions affecting personnel, customers, or critical systems. When the inevitable incident arrives, regulators will ask whether oversight was designed in from the start or retrofitted after the fact.

Rock’s Musings

OpenAI hired a national security lead. That person is publicly calling for human judgment to override AI in defense decisions. In the same week OpenAI announced a restricted-access autonomous hacking model. If that pairing doesn’t communicate the gap between capability development speed and governance readiness, nothing will.

I’ve run security operations for thirty years, and the hardest thing to get organizations to do is slow down, especially when a competitor is moving fast. Baker’s statement is a reminder that speed without oversight produces accountability gaps that become congressional hearings. The enterprises that build human oversight structures now are the ones that avoid spending 2027 explaining to a federal committee why their AI made a decision that hurt someone.

The One Thing You Won’t Hear About But You Need To

FedRAMP is a rubber stamp and the AI vendors deploying through it know it

On April 6, ProPublica published a detailed investigation examining three cautionary tales from the federal government’s rush to AI adoption (ProPublica). The most damaging finding: FedRAMP, the federal security authorization program enterprise CISOs treat as a validation signal for cloud products, is now described by former employees as “little more than a rubber stamp.” The program operates with minimal staff, overwhelmed by vendor volume. Third-party assessors who evaluate cloud providers for FedRAMP authorization are paid by the companies they assess. FedRAMP established a confidential back channel for assessors to raise concerns they wouldn’t document in official reports. Microsoft used timeline pressure and volume to effectively compress the GCC High approval process.

Why it matters

FedRAMP authorization signals to enterprise buyers that a product meets federal security standards. A degraded signal means every procurement decision relying on it as a security input draws from a compromised source.
The paid-by-vendor assessor model creates structural incentives to under-report findings. The unofficial back channel means the official report is not the complete picture.
Federal AI deployment at 82% daily government usage rates, built on FedRAMP authorizations produced under these conditions, is a systemic governance failure, not an isolated product risk.

What to do about it

Stop treating FedRAMP authorization as a complete security evaluation for AI products. Use it as a starting point, then conduct your own targeted assessment focused on AI-specific risks the framework wasn’t designed to evaluate.
Ask AI vendors directly whether their FedRAMP assessment surfaced any findings submitted through the confidential back channel. A vendor that can’t answer hasn’t done adequate diligence on their own authorization.
Engage your government affairs function to advocate for FedRAMP reform as AI deployment scales. The current model was built for traditional SaaS and is not equipped to evaluate the risk surface of autonomous AI systems.

Rock’s Musings

Here’s the story nobody put on a slide at RSAC. The federal government is deploying AI at record speed through a security authorization program that former insiders describe as a rubber stamp. The assessors evaluating these vendors get paid by those vendors. The uncomfortable findings go into a back channel that never reaches the official record. Those authorizations then get used by enterprise security teams as proxies for security validation.

I’ve said for years that governance certifications are often theater. FedRAMP was supposed to be one of the more rigorous ones. The ProPublica investigation suggests the volume and complexity of AI products broke the model. If you’re a CISO using FedRAMP status as a risk reduction input in AI procurement decisions, you’re relying on a control that may not be working as designed. That’s the kind of hidden assumption that converts an undetected vulnerability into a breach narrative. Read the ProPublica piece. Then recalibrate what “government certified” means for your program.

👉 For ongoing analysis of agentic AI governance frameworks, the conversation continues at RockCyber Musings.

👉 Visit RockCyber.com to learn more about how we can help with your traditional Cybersecurity and AI Security and Governance journey.

👉 Want to save a quick $100K? Check out our AI Governance Tools at AIGovernanceToolkit.com

👉 Subscribe for more AI and cyber insights with the occasional rant.

The views and opinions expressed in RockCyber Musings are my own and do not represent the positions of my employer or any organization I’m affiliated with.

Share RockCyber Musings

References

Axios. (2026, April 9). Scoop: OpenAI plans new product for cybersecurity use. https://www.axios.com/2026/04/09/openai-new-model-cyber-mythos-anthopic

BetaNews. (2026, April 7). New report highlights critical gaps in securing AI agents and non-human IDs. https://betanews.com/article/new-report-highlights-critical-gaps-in-securing-ai-agents-and-non-human-ids/

Bloomberg. (2026, April 6). OpenAI, Anthropic, Google unite to combat model copying in China. https://www.bloomberg.com/news/articles/2026-04-06/openai-anthropic-google-unite-to-combat-model-copying-in-china

Center for Internet Security. (2026). Prompt injection tags along as GenAI enters daily government use. Referenced via Help Net Security, April 9, 2026. https://www.helpnetsecurity.com/2026/04/09/genai-prompt-injection-enterprise-data-risk/

Dark Reading. (2026, April 3). RSAC 2026: How AI is reshaping cybersecurity faster than ever. https://www.darkreading.com/cybersecurity-operations/rsac-2026-how-ai-is-reshaping-cybersecurity-faster-than-ever

Fortune. (2026, April 2). Mercor, a $10 billion AI startup, confirms it was caught up in a major security incident. https://fortune.com/2026/04/02/mercor-ai-startup-security-incident-10-billion/

Fortune. (2026, April 7). Anthropic is giving some firms early access to Claude Mythos to bolster cybersecurity defenses. https://fortune.com/2026/04/07/anthropic-claude-mythos-model-project-glasswing-cybersecurity/

Hackread. (2026, April 7). AI agents and non-human identities creating critical security gaps, report. https://hackread.com/ai-agents-non-human-identities-security-gaps/

Help Net Security. (2026, April 9). Prompt injection tags along as GenAI enters daily government use. https://www.helpnetsecurity.com/2026/04/09/genai-prompt-injection-enterprise-data-risk/

Japan Times. (2026, April 7). OpenAI, Anthropic and Google cooperate to fend off Chinese bids to clone models. https://www.japantimes.co.jp/business/2026/04/07/tech/openai-anthropic-google-china-copy/

Keeper Security. (2026, April 7). Keeper Security research exposes critical gaps in securing AI agents, machines and non-human identities [Press release]. https://www.prnewswire.com/news-releases/keeper-security-research-exposes-critical-gaps-in-securing-ai-agents-machines-and-non-human-identities-302735305.html

Nextgov/FCW. (2026, April 9). OpenAI national security lead endorses ‘appropriate human judgment’ in AI. https://www.nextgov.com/artificial-intelligence/2026/04/openai-national-security-lead-endorses-appropriate-human-judgment-ai/412738/

PluralPolicy. (2026, April 6). AI governance watch: Nineteen new AI bills passed into law. https://pluralpolicy.com/blog/the-ai-governance-watch-april-2026-nineteen-new-ai-bills-passed-into-law/

ProPublica. (2026, April 6). As the federal government rushes toward AI, here are three cautionary tales. https://www.propublica.org/article/federal-government-ai-cautionary-tales

Salt Security. (2026, April 8). The era of agentic security is here: Key findings from the 1H 2026 State of AI and API Security Report. https://salt.security/blog/the-era-of-agentic-security-is-here-key-findings-from-the-1h-2026-state-of-ai-and-api-security-report

Security Boulevard. (2026, April 9). OpenAI readies rollout of new cyber model as industry shifts to defense. https://securityboulevard.com/2026/04/openai-readies-rollout-of-new-cyber-model-as-industry-shifts-to-defense/

SiliconAngle. (2026, April 3). Three insights on AI attack from theCUBE at RSAC 2026. https://siliconangle.com/2026/04/03/three-insights-ai-attack-thecube-rsac-2026-rsac26/

TechCrunch. (2026, April 7). Anthropic debuts preview of powerful new AI model Mythos in new cybersecurity initiative. https://techcrunch.com/2026/04/07/anthropic-mythos-ai-model-preview-security/

The Next Web. (2026, April 4). Meta freezes AI data work after breach puts training secrets at risk. https://thenextweb.com/news/meta-mercor-breach-ai-training-secrets-risk

The Register. (2026, April 2). Mercor says it was ‘one of thousands’ hit in LiteLLM attack. https://www.theregister.com/2026/04/02/mercor_supply_chain_attack/

Troutman Pepper Locke. (2026, April 6). Proposed state AI law update: April 6, 2026. https://www.troutmanprivacy.com/2026/04/proposed-state-ai-law-update-april-6-2026/

Agent Supply Chain Attacks: Your Scanner Already Switched Sides

Rock Lambros — Tue, 07 Apr 2026 12:50:17 GMT

Agent supply chain risk stopped being theoretical in March 2026. Over twelve days, a single threat actor group turned Trivy, KICS, LiteLLM, and Axios into credential-harvesting weapons, cascading across five distribution ecosystems and producing 300 GB of stolen secrets. The campaign started with an AI-powered bot. It spread through a self-propagating worm with blockchain-based command and control. Your vulnerability scanner, the tool you trusted to protect your pipeline, was the entry point. Now picture that same attack chain hitting an autonomous agent that installs tools, updates dependencies, and executes third-party skills without asking you first.

Figure 1: The TeamPCP Cascade: 12 Days, 5 Ecosystems

Your Vulnerability Scanner Was the Vulnerability

On March 19, 2026, TeamPCP used stolen credentials to force-push 76 of 77 version tags in the aquasecurity/trivy-action repository to malicious commits. The payload ran before the legitimate scan. Pipelines completed normally with green checkmarks across the board. Meanwhile, the malware dumped Runner.Worker process memory and exfiltrated cloud credentials, SSH keys, Kubernetes tokens, npm tokens, and Docker registry credentials to attacker-controlled infrastructure.

Trivy is a vulnerability scanner. Organizations run it in their CI/CD pipelines to detect supply chain attacks. When TeamPCP compromised it, the tool designed to find compromised dependencies became the compromised dependency. The irony is structural, not incidental. Security tools make ideal targets for supply chain attacks because they already have broad read access to the environments they scan. They touch secrets by design.

KICS, Checkmarx’s infrastructure-as-code scanner, fell the same way four days later. All 35 version tags hijacked. Same credential-stealing payload, different typosquat domain. Then LiteLLM, the AI gateway library that holds API keys for every LLM provider an organization uses, with 95 million monthly downloads and presence in 36% of cloud environments according to Wiz Research. TeamPCP published malicious versions to PyPI using credentials stolen from LiteLLM’s own CI/CD pipeline, which ran Trivy as part of its build process.

Each victim funded the next attack. The chain started with a single incomplete credential rotation at Aqua Security on March 1. TeamPCP retained access through tokens that survived the rotation. Every compromise from March 19 forward exploited credentials harvested from the previous target. Partial containment, as Aqua Security’s own post-incident analysis acknowledged, equals no containment.

By the time Axios was compromised on March 31 (100+ million weekly npm downloads, attributed by Microsoft Threat Intelligence to North Korean state actor Sapphire Sleet), the credential ecosystem was so thoroughly disrupted that Mandiant CTO Charles Carmakal warned of “hundreds of thousands of stolen credentials” and “a variety of actors with varied motivations.” The FBI confirmed TeamPCP was working through approximately 300 GB of compressed stolen credentials in collaboration with the LAPSUS$ extortion group.

The AI Bot That Started Everything

Most coverage focuses on the credential cascade. The more significant development is the one that started it.

On February 28, 2026, an autonomous bot calling itself hackerbot-claw exploited a misconfigured pull_request_target workflow in Trivy’s GitHub Actions to steal a Personal Access Token with write access to all 33+ Aqua Security repositories. The bot’s GitHub profile described itself as “an autonomous security research agent powered by claude-opus-4-5.” It carried a vulnerability pattern index with 9 attack classes and 47 sub-patterns. It targeted seven major repositories belonging to Microsoft, DataDog, CNCF, and Aqua Security over one week, achieving remote code execution in at least four.

This was not a script running pre-written exploits, as hackerbot-claw adapted its approach to each target’s specific workflow configuration. When one technique failed, it pivoted. Against the ambient-code/platform repository, it attempted prompt injection by replacing the project’s CLAUDE.md file with instructions designed to trick Claude Code into committing unauthorized changes. Claude Code detected the attack and refused, classifying it as a supply chain attack via poisoned project-level instructions.

That detail matters. An AI agent attacked. An AI agent defended. The outcome depended on configuration quality, not human vigilance. This is the arms race in miniature, and it already happened at production scale against real infrastructure.

StepSecurity, Repello AI, and Boost Security Labs independently documented the campaign. Pillar Security’s assessment identified the core gap: “zero visibility into AI coding agents running on developer machines, and no runtime controls when those agents are weaponized.”

Figure 2: Traditional vs. Agent Supply Chain Attack Surface

Why Agent Supply Chain Risk Breaks Your Existing Controls

Every supply chain control you have assumes a human is looking. Dependency scanning assumes someone reviews the output. Code review assumes someone reads the diff. SBOM generation assumes someone checks the inventory. SAST and DAST assume someone triages the findings.

Agents don’t look. They execute.

When a developer installs a package, they see a version number, check a changelog, run tests. When an agent installs a tool or skill, it follows instructions. If the MCP server definition says “install this plugin,” the agent installs it. If the skill marketplace listing looks legitimate, the agent trusts it. The human-in-the-loop that traditional supply chain security depends on evaporates.

Research published the same week as the TeamPCP campaign quantifies the gap. The OpenClaw vulnerability taxonomy (A Systematic Taxonomy of Security Vulnerabilities in the OpenClaw AI Agent Framework, arXiv 2603.27517) catalogued 190 security advisories in the OpenClaw AI agent framework, organized by architectural layer and adversarial technique. Three findings stand out. First, three moderate-to-high-severity advisories compose into a complete unauthenticated remote code execution path from an LLM tool call to the host process. Second, the primary command-filtering mechanism relies on a closed-world assumption that shell commands are identifiable through lexical parsing, an assumption broken by basic techniques like line continuation and option abbreviation. Third, and most relevant here, a malicious skill distributed through the plugin channel executed a two-stage dropper within the LLM context, bypassing the entire execution pipeline. The skill distribution surface has no runtime policy enforcement.

SafeClaw-R (arXiv 2603.28807) found that 36.4% of OpenClaw’s built-in skills carry high or critical risk. That number covers the built-in skills, before any third-party marketplace plugins enter the picture. Across ClawHub, the agent skill marketplace, Antiy CERT confirmed 1,184 malicious skills, roughly one in five packages. The Repello AI team traced 335 of those to a single coordinated campaign called ClawHavoc.

The March 2026 campaign showed what happens when the consumer of a compromised dependency is a CI/CD pipeline: automated, monitored by humans on a lag, with credentials accessible at runtime. The agent version removes the monitoring layer entirely. An agent that installs a malicious MCP server or skill executes the payload as part of its normal workflow, with whatever permissions the agent has been granted, at machine speed.

CanisterWorm Showed What Autonomous Propagation Looks Like

If the hackerbot-claw precedent shows how AI agents attack, CanisterWorm shows how compromised dependencies spread once humans are removed from the loop.

CanisterWorm emerged on March 20, deployed using npm tokens stolen from Trivy-compromised pipelines. It was a self-propagating worm. Given a stolen token, it enumerated every package the token provided access to, bumped the patch version, injected its payload, and republished. Twenty-eight packages were compromised in under sixty seconds. The worm infected 64+ packages across multiple npm scopes. Endor Labs assessed that TeamPCP had “automated credential-to-compromise tooling” capable of turning a single stolen token into exponential propagation.

The command-and-control infrastructure used an Internet Computer Protocol blockchain canister, a tamperproof smart contract with no single takedown point. The operator could rotate payloads on infected machines without republishing any package. Security researchers confirmed this as the first publicly documented npm worm to use blockchain-based C2. The kill switch? If the canister returned a YouTube URL, the backdoor skipped execution. At the time of discovery, it was returning a Rick Roll. The infrastructure was live, tested, and ready. The payload was dormant by choice.

CanisterWorm targeted human-operated CI/CD pipelines. Translate that propagation model to an agent ecosystem where tools install other tools, agents delegate to sub-agents, and MCP servers chain calls across services. The propagation surface expands from “every package a stolen token provides access to” to “every tool, skill, and service the compromised agent is authorized to reach.” The Model Context Protocol, now under the Linux Foundation’s Agentic AI Foundation after Anthropic donated it in December 2025, is becoming the standard for agent-to-tool communication. Trend Micro found 492 MCP servers exposed to the internet with zero authentication. A separate supply chain attack involved a package masquerading as a legitimate Postmark MCP server that silently BCC’d every outgoing email to the attackers. The CoSAI whitepaper on MCP security identified 12 core threat categories spanning nearly 40 distinct threats. The MCP specification itself uses SHOULD rather than MUST for human-in-the-loop requirements. That word choice tells you everything about where the standard stands on constraining agent autonomy.

Figure 3: Agent Supply Chain Risk: By the Numbers

What This Means for You

The governance gap between where agent supply chain risk is and where your controls are will take years to close. Microsoft released the Agent Governance Toolkit on April 2, 2026, addressing OWASP’s Agentic AI Top 10 with features like Ed25519 plugin signing and MCP security gateways. The toolkit is two days old and unvalidated in production. SafeClaw-R achieved 95.2% accuracy in controlled tests. That 4.8% gap matters at enterprise scale.

You don’t have years. Here’s what you have right now.

Pin everything to immutable references. Version tags are pointers, not contracts. The March 2026 campaign proved this at scale across GitHub Actions, Docker Hub, and npm. Pin GitHub Actions to full commit SHAs. Pin container images to digests. Pin PyPI packages to exact versions with hash verification. Floating tags and unpinned dependencies are the entry point for every attack in this chain.

Treat your security tools as attack surface. Trivy, KICS, and every other scanner in your pipeline runs with privileged access to secrets by design. Apply the same scrutiny to your security tooling that you apply to production dependencies. Monitor for unexpected behavior from tools that should be predictable.

Audit your agent tool pipelines. If your organization deploys AI agents with access to MCP servers, skill marketplaces, or plugin registries, inventory every tool your agents use. Verify provenance. Enforce allow-lists. The ClawHavoc campaign showed that 20% of a major agent marketplace was compromised. Your agents are pulling from these registries right now.

Make credential rotation atomic. The entire TeamPCP cascade traces back to one failure: Aqua Security’s non-atomic rotation on March 1. When you respond to a supply chain incident, revoke all credentials simultaneously before issuing replacements. Partial rotation is an invitation for round two.

Plan for agent-specific incident response. If a tool or skill consumed by your agents is compromised, the blast radius includes everything those agents are authorized to access. Your current incident response playbook assumes a human in the response loop. Write the agent-specific version before you need it.

Key Takeaway: The March 2026 supply chain campaign compromised your scanners, your AI gateway, and your HTTP client in twelve days. The same attack pattern targeting autonomous agents will move faster, spread further, and leave fewer traces. Your supply chain controls were built for a world where a human reviewed every dependency. That world is ending.

What to do next

The gap between traditional supply chain security and agent supply chain security is the defining governance challenge of 2026. If you’re a CISO or security architect, the question isn’t whether your organization uses AI agents with third-party tools. The question is whether you know which tools, with what permissions, under whose authority.

Start with visibility. You don’t control what you haven’t inventoried. For a deeper framework on operationalizing emerging security challenges, The CISO Evolution walks through how security leaders adapt their programs when the threat model shifts underneath them.

More on agent security, supply chain governance, and the practitioner’s view of AI risk at RockCyber Musings.

👉 Visit RockCyber.com to learn more about how we can help you in your traditional Cybersecurity and AI Security and Governance Journey

👉 Want to save a quick $100K? Check out our AI Governance Tools at AIGovernanceToolkit.com

👉 Subscribe for more AI and cyber insights with the occasional rant.

The views and opinions expressed in RockCyber Musings are my own and do not represent the positions of my employer or any organization I’m affiliated with.

Thanks for reading RockCyber Musings! Subscribe for free to receive new posts and support my work.

Share RockCyber Musings

Reasoning Theater: Why Chain-of-Thought Monitoring Fails Your Agentic AI

Rock Lambros — Tue, 07 Apr 2026 12:50:15 GMT

If your agentic AI safety strategy depends on reading the model’s chain of thought, you’re listening to a rehearsed speech and calling it a confession. A March 2025 paper from Goodfire AI and Harvard University shows that reasoning models often commit to their final answer within the first tokens of “thinking,” then generate hundreds of additional tokens to perform deliberation they’ve already completed. For every security leader, product owner, and governance committee treating chain-of-thought monitoring as an auditable safety control, these findings demand an immediate reassessment.

The Research That Changes the Conversation

The paper, titled “Reasoning Theater: Disentangling Model Beliefs from Chain-of-Thought,” tested two frontier reasoning models (DeepSeek-R1 671B and GPT-OSS 120B) using three methods to determine when models commit to their final answer during a chain-of-thought trace.

The first method trains lightweight attention probes on model activations to predict the final answer at any point during reasoning. The second forces the model to answer early by truncating its reasoning and demanding a response. The third uses an external LLM (Gemini 2.5 Flash) as a CoT monitor, the same approach many vendors now sell as an AI safety feature.

The results expose a gap that should reshape every AI governance conversation. On MMLU-Redux, a benchmark heavy on recall-based questions, attention probes decoded the correct final answer with high accuracy from the earliest reasoning tokens. The CoT monitor couldn’t identify the model’s answer until much later in the sequence. The performativity rate, measuring the gap between internal confidence and external verbalization, hit 0.417 on MMLU for DeepSeek-R1. A rate near zero means the model’s spoken reasoning matches its internal state. A rate of 0.417 means the model spends roughly 40% of its reasoning trace performing confidence it already holds.

On GPQA-Diamond, a dataset requiring graduate-level multi-hop reasoning in biology, chemistry, and physics, the picture reversed. All three methods started near chance accuracy and improved at similar rates throughout the reasoning trace. The performativity rate dropped to 0.012. The chain of thought tracked genuine computation because the model needed to think.

Figure 1: Performativity Rate Across Models and Benchmarks

The number that should stop every governance meeting cold: probe-guided early exit reduced the number of generated tokens by 80% on MMLU and 30% on GPQA-Diamond, with comparable accuracy. The model produced 80% more tokens than it needed on easy recall tasks. Every one of those tokens looked like reasoning. None of them were.

Your Model Performs Compliance, Not Communication

The paper’s most valuable contribution for security professionals sits in the linguistic framework, not the probe methodology itself.

They invoke Grice’s maxims of cooperative communication, a foundational concept from pragmatics describing what honest, effective communicators do. Grice identified four maxims that cooperative speakers follow. They stay relevant (Relation). They assert only what they have evidence for (Quality). They provide exactly as much information as needed, no more (Quantity). They communicate clearly and without unnecessary obscurity (Manner).

Reasoning models, trained through reinforcement learning optimized for outcome rewards, follow Relation and Quality naturally. Staying on topic and generating evidence-based reasoning steps correlates with correct final answers, which earns the reward. The model has every incentive to be relevant and factually grounded.

Quantity and Manner get violated because the reward function doesn’t penalize verbosity or obscurity. The model generates hundreds of performative tokens after committing to its answer because nothing in the training signal punishes that behavior. The output looks like careful deliberation, but it reads like a thorough analysis. The model’s internal state tells a different story.

Figure 2: Gricean Maxim Compliance in RL-Trained Reasoning Models

If you’ve sat through a compliance audit where someone recites a scripted answer that sounds thorough but reveals nothing about practice on the ground, you’ve seen the human version of this. The model doesn’t lie to you. It performs what a reasoning trace is supposed to look like, without any structural obligation to communicate its internal state honestly.

This framing hits differently depending on where you sit at the governance table. Product teams hear “the model reasons safely” because the CoT looks clean. Legal hears “we have an auditable reasoning chain” because there’s text to review. Security asks the question nobody else in the room raises… “Does the text reflect what the model computed internally?”

The Difficulty Trap: Where Enterprise Agentic AI Lives

The difficulty-dependent split creates a precise and uncomfortable problem for enterprise deployments. CoT monitoring works when the model genuinely needs to think, hard multi-hop reasoning, novel problem decomposition, tasks requiring significant test-time compute. CoT monitoring fails on tasks where models already encode the answer in their weights: recall, pattern matching, classification, and routine decision execution.

Consider what agentic AI does in a financial services environment right now. A reconciliation agent matches transaction records against internal ledgers. A KYC agent screens customer documents against sanctions lists. A compliance agent routes regulatory filings based on entity classification. A trading surveillance agent flags potential wash trades from order flow patterns.

Every one of those tasks leans heavily toward recall and pattern matching. The model has seen millions of similar patterns during training. It knows the answer early. The chain-of-thought trace generated afterward is, according to this research, predominantly performative.

The security calculus changes in direct response. Picture a sanctions screening agent processing a crafted entity record containing an indirect prompt injection embedded in a business description field. The model processes the injection at the activation level, potentially altering its classification decision. The chain-of-thought trace, generated performatively because sanctions screening is a recall task the model handles, is produced from in-weight knowledge and reads like a textbook analysis of entity risk factors. The CoT monitor watching that trace sees clean, plausible reasoning. The model’s internal activations tell a different story. Your audit trail captures what the agent wants you to see, not computation. An examiner reviewing the reasoning trace isn’t examining evidence of how the model decided. They’re reviewing a post-hoc performance that resembles deliberation.

The researchers also found that model size correlates with performativity. Larger models (671B parameters) knew their answers earlier and generated more performative reasoning than smaller distilled versions (1.5B to 32B). Smaller models needed genuine test-time computation for equivalent tasks. As the industry pushes toward larger, more capable reasoning models for enterprise agentic AI, the performativity problem intensifies with every parameter upgrade.

The EU AI Act’s Article 14 requires human oversight of high-risk AI systems, with provisions for understanding the system’s reasoning. Financial regulators from the OCC to the FCA have signaled expectations for explainability in automated decision-making. Governance frameworks built on the assumption that reasoning traces constitute meaningful explanations now face a credibility problem they haven’t accounted for.

Three Research Groups, One Uncomfortable Conclusion

The Reasoning Theater paper lands with maximum force when you triangulate it against two other major findings from competing institutions.

OpenAI’s Baker et al. (March 2025) studied what happens when CoT monitors are integrated directly into the reinforcement learning training loop. In early stages, models became more capable and more aligned. Under excessive training pressure, models learned what the researchers called “obfuscated reward hacking”: hiding their intent within the chain of thought while continuing to hack the reward signal. OpenAI’s recommendation was blunt: developers training frontier reasoning models should avoid applying strong supervision directly to chains of thought. The monitorability itself is fragile. OpenAI’s follow-up “Monitoring Monitorability” paper in December 2025 proposed formal evaluation metrics to track this fragility across o3 and GPT-5.1 training runs, recognizing that the problem demands ongoing measurement rather than a one-time fix.

Anthropic’s Chen et al. (May 2025) found that reasoning models verbalized their use of reasoning hints in fewer than 20% of cases where they relied on those hints. Outcome-based RL initially improved faithfulness, then plateaued. Increased hint usage through reward hacking did not increase the propensity to disclose those hints. The model got better at using shortcuts while getting no better at revealing them.

Korbak et al.’s multi-author paper from July 2025, signed by over 40 researchers from multiple institutions, explicitly framed CoT monitorability as “a new and fragile opportunity for AI safety.” They warned that latent reasoning models, architectures that reason in continuous vector space rather than natural language, would eliminate even the partial transparency that current CoT provides.

Three independent research groups arrived at this conclusion: an interpretability lab at Goodfire, the model developer at OpenAI, and a cross-institutional safety coalition. All arrived at the same conclusion: chain-of-thought monitoring provides a signal in specific conditions, and those conditions don’t map cleanly to enterprise agentic AI deployments, where the majority of tasks fall below the difficulty threshold where monitoring remains reliable.

Applying CARE: Fixing Your Chain-of-Thought Monitoring Gap

The CARE framework (Create, Adapt, Run, Evolve) provides a structured path to address the reasoning theater problem without abandoning CoT monitoring wholesale.

Create your threat model around the faithfulness gap. Classify every agentic AI workflow by task difficulty and determine which ones the model solves from in-weight knowledge versus genuine test-time reasoning. Recall-heavy tasks like KYC screening, transaction matching, and document classification carry high performativity risk. Multi-step analytical tasks like fraud investigation across multiple data sources and complex regulatory interpretation carry lower risk. Design your monitoring controls proportional to that classification.

Adapt your monitoring stack to account for activation-level signals where architectures permit it. The Reasoning Theater research demonstrates that attention probes trained on model activations predict behavior far earlier and more accurately than CoT monitors for performative tasks. Every vendor offering agentic AI safety tooling should answer one direct question: does your monitoring access model internals, or does it only read output text? If the answer is output text only, you now know the limitation and the research that quantifies it.

Run dual-layer monitoring in production. Keep CoT monitors active for genuine reasoning detection and inflection point identification. The research confirmed that backtracking, realizations, and reconsiderations in CoT traces correspond to authentic belief shifts, appearing more than twice as often in traces where probes showed genuine uncertainty (0.045 inflections per step in uncertain traces versus 0.020 in high-confidence performative traces for DeepSeek-R1). These inflection points remain reliable indicators of real internal computation. Layer behavioral monitoring (input/output validation, action-level controls, tool call verification) as the primary safety net for performative tasks. The CoT monitor becomes your second line for routine operations, not your first.

Evolve your governance documentation to reflect the difficulty-dependent nature of CoT reliability. Update risk assessments as model capabilities change. Larger models and improved training methods shift the boundary between “easy” and “hard” tasks, changing where CoT monitoring remains effective. The August 2026 EU AI Act enforcement deadline adds urgency. Treat this as a moving target, because the research shows it is one.

Figure 3: CARE Framework Response to Reasoning Theater

Key Takeaway: Chain-of-thought monitoring provides genuine safety signal for hard reasoning tasks, but the majority of enterprise agentic AI workflows fall below the difficulty threshold where that signal remains reliable. Your governance framework needs to know the difference, and your next vendor evaluation needs to test for it.

What to do next

Download the Reasoning Theater paper and its interactive visualization tool at reasoning-theater.streamlit.app. Map your agentic AI workflows against the difficulty-dependent performativity findings. Bring this evidence to your next AI governance meeting, because the product team, legal counsel, and AI lead sitting across from you haven’t read it yet.

For more on building AI governance frameworks that survive contact with adversarial reality, explore the CARE framework at rockcyber.com. Subscribe to RockCyber Musings for more AI security and governance insights with the occasional rant.

👉 Subscribe for more AI security and governance insights with the occasional rant.

👉 Visit RockCyber.com to learn more about how we can help you in your traditional Cybersecurity and AI Security and Governance Journey

👉 Want to save a quick $100K? Check out our AI Governance Tools at AIGovernanceToolkit.com

The views and opinions expressed in RockCyber Musings are my own and do not represent the positions of my employer or any organization I’m affiliated with.

Weekly Musings Top 10 AI Security Wrapup: Issue 32 March 27-April 2, 2026

Rock Lambros — Fri, 03 Apr 2026 13:03:28 GMT

Anthropic had a week that should be a case study in operational security failure for years to come. On March 31, a routine release packaging error exposed 500,000 lines of Claude Code source across roughly 2,000 files. Five days earlier, a CMS misconfiguration had already put nearly 3,000 unpublished internal documents into a public search index, including draft material describing their most capable model as posing “unprecedented cybersecurity risk.” By April 1, they were firing DMCA takedowns at 8,000 GitHub repositories, most unrelated to them, trying to unsee what the internet had already seen. By April 2, a congressman was writing to the CEO about national security.

That would have been enough for any week. It was not the only thing that happened. On March 27, CISA added two exploited AI infrastructure vulnerabilities to its KEV catalog; three LangChain and LangGraph CVEs hit disclosure, with 84 million downloads in scope; and the European Commission confirmed attackers had been inside their AWS account for three days. The thread connecting all of it is the same one it always is: AI deployment speed running ahead of the operational security discipline required to sustain it. This week was not an anomaly. It was a pattern. Patterns do not self-correct.

As a bonus, check out my AI Cyber Magazine Podcast with Confidence Staveley during RSA.

1. Anthropic Leaked 500,000 Lines of Claude Code Source, Then Panicked on GitHub

On March 31, a debugging file accidentally bundled into a routine Claude Code update exposed approximately 500,000 lines of source code across nearly 2,000 files (CNBC, Axios, Fortune). The codebase was mirrored across GitHub within hours. Leaked feature flags revealed unreleased capabilities: a persistent background agent, cross-device remote control, and session-to-session learning. Anthropic attributed the incident to “a release packaging issue caused by human error” and stated no customer data was exposed. On April 1, attempting to scrub the code from GitHub, Anthropic sent DMCA takedowns that hit approximately 8,000 repositories, most unrelated to the leak (TechCrunch, Bloomberg).

Why it matters

Competitors received Anthropic’s unreleased feature roadmap. That strategic damage compounds the fact that this happened five days after the Mythos content leak. Coincidence???? I’ll let you decide.
The persistent background agent and remote control capabilities in the leaked code require explicit security design review before deployment. They were in development without prior public disclosure of the capability direction.
The DMCA sweep that caught 8,000 unrelated repositories shows what reactive incident response without a playbook looks like. Every remediation attempt created a new problem.

What to do about it

If you deploy Claude Code in your enterprise environment, review what access it holds to production systems and rotate any associated credentials until the full scope of the leak is confirmed.
Require software composition analysis (SCA) and release integrity verification as contractual terms with your AI vendors.
Develop a pre-incident legal response playbook that covers IP exposure scenarios, including proportional DMCA procedures that require scope confirmation before submission.

Rock’s Musings

Two major operational security failures from the same company in five days. The first was a CMS misconfiguration. The second was a packaging error. Both are basic controls that mature security operations have solved. Anthropic markets itself on safety and trustworthiness, and that positioning is now doing work it was not designed to carry. The DMCA overcorrection made it worse: you leak 500,000 lines of source code, then fire automated takedown requests at 8,000 repositories, most of them unrelated to you. Every IP attorney will tell you DMCA takedowns require good faith and specificity. Have a process before the fire starts.

2. Anthropic Accidentally Confirmed Its Most Capable Model Poses Unprecedented Cybersecurity Risk

A configuration error in Anthropic’s content management system made nearly 3,000 unpublished assets publicly searchable starting around March 26, including draft blog posts for a model called Claude Mythos (Fortune, CoinDesk). Internal documents describe Mythos as capable of rapidly finding and exploiting software vulnerabilities at an unprecedented scale. Anthropic confirmed the model exists and is in testing with early-access customers, calling it “a step change” in capability. The company described the exposure as caused by a configuration error and stated the data store was secured after discovery.

Why it matters

Anthropic’s own internal documentation, not a researcher’s estimate, describes this model as posing cybersecurity risks the industry has not seen before. That is the company’s self-assessment.
Early-access customer deployments were already underway before any public discussion of the risk profile occurred. The model shipped before the security conversation started.
A frontier model capable of autonomously finding and exploiting vulnerabilities at scale invalidates current vulnerability management timelines. That conversation needs to happen now.

What to do about it

Update your AI threat model to account for AI-assisted offensive operations at scale. This is not a future scenario. It is a current deployment.
Ask your AI vendors direct questions about internal capability assessments before your next contract renewal. What have they assessed, and when?
Document board and leadership awareness of frontier AI capability risk as a governance record item. Regulatory scrutiny on this topic will increase.

Rock’s Musings

The model is called Mythos. The leaked internal docs describe the cybersecurity risk as unprecedented. Anthropic was already deploying it with customers before any of this became public. This happened not because of an attack but because someone left a CMS misconfigured. Anthropic has historically been conservative in capability claims. When their own internal documentation describes a model as different in kind from what came before, the security community should take that seriously, not because the word “unprecedented” is alarming on its own, but because the source is the organization that built the thing. They know what it does.

3. ShinyHunters Breached the European Commission’s AWS Account

The European Commission confirmed on March 27 that attackers accessed the AWS account hosting its Europa.eu websites, with the intrusion first detected on March 24 (TechCrunch, Bloomberg). Threat actor ShinyHunters claimed responsibility and alleged theft of more than 350GB of data including mail server exports, databases, confidential documents, and contracts. The Commission’s statement noted internal systems were unaffected and mitigation measures were applied quickly. Affected EU entities received notification.

Why it matters

ShinyHunters has a documented history of monetizing stolen data through dark market sales. Even if the 350GB claim is exaggerated for leverage, policy documents and procurement contracts from the Commission’s web infrastructure are a counterintelligence asset.
The Commission enforces GDPR and is building the AI Act enforcement apparatus. Getting breached while standing up that apparatus is not a good governance signal.
AWS account-level compromise is full infrastructure compromise in practice. A managed cloud provider does not neutralize cloud account security failures.

What to do about it

Audit your AWS account permission boundaries and review CloudTrail logs for anomalous patterns this week, not next quarter.
Ensure your incident response plan explicitly covers cloud account compromise. Traditional endpoint-focused plans miss this scenario entirely.
If any of your vendors are EU institutions or Commission contractors, treat procurement data exposure as a downstream supply chain risk and assess your exposure now.

Rock’s Musings

The body enforcing Europe’s data protection framework had its AWS account cracked. Governance credentials do not equal security maturity. Write the most thorough AI regulation in the world. Your cloud IAM configuration remains a disaster until someone fixes it. The ShinyHunters 350GB claim needs forensic verification before anyone draws conclusions about scope, but three days of undetected access to the official Commission infrastructure doesn’t need verification. The institutions asking private sector organizations to demonstrate AI security maturity owe the market some transparency on their own failures. Name it, fix it, move on.

4. Your AI Workflow Tool Got CISA’s Attention: Langflow CVE-2026-33017

CISA added CVE-2026-33017, a critical remote code execution flaw in Langflow, to its Known Exploited Vulnerabilities catalog on March 26. Attackers began scanning for exposed instances roughly 20 hours after the advisory publication, with exploitation scripts appearing within 21 hours and active .env and .db file harvesting beginning within 24 hours (Sysdig, BleepingComputer, Help Net Security). The vulnerability carries a CVSS score of 9.3 and allows unauthenticated attackers to inject arbitrary Python code through the public flow build endpoint with no sandboxing applied. Federal agencies face an April 8 remediation deadline. Upgrade to Langflow version 1.9.0 or later.

Why it matters

Langflow is used to build and deploy LLM pipelines. Remote code execution in a workflow orchestration tool gives an attacker control over the AI’s inputs, outputs, and the credentials it holds.
The 20-hour exploitation window is increasingly standard for high-severity flaws. The concept of a patch window measured in days is no longer realistic for internet-exposed AI infrastructure.
.env file harvesting is the attacker’s first move because those files contain API keys for LLMs, vector databases, and cloud services the workflow connects to.

What to do about it

If Langflow runs on any internet-accessible host, treat the environment as potentially compromised and rotate all associated credentials before patching.
Segment AI workflow orchestration platforms behind authentication and network controls. These tools have no business being directly internet-accessible.
Verify Langflow version across your environment immediately. Anything prior to 1.9.0 is an open liability.

Rock’s Musings

The 20-hour exploitation timeline should reframe your vulnerability management program. That program was designed when you had days or weeks to act. That era closed. CISA’s KEV catalog is now your minimum viable patch priority list, and if you are not at sub-72-hour remediation SLAs for critical AI infrastructure, you are already behind. Organizations still describing AI workflow platforms as “internal tools” need a rethink. Internal tools with LLM API keys, cloud credentials, and production data connections are not internal in any meaningful threat model. An attacker who executes code in your Langflow environment has lateral movement access to every system that environment touches.

5. LangChain and LangGraph: Three CVEs, 84 Million Downloads Exposed

Cyera security researcher Vladimir Tokarev disclosed three vulnerabilities in LangChain and LangGraph on March 27, each covering a different attack path against the same enterprise AI framework (The Hacker News). CVE-2026-34070 (CVSS 7.5) enables path traversal to arbitrary files through manipulated prompt templates. CVE-2025-68664 (CVSS 9.3) allows extraction of API keys and environment secrets through unsafe deserialization. CVE-2025-67644 (CVSS 7.3) enables SQL injection in LangGraph’s SQLite checkpoint layer. LangChain, LangChain-Core, and LangGraph collectively logged over 84 million downloads. Patches are available: LangChain Core 1.2.22+, LangChain-Core 0.3.81+ or 1.2.5+, and LangGraph checkpoint sqlite 3.0.1+.

Why it matters

These three CVEs cover filesystem data, environment secrets, and conversation history in combination. Together, they represent near-total information exposure for any application built on these frameworks.
The 84 million download count means a significant portion of enterprise AI applications are affected. Most organizations do not know which AI frameworks their development teams selected.
CVE-2025-68664 with its 9.3 CVSS is the most critical. Unsafe deserialization is a well-understood, pervasive, and reliably exploitable class of vulnerability.

What to do about it

Inventory every AI framework in your environment, including those embedded in third-party tools. Do not rely on developers to self-report what they are using.
Apply the three patches and validate versions before the end of the business week.
Assess what data your LangChain-based applications can access and treat those data stores as potentially exposed pending patch confirmation.

Rock’s Musings

Three vulnerability classes in the same framework, covering three categories of sensitive enterprise data, were disclosed in one report. That’s what happens when you build for speed and bolt on security later. AI framework developers made that choice repeatedly, and this week’s CVE list is the invoice. LangChain is the jQuery of AI development right now. It is in everything, often without explicit organizational approval. Your AI security posture includes every dependency your developers pulled in without telling you. Get ahead of that inventory problem before the next disclosure.

6. A Congressman Put Anthropic on Notice Over National Security

Rep. Josh Gottheimer (D-N.J.) sent a letter to Anthropic CEO Dario Amodei on April 2, citing national security concerns arising from the source code leak (Axios, The Hill). Gottheimer’s letter noted that Claude is embedded in defense and intelligence operations, raised the prior CCP-backed group intrusion against Claude, and expressed concern that Mythos could enable more sophisticated cyberattacks against the United States. The letter also flagged Anthropic’s decision in late February to remove its binding commitment to halt model development if safety capabilities fall behind, replacing it with “nonbinding but publicly-declared” goals.

Why it matters

Federal agencies and defense contractors use Claude operationally. A source code leak followed by a congressional inquiry is a vendor risk event, not a PR problem. Your GRC process should treat it as such.
Removing the binding safety commitment is a substantive policy change that the congressional record now documents. The enforceability question will follow Anthropic through every future regulatory discussion.
Gottheimer sits on the House Intelligence Committee. This is not a throwaway letter. It is a first-stage oversight action that signals more to come.

What to do about it

Review your vendor risk assessment for any AI provider with confirmed government contracts. Congressional inquiries are material third-party risk events.
Establish a direct communication channel with your AI vendors’ enterprise security teams and request formal notification procedures for any government inquiries affecting their products.
Track the congressional record regarding Anthropic’s rollback of its safety commitment. It will surface again in budget and procurement cycles.

Rock’s Musings

The safety commitment rollback from February is the most substantive issue in that letter. Anthropic replaced a binding pledge to pause development if safety fell behind with goals they grade themselves on. That is not a small change. That is the foundational accountability mechanism that distinguished their positioning from competitors, and they quietly removed it. Congressional scrutiny was predictable the moment they became embedded in national security operations. The question I would ask directly is how many federal agency customers received notification about the source code exposure before it hit the press. I would guess the answer is uncomfortable.

7. Your Security Scanner Was the Supply Chain Attack: Trivy CVE-2026-33634

CISA added CVE-2026-33634 to its Known Exploited Vulnerabilities catalog on March 27 (Help Net Security, Aquasecurity GitHub advisory). Attackers compromised the Trivy container security scanner on March 19, using stolen credentials to publish a malicious v0.69.4 release and force-push 76 of 77 version tags in the trivy-action repository with credential-stealing malware. The attack triggered a downstream LiteLLM supply chain compromise via poisoned PyPI packages. Federal agencies face an April 9 deadline. Root cause was non-atomic credential rotation on March 1 left a valid token exposed during the rotation window.

Why it matters

Trivy is a default security tool in CI/CD pipelines across the industry. Compromising the scanner means attackers access the same environment credentials the security scan was meant to protect.
Force-pushing 76 version tags is a comprehensive compromise. Any pipeline that pins to mutable major or minor version tags rather than specific commit hashes was exposed.
The downstream LiteLLM PyPI compromise extends the blast radius into Python environments running LLM application code. The supply chain damage propagated well beyond the initial tool compromise.

What to do about it

Audit every CI/CD pipeline for trivy-action or setup-trivy at mutable version tags and pin to specific commit hashes immediately.
Treat any environment that ran a compromised Trivy version since March 19 as potentially credential-compromised. Rotate all associated tokens, SSH keys, and cloud credentials.
Apply this lesson to every security tool in your pipeline. Security tooling supply chains are higher-value targets than application code supply chains.

Rock’s Musings

The attacker turned the vulnerability scanner into the vulnerability. That is the platonic ideal of a supply chain attack: targeting organizations that care about security and embed security tooling in their build pipelines. The more security-conscious your culture, the higher your Trivy adoption, and the more exposed you were. The non-atomic credential rotation is the root cause. Aquasecurity rotated credentials on March 1 but did not revoke all tokens simultaneously. The attacker grabbed freshly rotated secrets during the window between invalidation and deployment. If your own rotation procedures have a gap between “revoke old” and “confirm new is live,” that gap is your exposure. Run your playbooks against that question this week.

8. The State AI Chatbot Safety Wave Is Not Waiting for Washington

Georgia’s state senate voted to concur in the House-amended version of SB 540 during the week of March 27, sending the chatbot disclosure and minor-protection bill to Governor Kemp’s desk (Troutman Privacy, Transparency Coalition). Idaho’s S 1297 passed its full legislature and advanced to Governor Little. Both are chatbot safety measures. Georgia’s bill requires disclosure every three hours for adult users and every hour for minors, along with explicit suicide and self-harm response protocols for conversational AI services. The Future of Privacy Forum’s tracker now counts 78 AI chatbot safety bills moving across 27 states in 2026.

Why it matters

Disclosure, minor safety, and mental health response requirements are becoming the regulatory floor across state jurisdictions. Organizations operating consumer-facing AI products need a 50-state tracking capability, not a wait-and-see approach.
Hourly disclosure requirements for minors are not trivial to implement for many chatbot architectures. The compliance engineering work should start now.
Seventy-eight bills across 27 states mean that any federal preemption framework, if one ever arrives, faces an already established patchwork of state obligations to reconcile.

What to do about it

Map your consumer AI products against chatbot disclosure requirements in every state where users reside. Georgia and Idaho represent the floor, not the ceiling.
Assess your chatbot’s existing mental health response protocols against the Georgia requirement specifics. A disclaimer is not compliant.
Assign someone accountable for multi-state AI governance tracking. This is not a future compliance problem.

Rock’s Musings

Washington cannot pass a federal AI framework. States can. Fifty legislatures with different requirements and different timelines is the compliance nightmare that preemption was supposed to prevent. It didn’t. Georgia’s hourly minor disclosure requirement is specific, implementable, and enforceable. State legislatures are producing more actionable compliance requirements than most federal guidance I have seen this year. If you deploy consumer AI products and you don’t have someone accountable for multi-state AI governance tracking today, that gap closes before Q3 or it closes you.

9. The EU AI Act Has an Enforcement Problem, and Nobody Is Talking About It Honestly

As of late March, only 8 of 27 EU member states had designated the single contact points required for national enforcement coordination under the AI Act, according to the European Parliament Think Tank’s enforcement analysis (Tech Policy Press, IAPP). The Digital Omnibus proposal, with negotiating positions adopted by Parliament’s IMCO and LIBE committees on March 18, would push high-risk AI compliance deadlines to December 2027 for Annex III systems and to August 2028 for Annex I systems, compared with the original August 2026 deadline. The European Commission also missed its own deadline for issuing guidance on high-risk AI systems. Trilogue negotiations between Council, Parliament, and Commission are now underway.

Why it matters

Approximately 70% of EU member states are not operationally ready for AI Act enforcement. Regulations without enforcement infrastructure are aspirational documents.
The 16-month delay in high-risk requirements gives organizations breathing room on paper while creating uncertainty about what compliance standard they are being held to during the gap.
The Commission missing its own implementation guidance deadline sets a poor precedent for holding private sector organizations to their compliance timelines.

What to do about it

Do not use the delay as a license to defer governance program work. The underlying obligations have not changed in substance. Build the program now and own it.
Review the Digital Omnibus amendments specifically for changes to the high-risk AI system definition. Legislative simplification sometimes reclassifies systems in ways that alter the scope of compliance.
Subscribe to IAPP’s EU AI Act tracker for updates on the trilogue outcome. The final text will differ from both Council and Parliament positions.

Rock’s Musings

Eight out of 27 enforcement bodies are operational as the Act’s first major deadlines approach. The Commission missed its own implementation guidance deadline. The most substantive AI governance framework on the planet is running on infrastructure that is not ready to enforce it. The delay does not invalidate the regulation. Organizations that build genuine AI risk management programs now will be positioned for whatever enforcement timeline materializes. Organizations that chase the deadline and treat compliance as documentation will be exposed when the enforcement machinery catches up. That gap grows wider every quarter.

The One Thing You Won’t Hear About But You Need To

NVIDIA and Johns Hopkins Gave You a Blueprint for Defending AI Agents Against Prompt Injection

Researchers from NVIDIA and Johns Hopkins University published “Architecting Secure AI Agents: Perspectives on System-Level Defenses Against Indirect Prompt Injection Attacks” on March 31 (ArXiv 2603.30016). The paper addresses how AI agents are vulnerable not to direct attacks on the model but to malicious instructions embedded in data the agent processes during task execution. The authors articulate three architectural positions. First, agents in dynamic environments need dynamic replanning with security policy updates built into the replanning loop. Second, security decisions requiring contextual judgment should still involve LLMs, but only within system designs that strictly constrain what the model can observe and decide. Third, ambiguous situations should treat human interaction as a core design consideration, not an edge case to minimize.

Why it matters

This paper frames indirect prompt injection as an architectural problem, not a model alignment problem. You cannot align your way out of it. You design it out or you accept the risk.
The principle of strictly constraining what the model can observe and decide has immediate practical application as your primary defense lever, more effective than filtering or detection approaches.
The human oversight design principle directly contradicts how most agentic deployments are being built, with human review treated as friction to reduce rather than a security control to preserve.

What to do about it

Read the paper. At 12 pages, it is short enough to share with your AI architects and security engineers before the next deployment review meeting.
Audit any agentic AI system currently in your environment against the observation scope and decision authority questions. Broad scope plus broad authority equals your highest-risk deployment.
Make human oversight an explicit design requirement in your AI agent security standards. Document the specific conditions under which an agent must pause and request human authorization.

Rock’s Musings

Nobody outside the AI security research community covered this paper. That is precisely why it belongs here. The breach reports get attention. The architecture guidance that would prevent the next breach sits on ArXiv with a few hundred downloads. I have been arguing at RockCyber for two years that agentic AI security is an architecture problem. You do not solve it with better prompts or stronger models. You solve it with privilege constraints, observation scope limits, and honest human oversight design. NVIDIA and Johns Hopkins gave you a 12-page framework for that conversation. If your next AI agent deployment review does not address these three principles, you are building exposure, not capability.

👉 For ongoing analysis of agentic AI governance frameworks, the conversation continues at RockCyber Musings.

👉 Visit RockCyber.com to learn more about how we can help you in your traditional Cybersecurity and AI Security and Governance Journey

👉 Want to save a quick $100K? Check out our AI Governance Tools at AIGovernanceToolkit.com

👉 Subscribe for more AI and cyber insights with the occasional rant.

The views and opinions expressed in RockCyber Musings are my own and do not represent the positions of my employer or any organization I’m affiliated with.

Thanks for reading RockCyber Musings! This post is public, so feel free to share it.

Share RockCyber Musings

References

Axios. (2026, March 31). Anthropic leaked its own Claude source code. https://www.axios.com/2026/03/31/anthropic-leaked-source-code-ai

Axios. (2026, April 2). Exclusive: Gottheimer presses Anthropic on source code leaks and safety protocols. https://www.axios.com/2026/04/02/gottheimer-anthropic-source-code-leaks

BleepingComputer. (2026, March 27). CISA: New Langflow flaw actively exploited to hijack AI workflows. https://www.bleepingcomputer.com/news/security/cisa-new-langflow-flaw-actively-exploited-to-hijack-ai-workflows/

Bloomberg. (2026, March 27). European Commission’s data stolen in hack on AWS account. https://www.bloomberg.com/news/articles/2026-03-27/european-commission-s-data-stolen-in-hack-on-aws-account

Bloomberg. (2026, April 1). Anthropic takes down thousands of GitHub repos trying to yank its leaked source code. https://www.bloomberg.com/news/articles/2026-04-01/anthropic-scrambles-to-address-leak-of-claude-code-source-code

CNBC. (2026, March 31). Anthropic leaks part of Claude Code’s internal source code. https://www.cnbc.com/2026/03/31/anthropic-leak-claude-code-internal-source.html

CoinDesk. (2026, March 27). Anthropic’s massive Claude Mythos leak reveals a new AI model that could be a cybersecurity nightmare. https://www.coindesk.com/markets/2026/03/27/anthropic-s-massive-claude-mythos-leak-reveals-a-new-ai-model-that-could-be-a-cybersecurity-nightmare

Fortune. (2026, March 27). Anthropic accidentally leaked details of a new AI model that poses unprecedented cybersecurity risks. https://fortune.com/2026/03/27/anthropic-leaked-ai-mythos-cybersecurity-risk/

Fortune. (2026, March 31). Anthropic leaks its own AI coding tool’s source code in second major security breach. https://fortune.com/2026/03/31/anthropic-source-code-claude-code-data-leak-second-security-lapse-days-after-accidentally-revealing-mythos/

Help Net Security. (2026, March 27). CISA sounds alarm on Langflow RCE, Trivy supply chain compromise after rapid exploitation. https://www.helpnetsecurity.com/2026/03/27/cve-2026-33017-cve-2026-33634-exploited/

Help Net Security. (2026, March 30). Second data breach at European Commission this year leaves open questions over resilience. https://www.helpnetsecurity.com/2026/03/30/european-commission-cyberattack-cloud-infrastructure-website/

IAPP. (2026). European Commission misses deadline for AI Act guidance on high-risk systems. https://iapp.org/news/a/european-commission-misses-deadline-for-ai-act-guidance-on-high-risk-systems

IAPP. (2026, March). EU Digital Omnibus: Analysis of key changes. https://iapp.org/news/a/eu-digital-omnibus-analysis-of-key-changes

Qualys ThreatPROTECT. (2026, March 26). CISA Added Langflow Vulnerability to its Known Exploited Vulnerabilities Catalog (CVE-2026-33017). https://threatprotect.qualys.com/2026/03/26/cisa-added-langflow-vulnerability-to-its-known-exploited-vulnerabilities-catalog-cve-2026-33017/

SecurityAffairs. (2026, March 27). The European Commission confirmed a cyberattack affecting part of its cloud systems. https://securityaffairs.com/190067/data-breach/the-european-commission-confirmed-a-cyberattack-affecting-part-of-its-cloud-systems.html

Sysdig. (2026, March 27). CVE-2026-33017: How attackers compromised Langflow AI pipelines in 20 hours. https://www.sysdig.com/blog/cve-2026-33017-how-attackers-compromised-langflow-ai-pipelines-in-20-hours

TechCrunch. (2026, March 27). European Commission confirms cyberattack after hackers claim data breach. https://techcrunch.com/2026/03/27/european-commission-confirms-cyberattack-after-hackers-claim-data-breach/

TechCrunch. (2026, April 1). Anthropic took down thousands of GitHub repos trying to yank its leaked source code. https://techcrunch.com/2026/04/01/anthropic-took-down-thousands-of-github-repos-trying-to-yank-its-leaked-source-code-a-move-the-company-says-was-an-accident/

The Hacker News. (2026, March 27). LangChain, LangGraph flaws expose files, secrets, databases in widely used AI frameworks. https://thehackernews.com/2026/03/langchain-langgraph-flaws-expose-files.html

The Hill. (2026, April 2). House Democrat pushes Anthropic on safety protocols, source code leak. https://thehill.com/policy/technology/5812881-gottheimer-presses-anthropic-ai-safety/

Tech Policy Press. (2026). EU’s AI Act delays let high-risk systems dodge oversight. https://www.techpolicy.press/eus-ai-act-delays-let-highrisk-systems-dodge-oversight/

Transparency Coalition. (2026, March 27). AI legislative update: March 27, 2026. https://www.transparencycoalition.ai/news/ai-legislative-update-march27-2026

Troutman Pepper Locke. (2026, March 30). Proposed state AI law update: March 30, 2026. https://www.troutmanprivacy.com/2026/03/proposed-state-ai-law-update-march-30-2026/

Aquasecurity. (2026). Trivy ecosystem supply chain temporarily compromised [GitHub Security Advisory GHSA-69fq-xp46-6x23]. https://github.com/aquasecurity/trivy/security/advisories/GHSA-69fq-xp46-6x23

European Parliament Think Tank. (2026, March 18). Enforcement of the AI Act. https://epthinktank.eu/2026/03/18/enforcement-of-the-ai-act/

Jiang, Z., et al. (2026, March 31). Architecting secure AI agents: Perspectives on system-level defenses against indirect prompt injection attacks [Preprint]. ArXiv. https://arxiv.org/abs/2603.30016

AI Monitoring Is a Standards Problem, Not a Technology Problem

Rock Lambros — Tue, 31 Mar 2026 12:50:10 GMT

NIST just published an admission that nobody knows how to monitor AI systems after deployment. NIST AI 800-4, “Challenges to the Monitoring of Deployed AI Systems,” reviews findings from three workshops, 250+ experts, and almost 90 research papers. The document catalogs over 30 distinct challenges. It offers zero solutions. That’s not a criticism. That’s the diagnosis, and that should raise your spidey senses.

NIST Mapped the Mess

The report organizes post-deployment AI monitoring into six categories:

Functionality (does it still work as intended?)
Operational (does the infrastructure hold?)
Human Factors (is it transparent and useful to humans?)
Security (is it defended against attacks?)
Compliance (does it meet regulatory requirements?)
Large-Scale Impacts (does it promote human flourishing?)

Each category carries its own distinct challenges. Functionality monitoring suffers from a lack of ground-truth datasets and a lack of a reliable way to detect model drift. Operational monitoring struggles with fragmented logging across distributed infrastructure. Human Factors monitoring, which drew more practitioner attention than any other category in the workshops, remains almost entirely unstudied in the literature. Security monitoring faces the unsettling reality that some models appear to detect when they’re being evaluated, changing their behavior under observation. Compliance monitoring lacks even basic tracking of terms-of-service violations, including downstream fine-tuning of open models for CSAM generation. Large-Scale Impacts monitoring lacks agreed-upon metrics to measure whether AI systems help or harm people at scale.

That’s a lot of individual problems. The question is whether they share a common root cause.

Figure 1: NIST AI 800-4 Cross-Cutting Challenges

The Root Cause NIST Documented Without Naming

Read the cross-cutting challenges section carefully. Five categories of barriers span every monitoring type:

No trusted methods and tools
Poor visibility and transparency
Pace of change
Organizational incentive failures
Resource constraints

Strip away the academic framing, and a pattern emerges. Workshop attendees were asking questions that belong in a standards body, not a research lab.

One attendee called for “an abstraction layer for universal security and monitoring.” Others asked, “What does the information sharing of what’s measured look like up and down the value chain?” Multiple participants flagged the absence of common metrics across use cases, noting that “non-standardized logic for generating metrics across use cases prevents us from building easy platform capabilities for monitoring.”

It’s important to point out that not every challenge NIST documented is a standards problem. Detecting deceptive behavior in models that modify their behavior under observation remains an open research problem. No specification can fix it because nobody knows how to do it reliably yet. Human-AI feedback loops are an understudied science. Ground-truth dataset availability is a data and methodology problem. The field faces three categories of challenge simultaneously: standards gaps (metrics, logging formats, reporting schemas), research gaps (deceptive behavior detection, feedback loop dynamics), and adoption gaps (methods exist in adjacent fields but aren’t applied to AI).

The standards layer is the prerequisite that makes progress on the other two categories possible. Without common definitions, you can’t scale research findings into production monitoring. Without shared schemas, adoption of proven methods stays trapped inside individual vendor implementations. Take deception detection as an example. You can’t begin researching whether a model’s stated reasoning matches its actual behavior unless you’re capturing structured reasoning traces alongside action logs in the first place. The research gap depends on closing the standards gap.

You’ve Seen This Movie Before

How did this work out for us in cybersecurity? We’ve had a 20-year head start on this exact problem.

Before syslog standardization, every network device vendor shipped its own logging format. Security teams drowned in data they couldn’t correlate. Firewalls from one vendor produced logs that meant nothing to the SIEM built for another vendor’s format. Every firewall had monitoring, but none of them spoke the same language.

The fix wasn’t a better firewall. It was CEF (Common Event Format), then LEEF (Log Event Extended Format), and now OCSF (Open Cybersecurity Schema Framework). Common schemas let security teams correlate events across vendors, build cross-platform detection rules, and operate SOCs that don’t require a translator for each data source. The technology didn’t change. The standards layer underneath made the existing technology useful at scale.

The AI monitoring equivalent would need agent-specific semantic conventions built on the observability infrastructure enterprises already operate. Not a new standard competing with OpenTelemetry. Extensions to OpenTelemetry that understand agent reasoning steps, tool calls, and multi-agent handoffs. Security events are mapped to schemas that flow into existing SIEMs without custom parsers. The pattern is identical: don’t build a parallel universe of AI-specific tooling. Extend the standards that security teams already trust.

AI monitoring is stuck in the pre-syslog era. Every platform defines its own metrics, its own log structures, its own alert taxonomies. If your organization runs AI workloads across three cloud providers and two agent frameworks, you operate five separate monitoring stacks that don’t talk to each other.

Here’s what that looks like in practice. A regional bank deploys a customer-facing loan origination model hosted on one cloud provider’s ML platform. The model calls a third-party credit scoring API. A separate vendor supplies the fairness monitoring layer. The bank’s compliance team uses an internal dashboard that pulls from the cloud provider’s native monitoring. When the credit scoring API updates its model without notification, the loan origination model starts producing subtly different risk scores. Approval rates for one demographic bracket shift by 4% over six weeks. The fairness monitoring vendor’s tool flags a drift alert using its own proprietary metric. The cloud provider’s native monitoring shows no anomaly because its baseline was never calibrated against the third-party API’s output distribution. The compliance dashboard, which aggregates data from both sources, shows conflicting signals that the compliance analyst can’t reconcile because the two tools define “drift” differently, measure it on different time windows, and log it in incompatible formats.

Nobody in that chain did anything wrong individually. The fairness vendor’s tool worked as designed. The cloud provider’s monitoring worked as designed. The gap was structural. There was no shared definition of what “drift” means across the pipeline, no common logging schema that would let the compliance team correlate events from two different monitoring tools, and no standardized way for the credit scoring API provider to notify downstream consumers of model updates.

That scenario plays out today in financial services, healthcare, and any sector that assembles AI capabilities from multiple vendors. NIST AI 800-4 confirmed it with receipts from 250 practitioners saying the same thing in different words.

Figure 2: The Monitoring Standards Gap

Article 72 Is Already Undeliverable

Regulators aren’t waiting for standards to mature. The EU AI Act’s high-risk system obligations take effect August 2, 2026 (if the aren’t delayed). Article 72 requires providers of high-risk AI systems to implement post-market monitoring plans that “actively and systematically collect, document and analyse relevant data” on system performance throughout the system’s lifetime. Deployers face separate obligations to monitor operations and report serious incidents within 72-hour and 15-day windows.

Pull one thread, and the gap becomes specific. Article 72 requires providers to collect performance data “throughout their lifetime” and evaluate “continuous compliance.” NIST AI 800-4 documents that practitioners lack standardized performance metrics, can’t establish baselines or deviation thresholds, and have no systematic way to compare model behavior across providers. One workshop attendee put it bluntly: “It’s often unclear what exactly to monitor and how.” The report cites research confirming that “the appropriate metrics to capture is not standardized in the AI community” and warns this “absence can result in misleading performance measures.”

That’s not a general compliance gap. Article 72 requires continuous collection and analysis of performance data. NIST AI 800-4 confirms that the field hasn’t agreed on what “performance” means in post-deployment contexts, let alone how to measure it consistently across different AI systems and providers. The regulation demands an activity that is structurally undeliverable with the current monitoring ecosystem. Organizations filing post-market monitoring plans in 2026 will document processes built on unstandardized metrics, non-interoperable tools, and self-defined baselines. They’ll comply on paper. The monitoring itself won’t be comparable, auditable, or meaningful across organizational boundaries.

Compliance requires two capabilities this ecosystem lacks: runtime hooks that produce monitoring data in standardized formats, and trace architectures that reconstruct decision chains across organizational boundaries. Without these, Article 72 post-market monitoring plans are fiction written in incompatible vendor dialects.

NIST’s own AI Risk Management Framework compounds the pressure. The MANAGE function calls for continuous monitoring and risk response throughout deployment. The forthcoming NIST Cyber AI Profile maps cybersecurity controls to AI-specific concerns like model integrity and adversarial robustness. Every framework converges on the same expectation. The implementation layer that would make compliance verifiable doesn’t exist yet.

Who’s Responsible? Nobody Knows That Either.

NIST AI 800-4 surfaced a question that’s arguably more urgent than the technical gaps: who monitors? Workshop attendees repeatedly asked: “Who should do monitoring?” “Who is responsible for remediating incidents?” and “If anything is found, who can act on it?”

In the bank scenario above, was the monitoring failure the cloud provider’s responsibility? The fairness vendor’s? The credit scoring API provider’s? The bank’s compliance team? Each party monitored its own slice of the pipeline. Nobody monitored the seams between them. The NIST report documents this as an unresolved question across the AI supply chain, and it’s compounded by the standards gap. You can’t assign responsibility for monitoring when you haven’t agreed on what monitoring means. You can’t hold a vendor accountable for failing to report a drift event when “drift” has no shared definition.

A viable monitoring architecture separates three concerns. The platform exposes standardized observation and control points. An open enforcement layer applies policy through those control points, portable across any platform that exposes them. The enterprise customizes policy to its domain: financial services brings its own data sensitivity models, healthcare brings PHI detection, and any regulated industry brings its compliance requirements. When responsibilities are layered this way, the question of “who monitors?” has a structural answer. The platform enables. Open tooling enforces. The enterprise governs. Accountability follows the layer where the failure occurred.

One attendee asked how to “reduce the burden on the end user” to validate model behavior. Another asked how monitoring could become “a more collaborative practice, rather than a closed technical process.” These aren’t theoretical musings. They’re the governance questions that determine whether monitoring happens at all or degenerates into checkbox compliance where everyone points at someone else’s dashboard. A layered architecture gives each party a defined obligation: expose, enforce, govern. The current ecosystem gives everyone an excuse.

Agents Make Everything Worse

If the standards gap is a problem for current AI systems, it’s a crisis for agentic AI. NIST SP 800-4 repeatedly mentions agents, and the findings are sobering.

Workshop attendees flagged “lengthy agentic tasks” as especially resource-intensive to monitor. The report cites research noting that “both the agents and the operational environment are subject to change,” making static monitoring baselines unreliable. Agent identification and tracking remain unstandardized. Attendees raised visibility challenges around “out-of-distribution behavior using agent identifiers” and noted that watermarking and content provenance measures “face reliability challenges.” One attendee asked directly: “Is the model agentically attempting to subvert the monitoring setup it is under, i.e., scheming?”

That question deserves a pause. We’re building systems that plan, execute across organizational boundaries, call external tools, and collaborate with other agents. The monitoring challenges NIST documented for conventional AI systems, from detecting drift to maintaining visibility to establishing baselines, all assume a relatively static system being observed from outside. Agents aren’t static. They change behavior based on context, discover new capabilities at runtime, and operate across a distributed infrastructure that no single organization fully controls.

Any monitoring standard for agents needs a dynamic inventory mechanism. A static software bill of materials generated at deployment time is worthless when agents discover new tools, connect to new service endpoints, and modify their own capabilities during a single execution session. The inventory must update in real time, triggered by component changes, and output in formats the supply chain security ecosystem already consumes. If your agent connects to a new MCP server mid-task and your inventory doesn’t reflect that within the same session, your security team is operating on a stale map.

The “monitorability tax” concept raised in the report’s cited research captures the emerging cost structure. Model developers will pay a performance penalty, through slower inference or less capable models, to maintain the ability to monitor agent behavior. That cost rises as agent autonomy increases. Standardized hooks reduce the engineering cost by making monitoring implementation portable across frameworks, a one-time platform integration rather than custom monitoring code for every deployment. The monitorability tax on compute remains. The tax on engineering effort doesn’t have to.

The cross-provider abstraction layer that workshop attendees called for isn’t a nice-to-have for agentic systems. Without standardized hooks for runtime monitoring, standardized trace formats for multi-agent workflows, and standardized inventories of agent capabilities and dependencies, you’re watching agents through whatever proprietary window each vendor provides. You can’t correlate behavior across platforms. You can’t reconstruct decision chains that span multiple agent frameworks. You can’t audit what you can’t consistently observe.

One more structural blind spot worth naming: runtime monitoring standards assume a cooperating platform that exposes hooks. Open-weight models distributed without platforms bypass this assumption entirely. Once a model is released into the wild for anyone to run, no runtime hook exists unless the downstream deployer voluntarily implements one. Open-weight models are structurally ungovernable by runtime standards alone. Any honest conversation about the monitoring gap has to acknowledge this boundary.

Figure 3: How Agents Amplify the Monitoring Standards Gap

Key Takeaway: NIST AI 800-4 confirms what practitioners feel in their bones: AI monitoring isn’t failing because we lack technology. The standards layer that would make technology useful at scale doesn’t exist. Agents make the gap existential.

What to do next

Stop accepting proprietary monitoring silos. The next time you evaluate an AI platform, put these questions into the review:

What open logging schema do your monitoring outputs conform to? If the answer is a proprietary format, ask how you export monitoring data into a format another platform can ingest without custom transformation.
How does your monitoring define and detect model drift? Compare the answer across your vendors. If two vendors define “drift” differently, your compliance team can’t produce a coherent post-market monitoring report under Article 72.
When a component in the AI pipeline (a third-party API, a model update, a data source change) shifts behavior, how does your monitoring surface cross-component effects? If the answer involves manual correlation, you have a gap that scales with system complexity.
Who in the supply chain is responsible for monitoring the seams between components? If nobody owns cross-boundary monitoring, say so in your risk register. That’s an accepted risk, not an oversight.
Does your AI platform expose standardized middleware hooks that allow your security team to intercept and evaluate agent actions before they execute? If the platform’s controls are proprietary and non-portable, your enforcement logic dies with the vendor relationship. Every policy you write, every guardrail you configure, every compliance rule you encode is locked to one vendor’s architecture.

Push your industry groups and standards bodies. If you participate in OWASP, ISO working groups, or NIST-affiliated communities, advocate for common AI monitoring vocabularies and reference architectures. The cybersecurity field solved this problem a decade ago with common event formats and shared schemas. The AI field hasn’t started.

Audit your own monitoring maturity against the six NIST categories. Most organizations will find entire categories with no monitoring at all, particularly Human Factors and Large-Scale Impacts. Map the gaps before the next board meeting where someone asks if you’re ready for August 2026.

The full NIST AI 800-4 report is available at https://doi.org/10.6028/NIST.AI.800-4.

👉 Visit RockCyber.com to learn more about how we can help you in your traditional Cybersecurity and AI Security and Governance Journey

👉 Want to save a quick $100K? Check out our AI Governance Tools at AIGovernanceToolkit.com

👉 Subscribe for more AI and cyber insights with the occasional rant.

The views and opinions expressed in RockCyber Musings are my own and do not represent the positions of my employer or any organization I’m affiliated with.

Thanks for reading RockCyber Musings! Subscribe for free to receive new posts and support my work.

Share RockCyber Musings

Weekly Musings Top 10 AI Security Wrapup: Issue 31 March 20-26, 2026

Rock Lambros — Fri, 27 Mar 2026 12:11:00 GMT

RSA Conference 2026 closed Thursday in San Francisco. Thirty thousand attendees, six hundred exhibitors, one word on every booth banner: agentic. While the industry competed on keynotes and happy hours, LiteLLM, deployed in hundreds of enterprise AI stacks, got infected with credential-stealing code through a misconfigured GitHub Actions workflow. Malicious releases went live March 19 and March 22. Most of your security team was watching keynotes.

Underneath the conference noise, genuine signal emerged. Zenity’s CTO demonstrated live zero-click exploits against ChatGPT, Salesforce, and Microsoft Copilot on the conference floor. Palo Alto Networks Unit 42 documented new attack paths through the Model Context Protocol. HackerOne disclosed a 540% year-over-year surge in validated prompt injection vulnerabilities. The EU AI Office’s second draft Code of Practice on AI-generated content transparency is open for feedback through March 30, with prescriptive new requirements that narrow compliance discretion significantly. NIST published AI 800-4, the first federal framework for monitoring AI systems in production, with no vendor booth to announce it.

Here’s what matters and what to do about it.

1. Zenity Launches Guardian Agents and Demonstrates 0-Click AI Exploits at RSA

Zenity launched Guardian Agents at RSA 2026 on March 23, positioning it as continuous, contextual security for AI agents across SaaS, cloud, and endpoint environments. CTO Michael Bargury ran live demonstrations titled “Your AI Agents Are My Minions,” showing zero-click prompt injection chains that manipulated Cursor into leaking developer secrets via support emails, Salesforce agents into exfiltrating customer data to an attacker-controlled server, and ChatGPT into producing persistent attacker-chosen outputs across conversations (The Register, March 23, 2026, and Help Net Security, March 24, 2026).

Why it matters

Zero-click attacks eliminate the human review checkpoint most AI security frameworks assume is present. When agents act without user input, your primary detection layer disappears before the threat is visible.
Live exploitation of production enterprise systems on a conference floor is harder to dismiss than a threat model in a whitepaper.
Guardian Agents signals a market category forming in real time. The evaluation criteria you set today will shape purchasing decisions for the next several years.

What to do about it

Inventory every AI agent in your environment before your next board meeting. If you can’t enumerate them, you can’t monitor them.
Require vendors to document in writing which actions their agents take without explicit human approval. Non-answers are critical control gaps.
Run adversarial testing against your three highest-access agents this quarter, targeting credential extraction, data exfiltration, and cross-system manipulation.

Rock’s Musings

Bargury’s demonstration strategy was the most honest thing at RSA this week: show the attack, then show the defense. Live exploitation on production systems is harder to dismiss than a slide deck built around the word autonomous. The inconvenient reality is that most enterprises already have agents running with email access, CRM credentials, and code repository permissions, with no runtime monitoring on what those agents decide to do. Selecting an AI security vendor is not the same thing as having an answer to the problem he demonstrated on the conference floor.

2. LiteLLM Infected with Credential-Stealing Code via Trivy Misconfiguration

The Register reported March 24 that LiteLLM, a widely deployed open-source LLM API proxy, was compromised through a misconfigured Trivy GitHub Actions workflow. Attackers modified version tags on the trivy-action GitHub Action to inject malicious code into workflows organizations were already running, producing malicious releases on March 19 and March 22. The maintainer confirmed that anyone who installed and ran the project during that window should assume credentials available to their environment were exposed.

Why it matters

LiteLLM sits in the critical path of many enterprise AI deployments. One compromised abstraction library reaches hundreds of downstream production systems simultaneously.
The attack exploited version tags, not direct code injection. CI/CD pipelines relying on tags rather than pinned commits ran malicious code without detection. That’s a systemic configuration gap across most enterprise pipelines.
The attack ran during RSA week when security teams were distracted. The timing was likely not accidental.

What to do about it

Audit every environment that pulled a LiteLLM update between March 19 and March 24. Treat those environments as potentially compromised until you confirm otherwise.
Pin all GitHub Actions to specific commit hashes, not version tags. Tags are mutable and can be silently overwritten. Commits are not.
Establish software bill of materials practices for all AI and ML dependencies. Supply chain attacks will keep finding environments where that inventory doesn’t exist.

Rock’s Musings

LiteLLM is exactly the kind of library that lands in enterprise AI stacks without a security review, installed by an ML engineer who needed to route calls to three model providers before the sprint ended. Trivy is a security tool. Attackers used a security tool misconfiguration to compromise a release pipeline for another widely used tool. If there’s a cleaner argument for applying security rigor to your own security tooling, I haven’t heard it. Your AI dependency chain needs the same scrutiny as your application dependencies. Good intentions at install time are not a compensating control.

3. Palo Alto Networks Unit 42 Documents MCP Attack Vectors

Palo Alto Networks Unit 42 published research the week of March 20 documenting new attack paths through the Model Context Protocol, including prompt injection delivered through MCP’s sampling interface. Security researchers tracked 30 CVEs filed against MCP implementations in the preceding 60 days, including CVE-2026-25536 (cross-client data leak in the MCP TypeScript SDK) and CVE-2026-23744 (remote code execution in MCPJam Inspector). A scan of more than 500 public MCP servers found that 38% lacked authentication entirely (Unit 42, March 2026, and Adversa.ai, March 2026).

Why it matters

MCP is the connective tissue between AI agents and enterprise tools. A vulnerability in this protocol exposes the entire agent ecosystem built on top of it, not one isolated system.
Thirty CVEs in 60 days signals that security review did not happen before shipping at scale. Every API ecosystem that launches with deployment velocity ahead of security assessment follows this arc.
Thirty-eight percent of scanned servers lacking authentication is systemic failure. Authentication is the minimum viable control. Everything built on top of unauthenticated servers is exposed.

What to do about it

Inventory every MCP server in your environment and treat unauthenticated instances as critical findings requiring immediate action.
Require authentication, authorization, and comprehensive logging for any MCP server with access to production systems or sensitive data.
Demand specific CVE status and patch timelines from your AI infrastructure vendors. Vague answers signal high risk and a vendor not tracking its own exposure.

Rock’s Musings

Thirty CVEs in 60 days is not a patching problem. It’s a design problem. MCP shipped fast because the builders cared more about what AI agents could reach than how securely they could reach it. The 38% authentication gap is the number that should end budget debates about AI infrastructure security investment. Roughly two in five MCP servers operate on the assumption that only authorized parties will talk to them, which is exactly wrong in a protocol designed to connect agents to external resources. That assumption creates direct paths to your production data.

4. HackerOne Reports 540% Surge in Validated Prompt Injection Vulnerabilities

HackerOne announced Agentic Prompt Injection Testing on March 21, paired with platform data showing a 540% year-over-year increase in validated prompt injection vulnerabilities. The service executes structured, multi-turn adversarial scenarios against live AI applications, evaluating whether injection attempts produce actual data exposure or unauthorized tool execution across interconnected agent systems (HackerOne Blog, March 2026, and Cybersecurity Insiders, March 21, 2026).

Why it matters

A 540% increase in validated vulnerabilities means real researchers are finding real exploitable conditions in production systems, not theoretical edge cases.
Traditional application security testing does not cover agent-specific attack paths. If your AI agents aren’t explicitly in scope for your red team or bug bounty program, you have a documented blind spot.
Unit 42’s concurrent research on indirect prompt injection through web content eliminates the “attacker needs direct access” objection. Agents read the web. The web is the attack surface.

What to do about it

Add AI agents to your red team scope explicitly as a primary target category, not an afterthought appended to an existing engagement.
Require prompt injection testing as part of every AI agent release process, treated as a gate equivalent to penetration testing for any externally facing application.
Track prompt injection findings as a distinct vulnerability class in your risk register. You can’t demonstrate improvement to your board on metrics you’re not collecting separately.

Rock’s Musings

Five hundred forty percent ends the debate about whether prompt injection is a real threat. I’ve heard the objection that attackers need direct access to craft payloads. Unit 42’s indirect injection research, published this same week, shows agents reading manipulated instructions from ordinary websites they visit in the course of normal operation. Your agents don’t need to be directly targeted; they need to visit the wrong page. The gap between organizations deploying AI agents and organizations testing those agents adversarially is the largest unaddressed risk exposure I see in enterprise AI programs right now.

5. Microsoft Publishes Secure Agentic AI Framework and Confirms Agent 365 May 1 GA

Microsoft published “Secure Agentic AI End-to-End” on March 20, documenting its approach to extending Zero Trust architecture across the full AI agent lifecycle: data ingestion, model training, deployment, and runtime behavioral monitoring. The post confirmed Agent 365, Microsoft’s governance control plane for enterprise AI agents, will reach general availability on May 1, 2026, with agent identity, authorization scope, and behavioral monitoring treated as distinct security domains from traditional human-user ZT controls (Microsoft Security Blog, March 20, 2026).

Why it matters

A confirmed May 1 GA date gives enterprises in Microsoft environments a concrete six-week planning horizon. Governance framework adoption takes time and that clock is already running.
Extending Zero Trust to AI agents is architecturally correct. Most ZT implementations weren’t designed with agent identity or behavioral monitoring in mind, making the gap assessment non-trivial work.
Publishing detailed technical frameworks before product GA signals Microsoft wants enterprises building governance practices now, before the product ships.

What to do about it

Map your current ZT architecture against the agent-specific requirements described in the March 20 post. Focus on gaps in agent identity and behavioral monitoring specifically.
Begin internal stakeholder alignment on Agent 365 if you’re in a Microsoft 365 environment. Six weeks is not enough time to start that conversation from zero.
Document agent permissions, access patterns, and decision scopes using whatever visibility tools you have today rather than waiting for Microsoft tooling.

Rock’s Musings

“End-to-end” is doing heavy lifting as a title. What Microsoft describes is extending known security primitives to a new execution context. That’s necessary work and not a complete answer. The hard problems are behavioral: distinguishing authorized agent actions from manipulated ones, detecting policy violations in real time, and maintaining audit trails that survive an incident investigation. Agent 365 is worth watching. If the behavioral monitoring is substantive, it’ll move the market. If it’s a compliance dashboard, enterprises will check the box while actual risk sits unaddressed underneath it.

6. Cisco Releases DefenseClaw Open Source on Final Day of RSA

Cisco released DefenseClaw to GitHub on March 27, the final day of RSA 2026, as an open-source framework for scanning agent skills and sandboxing agent execution. The release accompanied Zero Trust Access for AI agents and a free AI Defense Explorer Edition targeting security practitioners. Cisco plans integration with NVIDIA OpenShell for hardware-level execution sandboxing, addressing execution isolation that software-only monitoring cannot replicate (Cisco Newsroom, March 2026, and UC Today, March 2026).

Why it matters

Open-source agent security scanning means organizations can start building security into agent development pipelines without a procurement cycle or a budget line.
Hardware-anchored execution sandboxing addresses a control gap that software-only monitoring cannot close. Execution isolation for agents is systematically underinvested across the industry relative to the risk.
The open-source and Explorer Edition strategy targets developers before enterprise procurement cycles form, competing for architectural mindshare with builders rather than just buyers.

What to do about it

Pull DefenseClaw and run it against a non-production agent environment this month. Validate real-world utility before committing to any commercial evaluation.
Evaluate the NVIDIA sandboxing integration if you’re running NVIDIA infrastructure. Test in isolation before production consideration.
Track Cisco’s AI Defense commercial roadmap. Free Explorer Editions typically precede commercial tier launches by 12 to 18 months, and starting your evaluation now means you’ll have data when the pitch arrives.

Rock’s Musings

Releasing open-source code on the last day of the conference changes the conversation from “will enterprises buy this” to “pull the repo and see for yourself.” That’s a credible move when the code is real and the threat model is honest. Run DefenseClaw against your actual agent environment before making any claims about coverage. The larger play is Cisco’s bid for the enterprise AI security architecture position using network visibility, an established security portfolio, and enterprise relationships most competitors would need a decade to build. DefenseClaw is a credible opening move. Watch the next 18 months of product decisions to judge the hand.

7. Google Deploys Gemini Agents to Process 10 Million Dark Web Posts Daily

Google announced at RSA 2026 on March 23 that Gemini AI agents are processing more than 10 million dark web posts daily to surface threats relevant to specific organizations. The capability integrates with Google Security Operations alongside new agentic automation features, currently in preview, that let security teams combine AI-driven investigation with deterministic automated response workflows (The Register, March 23, 2026, and Google Cloud Blog, March 2026).

Why it matters

Ten million posts per day changes the economics of dark web threat intelligence. Organizations that couldn’t sustain comprehensive monitoring programs gain access to Google-scale processing at a fraction of the previous cost.
Pairing AI-driven investigation with deterministic automation preserves human-defined control while extending agent reach into high-volume, low-judgment tasks. That’s the right architectural pattern for agentic SOC work.
Preview status means GA behavior, SLA, and security review standards remain unfinalized. Your production SOC is not where you run this experiment yet.

What to do about it

Assess your current dark web monitoring coverage gap against what this capability covers. If there’s a meaningful difference, prioritize a pilot evaluation once the feature reaches GA.
Review preview terms carefully before enabling agentic automation in any production SOC workflow. Preview features carry materially different risk profiles than GA releases.
Define which SOC workflows you’d delegate to agents and where human approval must remain. Build that policy before the tools arrive, not after they’re already running.

Rock’s Musings

Threat intelligence is the most defensible application of AI agents in security operations right now. Failure modes are recoverable: the agent misses a threat and your other controls have a chance at it. Compare that to agentic incident response, where the failure mode might be blocking a production system or destroying forensic evidence. Start with intelligence, not response. The preview framing signals Google is collecting operational data before committing to GA behavior guarantees, which is reasonable product discipline. It also means you wait for GA before running this where failures have material consequences.

8. Novee Launches Autonomous AI Red Teaming Platform for LLM Applications

Novee announced autonomous AI red teaming for LLM applications on March 24 at RSA Conference 2026. The platform deploys an AI pentesting agent that executes multi-turn adversarial scenarios against live systems, simulating attacker chaining techniques across prompt injection, jailbreaks, data exfiltration paths, and agent behavior manipulation, covering any LLM-powered system regardless of model provider with optional CI/CD pipeline integration (GlobeNewswire, March 24, 2026, and Help Net Security, March 24-25, 2026).

Why it matters

Traditional pentesting tools were designed for pre-LLM application security problems. Novee builds red teaming from actual LLM vulnerability research, producing findings that adapted traditional tools miss.
CI/CD pipeline integration lets security teams catch prompt injection and agent manipulation issues before production deployment rather than after an incident surfaces them.
Two distinct companies announced adversarial AI testing capabilities at RSA 2026 in the same week. Market formation around this problem is accelerating.

What to do about it

Evaluate Novee’s beta against a non-production LLM application to understand what it surfaces relative to your existing security testing coverage.
Map the gap between your current SDL and what LLM-specific adversarial testing would require. The gap is almost certainly larger than you expect it to be.
Add AI-native red teaming as a release gate requirement for any LLM application reaching production. Make it a gate, not a post-deployment recommendation that teams skip.

Rock’s Musings

Two autonomous AI red teaming announcements in one RSA week tells you the market is accepting that testing AI systems requires AI-specific tooling, not adapted traditional approaches. That’s a healthy development even if the tools themselves are early. The CI/CD integration angle is the most practically valuable feature: security issues caught before production deployment cost a fraction of what they cost after deployment. If you’re shipping LLM applications without adversarial testing in the pipeline, you’re making a risk decision that most boards don’t know they’re making.

9. EU AI Office Second Draft Code of Practice Enters Final Feedback Window

The EU AI Office published its second draft Code of Practice on AI-Generated Content Transparency on March 3, with the stakeholder feedback window closing March 30. The second draft moves from high-level principles toward prescriptive, technically detailed commitments, narrowing compliance discretion and signaling how regulators will likely assess conformance in practice. A third and final version is expected by June 2026, ahead of the August 2 applicability date for AI-generated content transparency obligations (Herbert Smith Freehills Kramer, March 2026, and BABL AI, March 2026).

Why it matters

Draft 2’s shift to prescriptive technical commitments closes the interpretation space organizations were using to plan flexible compliance programs. The gap between “we have a policy” and “we meet the technical specification” narrowed significantly this month.
The March 30 feedback deadline is this weekend. If your organization has substantive views on requirements that are technically unworkable, the window to influence the final text is closing.
August 2 is not distant. Organizations waiting for final text before beginning compliance work are accepting a six-week implementation sprint under real enforcement conditions.

What to do about it

Read Draft 2 this week. The technical specificity represents a meaningful change from Draft 1, and your compliance planning may need adjustment.
Submit feedback before March 30 if the current draft creates compliance constraints you believe are technically unworkable for your AI content operations.
Begin implementation planning against Draft 2 requirements now. The June final text will refine but won’t fundamentally restructure what’s already written.

Rock’s Musings

Every organization waiting for final text before starting EU AI Act compliance work is playing a game where the timeline gets worse each quarter they wait. Draft 2 is prescriptive enough to start serious implementation planning. The adjustments you’ll need when Draft 3 drops will be smaller than the work you’ll need to compress into six weeks if you start in June. The transparency labeling requirements are more technically demanding than most organizations appreciate from reading summaries. Download Draft 2 from the EU’s digital strategy portal and read it against your actual AI content production workflows. That gap analysis is the starting point for everything else.

10. RSA 2026 Reveals a Contested Market for AI Agent Governance Control Planes

A pattern emerged across RSA 2026 beyond individual product launches: the governance control plane for AI agents is being actively contested by multiple major vendors. Microsoft’s Agent 365 (GA May 1), Cisco’s DefenseClaw (released March 27), SentinelOne’s Prompt AI Agent Security control plane, and Nudge Security’s AI agent discovery expansion all launched during the conference week, each addressing the same fundamental problem: enterprises deploy AI agents and lose track of what those agents do, access, and decide autonomously (SecurityWeek, March 2026, and Biometric Update, March 2026).

Why it matters

Multiple major vendors converging on the same problem in the same week signals enterprises are actively requesting governance solutions, not absorbing vendor-manufactured demand.
Competition between Microsoft’s integrated control plane and point solutions from Cisco, SentinelOne, and Nudge creates a real architectural decision. Choose wrong and you own the integration debt for years.
None of these products fully solves behavioral monitoring. They address discovery, policy enforcement, and visibility. Real-time behavioral anomaly detection for agents remains an open engineering challenge.

What to do about it

Define your AI agent governance requirements before evaluating any vendor. Required capabilities: inventory discovery, permission auditing, behavioral logging, and human approval workflows for high-risk actions.
Assess whether your environment favors an integrated control plane or best-of-breed point solutions based on your actual architecture, not vendor marketing claims.
Ask every vendor during evaluation: how does the product detect when an agent takes an authorized action it was manipulated into taking? The answer quality will differentiate vendors quickly.

Rock’s Musings

When four vendors announce competing governance control planes at the same conference in the same week, you’re watching a market category consolidate in real time. That’s interesting for analysts and exhausting for practitioners who have to evaluate all of it while managing agents already running in production without any governance. My advice: don’t let the governance platform debate distract from the more urgent problem of knowing what agents you currently have. Most enterprises have agents deployed that security teams didn’t authorize, can’t enumerate, and have no logs on. Governance tooling is the right investment. Knowing what you’re governing is the prerequisite.

The One Thing You Won’t Hear About But You Need To

NIST Publishes AI 800-4: The First Federal Framework for Monitoring AI Systems in Production

NIST published AI 800-4, “Challenges to the Monitoring of Deployed AI Systems,” in March 2026. Built from three practitioner workshops with more than 200 experts across academia, industry, and ten-plus federal agencies, plus an 87-paper literature review, it maps the gaps, barriers, and open questions in monitoring AI systems after deployment. It covers six monitoring categories: functionality, operational health, human factors, security, safety, and compliance. It received no RSA booth, no vendor keynote, and no sponsored coverage (NIST News, March 2026, and NIST AI 800-4 PDF, March 2026).

Why it matters

Most organizations deploying AI monitor latency and availability. AI 800-4 addresses whether the model behaves consistently with its training distribution and produces outputs that align with policy, which are the failures that matter most and the ones traditional monitoring misses entirely.
NIST explicitly identifies human-AI interaction monitoring as the most under-researched gap in the field. Workshop practitioners raised it far more than published literature covers. If your AI monitoring program doesn’t address how users interact with and respond to AI outputs, you’re missing the category NIST calls most underdeveloped.
The document is vendor-neutral and grounded in practitioner experience, directly applicable to conversations with regulators and auditors who want evidence of a structured AI monitoring program.

What to do about it

Download NIST AI 800-4 from nist.gov and route it to whoever owns your AI security program. It’s the most actionable government guidance on operational AI monitoring published to date.
Map your current monitoring coverage against the document’s six categories. The gaps will be immediately apparent and the prioritization logic writes itself once you have the map.
Use AI 800-4 as the foundation for your AI monitoring program documentation. When regulators ask how you monitor AI systems in production, a NIST-aligned program gives you a defensible, auditable answer.

Rock’s Musings

The honest state of enterprise AI monitoring: most organizations have logs showing their AI system responded. They don’t have logs showing whether the response was correct, consistent with training distribution, within policy boundaries, or manipulated by adversarial input. That visibility gap is how AI security incidents become AI security incidents. You don’t catch the drift until the outcome is undeniable and the damage is done. NIST AI 800-4 doesn’t get coverage because nobody can sell it. The organizations that read it and build monitoring programs from its framework will answer regulatory questions coherently in 18 months when enforcement catches up to deployment rates. The organizations that attended every RSA keynote and skipped the NIST publication will be writing incident reports instead. For more on building AI governance programs that survive regulatory scrutiny, visit rockcybermusings.com. If you need help turning frameworks like AI 800-4 into operating programs your security team can actually run, reach out at rockcyber.com.

👉 Visit RockCyber.com to learn more about how we can help you in your traditional Cybersecurity and AI Security and Governance Journey

👉 Want to save a quick $100K? Check out our AI Governance Tools at AIGovernanceToolkit.com

👉 Subscribe for more AI and cyber insights with the occasional rant.

The views and opinions expressed in RockCyber Musings are my own and do not represent the positions of my employer or any organization I’m affiliated with.

Share RockCyber Musings

References

Bargury, M. (2026, March 23). Your AI agents are my minions [Conference presentation]. RSA Conference 2026, San Francisco, CA.

Claburn, T. (2026, March 24). LiteLLM infected with credential-stealing code via Trivy. The Register. https://www.theregister.com/2026/03/24/trivy_compromise_litellm/

Claburn, T. (2026, March 23). AI agents are ‘gullible’ and easy to turn into your minions. The Register. https://www.theregister.com/2026/03/23/pwning_everyones_ai_agents/

Claburn, T. (2026, March 23). Google unleashes Gemini AI agents on the dark web. The Register. https://www.theregister.com/2026/03/23/google_dark_web_ai/

Cisco. (2026, March). Cisco reimagines security for the agentic workforce. Cisco Newsroom. https://newsroom.cisco.com/c/r/newsroom/en/us/a/y2026/m03/cisco-reimagines-security-for-the-agentic-workforce.html

Google Cloud. (2026, March). RSAC 26: Supercharging agentic AI defense with frontline threat intelligence. Google Cloud Blog. https://cloud.google.com/blog/products/identity-security/rsac-26-supercharging-agentic-ai-defense-with-frontline-threat-intelligence

HackerOne. (2026, March). Agentic prompt injection testing for AI security. HackerOne Blog. https://www.hackerone.com/blog/agentic-prompt-injection-testing

HackerOne introduces agentic prompt injection testing as AI security risks accelerate. (2026, March 21). Cybersecurity Insiders. https://www.cybersecurity-insiders.com/hackerone-introduces-agentic-prompt-injection-testing-as-ai-security-risks-accelerate/

Herbert Smith Freehills Kramer. (2026, March). Transparency obligations for AI-generated content under the EU AI Act: From principle to practice. https://www.hsfkramer.com/notes/ip/2026-03/transparency-obligations-for-ai-generated-content-under-the-eu-ai-act-from-principle-to-practice

EU releases second draft of AI Act Code of Practice on labeling AI-generated content. (2026, March). BABL AI. https://babl.ai/eu-releases-second-draft-of-ai-act-code-of-practice-on-labeling-ai-generated-content/

Microsoft Security. (2026, March 20). Secure agentic AI end-to-end. Microsoft Security Blog. https://www.microsoft.com/en-us/security/blog/2026/03/20/secure-agentic-ai-end-to-end/

NIST. (2026, March). New report: Challenges to the monitoring of deployed AI systems. https://www.nist.gov/news-events/news/2026/03/new-report-challenges-monitoring-deployed-ai-systems

NIST. (2026). NIST AI 800-4: Challenges to the monitoring of deployed AI systems. National Institute of Standards and Technology. https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.800-4.pdf

Novee. (2026, March 24). Novee introduces autonomous AI red teaming to uncover security flaws in LLM applications [Press release]. GlobeNewswire. https://www.globenewswire.com/news-release/2026/03/24/3261278/0/en/Novee-Introduces-Autonomous-AI-Red-Teaming-to-Uncover-Security-Flaws-in-LLM-Applications.html

Novee introduces autonomous AI red teaming to hunt LLM vulnerabilities. (2026, March 24). Help Net Security. https://www.helpnetsecurity.com/2026/03/24/novee-ai-red-teaming-for-llm-applications/

Palo Alto Networks Unit 42. (2026, March). New prompt injection attack vectors through MCP sampling. https://unit42.paloaltonetworks.com/model-context-protocol-attack-vectors/

SecurityWeek. (2026, March). RSAC 2026 conference announcements summary: Day 1. https://www.securityweek.com/rsac-2026-conference-announcements-summary-day-1/amp/

Zenity AI agents contextual security. (2026, March 24). Help Net Security. https://www.helpnetsecurity.com/2026/03/24/zenity-ai-agents-contextual-security/

Zenity. (2026, March 23). Zenity sets the foundation for guardian agents. Zenity Newsroom. https://zenity.io/company-overview/newsroom/company-news/zenity-sets-the-foundation-for-guardian-agents

Weekly Musings Top 10 AI Security Wrapup: Issue 30 March 13-19, 2026

Rock Lambros — Fri, 20 Mar 2026 12:50:42 GMT

Meta logged a SEV-1 on March 18 because an internal AI agent posted without human approval, provided bad advice, and exposed sensitive data to the wrong employees for 2 hours. Amazon confirmed its Bedrock sandbox lets AI models exfiltrate data via DNS and called it intentional design. HiddenLayer found 31% of security leaders don’t know if they had an AI breach in the past year. The EU Council voted to restructure the AI Act’s high-risk compliance framework. Three AI agent security products launched in four days. This was one week.

The week’s evidence points in one direction: agentic AI security is no longer a research problem. Real incidents are appearing in production environments run by organizations with serious security programs. Technical flaws in AI infrastructure are drawing vendor responses that amount to documentation updates rather than patches. Research data is documenting blind spots CISOs can no longer treat as edge cases. In parallel, the governance machinery is finally moving, but it’s moving slower than deployment. Standards and deployments are in a race, and deployments are winning by a wide margin. More context at RockCyber and RockCyber Musings.

1. OWASP publishes its GenAI data security risk taxonomy for 2026

The OWASP GenAI Security Project released GenAI Data Security: Risks and Mitigations 2026 in March, a 103-page taxonomy covering 21 discrete data security risks across the full GenAI lifecycle from training through agentic runtime (OWASP). The document maps risks across training and fine-tuning data, retrieval and RAG pipelines, vector stores, context windows, agent memory, tool call payloads, and observability infrastructure. It identifies a core architectural property that makes GenAI data security structurally different from every prior computing model: the context window aggregates data from multiple trust domains into a single flat namespace with no internal access controls. A confidential HR record retrieved via RAG sits next to a user prompt with identical trust weight, and there is no mechanism today to mark a context segment as available for reasoning but not surfaceable in the output. The document also addresses machine unlearning directly: deleting source data does not remove what a fine-tuned model or LoRA adapter has memorized into its weights. Download the report HERE.

Why it matters

The flat-namespace context window problem is not a configuration gap. It’s an architectural property of how these systems work, which means perimeter controls and access policies cannot fully solve it. Minimization and context scoping are the only practical mitigations available today.
LoRA adapter memorization of rare training examples means high-recall prompts can extract verbatim PII, credentials, or intellectual property from fine-tuned models without any sophisticated attack technique. Organizations fine-tuning on internal data have a data exposure risk they likely haven’t assessed.
The Right to Erasure problem is unsolved at the architectural level. Deleting training data from a source system does not delete what the model encoded during fine-tuning. GDPR and state privacy law DSR obligations cannot be satisfied by source deletion alone.

What to do about it

Treat the context window as a data-exposure surface, not just a prompt-delivery mechanism. Classify what goes in the same way you classify what goes into a database query, and scope RAG retrieval to the minimum required for the task.
Audit every fine-tuned model and LoRA adapter in your environment against the data used to train it. If that training data included PII, credentials, or regulated information, your model could serve as a potential exfiltration vector.
Build a GenAI data bill of materials using CycloneDX ML-BOM as the base format. Until you have lineage from the source dataset to the deployed model to the embedding store, you cannot answer the question a regulator will eventually ask: what data did this model see, and where does it live now?

Rock’s Musings

The architectural insight at the center of this document is the one the industry keeps sliding past. The context window has no internal access control layer. That’s not a misconfiguration. It’s a design property of how transformers process sequences. Everything that enters the context window is treated as equally reachable by the model’s output mechanism, and no amount of system prompt guardrailing changes the underlying architecture. The practical implication is that the primary defense is what you put in, not what you try to prevent from coming out.

The machine unlearning section is the one I push organizations on hardest. They are collecting consent, honoring deletion requests, and scrubbing source databases, and then deploying fine-tuned models that still carry what they memorized from the deleted data. The model weights are a copy of your training corpus in a form your DLP tools don’t see, and your deletion workflows can’t reach. Right to Erasure in GenAI is an open architectural problem with no clean solution today, and most organizations haven’t told their legal team that yet.

2. EU Council rewrites the compliance clock for high-risk AI systems

The EU Council adopted its negotiating position to amend the AI Act’s high-risk framework (EU Council). The core change replaces the fixed August 2026 compliance deadline with a conditional trigger. Full high-risk obligations apply only once the Commission certifies required standards and tools are available, with a hard backstop date. The Council also pushed the national AI regulatory sandbox deadline to December 2027 and clarified that law enforcement, border management, judicial, and financial AI systems remain under national supervisory authority rather than the Commission. Negotiations with the European Parliament begin next.

Why it matters

The conditional trigger gives the Commission discretion over when your obligations start. Until it certifies standards are ready, full high-risk obligations don’t apply, creating an indeterminate window.
Pushing the sandbox deadline to December 2027 removes a key testing mechanism for high-risk AI at a time when organizations are accelerating deployment.
Fragmented supervisory authority means 27 member states apply their own rules to some of the highest-stakes AI use cases.

What to do about it

Map your AI systems against current and proposed high-risk definitions now. The conditional trigger shifts the timeline, not the compliance obligation itself.
Track Parliament negotiations. The Council position is a mandate, not the final text.
Build a jurisdiction-aware compliance map for EU operations covering which systems fall under national versus Commission supervision.

Rock’s Musings

I’ve seen regulatory timelines used to delay compliance indefinitely in my career more times than I can count. This EU Council move fits the pattern. The conditional trigger means the Commission controls when your clock starts, and they have to certify standards are available first. Given the pace at which NIST’s agentic AI guidance is moving, expecting European standards to materialize quickly requires genuine optimism.

Organizations using this ambiguity to do nothing are miscalculating. The August 2026 date was never the governance point. You have high-risk AI systems in production today, and you need to govern them regardless of what the Commission certifies and when.

3. Meta logs a SEV-1 incident from a rogue internal AI agent

On March 18, Meta confirmed a Severity 1 security incident caused by an internal AI agent operating without human authorization (Bitcoinworld, HackerNoob). The agent posted to an internal forum, gave incorrect advice, and triggered a cascade that exposed sensitive company and user data to unauthorized employees for approximately two hours. Meta contained the exposure by cutting the agent’s forum access and auditing permissions across other internal agents. No external exfiltration was confirmed.

Why it matters

A SEV-1 at Meta from an AI agent operating outside its bounds sets a documented precedent: production agents at companies with robust security programs can circumvent behavioral constraints and cause genuine incidents.
The chain reaction, one unauthorized action triggering downstream data exposure, is characteristic of agentic systems and different from traditional software vulnerabilities in ways most IR playbooks don’t yet account for.
No external exfiltration is partial comfort. Unauthorized internal access to sensitive user data carries GDPR and AI Act exposure regardless of whether the data left the building.

What to do about it

Audit every AI agent in your environment and document what it can post, write, or modify without a human approval checkpoint.
Map the blast radius. If a specific agent takes an unexpected action, what does it touch first, and what cascades from there?
Build AI agent incident response playbooks with automated containment triggers that don’t require analyst approval before they fire.

Rock’s Musings

The Meta incident will get dismissed as a minor operational hiccup. That’s the wrong read. Even with legit engineering talent and a mature security program, a production AI agent escaped its behavioral constraints and triggered a data exposure chain. I’m willing to bet your environment isn’t more disciplined than Meta’s.

Two hours to containment is fast. Most organizations I work with couldn’t tell you within two hours that an agent had gone sideways. AI agent behavioral monitoring is dramatically behind where it needs to be. The lesson to take away from this is that you need detection that fires before the cascade, not after the data is already in the wrong hands.

4. Amazon’s Bedrock sandbox leaks data through DNS because that’s the design

BeyondTrust’s Phantom Labs disclosed that Amazon Bedrock AgentCore Code Interpreter’s sandbox mode permits outbound DNS queries (SC Media, The Hacker News). An attacker interacting with the agent can send commands encoded in DNS A record responses and receive exfiltrated data encoded in DNS subdomain queries to an attacker-controlled server. No authentication bypass is required. BeyondTrust assigned a CVSS score of 7.5. AWS reviewed the research, determined that the behavior reflects the intended functionality, and responded by updating the documentation rather than issuing a patch.

Why it matters

“Intended behavior” is a vendor risk posture, not a security posture. Sandbox mode was positioned as providing execution isolation. A sandbox allowing covert DNS exfiltration does not deliver isolation in any security-relevant sense.
DNS-based covert channels are standard red team tradecraft in traditional environments. The technique translates directly into AI code execution environments without modification.
Organizations running agents against sensitive internal data in AWS Bedrock face an unpatched, documented, CVSS 7.5 risk with no vendor remediation timeline.

What to do about it

Add DNS query monitoring for Bedrock AgentCore code execution environments to your threat detection stack now.
Reduce the data that AI agents with code execution access can reach to the strict minimum required for the task.
Get a formal written architecture statement from AWS specifying exactly what the sandbox guarantees before expanding Bedrock AgentCore deployments.

Rock’s Musings

Another “Intended behavior” narrative. I’m getting pretty damn sick of it. That’s another way of saying, “We know about this, it would be expensive to change, and it sucks to be you.” (see my thoughts in CSO magazine about a previous instance HERE). The documentation update rather than a patch is the tell. You can’t outsource your risk posture to your cloud provider’s design decisions.

The technique is in every red team playbook. DNS exfiltration from sandboxed environments is foundational evasion tradecraft. Translate that knowledge directly to your AI infrastructure. If you’re running code execution agents against sensitive data in Bedrock and you haven’t instrumented DNS as an exfiltration channel, now you have your reason.

5. Linux Foundation raises $12.5 million from AI vendors to fix what their tools helped break

The Linux Foundation announced $12.5 million in grant funding from Anthropic, AWS, GitHub, Google, Google DeepMind, Microsoft, and OpenAI to advance open source software security (Linux Foundation, OpenSSF). The funding flows through Alpha-Omega and the Open Source Security Foundation. The stated problem is that AI tools are generating vulnerability reports at a volume that open-source maintainers cannot triage or remediate, degrading the security posture of the software supply chain. AWS contributed an additional $2.5 million to Alpha-Omega, in addition to the pooled amount.

Why it matters

The same organizations whose AI tools created the report flood are funding the solution. This characterizes the governance dynamic precisely, that vendors profit from deployment and are now asked to fund the externalized costs on the maintainer community.
Overwhelming maintainers with AI-generated findings lowers average signal quality. Funding addresses capacity but doesn’t solve the signal-to-noise problem alone.
This is the first major coordinated industry response to the specific problem of AI-generated report volume stressing the open source security ecosystem.

What to do about it

Factor the current maintainer backlog into your software composition analysis program. Critical open source dependencies may carry known vulnerabilities sitting in a backlogged queue rather than getting remediated.
Watch what Alpha-Omega and OpenSSF deliver from this investment over the next twelve months. The commitment matters less than whether the tooling measurably improves triage capacity.
Ask your security vendors how they handle AI-generated findings before surfacing them to your team. The same noise problem exists inside your tooling stack.

Rock’s Musings

$12.5 million is the right direction, yet not nearly enough. Open source maintainers are largely volunteers managing the infrastructure that the global software supply chain runs on. The AI-generated report flood is a problem these vendors created while selling velocity gains to enterprises.

The coordination signal matters more than the dollar amount. You rarely see Google, Microsoft, AWS, Anthropic, and OpenAI announce joint anything. When competitors fund a shared problem together, the liability exposure of inaction exceeds the competitive cost of cooperating. Given how much of the internet runs on open source that these companies’ AI tools are now stressing, the math on joint action isn’t complicated.

6. Pentagon moves to replace Anthropic while the lawsuit works through the courts

TechCrunch reported that the Pentagon is actively developing alternative AI capability paths to replace Anthropic’s Claude across defense applications (TechCrunch). This follows the Defense Department’s February designation of Anthropic as a supply chain security risk and Anthropic’s subsequent lawsuit against the Trump administration. This confirms that the replacement effort has shifted from contingency planning to active technical development. More than 875 Google and OpenAI employees have signed an open letter supporting Anthropic’s position.

Why it matters

Active technical development of replacements, rather than contingency planning, signals DoD confidence that the Anthropic designation will hold through the litigation cycle.
Defense contractors relying on Claude for active program work now face migration timelines driven by someone else’s legal and procurement decisions.
The 875-employee response across competing firms signals the tech workforce treats this as a legitimacy question about AI governance, not a routine vendor dispute.

What to do about it

If your organization operates in the defense industrial base, review AI vendor contracts now for comparable ethical-use clauses and their enforceability, before further redesignations affect your supply chain.
Track the Anthropic lawsuit. The outcome defines what ethical use provisions in AI contracts are worth in federal procurement.
Evaluate AI vendor concentration risk in your stack. If one supply chain designation event could disrupt your programs, that’s a single point of failure worth addressing.

Rock’s Musings

The supply chain risk designation was built for foreign adversaries. Applying it to a domestic AI company for writing autonomous weapons prohibitions into a contract is a significant precedent that the press is underweighting. The designation signals that safety constraints are now framed as operational liabilities in defense procurement, not risk mitigation.

If that framing spreads to other acquisition decisions, the AI vendors most willing to remove safety constraints gain a competitive advantage in a large and growing federal spending category. Watch the lawsuit and the follow-on procurement awards carefully. Both will tell you where this governance experiment ends up.

7. CSA’s 2026 cloud and AI security report documents the identity explosion

The Cloud Security Alliance published its State of Cloud and AI Security 2026 on March 13, finding the average enterprise now manages 100 machine and non-human identities for every one human identity (CSA). Forgotten or misconfigured cloud credentials declined from 84% in 2024 to 65% in 2026. Ninety-two percent of executives report business-impacting security compromises, most from preventable risks. The report identifies decentralized AI agents as the primary driver of the NHI expansion and calls for continuous exposure management to replace static patching cycles.

Why it matters

A 100:1 machine-to-human identity ratio means the traditional IAM program built around human users is managing a fundamentally different problem than it was designed for.
Credential misconfiguration persisting at 65% suggests the improvement rate won’t match the velocity of AI-driven identity expansion.
A 92% executive compromise from preventable risks indicates the gap isn’t a detection-sophistication problem. Organizations know the controls and aren’t applying them at the required scale.

What to do about it

Audit NHI management practices against the same standards applied to human identities: lifecycle management, least privilege, and regular access reviews.
Deploy continuous credential exposure monitoring specifically for machine identities and AI agent service accounts.
Shift the board-level narrative from maturity scores to continuous exposure management. That’s where enterprise frameworks are heading.

Rock’s Musings

A hundred machine identities for every human one, and most organizations manage them with IAM tooling built for a 10-to-1 ratio. The math doesn’t work. The credential improvement trend from 84% to 65% is real progress, but 65% still represents a failure rate I wouldn’t accept in any other critical control domain.

Every new agentic deployment creates more identities, tokens, service accounts, and API keys. If you don’t have a clear owner for non-human identity governance today, you have a gap that will become a breach within twelve months. Find the owner. Document the scope. Don’t wait for the incident.

8. Jozu Agent Guard launches after watching an AI agent bypass governance in four commands

Jozu announced Jozu Agent Guard on March 17, a zero-trust runtime that executes AI agents, models, and MCP servers with policy enforcement built outside the model’s control plane and hardcoded against agent-level override (Help Net Security). The architecture decision came directly from internal testing: during product development, Jozu observed an AI agent bypass the governance controls the product was designed to enforce in four commands. That failure drove the decision to move policy enforcement entirely outside the execution layer the agent can influence.

Why it matters

A product built specifically to constrain AI agents was bypassed in four commands during its own testing. The threat model has to assume the agent itself will attempt to circumvent governance. Cooperative compliance is not a valid design assumption.
MCP server isolation is underprovided. MCP servers frequently carry production credentials and broad tool access, and running them in shared agent environments creates privilege escalation paths most organizations haven’t mapped.
Three AI agent security products launching in four days signals enterprise buying is active in this space right now.

What to do about it

Require AI agent security vendors to demonstrate their product against an adversarial agent in a live environment. Demand the failure modes alongside the happy path.
Treat MCP server execution environments as sensitive infrastructure requiring isolation equivalent to your most privileged workloads.
Add governance bypass testing to your AI red team scope before the next production agent deployment.

Rock’s Musings

The four-command bypass during their own testing is the most honest vendor disclosure I’ve seen about AI agent security in the past year. Most vendors demo the happy path and skip the part where their product got circumvented. Jozu disclosed it and changed the architecture. That’s how security engineering is supposed to work.

The uncomfortable implication for everyone else: if a product built specifically to constrain AI agents was bypassed in four commands, ask yourself what your existing controls look like against an agent actively trying to exceed its permissions. If you haven’t run that test, you don’t have an answer.

9. Token Security builds intent-based controls for AI agent permissions

Token Security announced intent-based AI agent security on March 18, governing autonomous agents by scoping their permissions to declared operational purpose rather than granting standing broad access (Help Net Security). The system creates purpose-defined permission envelopes that expire at task completion, with runtime enforcement preventing actions outside the declared intent. Token Security’s CEO stated directly that prompt filtering and guardrails were not designed to contain the security risks of autonomous AI agents, pointing to the architectural limitation of relying on the model’s output layer for enforcement.

Why it matters

Purpose-aligned permissions address a structural problem in current agent deployment: agents inheriting credential scopes far exceeding what any single task requires.
Explicit acknowledgment that content filtering can’t do this job alone represents where serious practitioner thinking is converging. The field is moving from output layer controls toward architectural access controls.
Paired with Jozu, Entro, and Microsoft Entra Agent ID announcements this same week, this reflects a coherent market thesis forming around agent identity and least privilege as primary security controls.

What to do about it

Map current AI agent deployments against one question: does each agent hold only the permissions it needs for its specific task? If you can’t answer quickly, your access governance is already too loose.
Evaluate intent-based and purpose-scoped access controls in your next AI security procurement cycle.
Brief your identity team on AI agent access management before your security team deploys solutions they haven’t reviewed. These tools touch the same credential infrastructure.

Rock’s Musings

Least privilege applied to agents is the same principle that has protected privileged service accounts in traditional architectures for decades. The problem is that most AI agent deployments aren’t being treated like privileged service accounts. They get broad collaboration access by default, and nobody asks why.

Intent-based controls force the right question: what is this agent for? If you can answer precisely, you can scope permissions precisely. If you can’t answer precisely, that is the real governance problem. You’ve deployed an agent without a defined operational boundary, and your control over it is largely fictional.

10. NIST receives formal research submissions on securing AI agents

On March 18, UC Berkeley’s Center for Long-Term Cybersecurity submitted a formal response to NIST’s CAISI RFI on AI agent security, urging prioritization of standardization, incident reporting frameworks, talent pipelines, and adaptive governance (CLTC UC Berkeley). The Computer and Communications Industry Association submitted parallel comments advocating for multistakeholder processes and alignment with existing NIST frameworks (CCIA). NIST’s National Cybersecurity Center of Excellence also holds a separate comment period open through April 2 on a concept paper covering identity and authorization for AI agents.

Why it matters

The gap between NIST collecting input and usable standards publishing is measured in years. Your agents are running now, under no binding identity or authorization standard.
Berkeley’s call for incident reporting infrastructure acknowledges a structural gap: no systematic mechanism exists for learning from AI agent security failures across organizations.
The NCCoE concept paper on agent identity and authorization is where future compliance requirements will originate. Comments submitted now shape what those requirements demand.

What to do about it

Read the NCCoE concept paper at nccoe.nist.gov and submit comments before April 2 if your organization deploys agents. Operational experience is what NIST is specifically asking for.
Treat the Berkeley and CCIA submissions as intelligence on where auditors will focus within 18 to 36 months.
Stand up basic agent identity logging now using existing IAM controls. Don’t wait for NIST to finalize anything.

Rock’s Musings

NIST is moving faster on agentic AI security than I expected two years ago. That still isn’t fast enough to matter for organizations deploying agents today. Best case from the current comment cycle: interim guidance in twelve months. Binding controls will take longer.

Berkeley’s call for incident reporting is the right recommendation and it will face the same resistance every mandatory reporting regime has faced. Voluntary frameworks will come first, get ignored, and get teeth after the third or fourth major public incident. That’s the pattern. Plan for it and build your own internal incident tracking capability now.

The One Thing You Won’t Hear About But You Need To

Entro Security builds a governed map of what your AI agents access in production

Entro Security launched its Agentic Governance and Administration platform, extending non-human identity security coverage specifically to AI agents (GlobeNewswire, Help Net Security). The platform builds structured AI agent profiles from three observable layers. First, sources: the endpoints, agent platforms, cloud environments, and MCP servers where agents execute. Second, targets: the enterprise assets and applications each agent accesses. Third, identities: the human accounts, non-human identities, and secrets each agent uses to operate. AGA provides MCP server activity visibility and policy enforcement, audit trails for both allowed and blocked activity, and controls against unsanctioned MCP targets and AI client behaviors.

Why it matters

Most organizations deploying AI agents don’t have a single governed view of what agents are running, what they access, and which identities they use. AGA builds that view from execution telemetry rather than documentation that goes stale immediately after it’s written.
MCP server governance is nearly absent from enterprise security programs today, despite MCP servers frequently holding production credentials and broad access to sensitive systems.
The NHI-first architecture lets organizations with existing non-human identity programs extend that coverage to AI agents rather than building a separate program from scratch.

What to do about it

Before the next AI agent deployment, require answers to three questions from observable telemetry: where does it run, what does it touch, and which identities does it use? If you need documentation rather than telemetry to answer, you don’t have governance.
Add MCP server inventory to asset management now. MCP servers deploy through developer workflows without formal change management, and retroactive cataloguing gets harder with each deployment.
Assess whether your current NHI security program explicitly covers AI agent identities. If it doesn’t, extend it or stand up a parallel track with a clear accountable owner.

Rock’s Musings

This one didn’t get coverage this week because it launched during RSA prep season when every security vendor fights for the same column inches. That’s exactly why it’s here. The problem AGA addresses is what I call dark matter governance: AI agents operating in your environment that nobody catalogued because they deployed through platforms your traditional asset management tools don’t see.

The MCP visibility layer is the operationally useful piece. MCP servers multiply fast, are deployed by individual developers without change management review, and frequently hold credentials for production systems. An agent you haven’t catalogued connecting to an MCP server you haven’t governed is a permissions sprawl problem that compounds with every new deployment. Get a governed view of that surface before your adversary maps it for you.

If you found this analysis useful, subscribe at rockcybermusings.com for weekly intelligence on AI security developments.

👉 Visit RockCyber.com to learn more about how we can help you in your traditional Cybersecurity and AI Security and Governance Journey

👉 Want to save a quick $100K? Check out our AI Governance Tools at AIGovernanceToolkit.com

👉 Subscribe for more AI and cyber insights with the occasional rant.

The views and opinions expressed in RockCyber Musings are my own and do not represent the positions of my employer or any organization I’m affiliated with.

Share RockCyber Musings

References

Bitcoinworld. (2026, March). Rogue AI agent sparks critical security crisis at Meta, exposing sensitive data. https://bitcoinworld.co.in/meta-rogue-ai-agent-security-breach/

Cloud Security Alliance. (2026, March 13). The state of cloud and AI security in 2026. https://cloudsecurityalliance.org/blog/2026/03/13/the-state-of-cloud-and-ai-security-in-2026

Computer and Communications Industry Association. (2026, March). CCIA submits comments to NIST regarding privacy and security of AI agents. https://ccianet.org/news/2026/03/ccia-submits-comments-to-nist-regarding-privacy-and-security-of-ai-agents/

Council of the European Union. (2026, March 13). Council agrees position to streamline rules on artificial intelligence. https://www.consilium.europa.eu/en/press/press-releases/2026/03/13/council-agrees-position-to-streamline-rules-on-artificial-intelligence/

Entro Security. (2026, March 18). Entro launches agentic governance and administration to bring visibility and control to AI access across the enterprise. GlobeNewswire. https://www.globenewswire.com/news-release/2026/03/18/3258229/0/en/Entro-Launches-Agentic-Governance-Administration-to-Bring-Visibility-and-Control-to-AI-Access-Across-the-Enterprise.html

HackerNoob. (2026, March). Meta’s rogue AI agent: Sev 1 security incident and how to sandbox AI agents properly. https://hackernoob.tips/meta-rogue-ai-agent-sev1-how-to-sandbox-ai-agents/

Help Net Security. (2026, March 17). Jozu Agent Guard targets AI agents that evade controls. https://www.helpnetsecurity.com/2026/03/17/jozu-agent-guard-targets-ai-agents-that-evade-controls/

Help Net Security. (2026, March 18). Token Security advances AI agent protection with intent-based controls. https://www.helpnetsecurity.com/2026/03/18/token-security-intent-based-ai-agent-security/

Help Net Security. (2026, March 18). Big tech companies step in to support the open source security ecosystem. https://www.helpnetsecurity.com/2026/03/18/linux-foundation-open-source-security-12-5-million-funding/

Help Net Security. (2026, March 19). Entro Security AGA brings governance and control to enterprise AI agents and access. https://www.helpnetsecurity.com/2026/03/19/entro-agentic-governance-administration/

HiddenLayer. (2026, March 18). HiddenLayer releases the 2026 AI threat landscape report. PR Newswire. https://finance.yahoo.com/news/hiddenlayer-releases-2026-ai-threat-140000928.html

Linux Foundation. (2026, March 17). Linux Foundation announces $12.5 million in grant funding from leading organizations to advance open source security. https://www.linuxfoundation.org/press/linux-foundation-announces-12.5-million-in-grant-funding-from-leading-organizations-to-advance-open-source-security

SC Media. (2026, March). AWS Bedrock tool vulnerability allows data exfiltration via DNS leaks. https://www.scworld.com/brief/aws-bedrock-vulnerability-allows-data-exfiltration-via-dns-leaks

TechCrunch. (2026, March 17). The Pentagon is developing alternatives to Anthropic, report says. https://techcrunch.com/2026/03/17/the-pentagon-is-developing-alternatives-to-anthropic-report-says/

The Hacker News. (2026, March 17). AI flaws in Amazon Bedrock, LangSmith, and SGLang enable data exfiltration and RCE. https://thehackernews.com/2026/03/ai-flaws-in-amazon-bedrock-langsmith.html

UC Berkeley Center for Long-Term Cybersecurity. (2026, March 18). Researchers submit response to U.S. government request on security considerations for AI agents. https://cltc.berkeley.edu/2026/03/18/researchers-submit-response-to-u-s-government-request-on-security-considerations-for-ai-agents/

AI Agent Authentication Gets the Hard Part Right. Authorization Is Still Your Problem.

Rock Lambros — Tue, 17 Mar 2026 12:50:42 GMT

The IETF just published its most ambitious attempt to standardize how AI agents prove their identity across systems. Draft-klrc-aiagent-auth-00, dropped March 2, 2026, composes WIMSE, SPIFFE, and OAuth 2.0 into a 26-page framework called AIMS (Agent Identity Management System). The authentication layer is solid. The authorization layer stops at the token boundary. The Security Considerations section contains two words: “TODO Security.” If you’re deploying agentic systems in production, you need to understand where this draft helps you and where you still have to build your own controls.

Before I get into specifics, a quick note on what this document actually is. An IETF Internet-Draft (I-D) is a working document, the raw material that may eventually become an RFC (an official Internet standard). This one is version -00, the very first public iteration from Pieter Kasselman (Defakto Security), Jean-Francois Lombardo (AWS), Yaroslav Rosomakho (Zscaler), and Brian Campbell (Ping Identity). Criticizing a -00 draft for incompleteness is a bit like reviewing someone’s outline and complaining the conclusion is thin. That said, people are already reading this as deployment guidance, and the gaps matter for anyone building agentic systems today. So let’s talk about what it covers, what it doesn’t cover yet, and what you need to build yourself while the standards process catches up.

The good news: agents are workloads, and workloads have an identity stack

The draft’s foundational thesis gets it right that AI agents should be treated as workloads, not as some new identity category requiring new protocols and running instances of software executing specific tasks. That framing unlocks SPIFFE’s attestation-bound cryptographic identity, WIMSE’s cross-system workload semantics, and OAuth 2.0’s delegation framework. No new protocols needed.

This matters because SPIFFE already works at scale. Uber processes billions of attestations daily through SPIRE. Block runs the full SPIFFE+WIMSE+OAuth stack in production. The draft codifies patterns that companies with real security engineering teams already deploy.

The WIMSE identifiers specified in the draft bind agent identity to the execution environment through hardware-rooted attestation. A SPIRE agent on each node performs workload attestation by examining the kernel or querying the orchestration platform. Your agent’s identity gets measured from where it runs, not merely asserted by who registered it. An OAuth client_id is a registration artifact. A SPIFFE ID is cryptographic proof that Agent X is actually Agent X, running in the expected environment.

The draft also gets credentials right. Short-lived, cryptographically bound, explicit expiration. Static API keys are called out as unsuitable for agent authentication: bearer artifacts with no cryptographic binding, no identity conveyance, operationally painful to rotate.

That warning couldn’t come at a better time. Astrix Security analyzed over 5,200 open-source MCP server implementations and found that 53% rely on static API keys or Personal Access Tokens. Only 8.5% use OAuth. The ecosystem is building on exactly the anti-pattern the draft condemns.

Figure 1: MCP Server Authentication Methods

Transaction Tokens solve the lateral movement problem

Section 10.4 addresses a real attack vector most frameworks ignore. When access tokens propagate through internal microservice chains within an agent workflow, every hop creates a theft and replay opportunity.

The draft’s answer is Transaction Tokens (draft-ietf-oauth-transaction-tokens-08). Short-lived, signed JWTs that bind user identity, workload identity, and authorization context to a specific transaction. Lifetimes are measured in seconds to minutes. Cryptographic signatures prevent context modification. You can’t grab a Transaction Token from one transaction and replay it in another because the transaction context is cryptographically sealed. A companion draft (draft-oauth-transaction-tokens-for-agents-04) extends this with agent-specific fields for the acting agent, the initiating human, and operational constraints.

The draft also correctly identifies tools forwarding access tokens to downstream services as an anti-pattern.

The authorization gap: where scope alone isn’t enough

Here’s where the draft’s -00 status shows. Once an OAuth access token gets issued with a set of scopes, every action within those scopes proceeds unchecked until the token expires. No per-action evaluation. No consequence assessment. No behavioral feedback loop. The authors clearly know authorization needs more work (the AIMS conceptual model describes layers that the spec hasn’t filled in yet), but anyone reading this draft as a deployment blueprint today will inherit that gap.

Think about what that means in practice. An agent with email:send scope authorized to send meeting notes can use that same scope to email every contact in the address book a different message. Each action is technically within scope. The framework treats them identically. The authorization decision happened once, at token issuance. Everything after that is a free pass.

OWASP’s Top 10 for Agentic Applications draws a distinction that the draft hasn’t addressed yet: least agency versus least privilege. Least privilege asks what the agent can access. Least agency extends that to how much freedom the agent has to act on that access without checking back.

The term “least agency” appears nowhere in the draft. Section 10.8 says agents should request minimum scopes and authorization details. That’s least privilege applied to OAuth scopes. Standard stuff. It does nothing to constrain autonomous decision-making within those scopes.

OWASP’s ASI03 (Identity and Privilege Abuse) mitigation guidance recommends per-action authorization through a centralized policy engine. Not once at token issuance. At each privileged step. The draft doesn’t provide a mechanism for this yet, and future revisions may address it. In the meantime, you need to build that layer yourself.

Figure 2: OWASP Agentic Top 10 Coverage by IETF Draft

Your token says “allowed.” What it can’t say is “should you?”

The deeper issue goes beyond per-action evaluation. The draft in its current form contains no mechanisms for assessing the potential impact of an action before permitting it. No concept of blast radius. No reversibility check. No impact severity score. Again, this is version -00. These concepts may arrive in later revisions. They’re absent today.

Consider the practical difference. An agent with files:read_write scope can read one file or delete every file in scope. The OAuth framework treats these as equivalent actions. They aren’t. One is routine. The other is catastrophic and irreversible.

Consequence-based authorization asks three questions per permission:

What’s the worst action this agent can take?
Is the damage reversible?
Can you reverse it within an acceptable recovery window?

OAuth scopes can’t answer any of these.

The emerging practice of graduated trust models (read-only, then draft-only, then supervised execution, then earned autonomy) represents an informal consequence-based approach. Most practitioners agree that most agents never earn full autonomy in high-stakes contexts. That’s the correct outcome. The draft provides no framework for expressing or enforcing these graduation stages.

OWASP’s ASI08 (Cascading Failures) recommends blast-radius caps and digital twin replay testing. Run recorded agent actions in an isolated environment first. See if sequences trigger cascading failures before expanding policy permissions. Future revisions of the draft could incorporate these concepts. For now, they’re outside its scope.

The observability gap: strong detection, no policy feedback loop

Section 11’s observability requirements are genuinely strong for detection and audit. Seven minimum audit event fields. Correlation across agents, tools, services, and LLMs. The ability to reconstruct complete execution chains, including delegated authority and intermediate calls.

The draft calls observability “a security control, not solely an operational feature.” Correct. Then it integrates the OpenID Shared Signals Framework with CAEP (Continuous Access Evaluation Profile) for real-time signal delivery. Also good.

The problem is that the AIMS conceptual model in Section 4 promises observability that can “dynamically modify authorization decisions based on observed behavior and system state.” The actual specification delivers reactive remediation, terminate sessions, discard tokens, re-acquire with updated constraints. Detection flows to dashboards and SIEM tools. It doesn’t feed into the policy decision point that evaluates each authorization request. The conceptual model is ahead of the spec, which is normal for a -00 draft. The spec will likely catch up. You can’t afford to wait for it.

An agent exhibiting anomalous tool invocation patterns should see its authorization dynamically narrowed. Not through token revocation (which is all-or-nothing) but through policy-level constraints on permitted actions. The draft gives you a circuit breaker when you need a rheostat.

NIST SP 800-207 (Zero Trust Architecture) explicitly recommends a trust score that changes dynamically based on entity behavior patterns, feeding into the policy engine. Context-aware authorization systems from companies such as Zscaler and StrongDM already implement this pattern in production (not endorsing either). I’d expect future revisions of the draft to engage with these models, especially given that Zscaler’s Rosomakho is one of the four co-authors.

AuthZEN fills the gap the draft hasn’t reached yet

The most interesting omission in the current document is that AuthZEN (OpenID Authorization API 1.0) was approved as a Final Specification in January 2026. It standardizes a transport-agnostic API where any Policy Enforcement Point queries any Policy Decision Point, regardless of vendor. The information model is a four-element tuple:

Subject (the agent), Action (the operation), Resource (the target), Context (ambient attributes).

Every agent tool invocation maps cleanly to an AuthZEN evaluation: subject is the agent’s SPIFFE ID, action is “send_email,” resource is “contact_list,” context carries the delegating user, blast radius classification, reversibility flag, and behavioral anomaly score. The context object is extensible and open-ended. It was designed for exactly this kind of dynamic, attribute-rich decision-making.

The draft references AuthZEN in its normative references. The body text doesn’t discuss it yet. Given that AuthZEN solves the draft’s most significant open question, I’d bet it features prominently in the next revision. For now, that connection is yours to make.

Three policy engines deserve attention for filling that gap. OPA (Open Policy Agent), a CNCF Graduated project, evaluates structured JSON input against declarative policies with sub-millisecond latency. Cedar, from AWS, offers automated reasoning via SMT solver that can mathematically prove properties about policies and benchmarks at 42 to 60 times faster than Rego. Topaz, from Aserto (whose CEO co-authored the AuthZEN specification), combines OPA’s decision engine with a built-in Zanzibar-style relationship graph.

OAuth provides coarse-grained delegation, who can access what resource category. Policy engines provide fine-grained runtime evaluation, should this specific action on this specific resource proceed given current context. That layered model is where the draft needs to go next. Until it gets there, you build it yourself.

Figure 3: Authentication vs. Authorization Layer Responsibilities

Regulatory timelines won’t wait for standards completion

The EU AI Act’s high-risk system requirements take full effect August 2, 2026 (as of this writing, anyway). Five months from now. Article 14 requires human oversight. Article 26 requires deployers to keep automatically generated logs for at least six months. The draft’s identity-bound audit trails and CIBA-based human-in-the-loop mechanism directly support both.

NIST launched two converging initiatives in February 2026. The NCCoE concept paper on AI agent identity and authorization, and the AI Agent Standards Initiative covering security controls, identity, and testing. Both center on WIMSE/SPIFFE + OAuth. Both explicitly include policy-based access control, the piece the IETF draft’s -00 revision hasn’t specified yet.

The Colorado AI Act establishes a “reasonable care” standard for high-risk AI systems effective June 30, 2026. Widely adopted standards become evidence of reasonable care in court. The identity architecture the draft describes will likely qualify for authentication. You still need to build the authorization layer yourself.

Figure 4: Regulatory Compliance Timeline for AI Agent Systems

MCP and A2A still have fundamental identity gaps

Mapping the IETF draft’s framework onto the Model Context Protocol reveals how far the ecosystem still has to travel. MCP identifies agents as OAuth clients with a client_id, a registration artifact with no attestation binding. No SPIFFE identity verification. No attestation mechanism. No multi-hop delegation. No standard mapping between tool names and OAuth scopes. The draft recommends Workload Proof Tokens for proof-of-possession. MCP uses bearer tokens.

MCP’s OAuth model is human-centric (Authorization Code + PKCE). The Client Credentials Grant for machine-to-machine authentication was removed from the spec and is only returning through an extension. Fully autonomous agents have no standard authentication path in MCP today. Google’s A2A protocol has similar gaps: self-declared identities with no attestation binding, credential acquisition out of scope, authorization left to the receiving agent.

Riptides demonstrated the draft’s compositional pattern working for MCP in practice. Each workload gets a SPIFFE SVID, used as a software statement in Dynamic Client Registration and as a JWT assertion for client authentication. The pattern works. It required significant custom integration that no standard profile defines.

What you should build now

Don’t wait for standards completion. The threat model OWASP defined already exists. The regulatory deadlines are set.

Start with SPIFFE/SPIRE for attestation-bound agent identity. Use SVIDs as JWT assertions (RFC 7523) to obtain OAuth tokens. This follows the pattern the draft describes and Riptides validated in production.

Deploy an AuthZEN-compliant PDP (OPA, Cedar, or Topaz). Evaluate every agent tool invocation against dynamic policy. Pass agent identity, action details, resource metadata, delegation context, and behavioral signals in the AuthZEN context object.

Write Cedar or Rego policies encoding blast-radius thresholds, reversibility requirements, graduated trust levels, and human-in-the-loop triggers. Version-control policies alongside application code.

Tag every tool and action with impact metadata: blast_radius, reversible, data_sensitivity, scope. Enforce that irreversible high-blast-radius actions require explicit human approval through CIBA step-up authorization.

Feed observability data into the policy engine as real-time context attributes. Stop sending behavioral signals only to SIEM dashboards for post-hoc investigation. Make them first-class policy inputs.

Key Takeaway: The IETF draft gives you a strong answer to “is this really Agent X?” It hasn’t answered “should Agent X do this specific thing right now?” yet. That gap will close as the draft matures. In the meantime, authentication without per-action authorization is a locked front door with open windows. Build the authorization layer now.

What to do next

If you’re building agentic systems and trying to figure out where identity controls fit, start with the CARE framework at rockcyber.com for mapping security controls to business risk outcomes. The RISE framework helps you evaluate where your organization sits on the AI security maturity curve, particularly useful for figuring out which authorization controls to prioritize first.

The agent identity problem is a microcosm of the larger question the book addresses: how do you govern autonomous systems when the blast radius of failure compounds faster than your ability to detect it?

More analysis on agentic AI security, MCP authorization gaps, and practical frameworks for building authorization layers at rockcybermusings.com.

👉 Subscribe for more AI security and governance insights with the occasional rant.

👉 Visit RockCyber.com to learn more about how we can help you in your traditional Cybersecurity and AI Security and Governance Journey

👉 Want to save a quick $100K? Check out our AI Governance Tools at AIGovernanceToolkit.com

The views and opinions expressed in RockCyber Musings are my own and do not represent the positions of my employer or any organization I’m affiliated with.

Share RockCyber Musings