Testing AI agents and RAG stacks for real attack paths

DAST was not designed for prompt injection, tool misuse, or data exfiltration through retrieval. What to test instead.

AI agentsRAGOWASP LLMprompt injection

Shipping an LLM feature introduces a new attack surface: the model, the system prompt, tools, retrieval pipelines, and whatever data the agent can touch. Running a quarterly web scanner on the API gateway does not answer whether a user can exfiltrate documents through indirect prompt injection.

The failure modes are different from classic OWASP Top 10 issues. Prompt injection can live entirely in user content that later gets embedded or retrieved. Tool misuse happens when authorization around function calls is weaker than authorization around REST endpoints.

Effective testing needs scenario depth, not a single ask the model something evil check. Good coverage includes attempts to override system instructions through retrieved text, chaining tools to escalate privileges, probing whether chunks respect tenant boundaries, and validating that outputs do not echo credentials or PII.

Manual red teaming can explore creativity, but it scales poorly across every prompt variant and agent release. Automation should exercise the stack the way product teams ship it: multiple turns, tool calls, retrieval hops, and reproducible paths when something breaks.

DeepScan extends pentesting to agents and RAG workflows so you are not bolting AI checks onto a web-only program. If your roadmap includes copilots, internal agents, or customer-facing assistants, plan for testing that matches how those systems actually fail.