Agentic AI in Cybersecurity: Moving Toward Fully Autonomous Pentesting

Hot take: fully autonomous pentesting isn't a moonshot but an engineering discipline. the real question is how can we make HITL the exception?


Hot take

Human‑in‑the‑loop (HITL) agentic AI1 isn’t the destination. We should aim for fully autonomous pentesting inside clearly defined scopes, with strong policies and guardrails.

To be clear: full autonomy without guardrails is unsafe. The idea is full autonomy inside strict policy, rich telemetry, and fast rollback, with HITL used by exception for destructive or regulated actions.

There are two fundamental strategies to build an AI startup: you either bet the technology is going to get massively better or you bet the technology is about as good as it’s going to be… In the first world, you will be really happy when the models improve, and in the second world you will be really sad. - Sam Altman

I’m taking the first bet: design for full autonomy within clear bounds. Let agents run end‑to‑end pentest workflows without pausing for approvals when the policy, telemetry, and rollback guarantees are in place.

Why autonomy by default (and common misconceptions)

Common misconceptions:

A concrete example: an agent triages web recon findings, validates an SSRF in a sandbox, moves laterally only within canary accounts (isolated, monitored decoys) using short‑lived credentials, captures evidence, and leaves a replayable trace. It escalates only for destructive writes or when a policy‑based risk score trips a threshold.

Oversight model and caveats

Default to autonomy for low‑ and medium‑risk actions in scoped, reversible environments. Keep continuous oversight and a fast‑acting kill switch. Pull in HITL for destructive or irreversible operations and for explicit regulatory triggers. Always operate within legal authorizations and rules of engagement. This approach lines up with guidance from NIST AI RMF 1.0 and ENISA.

Oversight modes:

Implementation playbook: guardrails and steps

Guardrails that keep you safe:

Here’s the playbook:

  1. Pick low‑risk candidates: recon triage, evidence capture, sandboxed exploit validation, canary‑scope lateral movement.
  2. Start read‑only/sim: dry runs in sandboxes and canary tenants; no external writes.
  3. Enforce policy‑bound scopes: rules of engagement, rate limits, geofences, and no‑touch lists.
  4. Issue scoped, short‑lived credentials tied to intent, resources, and rate limits.
  5. Stage and promote: require success in sim/canary before production‑adjacent scopes.
  6. Instrument observability: attack graph traces, artifact logs, and per‑tool safety metrics.
  7. Gate destructive writes: require exception‑based approval on explicit triggers; wire a global kill switch.
  8. Close the loop: feed lessons into policies, tests, and SLOs; expand only when reliability is green.

In short: build for agents that act within constraints and escalate by exception, not tools that constantly wait for us.

Evidence: systems and frameworks

Across both, the guardrails converge: strict scoping, staged execution (read‑only/sim first), short‑lived/least‑privilege credentials, policy‑based gating, and comprehensive audit trails.

Challenges we’re solving (and how)

Autonomy doesn’t ignore risk; it manages it well. Here’s what works in practice:

Footnotes

  1. By “agentic AI,” I mean systems that can plan and act via tools under explicit policies and telemetry. By “HITL,” I mean step‑gated approvals for each action rather than supervising by exception.

  2. “Human‑over‑the‑loop” (HoTL) refers to policy‑ and telemetry‑driven oversight where humans supervise outcomes and intervene by exception rather than approving each step in advance.