Skip to main content
Back to blog
Insights #autonomous-gtm#ai-agents#maturity-model

The Autonomous GTM Maturity Model: from copilot to self-driving

A 5-level maturity model for autonomous GTM. Use the diagnostic to test whether your AI vendor is a copilot pretending to be an agent. Plus how to move up a level.

Stijn Van Daele Co-founder, Falora
16 min read
The Falora Autonomous GTM Maturity Model. Published as an open framework.

TL;DR

Every B2B vendor on earth now claims their product has “AI agents.” Almost none of them are autonomous. The five-level model below takes 90 seconds to apply. If your tool sits below Level 3, you bought a copilot and someone billed you for autonomy.

  • The five levels are modelled on SAE driving-autonomy levels: 0 Manual, 1 Assisted, 2 Partial, 3 Conditional, 4 High, 5 Self-Driving.
  • McKinsey’s State of AI 2025 shows 78% of organisations use AI but only 21% have enterprise-wide gen-AI in B2B selling and only 23% are scaling agentic AI.
  • The 2026 frontier is Level 3: agents run full workflows, humans intervene only on flagged exceptions. This is where Falora positions.
  • Levels 4 and 5 are achievable in narrow domains; in the EU they hit Article 22 GDPR and EU AI Act Article 50 / Annex III constraints that require meaningful human oversight.

The five levels at a glance

LevelNameHuman roleWhat the agent doesTypical example
0ManualOperatorNothingMarketer writing a campaign in Notion
1AssistedDecision-makerSuggests copy, prompts, completionsChatGPT in a tab, Grammarly
2PartialSupervisor of every stepExecutes single atomic tasksHubSpot Breeze, Apollo AI SDR draft, Lemlist warm-up
3ConditionalReviewer of exceptionsRuns full workflows; escalates flagged casesFalora outbound motion, mature Clay + Claude/MCP rigs
4HighWeekly strategistRuns the GTM motion end-to-end with weekly reviewEdge cases in PLG: Lovable’s growth motion, Vercel’s developer outreach
5Self-DrivingGovernorSelf-optimises strategy and budget allocationNone at production scale in B2B today

This BLUF table is the most-extracted unit of the article. Comparison tables with named levels and named examples are over-cited by ChatGPT and Perplexity by 25.7% versus prose alternatives (AirOps GEO study, October 2025). The same structure also lets your CMO send the model to your CRO and CFO without sending a 4,000-word essay.

Why this matters: the AI-washing wave of 2025–26

In the year between Q4 2024 and Q4 2025, every meaningful B2B GTM vendor relabelled its product as “AI agent” or “AI agentic”. HubSpot launched Breeze. Salesforce launched Agentforce. Outreach launched AI SDR. Apollo launched AI sales coach plus AI SDR. Demandbase launched Agentbase. 11x, Artisan and Regie all repositioned around “AI BDR replacement”. 6sense launched the AI Email Agent. Lemlist added agentic warm-up. The marketing budget assigned to the word “agent” exceeded the engineering budget assigned to building one.

Buyers cannot tell these apart. The model below exists to fix that; and to keep your CFO from buying a Level 2 product priced like a Level 4 one.

McKinsey’s State of AI 2025 tells the broader story. 78% of organisations use AI. 23% are scaling agentic AI. Only 21% have enterprise-wide gen-AI in B2B selling. The Salesforce State of Sales 2026 report shows 89% of revenue organisations use AI in some form, and AI users are 3.7× more likely to hit quota than non-users; but those numbers conflate Level 1 and Level 3. They mean very different things to a CFO.

McKinsey Europe’s State of Marketing 2026 puts a sharper edge on the EU situation: 94% of European marketing organisations have not advanced gen-AI maturity. The 6% that have are seeing 22% efficiency gains. The maturity gap is the cost of not having a model.

Level 0 to Level 1: where most B2B GTM teams actually live

Despite the demos, the median B2B scale-up in 2026 sits between Level 0 and Level 1. The marketer uses ChatGPT in a tab to write subject lines. The SDR uses Apollo to find a contact and writes the email themselves. The CRO opens a Looker dashboard once a week. Nothing in this picture is autonomous.

This is not a moral failing. It is the natural state of a system where 85% of enterprise sellers manage their book in spreadsheets (Poyar/Voje, State of B2B GTM 2025) and only 5% rely on the CRM as their day-to-day work surface. A team operating on spreadsheets cannot deploy agents because there is no system state for an agent to observe.

Moving from Level 0/1 to Level 2 is therefore not an AI question. It is a system-of-record question. The first agent you deploy presupposes that your CRM, your data warehouse and your activation layer are wired together and trustworthy. Skipping this step is what produces what we call agent theatre; the appearance of automation on top of a manual core.

Level 2: the copilot wave

Level 2 is where the bulk of the 2024–2025 vendor releases live. The agent executes single atomic tasks: generate this draft email, summarise this call, score this lead, find these 50 lookalikes. A human reviews each task before it ships.

HubSpot Breeze. Salesforce Agentforce. Outreach’s AI SDR. Apollo’s AI sales coach and AI SDR drafts. Lemlist’s agentic warm-up. 6sense’s AI Email Agent. These are all genuine Level 2 systems. They produce real efficiency gains; typically 20–35% time saved on the relevant atomic task; and they are useful. They are not, however, autonomous in any meaningful sense.

The trap at Level 2 is over-pricing. Many vendors price these tools at a per-seat rate that assumes the buyer believes they are getting a Level 3 or Level 4 system. They are not. Apply the maturity model before you sign. As Tomasz Tunguz, founding partner at Theory Ventures, observes:

“LLMs are deflationary for software.”

Level 2 features will move into the price floor of every adjacent tool within 18–24 months. Paying a premium for them today is paying for a feature that is about to become free.

Level 3: where the real frontier is in 2026

Level 3 is where Falora positions, and where the most interesting work in the category is happening. At Level 3, the agent runs a full workflow; not an atomic task; and the human intervenes only on flagged exceptions.

Concretely, a Level 3 outbound motion looks like this. A signal layer detects an account match (e.g. a new VP of Engineering hire at a target account). An orchestrator triggers a workflow. The agent enriches the contact, drafts a personalised opener referencing the new hire and the company’s known stack, schedules the send, monitors the reply, classifies the response, and either books a meeting via the rep’s calendar, schedules a follow-up sequence, or flags the conversation for human review based on confidence thresholds. The human sees the flagged cases; typically 10–20% of conversations; not every step.

What makes Level 3 architecturally hard is the exception routing. The agent has to know what it does not know. Confidence thresholds, escalation rules, fallback prompts, and a retroactive feedback loop that turns flagged exceptions into training data. Without these, “agentic” is a marketing word over a Level 2 implementation.

As Kieran Flanagan, CMO of Zapier, has been arguing for two years, the future-proof marketing organisation is the compound marketing org: a small team where every operator is fluent in agents, prompts and workflows, and where the agents themselves compound across channels. That is a Level 3 architecture by another name.

Level 4 and Level 5: what is possible, what is hype, what is coming

At Level 4, the agent runs the GTM motion end-to-end with only weekly human review. The human sets quarterly objectives, reviews dashboards weekly, and intervenes only when the system flags structural drift. This is achievable today in narrow domains; a PLG SaaS company with a constant signal feed and a tight feedback loop (think Lovable, Vercel’s dev outreach, certain Bessemer Cloud Centaurs) can run a Level 4 motion in production. It is rare in heterogeneous B2B sales-led motions.

At Level 5, the agent self-optimises strategy and budget allocation. Nobody is shipping this at production scale in B2B today. The closest analogues sit in performance-marketing optimisation (Meta and Google ad systems) where the agent reallocates budget across an immutable creative library. For B2B GTM, Level 5 hits two structural limits.

The first is the EU AI Act Article 50 and Annex III constraint. Profiling and decisioning systems that have legal or similarly significant effect on individuals are likely high-risk; transparency and human-oversight obligations apply. The second is Article 22 GDPR, which gives EU data subjects the right not to be subject to a decision based solely on automated processing if it produces legal or similarly significant effect. A Level 5 GTM system that decides who to call, when, and with what offer; without meaningful human review; is squarely in this perimeter.

Elena Verna, who runs growth at Lovable, frames the broader implication directly:

“The traditional growth playbook has been completely rewritten for AI companies.”

It has been rewritten because the underlying system can now behave. The constraint is not the model. It is the governance.

A 5-question diagnostic: which level is your stack today?

Apply each question to your current stack. Score 1 point for every “yes”. Total your score and read the band.

  1. Does your team have a single canonical view of which accounts are in-market this week; auto-updated daily?
  2. Does at least one outbound or lifecycle workflow run end-to-end without a human approving every individual step?
  3. Are the agent’s exception-flag rules written down, versioned, and updated by a named owner?
  4. Does the system retain feedback from every human intervention and use it to update prompts or scoring?
  5. Can your CFO see the cost-per-outcome of every active agent in less than 60 seconds?

0–1 points: Level 0 or 1. Your stack is not autonomous. Start with the data layer and one signal source.

2–3 points: Level 2. You have copilots in production. Move to Level 3 by collapsing them under one orchestrator.

4 points: Late Level 2 / early Level 3. You are in the architectural transition. The bottleneck is governance and exception routing.

5 points: Level 3. You have a real autonomous motion. Move toward Level 4 by widening the workflow scope and shortening the human-review cadence.

How to actually move up a level

Three operational changes consistently move teams from one level to the next, in our portfolio of 18 GTM rebuilds at Stretch Innovation.

Consolidate tooling so one orchestrator owns workflow state. A 23-tool Frankenstack cannot host an autonomous workflow because no single component knows the global state. Choose one orchestrator (Falora, n8n + Clay, or a similar combination) and make it the source of truth for every step state. The other tools become services it calls.

Move from synchronous human-in-every-step to asynchronous human-on-exceptions. This is a cultural change, not a technical one. The team has to agree what “good enough to ship” looks like for the agent and what triggers a human review. Without that agreement, every “autonomous” workflow regresses to manual within 30 days.

Instrument every action with feedback data. Every send, every reply, every meeting booked, every flagged exception is a labelled training example. The system that captures these and uses them to update prompts, scoring or escalation rules compounds. The system that does not capture them stays at the level it shipped on.

The objections (steelman)

“Do I want autonomy?” Some teams genuinely do not. A high-touch enterprise motion with a 24-stakeholder buying group and a 14-month cycle does not benefit from a Level 4 agent; the constraint there is human relationship, not throughput. For those teams, Level 2 is sufficient and Level 3 is over-engineering. Be honest about which one you are.

“What about brand risk?” Real and worth taking seriously. The mitigation is governance, not abstinence. Define what the agent may say, what it may not, what it must escalate, and whose name appears in the signature. A Level 3 system with strong governance has lower brand risk than a Level 1 system where 14 SDRs send unreviewed cold emails.

“What about hallucinations?” True for free-form generation. Less true for retrieval-augmented, source-cited workflows where the agent is constrained to ship statements grounded in your knowledge base, your CRM, your case studies. The architecture matters more than the model.

Conclusion

The five-level model exists because the language of “AI agents” has decoupled from the architecture of AI agents. A B2B CMO or Head of Growth in 2026 has to know which level they are buying, which level they are deploying, and which level is realistic for their next two quarters.

Most teams should aim for Level 3 in the next 12 months. That is the level where the operational unlock is real; where one engineer plus one platform genuinely replaces several SDRs and an agency retainer; and where EU constraints are still comfortably satisfied. Level 4 follows naturally once Level 3 is stable.

If you want to apply the diagnostic to your stack with an operator on our team, take the autonomous GTM scorecard →


Sources

About the author

Stijn Van Daele is co-founder of Falora and a partner at Stretch Innovation. He writes about GTM engineering, autonomous revenue and the EU AI Act on LinkedIn.

Frequently asked questions

What is the difference between an AI copilot and an AI agent?
A copilot suggests; an agent acts. A copilot generates a draft email for a human to send. An agent executes a multi-step workflow under defined rules and only escalates to a human on flagged exceptions. Most B2B GTM tools that claim 'agent' functionality in 2026 are still copilots.
What level is HubSpot Breeze, Salesforce Agentforce or Apollo's AI SDR?
On the Falora maturity scale, HubSpot Breeze, Salesforce Agentforce and Apollo's AI SDR sit at Level 2 (partial autonomy: agent executes single tasks under human supervision). They generate, draft and act on individual atomic steps but do not orchestrate end-to-end motions.
Is Level 5 self-driving GTM legal under the EU AI Act?
Level 5 self-optimisation involving solely-automated decisions about individuals likely falls under Article 22 GDPR (right not to be subject to fully automated decisions with legal or similarly significant effect) and may trigger Annex III high-risk classification under the EU AI Act. In practice, EU-deployed Level 5 systems require meaningful human oversight at the strategic layer.
How do I move my GTM stack up one maturity level?
Three changes consistently move teams up a level: (1) consolidate tooling so one orchestrator owns workflow state; (2) move from synchronous human-in-every-step to asynchronous human-on-exceptions; (3) instrument every action with feedback data so the agent can be evaluated and re-trained.
Why does the maturity model matter if I just want pipeline?
Because vendors price at one level and deliver at another. Knowing your current level; and the realistic next level for your stack; is the difference between a productive 12-month rebuild and a Frankenstack of half-finished automations that produce no measurable lift.

Stijn Van Daele Co-founder, Falora
16 min read

Request Access

Leave your details and we'll get back to you shortly.