What does LeafMesh actually do?

LeafMesh lets you run orchestrated agent teams inside your SaaS product. A front-line agent works alongside your internal agents — pricing, policy, inventory, ops — using your product to get each user's request done, governed and auditable.

How do customer-facing and internal agents work together?

The customer's agent turns a request into a goal, then negotiates with your internal agents that own pricing, policy, inventory, and operations. They reach an outcome within your rules — and anything past policy pauses for a human to approve.

Which SaaS categories is this built for?

We focus on HRMS, transport/logistics (TMS), and fintech platforms — products where customer requests need a real decision (approvals, routing, limits, fees) rather than just an answer.

Does it embed inside our product?

Yes. LeafMesh agents ship inside your product and connect to your existing systems (CRM, ERP, databases, APIs). Your users interact with agents in your own UI — not a separate tool.

How do we keep it safe and in-policy?

Every negotiation runs against your policies, keeps a full audit trail, and routes big or irreversible decisions to a human for one-tap approval. You stay in control of what agents can do.

How fast can we ship?

Your agent team is configured in YAML and validated before deploy, so you go from a scoped workshop to production in weeks — start with one high-value workflow, then expand.

Can it run over agents we've already built?

Yes. LeafMesh runs over agents built on LangGraph, CrewAI, AutoGen, and others — it's the operations layer that coordinates and governs them, no rebuild required.

Which models can the agents use?

It's model-agnostic — OpenAI, Anthropic, Google Gemini, AWS Bedrock, Azure Foundry, Google Vertex AI, DeepSeek, and local/self-hosted models, chosen per agent.

Right-Sizing AI Operations: A Decision Framework for 2026

How $1M to $5B businesses should approach the buy decision when 95% of pilots fail and 5% return 30×.

Classification	Framework	Reading time	Audience
Industry analysis	Buyer decision framework	~25 minutes	C-suite, Heads of Function, Procurement

§0 — TL;DR

Key Findings

The enterprise AI agent market is split. MIT's GenAI Divide study (Project NANDA, July 2025) found that 95% of generative-AI pilots delivered zero measurable P&L impact, while the surviving 5% accelerated revenue rapidly. McKinsey's 2026 State of AI confirms the gap: only 39% of organisations attribute any EBIT impact to AI, and of those, the vast majority report less than 5% of EBIT attributable. The buy decision is therefore not "AI yes or no" — it is "are we structured to land in the 5%, or the 95%?"
The mechanical differentiator between the two groups is not model quality. MIT's analysis attributes 84% of failures to leadership and integration issues rather than technology. McKinsey's parallel finding: workflow redesign is the #1 factor linked to measurable AI ROI — top performers are 2.75× more likely to redesign workflows than other firms (55% vs. 20%).
The buying-vehicle differentiator is published pricing with predictable scaling. Gartner forecasts that 40% of enterprise applications will embed task-specific AI agents by end of 2026 (up from <5% in 2025), and that 40%+ of agentic AI projects will be cancelled by end of 2027 — primarily due to scaling cost surprise. 90% of CIOs cite cost forecasting as their top AI deployment challenge. A flat $99/month self-serve plan with transparent $0.06/invocation overage removes that surprise structurally.
The threshold above which self-serve fails is roughly 150,000 invocations/month. Below that line, the right vehicle is published-rate self-serve; above it, the right vehicle is a negotiated annual contract with volume discount, dedicated ops engineering, and custom SLA. The platform underneath does not change — only the contract does.
Year-1 ROI should be read as a band, not a number. Real enterprise AI deployments in 2024–2026 cluster in the 5×–60× envelope: Klarna's customer service deployment was estimated at ~6×, JPMorgan's research-presentation automation at ~10–15×, and reported SMB lead-qualification deployments at 20×–40×. Any number outside this band is either a true outlier or marketing math.

Strategic Recommendations

CEO — Treat the diagnostic as a 60-second internal litmus test before any AI infrastructure buying meeting. If your recommended volume is under 150K/month, you do not need a procurement process — you need a credit card.
CFO — Validate the ROI band against the published public benchmarks (Section 5, Table 5.1). Anything inside the 5×–60× envelope is defensible; anything outside is either a true outlier worth investigating or marketing math worth discounting.
COO — McKinsey's workflow-redesign finding is the operating instruction: the buy itself is the easy part — the work is redesigning the function around the agent, not bolting an agent onto the function as it stands. Pick the workflow that consumes the most senior-judgment time today; that's the pilot.
CIO — The self-serve plan is the same platform as the enterprise contract. No feature lockout, no migration penalty, no architectural difference. Approve security posture once at self-serve, re-approve only if custom-contract terms (residency, SLA) materially change the posture.
CHRO — Klarna 2025 is the cautionary tale: AI carries routine work, humans carry judgment. Frame internally as "reqs we will not file" in growth roles and "work nobody had bandwidth to do" in smaller orgs — and protect the human capability that the diagnostic does not automate.
CRO / Compliance — August 2, 2026 is the EU AI Act high-risk enforcement deadline (Article 9 risk management, Article 11 technical documentation, Article 12 event logging). Per-decision audit logging is not optional for any AI-mediated decision affecting employment, credit, education, or law enforcement contexts. Section 12 covers what that means for the buy decision.
Procurement — Below the 150K/month ceiling, there is no procurement process. Above it, lead the negotiation: volume discount, custom SLA, dedicated ops engineer, regional data residency. Skip the middle ground.

Bottom Line

There are only three numbers you need before the buy decision: your industry, your annual revenue, and your ambition setting. Everything else — recommended invocation volume, monthly cost, Year-1 net value, ROI band, and the self-serve-vs-custom routing — falls out of those three. That's the framework.

§1 — The 2026 Reality

Two numbers frame everything that follows. They are not in tension; they describe the same market from opposite ends.

95% of enterprise GenAI pilots deliver zero measurable P&L impact. MIT Project NANDA's GenAI Divide report (July 2025), based on 52 executive interviews, 153 leader surveys, and analysis of 300 public deployments, found that the vast majority of enterprise AI initiatives produced no measurable bottom-line outcome. The NANDA methodology has drawn pushback since publication — on sample composition, the definition of "measurable P&L impact," and survivorship in public reporting — but the directional finding is independently corroborated by McKinsey's 2026 State of AI: of the 39% of organisations attributing any EBIT impact to AI, most report under 5% of EBIT attributable. The exact failure share is debated; the order of magnitude is not. (MIT Project NANDA, "The GenAI Divide: State of AI in Business 2025," July 2025; McKinsey "State of AI" 2026 release.)

The surviving 5% are running away with the category. Klarna's CS agent deployment alone is reported at $60M saved and the workload equivalent of 853 employees by Q3 2025. JPMorgan runs 450+ AI use cases in production daily — including agentic generation of investment-banking presentations in 30 seconds (vs. hours for a junior analyst) and a wealth-management AI assist credited with a 20% gross-sales increase during market volatility. Gartner's market estimate moved from $7.6B in 2025 to $10.8B in 2026, with a best-case 2035 projection of $450B. (Multiple sources — see Appendix B.)

The decision question is not whether the category is real. It is whether your organisation is structurally positioned to land in the 5%.

Table 1.1 — The Two-Group Market, Stated Honestly

Group	% of orgs	Outcome	Median impact	Root pattern
The 5% (success)	~5%	Measurable P&L impact, often >5% of EBIT in targeted functions	5×–60× Year-1 ROI on platform cost	Bought from a specialised vendor (67% success rate), redesigned the workflow around the agent, picked one painful function first
The 95% (zero P&L)	~95%	No measurable EBIT contribution; pilots stall before production	—	Bolted AI onto the existing process; built in-house (~33% success); scope was either too small to matter or too broad to ship

Source: MIT Project NANDA (July 2025); McKinsey State of AI 2026.

Chart 1.1 — The Buy-vs-Build Success Gap

Success rate for enterprise AI deployments by sourcing strategy. Buying from a specialised vendor is twice as likely to land in the 5% as building in-house on foundation-model APIs.

Success rate

Source: MIT Project NANDA — GenAI Divide (July 2025), buy-vs-build success rates analysed from 300 public enterprise deployments.

Sidebar

Why 84% of failures are leadership, not technology. MIT's analysis is specific: the failure mode is not that the models do not work. It is that organisations do not redesign the workflow the model is supposed to inhabit. The model is correct; the surrounding process is the same one designed for a human worker. The result is a model doing a fraction of the work, with all the original supervisory overhead still in place. McKinsey's parallel finding: top performers are 2.75× more likely to redesign workflows than other firms (55% vs. 20%).

Table 1.2 — Where the 5% Concentrate

Function class	Why it works at scale	Representative deployments
High-volume, low-judgment customer interactions	Routine query distribution permits aggressive deflection with human escalation as a clean fallback.	Klarna CS agent (2024–25, then hybrid 2025+)
Document synthesis at scale	Large input/output volume, low individual-decision cost, high aggregate hour savings.	JPMorgan investment-banking presentation generation; M&A memo drafting
Operational routing & enrichment	Schema-validated outputs, clear escalation rules, governance-aligned audit.	JPMorgan trade settlement, fraud detection at 450+ use cases
Lead qualification & response	Time-to-first-response is the metric, and minutes-to-seconds is the gap.	Category-level pattern across reported SMB deployments (Reinventing.ai 2026, Aalpha 2026) — no Klarna-scale named case yet at enterprise level
Internal back-office (AP/AR, recruiting screening)	Pure routine work, governance posture is straightforward, ROI is measurable in headcount-avoided.	Reported mid-market deployments at 65% ticket reduction, 40% more meetings booked

Source: Composite case study analysis from Klarna (FintechWeekly, 2025), JPMorgan public disclosures (Emerj, 2026), AI Monk enterprise case studies (2025–26).

Bottom Line

The 5% who succeed share three properties: they bought rather than built (67% vs 33% success rate), they redesigned the workflow rather than retrofitting AI onto the existing one, and they picked one painful, well-defined function rather than attempting horizontal deployment. The diagnostic in Section 4 is calibrated to put you in that group.

§2 — Why the Buy Decision Is Different in 2026

Three structural shifts changed the buying landscape between 2024 and 2026. Each one moved a piece of the decision out of "should we?" and into "which vehicle?"

The 2026 buyer effectively chooses among three vehicles: a legacy per-seat SaaS product retrofitted with AI features; a specialised agent platform with published pricing and a governance substrate; or a custom build on direct foundation-model APIs (Anthropic, OpenAI, etc.). The third option is currently the most common pattern at enterprise scale — and is also where the 95% failure rate documented in §1 most heavily concentrates, primarily because the workflow redesign, the audit substrate, and the governance posture all have to be assembled in parallel with the AI itself. The three shifts below describe why the first option is fading, why the second is emerging, and why the third is structurally harder to land than its lower published cost suggests.

Shift one: per-seat pricing has structurally collapsed for agent-density categories. When a single AI agent does the work that previously required 10, 20, or 50 human seats, per-seat pricing inverts — the customer with more agents pays less in relative terms. The shift applies specifically to categories where the unit of work is an event, not a user (agent platforms, AI-assisted CS, document processing); per-seat remains the dominant model for CRM, HRIS, ERP and the rest of the seat-anchored stack. Within the agent-platform category, Gartner predicts at least 40% of enterprise SaaS spend will shift to usage-, agent-, or outcome-based pricing by 2030, with seat-based revenue share declining from 21% to 15%. Hybrid pricing (fixed base + variable consumption) is the dominant transition state, used by 43% of SaaS firms in 2025–26, projected to hit 61% by end of 2026. Firms using hybrid pricing report 38% higher revenue growth and 38% higher net revenue retention than pure-subscription firms. (Bessemer Venture Partners AI Pricing Playbook; Valueships AI Pricing 2026 analysis.)

Shift two: predictable scaling has become the procurement gate. Gartner reports 90% of CIOs cite cost forecasting as their top AI deployment challenge, and 78% of IT leaders report unexpected charges from consumption-based or AI pricing models. Cancellation rate among agentic AI projects is now forecast at 40%+ by end of 2027, primarily for scope-and-cost reasons rather than capability reasons. A published rate that scales linearly is the only model that survives a procurement-led review at mid-market and enterprise scale.

Shift three: for EU-exposed operators, the EU AI Act enforcement deadline (August 2, 2026) is real. High-risk AI systems (employment, credit, education, law enforcement) now require Article 11 technical documentation, Article 12 event logging, and Article 9 risk management. Non-compliance penalties reach €35M or 7% of global turnover. "EU-exposed" includes any operator selling into the EU market, processing EU customer data, or running AI-mediated decisions that touch EU subjects — not just EU-headquartered firms. For organisations outside that perimeter, the deadline is a leading indicator rather than a hard cliff: the US, UK, and APAC regulatory direction is following the same per-decision audit substrate, on a slower timeline. The audit substrate is no longer optional infrastructure — it is a regulatory requirement that arrives whether the buyer is ready or not.

Table 2.1 — How the Three Shifts Change the Buy Vehicle

Shift	Old buying motion	New buying motion
Per-seat → usage / hybrid	Annual contract, fixed seat count, true-up at renewal	Self-serve credit card below a published ceiling; negotiated contract above it
Cost-forecast unpredictability	RFP, multi-vendor evaluation, locked seat license	Published rate, modeled in 60 seconds, no minimum commit
EU AI Act compliance	Annual audit of documented policy, sampling-based review	Per-decision event log, runtime governance, continuous attestation substrate

Source: Composite. Bessemer 2026 AI Pricing Playbook; Gartner Strategic Predictions 2026; EU AI Act Regulation 2024/1689.

Bottom Line

The 2026 buy decision is structurally simpler than the 2024 buy decision because the market shifted three pieces — pricing model, forecast predictability, and compliance substrate — out of "ambiguous" and into "settled." The remaining decision is which side of the 150K-invocation line your business sits on.

§3 — The Three Inputs That Determine Everything

The recommended deployment shape is determined by three inputs the buyer already knows: industry, annual revenue, and ambition. No CRM data, no operational telemetry, no internal benchmarking required.

Table 3.1 — The Three-Input Diagnostic Model

Input	Range	Drives
Industry	Goods / Mfg · SaaS / Software · Services / Staffing	Function set, base-rate band per function, invocation density per $M revenue
Annual revenue	$1M – $10B+	Absolute pool size, recommended invocation count, self-serve fit
Ambition (scenario)	Conservative · Base · Aggressive	Where in each function's lo–hi rate band the impact is computed

Table 3.2 — Invocation Density, Industry-Calibrated

Invocation density reflects multi-agent chains per business event. A typical insurance claim routes through 8–10 agents (intake, classification, document analysis, fraud, liability, communication, audit); a support ticket 4–6; a recruiting funnel 5–8. Density translates to a per-$M-revenue, per-month invocation rate. The figures below are calibration choices grounded in agent-chain modelling against representative workflows in each category — they are not measured industry averages, since no comparable public dataset exists. They are the values the diagnostic uses; a buyer with internal telemetry should override them with their own measured rate.

Industry	Invocations / $M revenue / month	Why it lands here
Goods / Mfg	~500	Capital-intensive, fewer high-value-per-event decisions, longer chains, fewer interactive events
Services / Staffing	~1,000	People-and-decision-intensive, mid-density events, billable-hour structure surfaces routine work clearly
SaaS / Software	~1,200	Event-rich, frequent automated decisions per customer (support, infra, sales orchestration)

For businesses that don't sit cleanly in one category: hybrid models default to the dominant cost-pattern. Fintech and insurtech behave structurally like SaaS (event-rich, software-led decisioning); healthtech and edtech behave like Services (people-and-decision-intensive, regulated visibility); marketplaces and DTC commerce split along the operations-vs-tech-stack divide. Pick the category your dominant cost line resembles, and the diagnostic remains directionally correct. Above ~$100M revenue, the right move is to re-anchor against your own internal telemetry.

Chart 3.1 — Invocation Density by Industry

Monthly invocations generated per $M of annual revenue. SaaS leads because its decision-events are software-native; Services sits in the middle on billable-hour structure; Goods trails because capital-intensive operations produce fewer interactive events per dollar.

Density

Source: LeafCraft calibration grounded in multi-agent chain modelling against representative workflows in each category — not a measured industry average.

Table 3.3 — Ambition Multiplier

Scenario	Multiplier	What it represents
Conservative	0.6×	Pilot one workflow; baseline volume; lower bound of each function's impact rate
Base	1.0×	Production roll-out across the routine-heavy functions; mid-band rates
Aggressive	1.4×	Full coverage including secondary functions; upper-band rates

Ambition has two effects in the model, not one. It drives invocation volume (above), and it drives where in each function's lo–hi rate band the per-function impact is computed (Conservative = floor of the band, Base = midpoint, Aggressive = ceiling). The two effects compound: an Aggressive setting picks up both more invocations and richer per-function impact.

The recommended invocation volume falls out of:

recommended_invocations_per_month
  = annual_revenue ($M) × invocation_density × ambition_volume_multiplier

ambition_volume_multiplier  — Conservative 0.6, Base 1.0, Aggressive 1.4
ambition_rate_position      — Conservative 0.0, Base 0.5, Aggressive 1.0
                              (position within each function's published lo–hi band)

Table 3.4 — Worked Examples

Profile	Calculation	Monthly invocations	Self-serve cost ($99 + overage × $0.06)	Annual platform cost
$1M Services · Conservative (founder-led)	1 × 1,000 × 0.6	600 (well under base)	$99 flat	$1,188
$3M Services · Conservative	3 × 1,000 × 0.6	1,800 (under base)	$99 flat	$1,188
$25M Services · Base	25 × 1,000 × 1.0	25,000	$99 + 20,000 × $0.06 = $1,299	$15,588
$100M Services · Base	100 × 1,000 × 1.0	100,000	$5,799	$69,588
$100M SaaS · Base	100 × 1,200 × 1.0	120,000	$6,999	$83,988
$1B Services · Base	1,000 × 1,000 × 1.0	1,000,000	Above ceiling — custom contract	Negotiated
$500M Goods · Aggressive	500 × 500 × 1.4	350,000	Above ceiling — custom contract	Negotiated

Bottom Line

The diagnostic is deliberately reductive. Three inputs you can answer from memory produce a four-digit number for monthly cost and a defensible business case. If the diagnostic requires a workshop to use, it is a sales process — not a diagnostic.

§4 — Volume Drives the Vehicle

The 150,000 invocations/month ceiling is not a sales preference. It is the operational point at which a flat published rate stops being the right contractual instrument.

Below the ceiling, the buyer's bill is predictable to the dollar (90% of CIOs' #1 concern, addressed structurally). Above the ceiling, the buyer should be negotiating volume discount, dedicated ops engineering, and custom SLA — not paying a published per-invocation rate that compounds linearly without discount.

Table 4.1 — Self-Serve vs Custom: The 150K Line

Property	Self-serve (≤ 150K inv/mo)	Custom (> 150K inv/mo)
Pricing	Published: $99/mo base + $0.06/invocation overage	Negotiated annual contract
Buying motion	Credit card · 5 minutes	Sales engagement · 4–8 week scope
Volume discount	None — flat $0.06 overage	Yes — typically $0.01–$0.03/invocation at scale
Ops engineer	Shared support	Dedicated
SLA	Standard	Custom, signed
Data residency	Standard cloud regions	Region of your choice
Annual cost band	$1,188 → ~$108,000	$200K – $5M+
Platform underneath	Same	Same
Migration penalty if crossing the line	None — same data, same configuration, only the contract changes	—

The crucial property of this model is continuity. A self-serve customer who grows past 150K invocations/month is not re-platforming, re-implementing, or re-training — only the contract changes. Same agents, same audit trail, same data, same governance policies.

Chart 4.1 — Monthly Platform Cost on the Self-Serve Plan

Monthly platform cost vs monthly invocation volume. The first 5,000 invocations are included in the $99 base; everything above scales linearly at $0.06/invocation. At ~150K invocations/month, the right vehicle becomes a negotiated annual contract — not because the platform changes, but because at that volume the buyer should be capturing volume discount, custom SLA, and dedicated ops engineering.

Self-serve cost

Source: LeafCraft published self-serve pricing — $99/mo base, 5,000 invocations included, $0.06/invocation overage.

Table 4.2 — Market Pricing Context (2026)

For comparison, here is where the $99 + $0.06/invocation model sits relative to published peer pricing in the agentic AI category.

Vendor	Base	Included	Variable rate	Notes
LeafCraft (LeafMesh ADK)	$99/mo	5,000 invocations	$0.06/inv	Full feature set included; flat rate up to 150K/mo; same platform on enterprise contract
LangChain (LangGraph Platform, Plus)	$39/mo	100,000 LangGraph calls	$0.001/node + $0.005/run	Plus plan only; production-grade requires Contact Sales
Lyzr Agent Studio	Not public — enterprise sales motion	—	Per-seat + variable	Annual contract typical; published pricing not available
OneReach.ai (GSX)	Not public — enterprise sales motion	—	Conversation-based	Annual contract; primarily voice/chat focus
IBM watsonx Orchestrate	$500/mo entry tier; enterprise tier negotiated	Capacity-based credits	Token-bundled	Bundled with broader IBM enterprise contracts
Microsoft Copilot Studio (agents)	$200/mo (25,000 messages)	25,000 messages	$0.01/message	Limited to Microsoft ecosystem; agent-message != multi-agent invocation
Agno (formerly Phidata) Cloud	Not public	—	—	Early-access only as of Q1 2026
Direct foundation-model APIs (Anthropic, OpenAI, etc.)	$0 platform	—	~$0.003–$0.015/call (model-dependent)	No governance substrate, no audit log, no agent registry — build burden falls on internal engineering; this is where the 95% failure rate concentrates
Typical enterprise contract (across category)	—	—	—	$100K–$350K/year contract typical for mid-market; $500K+ for enterprise SLA / residency

Source: Awesome Agents Pricing Comparison 2026; LangChain published pricing; Microsoft Copilot Studio published pricing (2026); IBM watsonx published tiers; Lyzr & OneReach published positioning; Anthropic / OpenAI published API pricing; Agno public statements (2026).

Sidebar

Why the $99 floor matters. Below this price point, the buying motion changes shape entirely. A $99/month decision is a credit-card purchase made by an individual buyer; a $500/month decision is a manager-approved buy; a $5,000/month decision triggers procurement review. By keeping the entry price under the procurement threshold, the platform allows organisations to pilot before they commit — which directly counters the MIT-documented failure pattern of overcommitting to scope before validation.

§5 — Year-1 ROI Is a Band, Not a Number

A single ROI multiple — "this investment returns 47.3× in Year 1" — is the fastest way to lose a CFO. The number reads as marketing math regardless of how rigorous the underlying model is. Three things go wrong simultaneously: the decimal precision implies false certainty, the magnitude triggers anchoring bias against unrealistic returns, and the buyer instinctively discounts everything else in the pitch.

A band, by contrast — "12×–18× in Year 1, where real enterprise AI deployments actually sit" — preserves three things at once: the rough magnitude (which is what matters), the inherent uncertainty (which the buyer already knows is there), and the comparison anchor (which makes the band defensible).

Table 5.1 — The 5×–60× Envelope, with Public Benchmarks

ROI multiple	What lives there	Public benchmark anchor
<2×	Generally too low for a defensible AI deployment	Often the "no measurable EBIT impact" zone in McKinsey 2026
2×–5×	Plausible but below median for the surviving 5%	Klarna early estimates (annualised $40M revenue projection on opex cost base of similar size) — roughly 3×–6×
5×–15×	Where most enterprise AI deployments land	JPMorgan investment-banking research automation, McKinsey-reported median enterprise outcomes
15×–30×	Where well-targeted, governance-bound deployments land	JPMorgan wealth-management 20% sales-increase ROI; Klarna Q3 2025 $60M-saved / 853-employee equivalent
30×–60×	Where small orgs and high-leverage workflows land	SMB reported deployments at $200–500/month replacing 2–3 hires; lead-qualification automation cases
>60×	Either a true outlier or marketing math	Treat with the same scepticism as a 10-bagger pitch

Source: Public Klarna disclosures (2024–25), JPMorgan AI roadmap (Emerj 2026), McKinsey State of AI 2026, AI Monk SMB case studies. Public benchmark ROIs above are point estimates derived from each company's own disclosures, not bands computed from the formula in Table 5.2 — they serve as the empirical anchor for where the 5×–60× envelope actually sits.

Table 5.2 — Diagnostic-Output ROI Bands by Profile

Each band below is computed from the diagnostic's underlying model and clamped to the 5×–60× envelope:

y1_roi_lo = max(5,  floor(y1_roi_multiple × 0.8))
y1_roi_hi = min(60, ceil (y1_roi_multiple × 1.1))

Profile	Total opportunity	Y1 net value	Recommended volume	Annual platform cost	Y1 ROI band
Services · $5M · Conservative	~$0.15M	~$0.04M	3,000/mo	$1.2K	25×–35×
Services · $25M · Base	~$0.75M	~$0.17M	25,000/mo	$15.6K	8×–13×
Services · $100M · Base	~$3.0M	~$0.7M	100,000/mo	$69.6K	8×–12×
SaaS · $100M · Base	~$4.0M	~$0.95M	120,000/mo	$84.0K	9×–13×
Goods · $500M · Base	~$11M	~$2.45M	250,000/mo	Custom	negotiated
Services · $1B · Base	~$30M	~$7.2M	1,000,000/mo	Custom	negotiated

Each row is computed strictly from the formulas above — y1_roi_lo = max(5, floor(roi × 0.8)), y1_roi_hi = min(60, ceil(roi × 1.1)). A buyer can reproduce these numbers from the three inputs (industry, revenue, scenario) and the methodology in Appendix A. ROI bands shown for self-serve fits only. Above the 150K/mo ceiling, both pricing and ROI shift into the custom-contract regime where the band reflects negotiated rates.

Chart 5.1 — ROI Band Compresses as Revenue Grows

Year-1 ROI band (lower and upper bound) by annual revenue, Services profile, Base scenario. Small businesses see higher multiples because the $99 floor barely registers against any productive opportunity; larger orgs see compression because absolute opportunity and platform cost both scale roughly linearly with volume. The 5x envelope floor is the empirical threshold below which a deployment generally isn't worth pursuing.

ROI band — lower bound
ROI band — upper bound

Source: Computed strictly from the diagnostic methodology in §5 — Services profile, Base scenario, no challenge-slider overrides.

Bottom Line

The band format is not a hedge. It is the honest representation of the actual uncertainty in any Year-1 AI deployment estimate. A vendor who shows you a single number is asking you to pretend that uncertainty does not exist — which is the same posture that produced the 95% failure rate.

§6 — Decision Matrix by Company Size

The right buying motion is determined more by company size than by industry. Below is the matrix that should drive your decision in 2026.

Table 6.1 — Recommended Motion by Revenue Band

Revenue band	Recommended motion	Who decides	Time to first value	Y1 ROI band
< $5M (Lean / founder-led)	Self-serve · base plan · pilot 1–2 workflows	Founder · CEO · 1st ops hire	1–3 weeks	10×–35× (wide — depends heavily on revenue + ambition within band)
$5M – $25M (Early-stage)	Self-serve · base plan · production pilot on the highest-volume routine workflow	COO · functional lead	3–6 weeks	8×–18×
$25M – $100M (Growth / mid-market)	Self-serve · base plan + scale invocations · production rollout across 2–4 functions	COO · CFO sponsor	4–8 weeks	8×–13×
$100M – $1B (Established mid-market)	Self-serve possible, custom contract worth exploring; production rollout across 4–6 functions	CFO · CIO · COO	8–12 weeks	8×–13× (self-serve) · negotiated (custom)
> $1B (Enterprise)	Custom contract — strategic engagement, dedicated ops engineer, regulated workflows	CFO · CIO · CRO/Compliance	12–20 weeks	Negotiated against contract value

Note: bands above are computed strictly from §5's methodology against representative profiles at the midpoint of each revenue band, Base scenario. The Conservative-vs-Aggressive setting within a band shifts the ROI by ~30% in either direction. Diagnostic outputs that fall outside the band — high or low — are not errors; they are either a genuine outlier or an unrealistic scenario setting and should be re-checked.

Strategic Recommendations

The biggest single mistake at every scale is trying to buy at the wrong vehicle for your size. A $5M founder running a procurement process for a $99 product wastes weeks of time on a credit-card decision. A $1B enterprise buying through self-serve misses every leverage point — volume pricing, SLA, data residency — that a custom contract would unlock.

§7 — Six Worked Scenarios

Each scenario shows how the diagnostic routes the buy for a representative organisation, calibrated against published 2026 benchmarks.

§7.1 — Lean Services Firm ($3M revenue, Conservative)

Profile: 12-person consulting practice. Founder-led. No formal ops function. Want to free senior consultant time for client work.

Diagnostic output:

Recommended volume: ~1,800 invocations/month (under the 5,000 included)
Monthly cost: $99 flat (no overage)
Annual cost: $1,188
Total opportunity: ~$90K/year (3% of revenue, weighted toward Conservative)
Y1 net value: ~$22K
Y1 ROI band: 15×–21×

Decision: Buy. The cost is irrelevant; the only question is which workflow to pilot first.

Recommended pilot: Inbound proposal triage and first-draft response (BD function). Routes through 4–6 agents per inbound, generates ~150–250 invocations/month per active deal. Even at base volume, the consultant team gets ~4 hours/week back. Benchmarks against reported SMB lead-qualification deployments showing response-time compression from hours to 60 seconds.

§7.2 — Growing Services Firm ($25M revenue, Base)

Profile: 80-person services firm. COO function exists, processes mostly manual. Pre-Series B mindset — want to scale without scaling headcount proportionally.

Diagnostic output:

Recommended volume: ~25,000 invocations/month
Monthly cost: $99 + 20,000 × $0.06 = $1,299
Annual cost: $15,588
Total opportunity: ~$750K/year
Y1 net value: ~$172K
Y1 ROI band: 8×–13×

Decision: Buy. Pilot one function for 60 days, expand if the diagnostic's recommendation holds within ±20% of measured impact.

Recommended pilot: Recruiting / HR top-of-funnel. 60–70% of recruiter hours are sourcing and screening; agents can carry that band end-to-end. Volume builds quickly (~3,000 invocations/month per active hiring funnel). Aligns with the McKinsey workflow-redesign finding — the function is redesigned around the agent rather than retrofitted.

§7.3 — Mid-Market SaaS ($100M revenue, Base)

Profile: 350-person SaaS company. Mature support function, growing CS function. Engineering productivity is the largest cost line.

Diagnostic output:

Recommended volume: ~120,000 invocations/month (just under self-serve ceiling)
Monthly cost: $99 + 115,000 × $0.06 = $6,999
Annual cost: $83,988
Total opportunity: ~$4.0M/year
Y1 net value: ~$0.95M
Y1 ROI band: 9×–13×

Decision: Buy self-serve with a 12-month review planned. If sustained volume exceeds 150K invocations/month, transition to custom contract — same platform, negotiated rate.

Recommended sequence: Support (Tier 0 deflection — the function class where Klarna succeeded before overreaching) → Customer Success (retention signals) → Engineering productivity. Each function added compounds the platform's marginal value. Note: keep human handoff cleanly architected from day one — the Klarna 2025 reversal is the warning.

§7.4 — Established Manufacturer ($500M revenue, Base)

Profile: 1,800-person manufacturer, multi-site. Substantial back-office, complex supply chain. Regulated finance function. Procurement-led buying culture.

Diagnostic output:

Recommended volume: ~250,000 invocations/month (above self-serve ceiling)
Vehicle: Custom contract
Total opportunity: ~$11M/year
Y1 net value: ~$2.45M (gross opportunity × 25% capture − $0.3M custom-contract services cost)
Y1 ROI band: contract-dependent (typically 10×–20× at this scale)

Decision: Talk to sales. The published price is informational only; the contract is built around volume discount, dedicated ops engineer, custom SLA, and procurement-aligned terms.

Recommended pilot: Production planning + supply chain coordination first (highest pool size, internal-visibility governance — fastest deployable). AP/AR (finance / back-office) second once governance controls are validated against internal audit requirements. EU AI Act compliance posture is material at this scale — see §11.

§7.5 — Enterprise Services ($1B revenue, Aggressive)

Profile: 4,500-person services firm. Substantial bench, billable utilisation is the controlling metric. Regulated client base. Multi-jurisdiction operations.

Diagnostic output:

Recommended volume: ~1,400,000 invocations/month (deeply enterprise scale)
Vehicle: Custom contract
Total opportunity: ~$45M/year
Y1 net value: ~$11M
Y1 ROI band: contract-dependent (typically 15×–30× at this scale, given billable-delivery compounding)

Decision: Strategic engagement. Dedicated ops engineer, data residency in multiple regions, custom SLA aligned to internal change-management cadence. Multi-year contract typical at this scale.

Recommended sequence: Billable delivery (utilisation + hours-per-job) first — the compounding case is structural; this is where the Klarna-style efficiency story meets the JPMorgan-style scale story. Bench reduction and back-office second. Recruiting / BD third.

§7.6 — Pre-IPO Enterprise SaaS ($1.5B revenue, Aggressive)

Profile: 6,000-person SaaS company. Multiple product lines, large enterprise customer base, regulated industries among customers, IPO timeline within 18 months.

Diagnostic output:

Recommended volume: ~2,500,000 invocations/month
Vehicle: Custom contract with multi-region residency
Total opportunity: ~$75M+/year
Y1 net value: ~$18M+
Y1 ROI band: 15×–25× (negotiated contract)

Decision: Strategic, multi-stakeholder engagement. The contract structure is procurement-led; the deployment is COO-led; the compliance posture is CISO + GC led. Multi-year, multi-region residency, custom SLA, dedicated ops engineer is table stakes at this scale.

Critical considerations:

EU AI Act compliance (§11) for any AI affecting employment, credit, education, law enforcement contexts in your customer base.
Audit substrate must be in place pre-IPO for S-1 disclosure posture.
Workflow-redesign focus (McKinsey 2026) — this is where the 55% vs 20% redesign-rate gap shows up at scale.

§8 — The Klarna Lesson: When Aggressive Becomes Wrong

Klarna's 2024–2025 trajectory is the public case study every enterprise AI decision should be calibrated against. The arc compresses both the upside and the failure mode into 18 months.

Table 8.1 — The Klarna Timeline

Date	Event	Reported impact
Feb 2024	OpenAI-powered CS agent launched	75% of customer chats automated · 2.3M conversations in first month · equivalent of 700 agents
2024 (year)	Continued scaling	Projected $40M annual revenue impact
Q3 2025	Operational efficiency continued	$60M saved · 853-employee-equivalent workload absorbed
Spring 2025	Customer satisfaction drop surfaced	"Focused too much on efficiency. Result was lower quality" — CEO Sebastian Siemiatkowski
Mid–late 2025	Hybrid model launched, human agents rehired	Uber-style flexible workforce model targeting students, parents, rural workers

Source: Klarna public statements; FintechWeekly (2025); Internative Klarna AI Reversal Postmortem; Solutions Review; Acefone.

What Klarna got right (and the diagnostic captures):

Picked one workflow (CS) that was genuinely high-routine.
Measured impact in dollar terms ($60M saved is independently verifiable).
Maintained a public deployment posture that the rest of the market could learn from.

What Klarna got wrong (and the diagnostic's structure warns against):

Over-rotated to AI for interactions where judgment, empathy, or compliance nuance mattered. Confident-but-wrong answers about policy, fees, or payment terms became a regulatory and brand issue simultaneously.
Treated the AI deployment as a labour-substitution exercise rather than a capability-augmentation exercise. The CEO's later framing — "focused too much on efficiency" — is the diagnostic of the failure mode.

Table 8.2 — The Hybrid Architecture That Held Up

Interaction class	Vehicle	Why
Routine queries (order status, basic refunds, account info)	AI agent · full deflection	High volume, low judgment, clear escalation rules
Complex disputes, fraud claims, hardship cases	Human agent · AI assist	Judgment, empathy, compliance nuance matter
Policy interpretation / regulatory edge cases	Human agent · AI suppressed	Confident-but-wrong is a regulatory event

Bottom Line

The durable Klarna lesson is to identify interactions where AI outperforms humans, automate those aggressively, and use the savings to invest in better human capability for interactions where judgment matters. The hybrid model is not a compromise; it is the architecture that performance data consistently supports. The diagnostic's "governed at X%" output is the structural representation of where that human handoff has to live.

Sidebar

Five Klarna-trap indicators a buyer can check against their own deployment. Each one is a signal the deployment is on the trajectory Klarna walked between 2024 and 2025 — and that a course-correction is cheaper now than in 12 months.

CSAT is trending down even as deflection rate trends up. The cost line is improving, the customer line is degrading — and the org is celebrating the cost line.
The AI is answering policy / regulatory / fee questions without an attached human escalation path. Confident-but-wrong on policy is a regulatory event waiting to surface.
There is no scheduled checkpoint to reassess what the AI should and shouldn't handle. Klarna ran the same deflection-aggressive model for ~18 months without re-scoping.
The narrative inside the org has shifted from "augmentation" to "replacement." Once the framing locks to labour-substitution, every interaction tier gets pushed toward AI regardless of whether judgment matters.
Senior judgment is leaving the org because the work it used to do is gone, but the work AI cannot do hasn't been redesigned yet. This is the irreversible step. The skill is the moat; once it walks, rehiring takes 2–3× longer than expected.

§9 — When NOT to Buy

A diagnostic that never recommends not buying is a sales tool, not a decision framework. Below are the conditions under which the diagnostic's output should be discounted or ignored.

Table 9.1 — Conditions That Argue Against Buying Now

Condition	Why it matters	Recommended action
No identifiable routine workflow	The diagnostic assumes ≥45% routine share in at least one major function. If your operations are entirely judgment-led (e.g., bespoke consulting, custom manufacturing), the carried-pool math collapses.	Wait until at least one workflow standardises.
Pre-product-market-fit	The Y1 capture rate assumes operational stability. A business still iterating its core process will pay for AI-assisted execution of a process that disappears next quarter.	Wait until you can name the workflow that hasn't changed in 6 months.
Data privacy or jurisdiction blockers	Below the self-serve ceiling, data residency is standard cloud regions. If you cannot have customer data routed through standard cloud, the self-serve plan is not the vehicle.	Talk to sales — custom contract with regional residency.
Regulator says no	If your regulator requires pre-clearance for AI-mediated decisions in your function (some finance, some clinical), the diagnostic's "governed at X%" estimate is overruled by the regulator's actual posture.	Engage compliance before procurement. Custom contract with explicit policy artefacts.
Team has zero AI literacy	Self-serve assumes someone in the buying organisation can configure a workflow. If no one on the team can name what an agent is, the platform cost is fine but the implementation burden falls on services.	Budget for a 30-day services engagement; treat as a different decision.
Pilot would consume <500 invocations/month	Below this floor the math is real but the signal is too sparse to validate. The ROI band is statistically dominated by noise.	Pilot anyway — but treat the result as directional rather than measurable.
You are in the 95% failure pattern	If you cannot name (a) one function you will redesign around the agent, (b) the human you will free up, (c) the metric you will move — you are in the 95%.	Wait until you can answer all three; then revisit.
"We'll wait until competitors prove it"	This is the FOMO-in-reverse argument and it sounds prudent. It is not. The 5% who succeed are running an 18–24 month head start on operational learning that compounds. By the time the laggard cohort moves, the leaders have already redesigned the workflow and locked in the cost structure.	Run the diagnostic anyway — even if you decide not to buy, the framework documents what your competitors are calibrating against.
"Let's build it ourselves on Anthropic/OpenAI first"	This is the third buying vehicle from §2 and is currently the most common pattern at enterprise scale. The API cost is genuinely lower; the build cost — audit substrate, agent registry, governance policy, observability, regression testing — is what produces the 95% failure rate, not the model.	Be specific about what you are building vs buying. If you are building the substrate alongside the agent, budget 6–12 months of platform-engineering time before measurable P&L impact. If you only want to build the agent, buy the substrate.

Bottom Line

The diagnostic is calibrated to the middle 80% of organisations that fit the profile. The 10% in the tails on either side — too small to generate signal, too irregular to standardise, too regulated to deploy without explicit pre-clearance — are not bad buyers; they are different buyers.

§10 — Reading the Diagnostic Like an Analyst

The diagnostic produces five numbers and one band. Below is what each means and how to use it.

Table 10.1 — Field-by-Field Interpretation

Field	What it answers	How to read it
Total opportunity	If every routine pool were fully captured, what would the annual impact be?	An upper bound. A directional ceiling on what's even worth discussing.
Year-1 net value	After capture rate (25%) and services cost, what lands on the P&L in the first 12 months?	The number the CFO will index against. Always lower than total opportunity.
Recommended invocations	What volume should we expect from realistic deployment across the functions above?	Drives the monthly cost. The signal of which vehicle (self-serve vs custom).
Monthly / annual platform cost	The published rate at the recommended volume.	The number that goes on the procurement form.
Year-1 ROI band	Net value ÷ Y1 platform cost, expressed as a range clamped to 5×–60×.	The number that survives the budget meeting. Read the band, not the midpoint.
Biggest single line	The function with the largest computed impact at your inputs.	The pilot workflow. The one you start with.

Sidebar

Field hygiene. If total opportunity exceeds 8% of revenue, treat as an outlier and investigate — the McKinsey distribution has 80%+ of orgs seeing zero EBIT impact, and the median for the 5% who succeed is in the 2–5% of EBIT range. If the Y1 ROI band tops out at 60× (the clamp), the underlying multiple is "more than 60×" — meaningful at small scale, suspect everywhere else. If recommended invocations sits exactly at the 150K ceiling, the routing has effectively decided for you that this is a custom-contract conversation.

§11 — Compliance: The August 2026 Deadline

The EU AI Act's high-risk AI system enforcement begins August 2, 2026. For any organisation deploying AI in employment, credit, education, or law-enforcement contexts — directly or as a vendor — the compliance posture is no longer optional.

Table 11.0 — EU AI Act Enforcement Timeline

The deadline is not a single cliff. It is the largest of several enforcement gates that began rolling out in February 2025 and continue through 2027.

Date	What enforces	What it means for the buyer
Feb 2, 2025	Article 5 (prohibited AI practices); AI literacy obligations	Already in force. Social-scoring, manipulative systems, untargeted biometric scraping are banned. Staff handling AI must be demonstrably AI-literate.
Aug 2, 2025	GPAI (general-purpose AI) provider obligations	Foundation-model providers (Anthropic, OpenAI, Mistral, etc.) face documentation, copyright, training-data disclosure obligations. Buyers building on these APIs inherit downstream documentation expectations.
Aug 2, 2026	High-risk AI system obligations — Articles 9, 10, 11, 12, 13, 14, 15	The big one. Risk management, training data governance, technical documentation, event logging, transparency, human oversight, accuracy/robustness — all required for high-risk systems in Annex III scope.
Aug 2, 2027	High-risk classification extended to safety components of regulated products (medical devices, machinery, toys, etc.)	Embedded AI in physical products comes under the same regime.

Source: EU AI Act (Regulation 2024/1689) Article 113 (Entry into force and application); European Commission AI Act enforcement schedule.

Table 11.1 — Core EU AI Act Requirements for High-Risk Systems

Article	Requirement	Operational implication
Article 9	Risk management system	Documented, reviewed, updated risk processes; continuous post-market monitoring
Article 11	Technical documentation	Annex IV requirements: system architecture, training data, validation, monitoring plan
Article 12	Event logging	Automatic record-keeping of system operation — the audit substrate that makes per-decision evidence possible
Annex III	High-risk classification scope	Employment, credit decisions, education access, law enforcement contexts

Table 11.2 — Penalties

Violation	Maximum penalty
Non-compliance with Article 9, 10, 11, 12 (high-risk obligations)	€15M or 3% of global annual turnover (whichever higher)
Violation of Article 5 (prohibited AI practices)	€35M or 7% of global annual turnover (whichever higher)
Provision of incorrect information	€7.5M or 1% of global annual turnover (whichever higher)

Source: EU AI Act (Regulation 2024/1689); Tredence EU AI Act Compliance Guide 2026; Raconteur Technical Audit Guide.

Strategic Recommendations

Operational compliance readiness checklist for August 2, 2026:

Inventory every AI system — including "shadow AI" deployed by individual departments. EU AI Act compliance assumes centralised visibility, and most organisations underestimate their actual deployment footprint.
Risk-classify each system (Annex III scope: employment, credit, education, law enforcement contexts).
Article 11 technical documentation for high-risk systems — architecture, training data sources, validation methodology, monitoring plan.
Article 12 event logging in place at runtime — not a quarterly export, a per-decision log. This is what runtime-substrate platforms produce by default; bolt-on monitoring tools do not.
Article 9 risk management process with named owner, review cadence, post-market monitoring framework.
Conformity assessment and CE marking for high-risk systems.
EU database registration for High-risk AI systems.

Bottom Line

The August 2026 deadline reframes the buy decision. For any organisation deploying AI in a high-risk context, the choice is no longer between an audit-capable platform and a non-audit one. It is between a platform that produces per-decision evidence at runtime, and a multi-quarter retrofit programme to build that capability on top of a system that wasn't designed for it.

§12 — Recommended Next Steps by Role

Table 12.1 — One-Page Action by Role

Role	First action (this week)	Second action (this month)	Third action (this quarter)
CEO	Run the diagnostic with your CFO in the room. Five minutes.	If volume < 150K/mo: authorise the credit card. If volume ≥ 150K/mo: take the sales call.	Pick one workflow. Sponsor the pilot. McKinsey: workflow redesign is the #1 ROI determinant.
CFO	Validate the ROI band against the §5 reality check. Anchor the budget conversation in the band, not the midpoint.	Reclassify the line item: this is variable cost tied to operational volume, not a fixed software seat.	Set the 60-day review checkpoint with measured impact vs diagnostic.
COO	Identify the workflow that consumes the most senior-judgment hours today. That's the pilot candidate.	Run the pilot. Measure deflection rate and cycle time weekly. Redesign the workflow around the agent (not the other way around).	If pilot lands within ±20% of diagnostic, expand to function #2.
CIO	Confirm: self-serve is the same platform as enterprise contract. No migration penalty.	Approve security / data-handling posture once for self-serve. Re-approve at custom transition only if data residency or SLA changes materially.	Move the platform from "pilot infrastructure" to "production substrate" in the architecture map.
CHRO	Reframe internally: "reqs we will not file" in growth roles, "work we couldn't afford" in smaller orgs.	Audit function profiles affected by pilot. Where does senior judgment now concentrate?	Update career ladders to reward judgment-tier roles over volume-tier roles. Klarna 2025 is the warning.
CRO / Compliance	Inventory AI systems in scope of EU AI Act (employment, credit, education, law enforcement contexts).	Confirm Article 9, 11, 12 readiness by August 2, 2026. Engage platform vendor on audit substrate posture.	Continuous attestation cadence vs quarterly sampling.
Procurement	Below 150K invocations/mo: there is no procurement process. Stand aside.	Above 150K: lead the negotiation — volume discount, custom SLA, dedicated ops engineer, residency.	Build the standard template for the next acquisition / function expansion.

Appendix A — Methodology

The diagnostic's invocation count is derived from:

recommended_invocations_per_month
  = annual_revenue ($M)
  × invocation_density_for_industry  (Goods 500, SaaS 1,200, Services 1,000)
  × ambition_multiplier               (Conservative 0.6, Base 1.0, Aggressive 1.4)

The total opportunity is computed by summing per-function impacts:

impact_per_function
  = base_amount                       (rev / cogs / ga / rd × revenue)
  × interpolated_rate                 (between fn.lo and fn.hi based on ambition)
  × (routine_pct / 0.5)               (routine share — caller-overridable)
  × (automation_ceiling / 0.6)        (per error-tier: low 0.78, med 0.60, high 0.40)
  × governance_factor                 (per visibility tier — internal / customer / regulator)

total_opportunity = sum across functions of impact_per_function

Year-1 net value is computed conservatively:

year_1_capture_rate = 0.25
services_cost ($M) =
  0.000  if revenue <   $10M
  0.015  if $10M ≤ revenue <  $50M
  0.050  if $50M ≤ revenue < $250M
  0.100  if revenue ≥ $250M and self-serve
  0.300  if custom contract

year_1_net_value = max(0, total_opportunity × year_1_capture_rate − services_cost)

Year-1 ROI band:

y1_roi_multiple = (year_1_net_value × 1,000,000) / annual_platform_cost
y1_roi_band     = { lo: max( 5, floor(y1_roi_multiple × 0.8)),
                    hi: min(60, ceil (y1_roi_multiple × 1.1)) }

The 5× floor and 60× ceiling are not statistical bounds — they are the empirical range where public 2024–2026 enterprise AI deployments published their actual outcomes. The clamp prevents the diagnostic from emitting numbers that are mathematically possible but not credibly defensible against the public benchmark set.

Appendix B — Sources

Market sizing and adoption:

Gartner, "40% of Enterprise Apps Will Feature Task-Specific AI Agents by 2026" (Aug 2025).
Gartner, "Over 40% of Agentic AI Projects Will Be Cancelled by End of 2027" (June 2025).
Gartner, 2026 Hype Cycle for Agentic AI.
Gartner, Strategic Predictions for 2026.

Enterprise AI ROI and outcomes:

McKinsey, "The State of AI in 2025: Agents, Innovation, and Transformation."
McKinsey, "State of AI Trust in 2026: Shifting to the Agentic Era."
McKinsey, "The State of AI: How Organisations Are Rewiring to Capture Value."
MIT Project NANDA, "The GenAI Divide: State of AI in Business 2025" (July 2025).
AI Monk, "12 Agentic AI Examples with Measurable ROI: Enterprise Case Studies 2025–2026."

Case studies:

Emerj, "Artificial Intelligence at JPMorgan Chase."
FintechWeekly, "Klarna Reverses Course on AI Customer Support, Resumes Human Hiring" (2025).
Internative, "Klarna's AI Reversal: A Postmortem in 3 Lessons" (2025).
Solutions Review, "Klarna's AI Layoffs Exposed the Missing Piece: Empathy" (2025).
Acefone, "Klarna's AI U-Turn: Key Learnings about Reliable AI Adoption in CS" (2025).

Pricing and market structure:

Bessemer Venture Partners, "The AI Pricing and Monetisation Playbook" (2026).
Valueships, "AI Pricing in 2026: SaaS Pricing Models That Actually Work."
MindStudio, "SaaS Pricing Is Breaking: Why Per-Seat Models Don't Survive the AI Agent Era."
Awesome Agents, "Agent Platform Pricing Compared 2026."
LangChain published pricing (2026).

Regulatory:

European Union, AI Act Regulation 2024/1689 (Annex III, Articles 9, 11, 12).
Tredence, "EU AI Act 2026 Compliance Guide for US Companies."
Raconteur, "EU AI Act Compliance: A Technical Audit Guide for the 2026 Deadline."
SecurePrivacy, "EU AI Act 2026: Key Compliance Requirements for Enterprises."

SMB-specific findings:

Reinventing.ai, "AI Agent Trends: ROI Pressure Pushes Enterprises to Orchestration, While SMB Adoption Accelerates" (March 2026).
Glivera, "Agentic AI for Small Business" (2026 guide).
Aalpha, "AI Agents for Small Businesses — In-Depth Guide" (2026).

Appendix C — Glossary

Invocation — One agent receiving an event, completing its task, and publishing its output. A single business workflow typically routes through 5–10 invocations end-to-end.
Invocation density — Industry-specific multiplier expressed as invocations per $M of annual revenue per month. Reflects multi-agent chains per business event in that industry.
Routine share — The portion of a function's pool that is rule-bound or pattern-bound enough for an agent to carry. Differs by function within an industry.
Automation ceiling — The maximum portion of routine work an agent can carry hands-off before human-in-the-loop is required. Depends on the cost of an error in that function (low / medium / high).
Governance factor — The portion of eligible routine work that actually moves to agents in a given workflow, given the controls that workflow needs and the controls the buyer can put in place.
Self-serve ceiling — 150,000 invocations / month. Below this, the published rate applies. Above this, a custom contract is required.
Year-1 capture rate — The portion of annual opportunity realistically captured in the first 12 months. Calibrated at 25% based on enterprise deployment ramp curves.
Year-1 ROI band — Year-1 net value divided by annual platform cost, expressed as a range clamped to 5×–60× (the range where real enterprise AI deployments actually land).
Services cost — One-time implementation engagement cost. Scales with company size: $0 (<$10M revenue) → $300K (custom contract).
Self-serve plan — The published $99/month + $0.06/invocation subscription. Available below the 150K invocations/month ceiling.
Custom contract — Negotiated annual contract above the self-serve ceiling. Volume-discounted invocation pricing, dedicated ops engineer, custom SLA, data residency.
The 95% — MIT Project NANDA's finding that 95% of enterprise GenAI pilots show zero measurable P&L impact, primarily due to integration and workflow-redesign failures (84%) rather than technology failures.
The 5% — The minority of organisations achieving rapid, measurable impact. Characterised by buying from specialised vendors (67% success rate vs 33% for in-house build), redesigning workflows around the agent (55% vs 20% for non-top-performers), and starting with one painful, well-defined function rather than horizontal deployment.
Human-in-the-loop (HITL) — A workflow design where an agent's output is reviewed, approved, or modified by a human before it takes effect downstream. The complement of an "autonomous" workflow. Required wherever the cost of an agent error exceeds the cost of human review time.
Guardrail — A runtime constraint applied to an agent's output (or input) that blocks, rewrites, or escalates the action when the output violates a policy. Examples: PII redaction, output-class restriction, regulatory disclosure language, refusal to make policy interpretation statements. Guardrails are the operational implementation of Article 14 (human oversight) and Article 15 (accuracy/robustness) of the EU AI Act.
Policy artefact — A machine-readable document defining what an agent may and may not do in a specific workflow context, who can override which decisions, what audit information is captured per decision, and what fallback path applies when a decision exceeds the agent's authority. Policy artefacts are the substrate that makes Article 11 technical documentation and Article 12 event logging produceable at scale.
Audit substrate — The persistent, per-decision record of every agent action: input, output, policy version, model version, intervening human approvals, downstream effect. The audit substrate is the layer that distinguishes a governed agent platform from a bolt-on monitoring tool — it is captured at runtime, not retrofitted from logs.
Agent-density category — A software category in which the unit of work is an event (a claim, a ticket, a deal, a query) rather than a user. Per-seat pricing inverts in agent-density categories because one agent does the work of many seats; usage- or invocation-based pricing dominates.

Right-Sizing AI Operations: A Decision Framework for 2026

§0 — TL;DR

§1 — The 2026 Reality

Table 1.1 — The Two-Group Market, Stated Honestly

Chart 1.1 — The Buy-vs-Build Success Gap

Table 1.2 — Where the 5% Concentrate

§2 — Why the Buy Decision Is Different in 2026

Table 2.1 — How the Three Shifts Change the Buy Vehicle

§3 — The Three Inputs That Determine Everything

Table 3.1 — The Three-Input Diagnostic Model

Table 3.2 — Invocation Density, Industry-Calibrated

Chart 3.1 — Invocation Density by Industry

Table 3.3 — Ambition Multiplier

Table 3.4 — Worked Examples

§4 — Volume Drives the Vehicle

Table 4.1 — Self-Serve vs Custom: The 150K Line

Chart 4.1 — Monthly Platform Cost on the Self-Serve Plan

Table 4.2 — Market Pricing Context (2026)

§5 — Year-1 ROI Is a Band, Not a Number

Table 5.1 — The 5×–60× Envelope, with Public Benchmarks

Table 5.2 — Diagnostic-Output ROI Bands by Profile

Chart 5.1 — ROI Band Compresses as Revenue Grows

§6 — Decision Matrix by Company Size

Table 6.1 — Recommended Motion by Revenue Band

§7 — Six Worked Scenarios

§7.1 — Lean Services Firm ($3M revenue, Conservative)

§7.2 — Growing Services Firm ($25M revenue, Base)

§7.3 — Mid-Market SaaS ($100M revenue, Base)

§7.4 — Established Manufacturer ($500M revenue, Base)

§7.5 — Enterprise Services ($1B revenue, Aggressive)

§7.6 — Pre-IPO Enterprise SaaS ($1.5B revenue, Aggressive)

§8 — The Klarna Lesson: When Aggressive Becomes Wrong

Table 8.1 — The Klarna Timeline

Table 8.2 — The Hybrid Architecture That Held Up

§9 — When NOT to Buy

Table 9.1 — Conditions That Argue Against Buying Now

§10 — Reading the Diagnostic Like an Analyst

Table 10.1 — Field-by-Field Interpretation

§11 — Compliance: The August 2026 Deadline

Table 11.0 — EU AI Act Enforcement Timeline

Table 11.1 — Core EU AI Act Requirements for High-Risk Systems

Table 11.2 — Penalties

§12 — Recommended Next Steps by Role

Table 12.1 — One-Page Action by Role

Appendix A — Methodology

Appendix B — Sources

Appendix C — Glossary

Cookie preferences