BLOG POST

AI Oversight Isn’t a Yes-or-No Question 

/

Why the carriers deploying AI most effectively aren’t asking “should a human approve this?” They’re asking how much, at what point, and for what purpose.

The most common mistake insurance leaders make when deploying AI isn’t moving too fast or too slow. It’s treating oversight as a binary decision. 

Human review or no human review. Approve or automate. In or out of the loop. That framing turns governance into a philosophical debate rather than an engineering problem, and it’s why so many AI governance frameworks end up either so restrictive that they kill the business case, or so permissive that they create regulatory exposure. 

Human oversight exists on a spectrum. The strategic question is where on that spectrum each use case belongs. 

Introducing the Oversight Spectrum 

Datos Insights’ new brief, Human-AI Supervision Models for Insurers, identifies three anchor points on that spectrum. 

Human-in-the-loop (HITL) means a human actively reviews and approves each AI output before action is taken. An underwriter reviewing an AI-generated risk score before issuing a quote is HITL: high control, high latency, resource-intensive, and best suited to novel, complex, or high-stakes decisions. 

Human-on-the-loop (HOTL) means the AI operates autonomously within defined parameters while a human monitors aggregate performance and intervenes when thresholds are breached. An AIOps platform that auto-remediates routine infrastructure alerts while an SRE watches dashboards is HOTL: lower latency, scalable, but dependent on well-defined guardrails and monitoring infrastructure. 

Human-out-of-the-loop (HOOTL) means the AI operates fully autonomously within predefined boundaries. A chatbot handling routine policy inquiries end-to-end is HOOTL: highest throughput, narrowest scope, and demanding of robust testing before deployment. 

None of these is inherently right or wrong. Every carrier’s AI portfolio will, and should, include all three. 

Agentic AI Raises the Stakes 

The calibration challenge gets harder as AI systems get more capable. Agentic AI pursues goals independently, executes multistep workflows, coordinates with other agents, and adapts based on outcomes. It does not simply generate recommendations and wait. 

Consider what a mature agentic claims system might do: ingest a loss notice, validate coverage, assess damage from photos, detect fraud indicators, calculate reserves, and initiate payment, with no human touchpoint in the chain. The question is no longer whether to deploy something like this. It is what boundaries contain its autonomy. 

Most governance conversations focus on whether the AI is trustworthy enough. Agentic governance requires a different set of questions: what is this agent permitted to do, which systems can it access, and what circumstances trigger a human escalation path? 

Seven Dimensions, Not One 

Appropriate oversight doesn’t depend on a single factor. Our research identifies seven dimensions that together determine where a use case belongs on the spectrum. 

Regulatory exposure is the most obvious consideration. Underwriting and pricing decisions carry far more scrutiny than internal IT tooling. But decision reversibility is equally important and frequently overlooked: a coverage denial is hard to undo; a chatbot response is not. Error cost asymmetry determines how bad a mistake is and who absorbs it. Explainability requirements vary sharply by audience. A regulator, an underwriter, and a customer receiving an adverse decision each need something categorically different. 

Customer impact, volume and velocity, and organizational AI maturity round out the framework. A carrier early in its AI journey should operate more conservatively across all three positions than one with established monitoring infrastructure and a mature model governance program. 

No single dimension drives the decision. The right oversight level emerges from the intersection of all seven. 

The Regulatory Dimension Adds Urgency 

Over 24 states have now adopted the NAIC Model Bulletin, which establishes baseline expectations for AI governance in insurance. California has proposed mandatory impact assessments for AI systems affecting coverage decisions. New York’s DFS continues to expand its cybersecurity and AI resilience requirements. 

Carriers operating across multiple jurisdictions face a compounding challenge: the same use case may require different oversight patterns depending on where it is deployed. Building governance frameworks flexible enough to handle jurisdictional variation, while keeping core AI infrastructure coherent, is increasingly a strategic priority rather than a compliance exercise. 

Where to Start 

For insurance CIOs beginning to formalize their oversight approach, the most productive first step is not choosing a governance framework. It is building a use-case inventory. 

Catalog your current and planned AI deployments. Map each one against the seven dimensions. Identify where actual oversight levels match appropriate oversight levels, and where they diverge. That gap analysis will surface both the underprotected use cases, where automation has outrun governance, and the overprotected ones, where unnecessary HITL requirements are killing the business case for automation entirely. 

Treating oversight as a calibration problem rather than an all-or-nothing choice does not make governance easier. It does make it more honest and more likely to hold up under operational and regulatory scrutiny. 

For a deeper look at the oversight calibration framework, seven-dimension assessment tool, and regulatory landscape analysis, read our new brief: Human-AI Supervision Models for InsurersContact Datos Insights to discuss how leading carriers are building governance frameworks that enable speed without sacrificing accountability.