Six Months Into the Year: What’s Actually Working in AI (and What’s Not)

Travis Rehl’s midyear read: the winners aren’t chasing smarter models. They’re solving for cost, correctness, and trust at production scale.

Six months into the year, here’s what I think most people aren’t saying out loud: AI is more expensive than anyone planned for, and the model economics most companies are operating under don’t hold up in production.

Frontier models are powerful, but they’re not always the best business choice. Open-source models and small language models are increasingly good enough for a growing set of production workloads, especially when the task is narrow, the context is controlled, and the quality bar is clear. The hard part is knowing which workloads can move down the cost curve without losing reliability.

That’s where the real work is happening now: not chasing the smartest model, but building a model strategy by use case. Some workloads still need frontier-class reasoning. Others don’t. Companies that figure out the difference will scale AI with better economics. Companies that don’t are going to get caught off guard when their AI bill at scale doesn’t match the pilot math.

That’s the conversation that needs to happen. But it’s not the only one. Here’s what’s actually working, what isn’t, and what I’d focus on next.

What’s actually working in AI right now?

What’s working isn’t “AI everywhere.” It’s AI applied to repeatable, open-ended tasks where the inputs are messy but the outcome is clear.

Document processing is where I’d point first. Not because documents are trendy. Because the old approach is breaking down and agents are picking up the slack. We work with organizations that process high volumes of invoices, claims, forms, and contracts where the document formats shift constantly. OCR models can’t be tuned fast enough to keep up. You retrain for one layout and three more show up. So instead, we deploy agents that review documents, extract the right data, do the data entry, and handle the interpretation layer that OCR was never built for. These are repeatable tasks, but they’re open-ended enough that rigid automation fails. Agents succeed because they adapt to the document in front of them while still operating within a defined scope.

And agents are real. But the ones that work are built like employees with a job description, not interns with admin access. We’ve built agents on Amazon Bedrock AgentCore using Strands Agents that pay invoices, submit insurance claims, and delegate work to humans when they hit the edge of their authority. They have specific goals, defined requirements, and a curated set of tools. An agent that can do anything usually does the wrong thing. An agent that can do three things well, and knows when to hand off, actually ships.

What’s not working (and why it’s failing quietly)

The biggest failures I see midyear aren’t dramatic. They’re silent. And expensive.

We’ve lived the cost problem directly. We’ve migrated customers from OpenAI to Anthropic on Amazon Bedrock to get better cost control within the AWS ecosystem. We’ve migrated customers from Anthropic Claude to open source models like GLM 5.1 and Kimi K2.5 to reduce costs further when the use case allowed it. And we migrated DarcyIQ itself off frontier models onto custom open source models we host internally, which reduced our AI costs by 70%. That’s not a projection. That’s a production system running right now at a fraction of the cost, because we did the work to figure out which models actually needed to be frontier-class and which ones didn’t. Most teams haven’t done that work yet, and the bill is coming.

The second failure mode is quieter: the demo gets applause, the pilot technically “works,” then adoption fades. Leaders assume the team lost interest. The real issue is that the organization never defined what a correct outcome looks like, or who is accountable for confirming it. AI introduces a new layer of operational work: reviewing outputs, validating decisions, correcting errors, and deciding when a result is ready to use. If that responsibility isn’t designed into the workflow, it defaults to the busiest people on the team. And when that happens, they’ll quietly route around the tool rather than take on another invisible burden.

This is also why broad “enablement” programs underperform. Teaching everyone how to write prompts is not an AI strategy. It creates scattered experiments, inconsistent quality, and a false sense of progress. The organization becomes proficient at trying things, but it doesn’t compound into a capability that leadership can rely on.

What is the midyear lesson?

If I had to summarize the difference between teams winning and teams stalling, it’s this: the winners don’t treat trust as a feeling. They build systems that confirm the work was done correctly at the point of action, not after the fact.

The common approach to AI trust is guardrails (external systems that check output after it’s generated) or human-in-the-loop endpoints where a person approves every action. Those work, but they don’t scale. Every guardrail is a negotiation. Every human approval is a bottleneck. Neither one makes the agent more trustworthy; they just catch it when it isn’t.

The path to more autonomous and trustworthy agents is different: build tools that deny bad behaviors at the point of interaction. When an agent calls a tool to submit an invoice, to file a claim, or to update a record, the tool itself evaluates the request and pushes back if the quality isn’t good enough.

We’ve seen this work at two levels. The basic level: an agent tries to submit an invoice missing required fields, and the tool rejects the submission and tells the agent what’s missing. The agent has to resolve it before proceeding. The more sophisticated level: an agent consumes content through a tool, and the tool runs a semantic comparison using embeddings to verify the content meets a quality threshold or aligns with expected output. If it falls below target, the tool denies the interaction. The tool becomes the quality gate, not a human reviewer, not an after-the-fact filter, but the interaction point itself.

This is what gives agents real autonomy without creating real risk. The agent can operate within its scope because the tools it interacts with enforce the boundaries. You don’t need to approve every action. You need to build tools that won’t accept bad ones.

What can leaders do in the second half of the year?

If the first half was about experimenting with models, the second half should be about two things: getting honest about cost, and building the architecture that makes correct outcomes repeatable and trust earned.

First, do the model math. Map every AI workload to the model that actually fits it. Not every use case needs a frontier model, and the ones that do need to justify the cost with production-grade outcomes. We’ve seen 70% cost reductions by being deliberate about this. The savings are real, but they require the discipline to evaluate each workload on its own terms, not default to the most capable model for everything.

Second, design your tools to be the trust layer. Don’t rely solely on guardrails and human approval loops. Build the quality enforcement into the tools your agents interact with. Make the tool the thing that says no when the output isn’t good enough. That’s how you get agents that can operate with real autonomy without creating real risk.

The second half of the year is where AI stops being a set of experiments and becomes part of how the business earns trust. The teams that get there first won’t be the ones with the smartest models. They’ll be the ones who solved for cost, built trust into the architecture, and owned the outcome from end to end.

Make the second half count

If you want AI to move from promising to dependable by year-end, Innovative Solutions can help you get the model economics right and build the quality controls that make AI outcomes repeatable and trusted. Let’s make the back half of the year the part that ships.

FAQ

What’s actually working in AI for businesses right now?

AI is working best in repeatable, open-ended tasks, especially document processing where formats shift constantly and OCR can’t keep up. Agents that review, interpret, and enter data are outperforming rigid automation because they adapt to the input while staying within a defined scope. Bounded agents with clear job descriptions, like those built on Amazon Bedrock AgentCore, are delivering real production value for invoice processing, claims submission, and human delegation.

Why do AI pilots succeed in demos but fail in production?

Two reasons. First, the cost economics don’t hold. Frontier models are often more expensive than teams budget for, while open-source and smaller models can reduce cost when they’re matched to the right workload. Companies hit production scale and discover the math doesn’t work when they default to one model strategy for everything.

What should leaders prioritize in the second half of the year to get real AI impact?

Two things: get honest about model costs by mapping each workload to the right model, not the most expensive one, and build trust into the architecture by making tools that enforce correct outcomes at the point of execution. That’s what turns AI from experimentation into a capability the business can stand behind.

Contact Us For More Information

Related Case Studies

Innovative Solutions Launches Forward Deployed Services, Bringing Embedded Engineering to Growing Businesses

Innovative Solutions, an Amazon Web Services (AWS) Premier Tier Services Partner that delivers AI and data services to growing businesses, today announced the launch of Forward Deployed Services, a new service line that embeds Forward Deployed Engineers (FDE) and architects directly into customer environments to build, manage, and continuously optimize AI and cloud solutions. The offering brings FDE delivery to the growing businesses that need it most.

I Trust, Global Politics & Agentic Delivery - AI Unplugged Episode 14

AI Trust, Global Politics & Agentic Delivery

AI is speeding up work. But speed is starting to expose a new problem: trust. In Episode 14 of AI Unplugged: AI Trust, Global Politics & Agentic Delivery, we unpack why AI favorability can fall as usage rises, especially when outputs feel influenced, accuracy feels inconsistent, and people start questioning what’s behind the answer.

AI Unplugged is available on YouTube and where ever you listen to podcasts.

Automation vs Agentic AI

Most “AI automation” is just a fancy if-this-then-that. That’s useful, but it hits a ceiling fast. The big jump comes when AI can handle a whole task end-to-end and safely. It can figure out the steps, pull the right info, use the right tools, and know when to stop or ask a human. That’s an agentic workflow.