Why metrics matter more than models

The AI approach that creates systems that work perfectly and fail completely

In this issue:

  • CRAFT: The core problem of picking models before defining success

  • CODE: What do do when complexity scales faster than your engineering team

  • NEWS: Infrastructure goes sovereign, coding barriers disappear, and healthcare gets its humanity back

CRAFT NOTES
What we’re thinking about

Most teams approach AI from the wrong starting point.

They pick a model, build features, and then figure out how to measure success. But effective, production-grade AI should start with one fundamental question: what does "working" actually mean for your users?

Here's what we're seeing in the field: Teams are spinning their wheels troubleshooting AI systems that are simultaneously "working" and completely failing. OpenAI's recent acknowledgment that even their most advanced models still hallucinate explains why so many teams are still struggling.

"Intelligent" systems start making decisions that technically work but miss the mark entirely. They complete tasks efficiently but choose the most expensive path every time. They give accurate answers to the wrong questions.

For an AI agent handling customer support, success might be resolution efficiency, not perfect grammar. For a code review assistant, it might be catching critical bugs, not finding every style inconsistency. Define these metrics early, and everything else becomes clearer.

The uncomfortable truth is that better AI performance often means exponentially higher costs. Your evaluation framework must help you understand this performance-cost relationship before you're deep in production optimization hell.

The teams that treat evaluation as core product capability, not a development checkpoint, are the ones building AI systems that actually drive value.

CODE LAB
What we’re discovering

"In the DevOps world, it is a best practice to use infrastructure as code to create and manage the resources in the cloud. What if we have multiple environments and multiple cloud providers?

To deal with this challenge, we use AI to generate part of the code to accelerate the development process, and we have patterns, templates, and modules that we can use for a variety of situations.

It allows us to deliver value in a shorter period of time by using AI and a good architecture to generate the relevant code we need."

 Edgar Peixoto, DevOps Engineer

Why this matters for your team: Your infrastructure complexity is scaling faster than your team can manage it, and that's where most AI implementations fall apart. The real win isn't faster code generation—it's reducing the cognitive load on your engineers so they can focus on solving business problems instead of wrestling with environment-specific configurations.

When your senior engineers spend less time translating requirements across different cloud providers, they have more bandwidth for the strategic work that actually moves your company forward. This is especially critical for mid-market teams where every engineering hour counts and you can't afford to have your best people stuck in necessary but “invisible” work.

AI NEWS

What we’re paying attention to

Nvidia and OpenAI are backing a major UK infrastructure investment potentially worth billions, focused on data center development as part of the growing "sovereign AI" movement. Countries worldwide are courting major US AI players to boost their own national infrastructure and reduce dependency on foreign technologies. For IT leaders, this signals a fundamental shift in AI strategy: where your compute resources are located will increasingly matter for compliance, performance, and geopolitical stability. The trend toward data sovereignty means your AI infrastructure decisions today will determine your operational flexibility tomorrow, especially as governments prioritize onshoring critical AI capabilities.

The uncomfortable vibe-coding experiment that may send all the wrong messages

A writer with zero coding experience shipped features to 100 million Notion users in 30 minutes using AI-assisted coding. The uncomfortable reality: 30-40% of code at tech companies is now AI-generated, and teams are hiring only engineers "bullish on coding tools." For IT leaders, this signals a fundamental shift in technical hiring requirements and team composition. The question isn't whether AI democratizes coding—it's whether your organization can maintain quality standards when technical barriers disappear and the need for architectural judgment becomes paramount.

The healthiest AI use case ever?

AI “scribe” technology eliminates physician note-taking during patient visits, freeing doctors to  go home on time instead of charting at night. Physician AI usage jumped from 38% to 66% in one year, with doctors reporting more face-to-face patient time and "wholesome conversations." For IT leaders, this demonstrates how AI wins when it removes administrative burden rather than replacing core expertise. The key outcome: physicians get their evenings back while patients get better attention during visits.

SKIP THE ENDLESS EXPLORATION, START BUILDING

Done with endless AI meetings? Ready for practical implementation?

Stay Connected

Stay connected with us on LinkedIn for more updates and exclusive content. If you know someone who may want to subscribe, please share this newsletter.

We’d love to hear your feedback—reply to this email with any thoughts or suggestions.

Let’s build together.