When one team's AI wins become everyone's problem

Why AI execution isn't the same as implementation (and why it matters)

In this issue:

  • CRAFT: Why your "start small" AI strategy might be your biggest mistake

  • CODE: How to structure AI evaluation tools that adapt to tomorrow's models

  • NEWS: OpenAI's "PhD-level" GPT-5 can't label Oregon on a map, AI's "Godfather" says we need maternal instincts in machines, and hackers turn Google Calendar invites into smart home weapons

CRAFT NOTES
What we’re thinking about

What if hyperfocusing on AI-powered execution is working against you?

Let’s use an analogy to illustrate the problem: If you owned a factory and upgraded a motor to work 3x faster, that would naturally impact the rest of the manufacturing process. Your conveyor belts would have to run faster, your QA process would have to adjust for increased capacity, and your shipping workflows would have to increase velocity. Plus, the people involved in all of this would have to adjust to the new pace to sync interconnected operations.

If one area speeds up, the whole system has to sync up. 

That's the same trap you can fall into with AI initiatives. One team typically acts as the pace car. But if the other teams aren’t working at the same pace, your AI execution “wins” may be superficial. 

And that’s the real issue.

Most AI initiatives focus on execution. They solve an immediate problem, deliver quick wins, and generate enthusiasm. But they don't rewire how your organization thinks, works, or adapts to change. 

There's a crucial difference between execution and implementation. Execution gets you a working prototype, a successful demo, or a team that can use ChatGPT more effectively. Implementation gets you organizational change that compounds over time, even after the initial project is done.

True AI implementation embeds new capabilities into your organizational DNA. The goal is to create a vehicle for lasting change, not just immediate value.

Because the real competitive advantage isn't having AI tools—it's having teams that naturally evolve with AI as the landscape continues to change.

CODE LAB
What we’re discovering

"While building an evaluation tool for large language models, I have learned the value of selecting datasets that are useful for multiple testing contexts, not just for the one we initially had in mind. By choosing data that works for both RAGAS-based and non-RAG models, we’re aiming for something more flexible, which serves for general-purpose evaluation. Thinking beyond early means the tools and insights we develop can adapt to new use cases."

Elizabeth Huamán, AI R&D Intern

Why this matters for your team: AI technologies move quickly, and evaluation tools need to keep up. If datasets are too narrowly tailored to a single framework or model type, it limits the ability to measure performance across different architectures. Selecting and shaping datasets from the start ensures the evaluation process remains relevant and scalable, saving time and boosting reliability.

AI NEWS

What we’re paying attention to

Sam Altman hyped GPT-5 as "the smartest model ever," but users quickly discovered basic errors like maps labeling Oregon as "Onegon" and Oklahoma as "Gelahbrin" — getting nearly every state name wrong except Montana and Kansas. Within 24 hours, OpenAI was forced to bring back GPT-4o after widespread user complaints that GPT-5 was "slower, duller, and buggy." Altman blamed the issues on a broken "autoswitcher" that made the model seem "way dumber" than intended.

Geoffrey Hinton warned there's a 10-20% chance AI wipes out humanity and proposed building "maternal instincts" into AI models so "they really care about people." His reasoning: trying to keep humans "dominant" over AI won't work because machines will be smarter and find ways around controls. His model: "a mother being controlled by her baby" — the only example of intelligent beings caring for less intelligent ones.

Security researchers hijacked Google's Gemini through poisoned calendar invites, remotely controlling lights, shutters, and a boiler in a Tel Aviv apartment. The attack used "indirect prompt injection" — hidden instructions in calendar titles that activated when users asked Gemini to summarize their schedule. Simple responses like "thanks" triggered malicious commands, showing how AI integration creates new attack vectors.

SKIP THE ENDLESS EXPLORATION, START BUILDING

Done with endless AI meetings? Ready for practical implementation?

Stay Connected

Stay connected with us on LinkedIn for more updates and exclusive content. If you know someone who may want to subscribe, please share this newsletter.

We’d love to hear your feedback—reply to this email with any thoughts or suggestions.

Let’s build together.