Click the Subscribe button to sign up for regular insights on doing AI initiatives right.
2025 Wrap
This will be the last post before this newsletter goes on a holiday break. Back in early January! Given that time of year, I want to reflect on the writing.
I started almost a year ago and with some interruptions (pneumonia, sick kid(s) at home, vacation) posted a short article every business day
That’s a total of 177 articles (not counting this one)
It’s been a lot of fun to come up with topics and get feedback from you, the dear readers
Topics have admittedly been all over the place: Some AI, some project management (agile etc), some general principles.
What have I experienced and learned?
If you make a commitment to daily writing, you subconsciously pick up more ideas! You read differently, you listen differently, you notice differently.
Daily really is easier than weekly or, god forbid, “whenever I can think of something” (i.e. never)
I find myself able to speak more clearly about topics I have written about. It feels less off-the-cuff and more internalized.
Stationary company Leuchtturm1917 has a slogan: “Thinking with the hand.” That’s what writing is, and it really works wonders. Working through an issue means writing about it, because then your jumbled thoughts get sorted out logically on paper.
What’s next?
I hope to keep this newsletter useful and make it even more useful by being sharper about the topics and themes I touch on, adding a specific lens. That means:
Fewer super-nerdy posts. (Maybe I need to find a separate outlet where I can rant about Rust versus C++, Tailwind versus Cube, or Modal versus AWS Lambda)
More business-y posts. What does all the AI stuff mean for people who have a business to run, or a department in a larger business?
Incidentally, if you feel that some posts were more relevant to you than others, please let me know!
Thank you!
Thanks for staying with me for these posts. Maybe they even inspire you to start a daily writing practice of your own. If you do, let me know so I can return the favour and be a reader!
To close out, I just have one favour to ask: If you enjoy these posts, why not convince a friend or two or three to check them out and sign up? Just point them to https://aicelabs.com/articles
Cheers and see you in ‘26
Have Better Problems
AI won’t solve all your problems, and that’s okay. Instead, if used correctly, it will let you trade one set of problems for another, which, hopefully, are better problems to have. This is true for any business initiative, whether it uses AI or not, but it’s worth internalizing:
By automating drudgery tasks, it lets you trade the problem of “my best people are wasting time on manual data extraction tasks” for the problems of “I have to find productive things for my team to do now that they have all that time freed up” and “I have to maintain this automation tool”.
By scaling up a process such as lead generation, you trade the problem of not having enough leads for the problem of having to follow through on all these great opportunities (but see the caveat below).
By having AI reduce the amount of work required to produce a given output (whether that’s images, ad copy, or code), you trade the problem of being bottlenecked on creation for the problem of being bottlenecked by taste and good judgment.
Just make sure the trade is in your favour:
Don’t trade the problem of not knowing what to write on LinkedIn for the problem of being viewed as a purveyor of slop.
Don’t trade the problem of taking too long to write code for the problem of drowning in low-quality, brittle, vibe-coded spaghetti code.
Don’t trade the problem of not having leads for the problem of being an annoying spammer.
Use good judgment. As long as you are upgrading your problems, you are making progress and coming out ahead.
So, what problem are you ready to upgrade?
Want AI for your business? Start here.
There are multiple ways to get your company started on its AI journey. The simplest: hand everyone access to ChatGPT, Microsoft Copilot, or whatever tools you're already using, turn on all the AI features, and call it a day. But let's focus on AI projects meant to solve one specific business pain. Where do you start so you can get your feet wet without drowning?
Here are the questions we'd work through with a client:
Is there a burning problem?
Something constantly draining your resources, or massively getting in the way? Why does it need solving, and what would "solved" look like and unlock in your business?
This avoids doing AI for AI's sake—applying it to something just because you can, or optimizing an area that doesn't need optimizing.
If you don't have such problems, don't be sad you can't throw AI at them. Be glad you don't have big problems. ("I really wish my house was on fire so I could use this cool firefighting robot I built…")
What have you considered so far?
Assuming you have a burning problem, what solutions have you attempted, and why didn't they work? Have you tried more people, off-the-shelf software, consultants? What were the outcomes?
If you haven't tried anything, we have to question whether it was actually a big problem. It's possible you haven't tried because, until AI, it was considered intractable—something you just lived with. Fair enough. If you have tried things and they've failed, that gives important intel into the assumptions, surprises, and unexpected twists. Even a failed attempt provides valuable learning.
Can we identify a small, self-contained piece that delivers some value?
Big burning problems that defy simple solutions often span multiple areas. Still, for an initial project, find an initiative with a small blast radius. Solving the entire problem might require a huge effort, but we can define something with limited scope and reduced features to prove out value and build momentum.
The pitfall: something that's only useful once all its many components have been built. Be suspicious of long projects with late payoff. Seek out short projects with early payoff, even if small. The 80/20 rule works in our favor—80 percent of value unlocked with 20 percent of effort. And after that, we don't slavishly work until 100% completion. We inspect, adapt, use feedback from a successful rollout to define a new project where we play the 80/20 game again.
If you're on board in principle but find this too generic, let's have a chat about how this could work for your business. And if you know someone curious about where between AI hype and AI doom there might be some value for them, send them my way.
Product Engineering - Back to the Basics
It's funny how advice goes in circles. Lately there's been lots written about "Product Engineers." Now that AI makes coding faster and cheaper, the advice goes, engineers must look beyond writing code and get involved in product decisions—understanding how features impact the user experience and the business.
Only, that sounds awfully similar to what agile software development was supposed to be about. Engineers were always meant to be problem-solvers first, with tools and methods coming second.
What happened? "Agile" got hollowed out and software engineering became software delivery. The product owner orders code and gets it delivered, with the delivery person having neither care nor clue about the final purpose—the way your DoorDash driver couldn't care less what food you ordered and why.
Building software well requires deep understanding of what it's going to be used for. This is especially true when building for non-technical stakeholders: you know the business purpose, but you can't weigh the technical trade-offs. That's the engineer's job—and it only works if they understand your problem as well as you do.
AI has sharpened this reality. When writing code gets cheaper, the bottleneck shifts to judgment: knowing what to build and why. The engineers who thrive now are the ones who were always thinking this way.
For founders, that's good news. The industry is rediscovering what good engineering looks like—which makes it easier to spot. When you're evaluating technical help, ask how they'd approach understanding your problem before writing a line of code. If the answer jumps straight to tools and frameworks, you're talking to a delivery driver.
The AI Toolbox, Unpacked
"AI" gets thrown around as if it's one thing. It's not. It's a grab bag of different tools, each with different strengths, limitations, and price tags. Here's a quick reference card.
Roughly ordered from simple/predictable to complex/probabilistic.
Rule-based Automation (sometimes called RPA - Robotic Process Automation) If-then logic. "When a support ticket contains 'refund request,' route it to the billing team." Or: "When an order exceeds €10,000, flag it for manager approval." Same input, same output, every time. Best for well-defined processes where inputs are already structured (form fields, database entries, spreadsheet rows) and rules are clear.
Operations Research / Mathematical Optimization Finding the mathematically best answer given constraints. Route planning, shift scheduling, inventory allocation. Old-school (1950s+), battle-tested, provably optimal.
Classical Statistics / Regression Finding patterns in historical data to predict numbers. Demand forecasting, pricing, trend analysis. Highly explainable ("sales rise 12% when temperature drops below 10°C"). Your finance team can audit it.
Classical Machine Learning Algorithms that learn from labeled examples. Show it 10,000 "fraud" and "not fraud" transactions, it learns to spot the difference. Good for classification and segmentation. Garbage in, garbage out.
Deep Learning Neural networks. Especially powerful for images, audio, and sensor data—visual inspection, speech recognition, anomaly detection. Needs lots of data. Less explainable.
Traditional NLP Purpose-built systems for specific language tasks—extracting names, dates, amounts from documents. Can be more reliable and efficient than LLMs for narrow, well-defined extraction.
Large Language Models (LLMs) The ChatGPT/Claude category. General-purpose reasoning about text. Summarization, drafting, Q&A, flexible extraction, translation. Probabilistic—same input can give different outputs. Can hallucinate. Best with human review.
Agentic AI LLMs with tools and autonomy to execute multi-step workflows. The newest category. Powerful, but still earning its stripes in production.
These tools sit on a spectrum from deterministic to probabilistic, narrow to general. The right choice depends on your problem. Sometimes that's an LLM. Sometimes it's a simple optimization model or plain automation. Knowing the landscape helps you ask better questions.
Tech isn’t a business goal
Seen on LinkedIn:
Client: I need AI
Junior AI Engineer: Cool! Let's build a multi-agent system.
Senior AI Engineer: What's your business goal?
Client: Query our SQL database in plain English.
Junior: Okay, multi-agent is the way to go!
Senior: One LLM + Schema will do.
The point: don't over-engineer. Ask about business goals first. So far so good.
But "query our SQL database in plain English" isn't a business goal. It's a technical capability that may or may not enable one.
The client has self-diagnosed (fine—it's their business) and then self-prescribed (dangerous when the remedy is outside their expertise). Our job is to ask deeper questions. What's the real problem behind the presenting problem? What's the business goal behind the ask for a technical capability?
Hypothetical continuation:
Client: Query our SQL database in plain English.
Senior: What would that unlock in your business?
Client: Our sales team wouldn't waste three hours each day waiting on data analysts to pull reports!
Senior: What sort of reports?
Client: Yesterday's sales data for each store, grouped by item category.
Senior: That's a simple dashboard. No AI required. We'll have it ready by tomorrow.
But what if instead:
Senior: What would that unlock in your business?
Client: Regional managers need to cross-reference sales, inventory, supplier lead times, and support tickets to make judgment calls. Every situation is different—last week someone had to figure out which stores could absorb inventory from a closing location.
Senior: Unpredictable queries across multiple domains. Multi-agent might actually be the right call here.
Simple isn't always best. The only way to know is to keep asking questions—especially business questions.
Secondary Concerns, Primary Mistakes
Here's a funny thing that happens when multiple stakeholders are involved in a purchasing decision. There's an overarching goal—solving a problem, enabling a capability. Then there's a list of nice-to-haves, ways to weigh options against each other. Users want low friction and easy onboarding. Finance wants low cost. IT wants low maintenance burden.
What I've seen happen: the secondary desires override the primary goal. No matter how well a solution satisfies the nice-to-haves, if it doesn't move the needle on the actual goal, it shouldn't be in the running.
With AI initiatives this takes forms like:
Choosing a "free" open-source model to avoid per-token costs, even though it hallucinates too often to actually automate the target workflow
Going with on-premise deployment to satisfy security requirements, when the on-premise version lacks the capabilities that made the cloud version worth considering
IT insisting on the enterprise vendor with a support contract, even though the team's pilot showed it handles their edge cases half as well as the scrappier alternative
These concerns—cost, security, maintainability—aren't invalid. They're just not primary.
If It's Good, Why Do You Need to Manage the Change?
Some of the bigger consulting companies heavily discount the implementation phase of a project only to rake it in on "change management" contracts. Which raises a question: If something works, is awesome, and noticeably makes your employees' lives easier, why does that change need to be managed?
Granted, some nuances in a company-wide rollout matter: Who maintains it? What are the right usage metrics? How do we measure ROI? But if the process becomes heavy, it hints at bigger problems.
Does the solution actually solve the problem people have?
Or does it solve the problem executives think people have?
Is the UX so bad it creates friction — which hinders adoption, which requires heavy-handed mandates around usage?
People saw right away that email was faster than a paper memo via the office mail cart. No six-figure consulting contract required. ChatGPT went to hundreds of millions of active users practically overnight. People like using it so much that companies now worry about "Shadow AI" — employees using these tools in unauthorized ways. They like it enough to take that risk.
This is the experience of a startup before and after product-market fit: Before, you push push push. After, you feel pull.
Change management should start from a place where people genuinely want the change and need help making it happen faster, better, or smoother — not from a place where they need to be convinced.
If you have to convince people to use it, ask whether you built the right thing.
Using AI For Feedback without Fooling Yourself
AI expert Andrej Karpathy recently wrote this on X:
Don't think of LLMs as entities but as simulators. For example, when exploring a topic, don't ask: "What do you think about xyz"? There is no "you". Next time try: "What would be a good group of people to explore xyz? What would they say?"
This becomes especially important when you want AI to explore something where opinion and nuance matter—and where it'd be easy to fool yourself:
Is this a good business idea?
Should we go for PostgreSQL or MongoDB?
Tailwind or Bootstrap? Rust or C++? Dogs or cats?
If you put the question directly to ChatGPT or Claude, they'll answer with whatever personality the training data happened to produce. Instead, try:
"What would be a good group of people to critique this business idea? What holes would they poke in my thesis?"
"What group of practitioners should chime in on our choice of database?"
"Who would have a nuanced take on which CSS framework, systems programming language, or pet to choose?"
This isn't far from classic prompting advice about adopting personas—with the twist of asking the AI to generate the list of personas first.
One caveat: even these answers are simulations. Taking the business idea example—yes, it's better than assuming you know what people would think. But if the AI's simulated experts love your idea, that doesn't mean real experts would. The AI is guessing what a VC or a seasoned operator might say, based on patterns in text. It has no actual model of your market, your timing, or the dozen ways your thesis could be wrong that haven't made it into training data.
Simulated criticism is a useful stress test. It's not validation.
To Code or Not to Code
My LinkedIn feed used to be full of screenshots showing complex workflows in Make, n8n, Pipedream—all those "no code" builders where you click and drag automations like "When an email arrives with this label, use GPT-5 to summarize it and send it to this WhatsApp group."
I'm torn. For straightforward workflows, they're a beautiful entry into process automation, especially for people without programming skills. But when I see a screenshot with several dozen nodes and branches? Maintenance nightmare. Good luck changing the business logic. Good luck testing edge cases. Good luck debugging when something breaks. What starts simple becomes a ball of chaos.
They work well for simple integrations that let your software tools talk to each other in novel ways. Infused with AI to extract or reformat information, they can be useful. But as soon as you're handling complex business logic, write code.
There's another problem: because these builders are graphical, not text-based, you can't use coding agents to help you. Yes, they all come with their own "Agentic Workflow Builder AI" where you describe what you want and it builds it for you. But you don't get Claude Code or Codex. And whatever you build is locked to that platform. Want to switch from n8n to Pipedream? Good luck. Your Python code runs just as happily on AWS as on Azure or Google Cloud.
If these tools offer a quick path to an initial product for a client whose business case doesn't warrant custom development, I'll use them. Otherwise, plain code wins.
What Stays The Same
Has the pace of AI news got your head spinning? Does it feel impossible to make decisions because of all that uncertainty?
Here's an alternative frame, courtesy of Jeff Bezos:
I very frequently get the question: 'What's going to change in the next 10 years?' ... I almost never get the question: 'What's not going to change in the next 10 years?' And I submit to you that that second question is actually the more important of the two — because you can build a business strategy around the things that are stable in time.
For Amazon, that's low prices and fast delivery. No matter what else changes, customers will always want those things. Any investment there is money well spent.
Every business has these timeless truths. They'll be different from Bezos's. If you make ultra high-end luxury watches, your customers care about exclusivity, not speed. But they'll be durable.
Once you've identified yours, it does two things for you. First, it offers a lens to interpret new technology: Does this help my enduring strategy? If not, safe to ignore. Second, it lets you view new technology as opportunity instead of threat.
Consider how the music industry thought about MP3s: "We're in the business of selling CDs. Downloads threaten CD sales. Fight them!"
If they'd focused on what stays the same: "We're in the business of selling access to music. Downloads are even more convenient than CDs. Get in on that!"
We ended up there anyway, but with a decade wasted on pointless fights against Napster.
If you're feeling whiplash from the AI news cycle, this might be worth sitting with for a bit. What are the timeless truths in your business? And does AI threaten them, or help you deliver on them even better?
Hit reply and let me know. I'm genuinely curious what you come up with.
Engineering with Tolerance
Engineers dealing with the physical world have always worked with tolerances. The diameter of a screw, the length of a steel beam, the conductivity of copper wire—none are ever exact. Instead, manufacturers quote values as "5mm ± 1%." It's then the engineer's job to design systems that function despite inexact inputs.
In traditional software, we don't worry about tolerances. Sure, there are floating point issues (0.1 + 0.2 = 0.30000004), but generally you issue commands and the computer executes them to the letter. Even with traditional machine learning, a trained model is deterministic. The outcome will be in exactly the format you desire. With generative models, that all goes out the window.
An example:
Traditional spam filter: Train a classifier to read an email and output 0 or 1. Zero means "not spam," one means "spam."
LLM spam filter: Send the email to the model with a prompt: "Have a look at the following email and tell me whether it's spam. Answer with a single word—'yes' if spam, 'no' if not."
Here's the tradeoff between determinism and expressive power:
The traditional filter always produces correctly formatted output. 0 or 1, nothing else. Whether those labels are accurate is a separate question, but plugging them into the larger email service is trivial.
The LLM might be far more sensible in its classification, with stronger understanding of semantics. But there's a non-trivial chance the output is neither "yes" nor "no" but something like "Yes, that definitely looks like spam," or worse: "No. Would you like me to summarize the email?"
LLMs have gotten better at following instructions, but there's currently no way to hardcode formal requirements. They're guidelines. So what do we do?
Post-process: If an answer doesn't fit your schema, add a cheap post-processing step. In the example above, look for "yes" or "no" and ignore the rest.
Use tool calls for the final answer: Instead of having the spam-filter LLM spit out the answer directly, tell it to call a tool with either
markSpamormarkNotSpam. It's a bit of a hack, but current LLMs are strongly optimized for tool calling—more than following vague free-form instructions.Accept wide, output narrow: Design systems that accept a range of input formats but follow a narrowly defined output schema.
Physical engineers learned this a century ago: design for the tolerance, not the spec
What the CTO Should Have Said
Saw this joke the other day:
VP of Sales: "Every dollar I spend returns three."
VP of Marketing: "Same."
CTO: "Well, without us, there'd be nothing to sell!"
Now it's obvious that a software company that doesn't have software doesn't have anything to sell. But obvious things aren't helpful, so let's dig in deeper. When put on the spot to explain what their return on investment is, the CTO might feel justified pointing out the obvious. But that doesn’t help us answer important questions for the business:
If we spend good money on building this feature for our product, will it lead to more business?
If we hire another couple engineers, will their contributions create enough revenue to cover their salaries (and more)?
How much, if anything, should we invest in tools that make our developers more effective? How much is a 10% speedup worth?
These are important questions, but hard to answer. One reason is that the efforts of the software engineering department take a long time to bear fruit. Say you add a new feature to your software product. Now what? You’ll have to measure:
How many more people signed up because of that feature?
How many people didn’t cancel because of that feature?
How does the presence of that feature change the ROI multiple for sales and marketing, as in, “Now that we have Feature X, every dollar I spend returns four instead of three.”
The last part is worth emphasizing. It adds nuance and quantification to the CTO’s objection: How much easier do the product features make it for sales and marketing to do their jobs? And not just raw number of features, but also their quality. That’s why some products are flying off the shelves and others need gargantuan advertising budgets and an aggressive salesforce.
And that in turn brings us to the lesson that sales, marketing and engineering make no sense in isolation, where the engineers cook up features based on a hunch and now marketing and sales have to figure out how to turn that into revenue. Or, conversely, where marketing and sales make unsubstantiated promises to generate buzz and close deals and now engineering has to figure out how to make it a reality. That’s just local optimization rearing its ugly head again.
So in the end, what should the CTO have answered?
“I make those three dollars turn into 4.”
Build vs Buy Revisited
A story has been going viral on Reddit and LinkedIn about the owner of a small software company who lost their biggest client because the client just built a (worse) version of the software product themselves. Without knowing the details, it’s hard to say whether that client was smart (saving the recurring subscription costs) or foolish (paying much more down the road for maintenance and a worse version of the product). But a few things are worth noting
AI-assisted coding in the right hands can certainly shift the raw calculation on build versus buy. A bit of upfront cost for the time of your in-house developers gets you perpetual savings on subscription (just don’t forget to account for hosting costs)
If the tool sees use, it will see bugs and feature requests. Now your engineers are doing double-duty.
Your company should focus all its effort on its zone of genius: The thing it is uniquely qualified and positioned to do. AI might move certain tasks from your zone of competence to your zone of excellence, meaning it’s now very easy. But that still doesn’t mean it’s what your team should be spending time and effort on.
The SaaS company whose product you want to replicate via AI has access to AI, too, and since for them, the product is their zone of genius, you can bet that they’ll continue improving it at a pace that your small internal effort cannot match.
Remember opportunity costs. Is rebuilding it really the best thing your team could be doing with their time? If the answer is yes, you might ask yourself why that is, and why you believe doubling down on your core strengths would yield diminishing returns!
In the end, you need to know exactly why you’re building: Build when it's core to your differentiation. Buy when it's not. AI doesn't change that. It just tempts you to forget it.
The Slaves of Calvisius Sabinus
The ancient Roman philosopher Seneca tells the story of a wealthy Roman named Calvisius Sabinus who wanted to appear educated but kept mixing up his references. Odysseus instead of Achilles, and stuff like that. His solution? Buy special-purpose slaves. One trained in Homer, one in Hesiod, one for each poet. At dinner parties, they'd feed him lines to repeat.
It didn't work. One guest quipped that Sabinus should take up wrestling. When Sabinus protested he was too frail, the guest replied: "Maybe, but consider how many healthy slaves you have."
I found this in Ryan Holiday's recent book Wisdom Takes Work (original source). Surprisingly, Holiday makes no mention of AI. But isn't this exactly how people come across when they outsource all their thinking to it? If you post something sharp but it's pure AI, what exactly did you provide?
This is different from using AI as support rather than substitute. Nobody would have mocked Sabinus for using learned slaves as tutors. The distinction matters: Are you building capability, or just renting the appearance of it?
The 24x Scope Creep
Australia's Bureau of Meteorology just launched a new website. Budget: $4 million. Final cost: $96.5 million.
The technology worked fine. The problem was simpler: nobody properly understood how people actually use weather data. The previous site let users see the path a storm had taken and when it would arrive. Emergency services relied on this. That feature disappeared. The new radar color scale made storms look less severe than they were. Also not ideal.
And how did costs balloon 24x? The usual way. The $4 million figure was just the front-end redesign, with the real expenses buried in "backend infrastructure."
This is how many technology projects fail. Bad requirements and disregard for the end users don't add cost linearly. They compound. An extra month defining requirements properly costs far less than a year of scope extensions fixing a system that solves the wrong problem.
Don’t not hire junior devs, either
Apologies for the double-negative ;) But there’s a missing piece to yesterday’s article on hiring devs. The lesson of that piece was that you can’t get around requiring some level of senior technical capability when you’re starting anything technical, whether that’s a new business or an internal improvement initiative. The lesson was not that you shouldn’t have any junior developers on your team, which is the other extreme of bad advice: “Now that AI is basically like a very eager junior, you can just fire all your junior devs, and hire an additional AI-augmented senior dev.”
Such advice is equally wrong. A healthy team has a healthy mix of all sorts of backgrounds, both in technical and lived experience. I could try to write a long and eloquent piece about this, but others have already articulated much better than I could hope to.
One is Charity Majors, Cofounder and CTO of the observability platform honeycomb.io and great writer with a fun style. Check out her piece from 2024 about it: https://charity.wtf/tag/junior-engineers/
Alex Jukes, Fractional CTO and fellow daily email writer, has a whole series of posts on it: https://engineeringharmony.substack.com/p/where-will-the-engineers-of-tomorrow
It seems to me that both ends of the spectrum come from a miscalculation:
“I’ll save the money from a senior engineer and just hire a bunch of juniors; who needs seniors with their complicated architecture anyway?”
“I’ll save the money from a bunch of juniors and just let AI + a senior loose; who needs juniors when you have Claude?”
There is a way to save money, and that’s by not wasting it thrashing around with an approach that leads nowhere.
Hire Senior Devs. Really.
There’s a piece of startup advice floating around LinkedIn along the lines of, “Don’t hire senior devs; it’s an expensive mistake because they’ll over-engineer everything instead of shipping fast.” The thought is that these senior devs get stuck on arcane details, make unreasonable demands on the purity of the codebase, build complexity that’s overkill for your company and endlessly debate architecture choices instead of building products for your customers.
Add that to the pile of confusing or outright wrong advice that non-technical founders are subject to.
While it’s true that in the early stages of your venture, over-engineering is lethal, I have to wonder where people find all these over-engineering seniors. If you get a senior developer who is senior by skill and not just by “years in front of a computer”, they will have gained the business sense to ask the right questions:
What are we building?
Why are we building it?
For whom are we building it?
What are hard constraints, what are soft constraints?
If a developer asks these questions of an early-stage startup and comes up with “well obviously we need a fleet of 42 microservices that our 4-person team will have to build, maintain, deploy, and orchestrate”, they’re not a senior, they’re a junior who picked up the right buzzwords to impress in an interview.
Maybe that’s what’s really prompting these posts: People who thought they hired a senior but got someone with no sense of how business needs drive engineering decisions. However, deciding to only hire junior developers is the wrong response to that. The junior may or may not overcomplicate your architecture. But if they keep it exceedingly simple, it’s not because they made an informed trade-off; it’s because they don’t know any other way.
You can't ship fast by avoiding experience; you ship fast by working with people who know which corners to cut.
They’re not hallucinations…
There’s something about the term “hallucination” when applied to a large language model’s wrong but confident answer that disagrees with me. Let’s unpack.
If a person hallucinates, they’re in an altered mental state. Maybe from drugs. Maybe from hunger, thirst, or sleep deprivation. Maybe from a traumatic brain injury. It’s a disruption to the normal workings of their mind that causes them to think or hear or see something that’s not there.
If an LLM hallucinates, it’s not at all due to damage or tampering with their internal structure. When ChatGPT confidently mentions books that don’t exist, for example, it’s not because someone took a wrench to OpenAI’s server banks or let a virus loose on their code.
Here’s a better analogy. Imagine you’re in English class, called on by the teacher to state your opinion about a Shakespearean sonnet you were supposed to read. You didn’t do the reading, so you just say: “Ah yes, I really liked it; I liked how it felt meaningful without ever quite saying why, like Shakespeare was hinting at something emotional that I was definitely almost understanding the whole time.” That’s not a hallucination, it’s a plausible-sounding answer to the teacher’s question. A non-answer, to be exact, because it’s not grounded in reading we should have done.
It might sound nitpicky to obsess over terminology, but the mental models and analogies we use inform how we think deeper about things. The “hallucination” view implies a temporary deficiency that we can overcome or avoid, whereas the “non-answer” view implies that we get such non-answers every time the model is out of its depth, like the student who didn’t do the assigned reading.
With that mental model, the way to avoid, or at least catch, non-answers is to pose questions in such a way that non-answers are not a plausible way to continue our exchange. Part of that is prompt and context engineering:
Don’t assume that a model knows facts. That’s how you end up hearing about books that don’t actually exist.
Include relevant content directly in the prompt OR
Provide access to a trusted knowledge base via tools such as the model context protocol (MCP) or retrieval-augmented generation (RAG))
Offer a graceful backdown. LLM-based chatbots are trained to be helpful, so “I don’t know” does not come naturally for them.
We don’t have to get ChatGPT off LSD or shrooms to get correct answers; we have to know what questions even make sense to ask, and what context to provide.
Why Automation ROI Looks Worse Than It Actually Is
I’m in a business-y mood this week, so here’s another piece of the puzzle that sometimes gets overlooked. This time, it’s a mistake that pushes us into not going for an AI project even if it would make total sense.
The mistake? Looking at the ROI of time saved purely through the lens of salary fraction. Let’s look at an example with simple numbers so we don’t get distracted by math.
Let’s say you’re a company with a viable business model and you have good economic sense
You pay someone a $100k annual salary
If your company makes any sense, that person must create annual value in excess of their annual salary. Multipliers vary by role and industry. Let’s use 2x for easy math: A $100k annual salary gives you $200k of economic benefit
Now let’s assume that part of their current job is an annoying menial administrative task that, for some reason, only they can do, even though it isn’t part of their true value-creating activity. Let’s assume that this takes up a quarter of their working hours:
25% of their work goes towards something that creates no direct value
Only 75% of their work goes towards the value creation.
That means they only create $150k of economic benefit to the company (2x value multiplier with a 25% penalty multiplier)
Next, we imagine that we could wave a magic wand (AI-powered, no doubt) to make the annoying task go away. How much should that be worth to us?
The simplistic calculation says: 25% of their time costs us 25% of $100k, so that’s $25k.
The better calculation says: 25% of their maximum value creation potential is 25% of $200k, so that’s $50k.
So with these simple numbers, we see that the true ROI of business process automation can be much higher than pure salary cost.
Caveats
These gains can of course only be realized if the worker actually has something better, high-value, to do with the freed up time. For today’s knowledge workers that’s almost certainly true, but needs to be taken into account.
Can the rest of the system absorb their increased productive output (see yesterday’s post on contraints)
The difference between a “cost savings” versus “value unlock” ROI calculation can be big. Miss this distinction and you’ll systematically underinvest in automation that would actually move the needle.
