Click the Subscribe button to sign up for regular insights on doing AI initiatives right.
Beware Pre-AI Value Extrapolation
The other day, a post of mine featured a Simpsons-style cartoon of me hiking, courtesy of ChatGPT's new image generation feature.
On the other hand, the first search result on "What would it cost for a graphic designer to turn my photo into a cartoon" linked to someone's Etsy shop where they offer this service for roughly $30.
So, did I just a) save myself $30 and b) cheat someone out of their $30?
No, and no.
If the only way for me to get a Simpsons-style photo was to pay someone $30, it wouldn't have been worth it. I would only bother making this cartoon if it was free.
However, others care enough about turning their photos into cartoons to pay $30 or that seller on Etsy would have no business.
Here's the critical part: Armed with ChatGPT, I could now set up a shop on Etsy and sell cartoonification for $30 a pop at almost no cost. Pure profit if you subtract the little time it would cost me to load their picture into ChatGPT and enter a simple prompt. So, a million-dollar business right here? I could even undercut the seller a bit, charging only $20 per cartoon.
I hope this example makes it painfully obvious why that won't work. As of last week, everyone who wants their photo cartoonified can now do so at a ridiculously low price point. That seller on Etsy will have to figure out something new and quick. But everyone who thinks setting up some automation in Zapier that goes from customer order to ChatGPT and back would net them $30 (or anywhere close to that) is deluding themselves.
If you have a goose that lays golden eggs, you're rich. If everyone has a goose that lays golden eggs, nobody cares about gold any longer.
In slightly more subtle ways, this will play out for many products and services. So if you build a quick-and-dirty AI automation that does X, and X used to cost $Y, chances are you can't sell X for $Y.
Evolutionary Design for AI Systems
In an earlier post, we saw how complex systems come from simple systems. This practice of evolutionary design is a cornerstone of agile technical practices. Start so simple that it's almost embarrassing and add complexity slowly, one step at a time.
In typical software, that means ditching features and particular execution paths early on, but in AI, that doesn't quite work. If you're building a plant identification app, there's not much sense in deciding to start with support for only two types of plants. It doesn't simplify the tasks of model training and deployment. Plus, chances are you'd have to significantly change the model architecture and training parameters at each step, throwing away the previous work.
So, what do we do instead? We apply evolutionary design to our machine learning infrastructure and don't initially worry about releasable "MVPs." This approach is recommended in Google's Rules of Machine Learning and Andrej Karpathy's great Recipe for Training Neural Networks.
The first step, the evolutionary seed, is setting up the end-to-end training and evaluation skeleton, plus picking a "dumb" baseline
We run a number of sanity checks to make sure things are set up correctly
We gradually increase the model's complexity until we hit diminishing returns
The mental picture I have of this process is that, in traditional evolutionary design, you're assembling small Lego pieces into a complex whole, whereas for designing an AI system, you're slowly bringing a blurry image into sharp focus.
Principles, Practices, Problems
In the last few posts, we explored the issues standard agile practices have with AI projects.
This is an example of a more general phenomenon. It goes like this:
We start with entirely reasonable principles.
We apply these principles to a concrete situation, yielding entirely reasonable practices.
Because practices are more concrete than principles, they spread more readily.
Reasonable-Practice™ Certified Coaches and Consultants spread the practices far and wide.
Applied to different situations, the practices are entirely unreasonable, but the certified coaches will tell you that you're just not doing it right.
In the justified backlash against unreasonable practices, the reasonable principles get thrown out.
You can see this with a lot of the advice you read online, where a version of Newton's Law states that each piece of advice has an equally viable opposite piece of advice. A few examples beyond agile:
Funding
The principle: Your company needs money to fund itself
The practice (for a consumer-facing company during times of low interest rates): Raise as much venture capital as possible as early as possible and grow as big as possible as fast as possible.
Where it makes no sense: A company building a straightforward business product with paying customers that can fund growth entirely through profits.
Productivity
The principle: We want to have a sense of how productive our workforce is
The practice (on the factory floor): Count widgets produced per worker per unit time
Where it makes no sense: Measuring developer productivity by lines of code written per unit time
Hiring
The principle: We want someone who will thrive in their role
The practice (for a big tech company filling a particular niche role doing one particular thing): Insist on 5+ years of experience in exactly the tech stack the company uses
Where it makes no sense: Hiring for a role in a startup where, in all likelihood, the tech stack will have changed before you've even finished the hiring process
Quite often, when confronted with practices that obviously (to you) make no sense, there's a good chance they made total sense, just in a different context. Why not try to back out a bit until you uncover the principles and then re-apply those?
Market Risk vs Product Risk
To follow up on why standard Agile practices don't quite work for AI, one observation is that Agile is good at managing market risk, but AI also has a lot of product risk.
Market Risk: Will anyone buy the damn thing?
Product Risk: Can we even build the damn thing?
Pure Market Risk
Much of the software built over the last few decades had pure market risk. The pressing question was never whether it was technically feasible—some interesting engineering challenges at scale notwithstanding—to build the thing but whether anyone would care enough to use it. This is where Agile methodologies and the Lean Startup movement shine, with their Minimum Viable Products and incremental iterations.
In short, If you only have market risk, iterate away as fast as you can.
Pure Product Risk
Let's imagine the other extreme. If you're a biotech company working on a cure for just about everything, you don't need focus groups, usability studies, or small-scale pilots. You have a tough road ahead of you, but you know there'll be an eager market once you get there. Defining intermediate goals and milestones still makes sense, and you can have successful releases of these intermediate products. Still, the micro-iterations of the Lean and Agile camps are a distraction.
In short: If you only have product risk, don't bother with quick customer-facing iterations.
Why not both?
And that brings us to AI products, which combine both risks.
Can you even build the thing? Are current models sufficiently capable, or can you train your own and achieve the required accuracy?
Once done, will people care about the AI-enabled product?
In such a situation, we're prevented from iterating rapidly. However, we can still use the general ideas from Agile and Lean and make deliberate bets and experiments to validate or invalidate our hypothesis. We just have to accept that the risk with each bet is higher. The increased product risk forces us to move slower, which, in turn, limits how quickly we can eliminate market risk.
Iteration with a Long-Term View
So, rather than iterating and pivoting mindlessly, the way forward with AI products is to have a compelling long-term view (a vision!) of what you want to establish. If it is sufficiently bold, this vision can help mitigate some market risks. It should be possible to have an educated opinion about whether people want to buy the result of this bigger vision. In the meantime, the vision helps guide the—now somewhat slower—product iterations as we move the product from one intermediate milestone to the next.
In short, If you have both product and market risk, slow down the cycle of iterations so you can make meaningful progress before each customer-facing release.
Style Transfer: Solved
Interrupting our regular coverage for what's been lighting up the internet the last few days. Both Google and OpenAI have released multi-modal models that, for the first time, integrate image generation and image editing right into the language model itself. This is a big deal because now, for the first time, the model that's generating the images has a solid understanding of the user's intent. One of the most obvious use cases here is that of style transfer: "Turn this photo into a Simpsons cartoon" or "Make this a Van Gough painting."
A brief history of style transfer
Repurposed Image Recognition Models
The first widely publicized algorithm for style transfer was published almost ten years ago. Gatys et al. recognized that, in an image recognition model, the first few layers mostly recognize style, and the later layers mostly recognize content. By feeding a content image (the photo of me) and a style image (Van Gough's Starry Night) through the model, the algorithm then optimizes for an output image whose activations in the style layers match the style image and the activations in the content layers match the content image.
Clever as it is, this algorithm and subsequent works were mostly good at matching brush strokes of various art styles but not so much the broader artistic implications, like Picasso's way of messing with perspective or Dali's "everything melts" style.
Diffusion Models
Without getting too technical, these models (DALL.E, Stable Diffusion, Midjourney, etc.) use a reverse diffusion process to start from pure noise and slowly generate a target image, guided by a text prompt. Early models famously had huge problems adhering to prompts (and let's not forget the horrific way they generated hands and fingers). They were great at applying styles to a target prompt: "A bored cat in the style of The Girl with the Pearl Earring." and so on. Instead of a text prompt, they could also be prompted with a source image and glean style from there.
They could also be fine-tuned on your images and after that time-consuming and finicky process, you could ask for yourself as, say, a Lego mini figure.
However, they were still bad at:
Text generation
Prompt adherence
Image editing
In simple terms, these models do not have a genuine concept of content, just a vague sense that the text and image embeddings are close to each other.
Multi-modal Models
By baking image generation right into their Large Multimodal Model (LMMM?), both Gemini and GPT4.o have access to all their text-based world knowledge and what they have learned about how objects relate to each other. These models know that the Simpsons have yellow skin and an overbite. They can move parts of the image around while keeping them consistent overall in a way the previous generative models couldn't because they were too focused on individual pixels.
Cheers, and enjoy the weekend!
Agile doesn’t work for AI
Oh, them's fighting words. Or maybe you're in the camp that would say, "Well, duh, Agile doesn't work at all."
But not so fast. The higher principles behind Agile very much apply. Avoid waste. Take small, safe steps. Have tight feedback loops to make sure you're building the right thing.
Where things break down is when rigidly codified "Agile" practices—which might make total sense when applied to standard software development—are applied without modification to AI projects.
Some examples:
AI development is much closer to research than development, so everything's a spike in Scrum terms. You might as well not bother, then.
A large feature does not intuitively break down into smaller pieces. Case in point: to build an image recognition model, you don't start with a model that can recognize one category and then slowly add additional categories. In fact, it's often the opposite: You start with a model that can recognize hundreds of categories, then you throw those all out and replace and fine-tune for those you particularly care about.
There are non-intuitive discontinuities in mapping a user story to the required effort. Minor changes to the requirements turn "sure thing, give me an afternoon" to "uhm, give me millions in VC and five years" (Hat tip to XCKD, whose predicted timeline for image recognition turned out just about right).
Test-driven development (TDD), a fantastic practice for software development, does not productively apply to ML. Sure, you can TDD that your outward plumbing around the machine learning system is correct, but you can't TDD your way to, say, better performance on a relevant model eval.
So what are we going to do about it? I'd say the community at large is still figuring that out. I'm hoping to add my own thoughts to the discussion over time. It would be a shame if the solid principles were to get thrown out due to frustration with the concrete practices.
If it hurts, do it more often*
*At least for things that you should be doing anyway. Don't go stubbing your toe every hour ;)
This is quite obvious in some areas. If you go for a run once a month, you'll be sore for days afterward, but if you go for a run every other day, you'll do just fine. Or, as I try to tell my kids, tidying up once a day is no big deal versus letting the mess pile up for weeks.
In software engineering, releasing a new version to customers once every six months is a big, fraught, painful process where everything has to go right. With continuous deployment, releasing six times per day is a non-event.
The same is true for many things at many scales:
Integrating your code changes with those of your colleagues. It's a big pain with lots of conflicts to resolve if done every few days and it's a trivial exercise if it's done hourly.
Annual planning. So much uncertainty, so much handwringing about which of the many possible futures will come to pass. Much easier to keep the detailed planning for the shorter timescales, in the spirit of Lean and Agile.
Going meta: Writing a long monthly or even just weekly newsletter is a dreadful thought. It better be of the highest quality, jam-packed with top-notch insight, with a carefully chosen topic. Writing every day takes all that hassle and pressure off.
In a way, this is a corollary of the idea of Exponential Shrinking. If you can get away with something much smaller, do that. If the overall quantity can't be reduced, slice it up and deliver more frequently.
What's something painful or annoying that you ought to do? What would happen if you upped the frequency?
Wilderness First Aid, or The Pull to Complexity
Even though we understand and accept that simplicity is better, we repeatedly end up with complex solutions. Why?
It's tempting to use what we know: I once took a comprehensive Wilderness First Aid Course, and for the next couple of outings, a little voice in my head said, "If only someone would sprain their ankle right now. I know exactly how to tape it based on the direction of the sprain. I'd be a hero." These are horrible thoughts, but we all crave to be competent and demonstrate that competence.
So, we read about strategies, processes, and design patterns and can't wait to use them. Resisting this pull goes against our nature.
And if it's not our desire to appear competent that pulls us towards complexity, it's our fear of appearing incompetent in front of peers, bosses, or clients. If we propose a simple solution, won't they think we are simple?
The antidote is a mindset shift from the baroque aesthetic of "more is more" to the minimalist aesthetic of "how little can we get away with?" Take pride in expressing things simply, in finding clarity, in discovering the connection that cuts through layers of complexity.
And alleviate your fear of appearing simple. The right people will appreciate it when you, the expert, give them a simple solution that works.
Vibe Cooking
In the past, I've successfully used GenAI to develop recipes based on available ingredients and what I was in the mood for. The instructions generally made sense, and the result was decent.
So, naturally, we put on our hype goggles and extrapolate to... Vibe Cooking.
Upload a picture of your fridge and pantry to the AI
Ask it what you should cook
Taste in between steps and ask the AI for adjustments
Rinse and repeat
Post on social media that chefs are going out of business and how you'll open a Michelin-star restaurant despite having no professional cooking experience 👩🍳
(Optional: Learn the hard way that running a restaurant involves more than cooking random stuff on a whim 🤷♂️)
And don't stop there. Other professions are bound to get vibed, too. Here are five more.
Vibe architecture. Who needs architects when the AI can spit out blueprints and work breakdown structures, then write emails to coordinate the contractors?
Vibe medicine. (WebMD on steroids)
Vibe law. "Just a second, your honour, I'm feeding the opposing counsel's last remark into my AI..."
Vibe accounting. AI does your taxes. Getting audited? Feed the angry letter right back into the AI.
Vibe engineering: AI generates structural designs. If something collapses, simply input "bridge fell down" and ask for updated blueprints.
Happy Monday and happy vibing.
Comprehensive Tests Between Ecstasy and Agony
Only a short email as I'm deep in the weeds of debugging...
Throughout a big client project, a comprehensive suite of tests—both small-scale unit tests and more extensive integration/functional tests—has saved my bacon countless times and ultimately accelerated development. Good tests let you pinpoint exactly where something went wrong, like where a new feature messed up existing functionality. This is the ecstasy. You can confidently make big strides and sweeping improvements to your code all in the confidence that you have a solid safety net.
But today, I spent agonizing hours fighting with tests that work locally but don't work when running on the GitHub server, where the code gets checked before being integrated into the mainline. The issue is related to arcane details about what you can and cannot do with GitHub Actions, Docker, and network calls, and it's still not solved 🤷♂️.
The lesson: There's a tradeoff in everything, nothing is purely good, and nothing ever works the way you'd hope it would. Our job as software engineers is to find a satisfying path through these tradeoffs that lets us make steady progress.
In my current testing conundrum, I've decided that pragmatism beats purity. A simple "hack" lets me circumvent the issues, but it's not the purist's way.
Anyway, thanks for listening to my rant. Enjoy your weekend and Monday we're hopefully back to our regularly scheduled content :)
Training Wheels vs Balance Bikes
Here's another way to think about the pitfalls of "Vibe Coding" and relying too much on AI: Some assistive technologies are like training wheels, and others are like balance bikes.
As a millennial, I learned to bike the traditional way: tricycle -> bike with training wheels -> Bike. Progress is relatively slow because training wheels do nothing to develop balance, so once the training wheels are off, you're back to zero.
My kids learned biking the new way: balance bike -> bike. A balance bike is a bike without pedals. You push it around like a walker and then learn to roll with it. This is a much faster way to learn biking because the crucial part is learning how to keep your balance, not how to push the pedals.
And what does this all have to do with AI, and AI-assisted coding in general? Aimlessly throwing requests at the AI is like using training wheels. At the moment, it feels comfortable, but once the training wheels come off, you're left floundering.
In contrast, using AI to refine your thinking, validate your approach, generate ideas, etc., is similar to the balance bike approach: You acquire critical skills, so you don't bump into a skill ceiling. Let AI be a stepping stone, not a roadblock!
The Capability-Impact Gap
Writer and Computer Scientist Cal Newport pointed out something interesting in a recent episode of his podcast, Deep Questions:
On the one hand, AI's capabilities are evolving rapidly and frequently prove wrong the nay-sayers:
Oh, it can't do X
Two months later, it can indeed do X
On the other hand, AI's economic impact has been relatively muted so far.
Early on after ChatGPT was released, massive disruptions in every industry related to knowledge work were predicted. With very few exceptions (Chegg, a company that basically lets students cheat on their homework, saw a 99% decline in its stock price) that just hasn't happened.
Why is that? Cal explains that the current dominant paradigm of AI usage, posting questions into a chat box, does not lend itself to such massive disruption, and I agree.
In essence, the capabilities of AI are probably good enough, and what needs to happen now is a painful and slow period of finding that elusive product-market fit for a true killer-app.
One compelling near-term use case is to use AI to augment a user's capabilities. Cal uses the example of Microsoft Excel. Most casual users are not aware of its more powerful features. Lookups, pivot tables, scripting. By conversing with a built-in AI, users can unlock these features more readily than by reading tomes of documentation (especially when they wouldn't even know what to look for or how to translate a feature's plain specs into how it makes their work more manageable).
More generally speaking, thinking beyond the chat box paradigm and focusing on empowerment will be the way to go!
AI vs VAs
Here's a quick sounding board for your fantastic AI idea if it involves outsourcing human labour to AI: Why hasn't that labour been outsourced to cheap overseas virtual assistants (VAs) yet?
A big deal in productivity circles and among entrepreneurs about a decade ago (think Tim Ferris) was that you could hire assistants for cheap from low-wage countries. They can handle admin tasks, content creation, and turning your blog post into a tweet. All sorts of things GenAI could do for you, and likely even cheaper and without hallucinating. Yet while VAs are a growing market, they haven't been adopted universally. There are some failure modes around availability, miscommunication, and trust issues. Still, comparing and contrasting these approaches to "automating" onerous tasks is instructive.
Looking at your specific use case, there might be good reasons why an AI model would be a better fit, but it's not a given. A sweet spot for the AI would be:
A large volume of incoming requests that wouldn't be cost-effective for human assistants to handle, even from low-wage countries
The need to be available 24/7
Dealing with highly sensitive data
Sufficiently narrow tasks so that hallucinations aren't an issue
Straightforward application of specialist knowledge, such as code generation
If those conditions are met, you might be on to something. If not, you might want to do a bit more user/market research.
Vibe Coding: Programming by Coincidence
You may or may not have wondered why there were suddenly no emails for the last two weeks. It turns out that writing a daily email isn't quite feasible when you're down with pneumonia. Now that I'm a week into taking antibiotics, things are looking better.
So, anyway. Right now, it seems like everyone is talking about Vibe Coding:
Just ask the AI coding agent for what you want.
If the program spits out errors, feed those back to the agent.
Accept all changes suggested by the AI.
Rinse and repeat until it sort of works.
Programming by Coincidence
I couldn't help but remember a chapter from one of my favourite books, The Pragmatic Programmer, about Programming by Coincidence.
Quoting the intro paragraphs:
Suppose Fred is given a programming assignment. Fred types in some code, tries it, and it seems to work. Fred types in some more code, tries it, and it still seems to work. After several weeks of coding this way, the program suddenly stops working, and after hours of trying to fix it, he still doesn’t know why. Fred may well spend a significant amount of time chasing this piece of code around without ever being able to fix it. No matter what he does, it just doesn’t ever seem to work right.
Fred doesn’t know why the code is failing because he didn’t know why it worked in the first place. It seemed to work, given the limited “testing’’ that Fred did, but that was just a coincidence. Buoyed by false confidence, Fred charged ahead into oblivion. Now, most intelligent people may know someone like Fred, but we know better. We don’t rely on coincidences—do we?
What's true for manual coding is doubly true for AI-assisted coding. If you never understood why something worked, you're stuck the moment it goes awry.
It's okay to vibe code on some one-off personal-use tool, but certainly not for a moving target like a client-facing web app. Who cares if the AI gets you, say, 50% or 60% or even 80% there if the resulting code is of such low quality that finishing the remaining 20% is near impossible?
Wide and Narrow Thinking
Why are image-generation models fantastic at generating photorealistic content but hilariously bad at text?
And why can ChatGPT figure out challenging coding tasks but not reliably tell you whether 9.11 or 9.9 is larger?
It comes down to narrow versus wide thinking, in the loosest sense. If you ask Dalle or Midjourney for a renaissance painting of a cat dressed like Louis XIV, there are many paths the AI can take to get there. But if you ask it to add a text label, the space of acceptable outputs is vastly smaller.
The same applies to mathematical and logical reasoning. The path of acceptable steps to take is much smaller, and we're expecting quite a lot from an AI if it has to reconcile this very focused, narrow, discrete thinking with its more random, free-flowing nature.
Tools to the rescue
Specifically for language models, the most promising approach to fix this is using tools (like we've seen in the AI Agent case; just realize that there's nothing mystical about them). The "wide thinking" LLM will recognize when dealing with a math problem and can then defer to a calculator or write Python code to solve it. ChatGPT already does that, of course.
Over time, I would imagine more tools getting integrated with an LLM so that it can focus on what it's good at (wide thinking) and defer to tools for the things it's not good at, or where more precision and repeatability is desired (narrow thinking).
I could imagine a few such cases, like matching an AI code assistant with static analysis, automated refactorings and other goodies. Every industry and job will have its own set of narrow tools to enhance AI assistants' usefulness and reliability.
The monkey on a pedestal
I love this story about where to focus your efforts as related by Astro Teller: "Tackle the monkey first." Here’s the short version:
Imagine your boss gives you three months to figure out how to
Get a monkey to recite Shakespeare...
...while standing on a pedestal.
Then after a month or two, the boss checks in on progress, and you say: "Things are going great! Look at this amazing pedestal we made out of genuine Italian marble! Look at the intricate carvings!
That sounds hilarious, but there is a tendency for all of us to procrastinate on hairy, audacious, vague things because, well, they're hard. It's no fun making no progress. Working on the low-hanging fruit gives us a sense of forward motion. But all that effort will be wasted if the uncertain thing turns out to be impossible or more complicated than anticipated.
That's why, on a recent project, we consciously pushed all the easy things to the very end, even though we could have completed them quickly. We didn't want it to take away bandwidth on the uncertain parts. After all, who cares about the font sizes on your web app if it just doesn't work?
It's time to stop polishing the pedestal and start training the monkey.
On AI Snobs
The ongoing hype around generative AI has led to an influx of tech influencers and enthusiasts. This, in turn, has led to an influx of snobs and cynics who will shake their fists at those who dare claim AI expertise without advanced degrees and experience with statistical methods and "standard" machine learning.
These AI snobs will give you a long study plan of all the things you have to master before putting anything AI-related into your LinkedIn headline:
Linear algebra
Vector calculus
Statistics
Classical ML methods (support vector machines, logistic regression, k-means clustering)
Stochastic Gradient Descent and other optimization methods
and on and on.
I call nonsense. First, if it's all about foundations, why stop at the math parts? I'd like to demand that anyone using a computer first learn about the quantum properties of semiconductors. 😬
Second, the best way to achieve valuable outcomes is to take a top-down approach. Use whatever tools are available, and only if you encounter sharp edges will you spend the time and effort to go deeper. (Note: The courses on fast.ai are a masterclass in this principle.)
Concrete examples
No-code platforms to rig LLM workflows and agents together don't require any deep ML expertise. If those fit your bill, you won't need that deep expertise, AI snobs we damned.
On the other hand, if you are contemplating an AI project with an uncharted course and an unknown approach, it's helpful if you can rely on someone who has extensive experience with different techniques.
Even if it's not hallucinating, it's hallucinating
You may have noticed that, occasionally, your trusty AI assistant makes stuff up. I've had Github Copilot invent code libraries that don't exist, for example. And then there was that case of the lawyer who was leaning on ChatGPT to find some precedents and found out the hard (i.e. embarrassing) way that those were made up, too.
Those are called hallucinations and a lot of effort goes into reducing the rate of hallucinations in LLMs.
However, at their very core, all LLMs hallucinate everything. Let me explain (and for more detail, I highly recommend Andrej Karpathy's Deep Dive into LLMs. It's a three-hour video, so watch it over a couple lunch breaks...).
Before you can create a useful AI assistant like Claude or ChatGPT, you need a base model. That base model is a neural network trained to guess the next word (a token, actually, but let's keep it high-level) in the training dataset, which is basically the entire Internet.
For a given sequence of words, the base model returns a list of probabilities for possible follow-up words. These probabilities match the statistics of the training text. In short, we've got ourselves an internet text simulator. This base model hallucinates everything, all the time. All it does is answer the question, "How would a text that starts like this likely continue?"
All the work that's been done on top of this base model is about clever tricks that turn an internet text simulator into something useful:
Post-training with hand-curated examples of what a good answer looks like so that instead of an internet text simulator, we get a helpful assistant simulator
Reinforcement learning where human (or AI) critics provide feedback on the answers
Adding examples to the training set where the AI assistant is allowed to (supposed to, really) say "I don't know"
Enhancing the model through tool use (e.g. internet searches)
It's all about constructing sufficiently narrow reins that ensure the most probable way the underlying text would continue is actually something useful. Even if it's all just made up.
This is useful to keep in mind when evaluating potential LLM use cases.
Keeping up via known unknowns
Happy Monday!
There's never a shortage of new things we have to learn about. New model, new framework, new benchmark, new tool. It's a delicate balance. If we never bother to keep up, we'll get left behind. If all we do is try to keep up, we'll never get anything done.
Here's something I've stumbled on that works for me. It's based on Donald Rumsfeld's much-ridiculed classification of things we know (or don't know).
What we (don't) know
Known knowns: That's just the stuff you know.
Known unknowns: Stuff you don't know. But at least you know that you don't know.
Unknown unknowns: Stuff you don't know. And you don't even know that you don't know.
With unknown unknowns, you don't even know what the question is and that you could be asking it.
Making the unknown known
Loosely keeping up with recent developments means turning the unknown unknowns into known unknowns. The first step here is just to pay attention and keep your eyes open. You're most likely doing that already! At this level, the question is relatively shallow: "What even is XYZ?" What might prevent us from digging further, though, is the sense that we'd have to spend a lot of time to gain a deep understanding and that this isn't sustainable given the number of new things to explore.
But what if all we do is push a little bit deeper so that our questions are a bit more nuanced? This can easily be achieved by a bit of browsing and skimming:
For a new tool, idly browse the documentation.
For a framework or package, skip quickly through the tutorial.
For a research paper, skim the abstract and call it a day.
It's enough to come out of this experience with many new things you don't know. But at least you'll know that! It'll give your mind something to latch on and, in the future, notice when it becomes relevant to what you're doing. Then, when you have validation that diving deeper will be helpful, you can spend the time and feel good about it.
When AI really wanted to sell me more power drills
Here's another short example of where an "obvious" application of AI doesn't lead to good business outcomes:
I once bought a nice Bosch power drill from Amazon. For a long time after, Amazon would relentlessly push more power drills:
"Here are some power drills you might want to buy. Check out our deals on power drills. Hey, have you seen these latest power drills?"
But given that I had just bought my new drill, yet another drill would be among the least likely things I'd buy!
A typical recommendation algorithm, in straightforward terms, works like this:
Look at all the stuff user A has bought
Find other users whose purchase history is _similar_ to user A's.
Identify things those users have bought that user A hasn't yet bought and recommend those.
Of course, Amazon’s recommendation system is more complex than just what I've described (so-called collaborative filtering). Still, this misfire shows that even sophisticated AI can get things wrong. (They do apply more sophisticated content-based recommendations these days.)
This type of recommendation works great for books, CDs, movies: Categories with a wide range of items that can be sorted by genre or other matters of taste. In that regard, it mimics human recommendations: If I love historical fiction and you love historical fiction, we can share book recommendations, with additional purchases being likely.
However, collaborative filtering fails for categories like power tools or consumer electronics, where purchases are one-time and driven by need. If I buy a Bosch drill, I don't want another drill. I want things to help me get the most out of the one I just bought.
What Amazon should have been recommending:
Books on DIY projects
Bandaids 😬
A set of drillbits (square ones like we use here in Canada)
Instead of spamming me with drills, Amazon could have predicted what I needed next, turning a one-time sale into a series of useful purchases.
And to tie it all back to making AI work for you: Think all the way through to the intended outcome and don’t lazily stop short.