The Stakes Are High
Taking our last few posts (on AI, Iterations, Product/Market Risk, Evals) together, there's an important observation:
The stakes are higher to do things right from the beginning:
Evaluating an AI system is harder than testing a traditional software system.
With AI, it's less certain that what you want to build is even feasible than with traditional software.
Especially when doing your own model training, feedback loops are longer. You're more runout.
It's well known from traditional software that when testing is slow and painful, it's not done as much as it should. It's also known that, when feedback loops are long, there's higher risk of building something that doesn't meet the user's needs.
So, going back to our principles, successful AI Engineering requires strong discipline and a will to uphold these principles. The temptation will be strong to forge ahead without evals, without trying hard to shorten feedback loops and without keeping the design simple. But resist we must so that we don't design ourselves into dead ends and so we can deliver real value for the long term.