The cognitive bias known as the streetlight effect describes our desire as humans to look for clues where it is easiest to search, regardless of whether that’s where the answers are.
For decades in the software industry, we have focused on testing our applications under the reassuring streetlight of GitOps. It made sense in theory: Wait for changes to the codebase made by engineers, then trigger a re-test of your code. If your tests pass, you are good to go.
Yet, we have known for years that the vast (and growing) majority of changes to an application come from outside the codebase — think third-party libraries, open-source tools and microservices galore. Your application is no longer only comprised of code you or your team wrote and controlled.
GitOps was Already Broken: AI is the Final Nail in the Coffin
When your application breaks, why are you searching for the issue in your lines of code, knowing full well that most of your changes are coming from beyond the repository? This is our modern-day streetlight effect, and it is a recurring reality for all software teams.
However, AI is here to solve everything, right? Maybe, but the path to AI nirvana is littered with complexity. If we thought GitOps was the answer to all of life’s breaking changes, what do we do when models, training datasets and evaluations don’t fit into the repo? Whereas a few years ago, managing reliability and availability purely through version control systems (VCS) was an adorable but conceivable dream, now it is pure fantasy.
GitOps was already broken. Adding AI to the mix is the final nail in the coffin. The time has come, and we can’t pretend anymore: We must expand how we think about change in the world of software.
A Brave New World
Imagine you have built a cool, new feature using a large language model (LLM). You pick a model, call an API on Hugging Face or OpenAI, and then build a nice user interface. Voilá! You just created a chatbot that answers questions and enables a delightful customer experience. However, you will inevitably hit a wall as you quickly realize your new app is super expensive, un-testable and a recipe for on-call wakeups.
Why? Because the way we build, test and release AI-enabled software is different. It is one thing to experiment with how to bring AI into your product; it is another thing to implement it in production, at scale, and with confidence. What was a fun experiment is now just another thing that can bring down your product. How can you know that the model is changing? How can you ensure that the right answer is provided to your customers?
The stakes are high. Engineering organizations at this very moment are facing tremendous pressure from the market, leadership and their boards to ‘smartify’ all experiences in the product using GenAI. For the first time in a long time, the business side of the house is an equal partner in driving technology adoption.
This wasn’t the case with other recent technological waves like CI/CD, Kubernetes or Docker. These were engineering practices and tools that made it easier to drive business outcomes. Now, business folks are catching the magic and putting pressure on their teams to jump in.
And once they do? Engineering teams adding AI to their production-scale applications are about to hit a wall. They will take one step and realize they cannot make this work fit into a normal CI/CD flow. Testing software in an AI world has taken on new dimensions.
How Confident is Confident Enough?
An AI-enabled piece of software has a model, testing datasets and evaluators. All this can change. For example, a developer could say, “I’d like to refine how I prompt the model to improve my feature.” So, they begin swapping out one prompt for a new and improved one. That single change means a new set of tests must be run. Is the output still ethical? Precise? Accurate? The qualification of what a good answer should be on an LLM must be requalified to ensure it is still valid. However, the outputs are never deterministic — so how do you determine the confidence level?
How you test AI-powered software differs from how you test deterministic software. The way we test will no longer be, “If I call this function, I know I’ll retrieve an output of zero.” It now becomes, “Well, if I prompt with this type of context, I should receive something like this other value.” We are now dealing in confidence scores, not binary answers: Does it fall within a threshold of acceptability? If so, carry on.
Testing in an AI-flavored universe must follow a data science approach: Take X prompt, call the model, evaluate the answer, use a second LLM to evaluate my own evaluation, etc. It gets complicated quickly.
Rethinking Pipeline Triggers to Automate the Complexity of AI
The explosion of AI has made it more important than ever to be able to build, test, evaluate, deploy, train and monitor applications. Frankly, if you are solving the software delivery problem by building a new VCS in 2023, you are probably doing it wrong. AI demands a fundamental reevaluation of our software development pipelines. Traditional methods that once served as the backbone of software delivery are becoming increasingly inadequate in the face of AI.
While VCS has been instrumental in managing changes and collaborating in software development, the AI era necessitates more. As we have discussed, AI-driven development introduces new variables like model versioning, dataset changes and algorithmic updates, which traditional VCS does not effectively manage. These elements require continuous and dynamic integration and delivery processes tailored to accommodate the non-linear and often unpredictable nature of AI models.
Moreover, deploying and operating AI models in production environments introduces new challenges. It is about the behavior and performance of the models. This shift requires a new kind of pipeline trigger that is sensitive to code changes and changes in data, model behavior and external dependencies. These triggers must be intelligent and adaptive, capable of initiating a series of automated tests and evaluations to ensure the model’s performance remains aligned with its intended function.
Furthermore, training and monitoring AI models require a more nuanced approach. The pipeline must include mechanisms for continuous training, where models are regularly updated with new data, and monitoring, where models are evaluated for accuracy, fairness and drift. This calls for a holistic view of the software delivery process, where AI models are treated as integral, living components of the software, requiring ongoing care and attention.
Embracing Change in the Era of AI
As we wave goodbye to the era of GitOps and step into the dynamic landscape of AI-powered software development, it is evident that the rules of the game have changed. Our industry is in a pivotal moment: The shift from the familiarity of GitOps to the ever-changing realm of AI.
The challenges are as exciting as they are daunting. Integrating LLMs and other AI tools into our software doesn’t just add complexity; it redefines our approach to building, testing and deploying applications. But with great challenges come great opportunities. The push to incorporate AI into products isn’t just a trend; it is a testament to the transformative power of AI in creating smarter, more intuitive user experiences.
As engineers and developers, we are at the forefront of this revolution, navigating uncharted territories with the potential to redefine how software interacts with the world. Adapting to AI in production-scale applications means rethinking our CI/CD flows, embracing new testing methodologies, and constantly evaluating the ethical, precise and accurate nature of AI outputs. As we stand at the cusp of this transformative era, embracing the complexities and possibilities of AI in software development, let us forge ahead with innovation and resilience, poised to shape where technology not only meets the demands of the present but also inspires the possibilities of tomorrow.
To learn more about Kubernetes and the cloud native ecosystem, join us at KubeCon + CloudNativeCon North America, in Salt Lake City, Utah, on November 12-15, 2024.