Breaking Free From the Dead Zone: Automating DevOps Shifts for Scalable Success

When building software, especially SaaS, sticking to manual processes in DevOps is like bringing a knife to a gunfight. Manual processes not only slow down operations but also open the door to mistakes that can escalate into significant problems. This tricky spot, where there’s not enough automation and too much room for human error, isn’t a good place to be if your organization wants to grow smoothly and reliably.

Many organizations have adopted “shift left” as the primary driver for automation through CI/CD pipelines. This is good, but without automation at every step of the software deployment process, gaps exist and they can be glaring.

In this article, we’re going to introduce two terms that you’ve likely not seen before: ‘Shift Right’ and the ever-so-dangerous ‘Dead Zone’ of automation. We’ll explore how these two types of DevOps shifts can work together to alleviate DevOps toil and improve the quality of the services you deliver.

Understanding ‘Shift Left’: The First Piece of the Automation Puzzle

Let’s revisit the concept of “shift left.” In DevOps, shifting left means moving tasks like testing, security checks, and quality assurance to earlier in the development process—basically, closer to the start line where the code is written. Instead of discovering bugs or security issues right before launch (or worse, after it’s live), developers catch and address them upfront. For DevOps practitioners, this approach has been a real game-changer. It speeds up development cycles, reduces the cost and hassle of late-stage fixes, and leads to more robust, reliable SaaS. By bringing automation tools into the early stages, teams can nip problems in the bud, making the whole process smoother and more efficient.

You are likely already “Shifting Left”. For example, if you rely on build-time container security scanning, that is in effect shifting vulnerability management closer to your source code repo making the process “GitOps” friendly.

By Integrating automated unit tests into your development workflow, and by running these tests every time new code is committed, you’re catching bugs earlier in the process. That, in turn, reduces the chances of issues cropping up later in production.

Similarly, implementing static code analysis tools that scan for code quality and security issues as developers write code is a form of shifting left. This practice provides immediate feedback, allowing developers to address potential problems on the spot rather than during later stages of the development cycle.

All these automation methods share a common characteristic: when issues are found, it’s up to the developer to fix them. This often leads to the use of tracking tools like Jira to log issues for later resolution. In essence, when things go wrong, human intervention is still required—a point we’ll delve into further later on.

The Power of ‘Shift Right’ Automation

If ‘Shift Left’ is all about integrating processes closer to the source code, ‘Shift Right’ offers a complementary approach by tackling challenges that arise after deployment. Some decisions simply can’t be made early in the development process. For example, which cloud instances should you use? How many replicas of a service are necessary? What CPU and memory allocations are appropriate for specific workloads? These are classic ‘Shift Right’ concerns that have traditionally been managed through observability and system-generated recommendations.

Consider this common scenario: when deploying a workload to Kubernetes, DevOps engineers often guess the memory and CPU requests, specifying these in YAML configuration files before anything is deployed. But without extensive testing, how can an engineer know the optimal settings? Most teams don’t have the resources to thoroughly test every workload, so they make educated guesses. Later, once the workload has been running in production and actual usage data is available, engineers revisit the configurations. They adjust settings to eliminate waste or boost performance, depending on what’s needed. It’s exhausting work and, let’s be honest, not much fun.

‘Shift Right’ embraces the idea of not being tied to specific configuration settings at build time. Sure, we start with initial resource requests, but automation should adjust them swiftly. Runtime systems are better positioned to observe and make real-time adjustments. By shifting right in this context, we achieve better application stability and lower costs because we eliminate waste as it happens. It’s more effective to set scaling policies or guardrails for what automation can do, rather than sticking to rigid configuration settings.

Conflicting Approaches?

So, what happens when ‘shift left’ and ‘shift right’ strategies bump into each other? Let’s consider the example of workload rightsizing. How do we ensure that configuration changes made on the ‘right’ (post-deployment automation) don’t conflict with the source code on the ‘left’ (our pre-deployment configurations)?

Consider the deployment file below, where resource requests and limits are hard-coded:

apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
replicas: 2
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
containers:
– name: my-app-container
image: my-app-image:latest
resources:
requests:
cpu: “500m”
memory: “512Mi”
limits:
cpu: “1000m”
memory: “1Gi”

In this deployment file, the CPU and memory requests and limits are explicitly defined. If an automated system adjusts these settings at runtime to optimize performance or reduce costs, it creates a mismatch between what’s in your source code and what’s actually running in production. Continuous deployment tools like ArgoCD might flag these discrepancies as “drift” and could even revert the changes to match the original settings defined in code.

But here’s the catch: in Kubernetes, a Deployment creates Pods, but pods themselves don’t exist as entities in your source code. They are generated based on the Deployment specifications. This is actually beneficial for workload rightsizing because we can adjust pods at runtime without violating the consistency of our Infrastructure as Code (IaC).

So, if we don’t want to be tied down by the specific values in the configuration file above, what should we be setting instead? In CAST AI’s approach to workload rightsizing, we allow DevOps engineers to define scaling policies using annotations:

annotations:
workloads.cast.ai/vertical-autoscaling: “on”
workloads.cast.ai/scaling-policy: “my-scaling-policy”
workloads.cast.ai/apply-type: “immediate”

These settings are declarative rather than imperative—a style that’s very much in line with Kubernetes philosophy. Instead of specifying exact resource values, we’re defining the desired behavior at runtime. We tell the system what we want it to achieve and let automation handle the specifics. This way, we focus on setting policies or guardrails that guide the automation, rather than hard-coding low-level attribute values.

This is a great example of a ‘Shift Right’ strategy that compliments ‘Shift Left’”. We allow the runtime to set specific attributes of workloads, while source code contains specific guardrails and policies to ensure that the runtime has a reasoned strategy that captures the DevOps engineer’s desired state.

Considering ‘Shift Right’ for Container Security

We typically use image scanners integrated into the CI/CD pipeline to scan containers at build time, catching vulnerabilities before they ever reach production. This is how teams ‘shift left’ for security. However, if we stop there, we’re leaving our production environment exposed to container drift.

Let’s go over a significant yet somewhat common example: Imagine your data science team wants to deploy Jupyter notebooks within Kubernetes, attached to GPU nodes. Sound familiar? This setup is common because most local environments lack the GPU capacity to run meaningful machine-learning experiments.

You’ve scanned the Jupyter container image, and it comes up perfectly clean—zero vulnerabilities. It might look something like this from a scanning tool:

So how does container drift impact us? Python environments often lack certain packages, so we have the convenient !pip install command to add packages as needed. This is where things can go wrong. Well-intentioned data scientists might install anything, and that includes potentially harmful software like crypto miners, even in a container image that initially had zero vulnerabilities. Vulnerabilities like this can’t be stopped at build time—shift left isn’t an option here. To prevent these kinds of accidental or malicious activities, we need to shift right and leverage runtime detections.

Fig: A screenshot from a Jupyter Notebook application that connects to XMRig, a common crypto mining server.

But detecting these types of vulnerabilities isn’t enough. What then? Do we need a DevOps engineer to manually kill the malicious pod? How do we know if any damage was inflicted, or data exfiltrated?

A good runtime solution will not only detect malicious behavior like the example above but also automatically take action to kill the offending workload and prevent it from re-entering the system. Even better, it would be ideal if such workloads could be quarantined in a running state, giving SecOps a chance to examine what the malicious workload is doing and how it got into the environment. We call these types of automation policy enforcement. Without automation, we’re still relying on individuals to do the right thing.

Avoiding the Dead Zone

Having explored both ‘shift left’ and ‘shift right’ strategies, it’s clear that the dead zone — that middle ground dominated by manual processes—is a place we want to avoid. This zone is fraught with inefficiencies, human errors, and delays that can hinder growth and innovation. By embracing automation on both ends of the development lifecycle, we eliminate these pitfalls. Automation keeps us out of the dead zone by ensuring that tasks are handled quickly and consistently, freeing up our teams to focus on what truly matters: delivering value and driving the organization forward.

The Future of Automation: Left and Right Working Together

The future of automation in DevOps lies in creating bridges between the “shift left” and “shift right” strategies. A prime example of this is Dependabot, a tool provided by GitHub that automates dependency updates. Dependabot scans your project’s dependencies for outdated versions or vulnerabilities and automatically generates pull requests to update them.

Why is this important? Because it tackles a tedious yet critical task — keeping dependencies up-to-date to ensure security and compatibility. Dependabot automates the detection and initial update process, but it doesn’t remove humans from the equation. Instead, it involves them at the right moment, allowing developers to review and merge the changes. Machines handle the heavy lifting of identifying and proposing updates, while humans provide oversight and make informed decisions.

This approach exemplifies how automation can 10x outcomes without sacrificing control. It acknowledges that while some issues originate on the left or right sides of our pipeline, their solutions need to be fully integrated back into the source code to be effective. Automation initiates the fix, but human expertise ensures it’s implemented correctly.

At CAST AI, we’re excited about expanding this concept. For instance, suppose our system identifies an optimal database index that could significantly improve performance. Creating the index at runtime offers immediate benefits, but it’s not the complete solution. To make this improvement lasting and maintainable, we need to incorporate the change back into the application source code and move it through the entire pipeline.

This is where we see significant innovation potential that will help our customers avoid the dead zone. By automating the identification of such optimizations and facilitating their integration back into the development process, we will create a feedback loop that continuously enhances our customers’ software. We will ensure that improvements aren’t just temporary patches but become part of the application’s foundation.

Final Thoughts

Automation is no longer a luxury; it’s a necessity for scalable success. By leveraging both ‘shift left’ and ‘shift right’ strategies and ensuring they work together, we can eliminate toil and reduce the potential for human error. The key is to empower our teams with tools that automate the mundane and highlight the impactful, allowing us to focus on what truly matters: building great software.

To learn more about Kubernetes and the cloud native ecosystem, join us at KubeCon + CloudNativeCon North America, in Salt Lake City, Utah, on November 12-15, 2024.