Study Finds No DevOps Productivity Gains from Generative AI

A study of 800 software developers working on large engineering teams that have adopted the GitHub Copilot generative artificial intelligence (AI) tool finds limited gains in productivity are being achieved.

Conducted by Uplevel Data Labs, a provider of a software engineering intelligence platform, the study of a test and control group published today also notes that the number of vulnerabilities being created by developers using GitHub Copilot increased.

Specifically, developers using GitHub Copilot have not seen any improvement or decrease in pull request (PR) cycle time or overall throughput in terms of the number of pull requests merged.

Developers, however, have experienced a 41% increase in bugs within pull requests.

Matt Hoffman, a product manager and data analyst at Uplevel, said those results suggest that overall code quality is being adversely impacted by an AI tool that was trained using code of varying quality that OpenAI collected from across the Web. GitHub Copilot is based on a large language model (LLM) originally developed by OpenAI that GitHub licenses through its parent company Microsoft.

At the same time, the study suggests that burnout as measured by the amount of time developers spend working outside of standard hours is decreasing. However, that reduction isn’t attributable to generative AI because both the control and test groups saw a decline in the amount of time spent working outside of standard hours, with those that did not have access to GitHub Copilot seeing a 28% reduction compared to a 17% decline for developers that did have access.

It’s still early days when it comes to using generative AI tools to write code. The first generation of these tools is steadily going to be replaced by another wave of these tools that have been trained using code that has been vetted for quality. In addition, those tools will also take advantage of more advanced reasoning engines and AI agents to automate workflows.

In the meantime, however, it’s clear that human developers need to scrutinize the code being generated by the first generation of these tools closely. In addition to injecting more vulnerabilities into code, the code being generated may not run as efficiently as code created by a professional developer, resulting in increased costs.

Of course, generative AI tools also make coding more accessible to a wider range of developers who previously may have stopped coding simply because the effort required was too tedious. Overall, the amount of code being created by developers has increased. It’s just not all that code has impacted PR cycle times for better or worse, according to the Uplevel study. It does, however, remain to be seen if an increased number of vulnerabilities in the code created does lead to an increase in application security incidents.

Ultimately, each organization and the developers that work for them will need to determine how much to rely on generative AI tools. After all, it’s not the machines that created code used in an application that will be held accountable for its overall quality but rather the DevOps teams that allowed it to become incorporated into the software build in the first place.