Research

The True State of AI-Assisted Development in 2026

A research-driven examination of what AI coding tools actually deliver — productivity gains, quality trade-offs, trust erosion, and what the data says engineers should do differently.

OOmar Harkouss

6 March 202612 min read

The promise was clear: AI would make software engineers dramatically more productive, remove the drudgery of boilerplate, and let developers focus on the work that actually requires human judgment. Three years into mass adoption, enough data exists to evaluate that promise honestly — not with vendor press releases, but with controlled studies, longitudinal analyses, and the self-reported experiences of nearly 50,000 developers.

The picture that emerges is more complicated than either the optimists or the skeptics predicted. Productivity gains are real and measurable. So is the accumulation of technical and security debt that trails behind them. And perhaps most revealingly, developer trust in these tools is falling even as adoption climbs — a pattern that has no precedent in the history of developer tooling.

This article synthesizes the most significant research published between 2023 and early 2026 to answer one question with precision: what is AI-assisted development actually doing to the craft of software engineering?

I. Adoption: A Tool That Became Infrastructure

The speed of AI coding tool adoption has been extraordinary. GitHub's own infrastructure tells the story: GitHub Copilot surpassed 20 million cumulative users by July 2025, up from 15 million just three months prior — adding 5 million users in a single quarter (Quantumrun Foresight, 2026). Enterprise adoption grew 75% quarter-over-quarter in Q2 2025, with over 50,000 organizations now running the tool at scale.

Stack Overflow's 2025 Developer Survey, drawing on 49,009 responses from 177 countries, found that 84% of developers now use or plan to use AI tools in their development process — up from 76% in 2024 and roughly 70% in 2023 (Stack Overflow, 2025). Among professional developers, 51% report using AI tools every single day. This is not a pilot program. This is infrastructure.

The market reflects the same trajectory. The AI coding assistant market reached $7.37 billion in 2025, up from $4.91 billion the year prior, with projections pointing toward $30.1 billion by 2032 at a 27.1% compound annual growth rate. Gartner forecasts that 90% of enterprise software engineers will use AI coding assistants by 2028, up from under 14% in early 2024 (Quantumrun Foresight, 2026).

These are not adoption curves. They are vertical lines.

II. Productivity: What the Controlled Studies Actually Found

The headline productivity claim associated with GitHub Copilot — "55% faster task completion" — requires careful contextualisation. The figure originates from GitHub's own controlled experiment involving 95 professional developers. Participants using Copilot completed a specific HTTP server implementation task in an average of 1 hour 11 minutes, compared to 2 hours 41 minutes without assistance. Task success rates improved modestly from 70% to 78% (GitHub Research, 2024; Vladimir Siedykh, 2025).

These are real numbers from a real study. But the task involved — building a standalone HTTP server — is precisely the kind of bounded, well-specified, isolated implementation where AI tools perform best. The question of whether those gains generalise to the messy, context-dependent reality of production software development is where the research becomes more complicated.

Accenture's randomised controlled trial with their own development teams provides enterprise-scale corroboration. Copilot users showed:

8.69% increase in pull requests per developer
11% increase in pull request merge rates
84% increase in successful builds
26.08% increase in completed tasks overall (Cui et al., 2024; cited in Moe et al., 2025)

The build quality improvement is particularly notable — it suggests that AI assistance is not just accelerating output, but reducing the frequency of broken commits reaching CI pipelines.

A separate enterprise analysis at ZoomInfo tracked 26 days of live Copilot usage in late 2024 and found an average of 6,500 suggestions and 15,000 lines of code suggested per day, with suggestion acceptance rates in the 25–35% range — consistent with GitHub's reported average acceptance rate of approximately 30% (ZoomInfo, arXiv 2025).

Shopify's deployment stands as perhaps the most cited enterprise success case: the company achieved over 90% adoption with developers accepting more than 24,000 lines of AI-generated code daily, attributing the result to deliberate internal evangelism and structured onboarding rather than top-down mandates (LinearB, 2025).

The Satisfaction Signal

Beyond velocity, the research consistently surfaces a less expected finding: AI tools significantly improve developer satisfaction and reduce cognitive fatigue on routine work.

GitHub's large-scale survey found that 60–75% of Copilot users reported feeling more fulfilled in their roles, less frustrated when coding, and better able to focus on meaningful work (GitHub Blog, 2024). More specifically:

87% of developers reported that AI tools helped preserve mental effort during repetitive tasks
73% felt that the tools helped them maintain flow states during complex problem-solving
70% experienced reduced mental effort on routine tasks
54% spent less time searching for information or examples (Accenture, via SecondTalent, 2025)

These numbers should not be dismissed as soft metrics. Cognitive load and flow state quality are among the strongest predictors of sustained developer performance. A developer who finishes a day with reserve cognitive capacity makes better architectural decisions in the morning than one who spent the afternoon typing boilerplate.

III. The Quality Problem: What the Optimists Missed

Every productivity study cited above focuses on the near term. The quality research focuses on what happens next.

Code Churn and Duplication

GitClear's analysis of 153 million lines of code from 2020 to 2024, covering both private repositories and 25 major open-source projects, identified multiple signatures of declining code quality that correlate precisely with the timeline of AI tool adoption.

The most alarming: during 2024, GitClear tracked an 8-fold increase in the frequency of code blocks with five or more lines that duplicate adjacent code — a prevalence of copy-pasted code ten times higher than 2022 (GitClear, via LeadDev, 2025). Code duplication increased from 8.3% to 12.3% of all changed lines between 2021 and 2024.

Simultaneously, refactoring activity collapsed. Intentional code reorganisation — the metric GitClear uses to track the movement of code into reusable modules — fell from 25% to under 10% of changed lines during the same period (Pixelmojo, 2026). Developers using AI tools are writing more code and restructuring less of it.

A separate analysis by GitClear found that AI-generated code has a 41% higher churn rate than human-written code, meaning it is revised or discarded significantly more frequently after initial commit (LinearB, 2024). One Fortune 500 financial services company, cited in the same report, spent three months refactoring authentication modules that were functionally correct but violated their security architecture principles — a class of error that no automated test can catch.

By 2025, the average developer was checking in 75% more code than in 2022 (GitClear, via Dark Reading, 2025). The size of codebases is growing dramatically. The quality of the code within them is not keeping pace.

The Security Trajectory

The security data is more alarming than the quality data.

Early academic studies established the baseline. Pearce et al. (2021) found that approximately 40% of programs generated by GitHub Copilot were vulnerable to MITRE's 2021 Common Weakness Enumeration Top 25 Most Dangerous Weaknesses. Siddiq and Santos (2022) found that 68–73% of code samples from two different AI tools contained manually-detectable vulnerabilities (CSET Georgetown, 2024).

These findings have not improved materially with time. Veracode's Chris Wysopal, who predicted the vulnerability rate would decline: "It's been completely flat" (Dark Reading, 2025). Academic research in 2025 confirmed that 15–25% of AI-generated code contains security vulnerabilities, with missing input sanitisation and credential exposure ranking as the most common failure modes (Pixelmojo, 2026).

The enterprise-scale consequences became visible in 2025. Apiiro's analysis of Fortune 50 enterprises documented a 10-fold increase in security findings per month between December 2024 and June 2025 — rising from approximately 1,000 to over 10,000 monthly vulnerabilities (Apiiro, 2025). Privilege escalation paths increased by 322%. Architectural design flaws increased by 153%.

SonarSource's landmark study from August 2025, which analysed thousands of programming tasks completed by leading language models including those from OpenAI, Anthropic, and Meta, found that every model tested generated a high proportion of vulnerabilities classified at the "BLOCKER" level — the most severe classification in their framework (Medium/SonarSource, 2025).

The critical insight from this research is structural: AI tools excel at surface-level correctness. Trivial syntax errors in AI-written code dropped by 76%, and logic bugs fell by more than 60% (Apiiro, 2025). But the security gains at the shallow level are more than offset by a surge in deep architectural flaws — the kind that automated linters miss and that reviewers struggle to identify under time pressure.

The DORA Paradox

Google's 2025 DORA Report introduced what may be the most important finding in this space: AI doesn't fix a team — it amplifies what's already there (SonarSource, citing DORA 2025).

The same report found that a 90% increase in AI adoption across their study population was associated with:

A 9% climb in bug rates
A 91% increase in code review time
A 7.2% decrease in delivery stability (Google DORA, 2025; LeadDev, 2025)

The headline productivity number is 55% faster individual task completion. The organisational-level finding is that software delivery stability is declining at a statistically significant rate. These two facts are simultaneously true. This is the Developer Productivity Paradox: individual velocity increases while system-level reliability falls.

Forrester's 2025 Predictions Guide projected that by 2026, 75% of technology decision-makers will face moderate to severe technical debt — a direct consequence of accelerated AI-assisted code generation without commensurate investment in review and governance (Forrester, via SonarSource, 2025).

IV. The Trust Collapse

The most counterintuitive finding in the entire body of research is this: the more developers use AI tools, the less they trust them.

Stack Overflow's 2025 Developer Survey — with its 49,009 respondents across 177 countries — documented this precisely. Developer trust in AI accuracy fell from 40% in 2024 to 29% in 2025, an 11-percentage-point decline in a single year (Stack Overflow, 2025). Positive sentiment toward AI tools dropped from over 70% in both 2023 and 2024 to 60% in 2025 (Stack Overflow Developer Survey, 2025).

More developers actively distrust AI accuracy (46%) than trust it (33%). Only 3% report "highly trusting" the output. Experienced developers are the most skeptical: they show the lowest "highly trust" rate (2.6%) and the highest "highly distrust" rate (20%) (Stack Overflow AI section, 2025).

The number-one frustration, cited by 45% of respondents, is dealing with "AI solutions that are almost right, but not quite." This is not code that is obviously wrong — it's code that compiles, passes shallow tests, and fails in edge cases under production conditions. 66% of developers report spending more time fixing "almost-right" AI-generated code than anticipated (Stack Overflow, 2025).

When asked why they would still consult another human if AI could theoretically handle all coding tasks, developers gave three reasons in particular:

75.3% said they don't trust AI answers
61.7% cited ethical or security concerns
61.3% said they want to fully understand their own code (Stack Overflow Press Release, 2025)

Stack Overflow's Research Manager Erin Yepis offered a useful frame for the satisfaction data: "There is a correlation between developers that use AI tools daily or weekly and higher favorability scores." The trust collapse is concentrated among infrequent users who encounter the failures without accumulating the expertise to route around them. (Stack Overflow Dev Interrupted podcast, 2025).

In other words: the developers getting consistent value from AI tools are the ones who have learned, through repeated use, precisely where the tools fail.

V. The Emerging Stratification

The research, read in totality, points to a stratification that has not yet been widely discussed.

Elite practitioners — developers with deep domain expertise who use AI tools daily, maintain rigorous review processes, and have built institutional knowledge about failure modes — are extracting compounding productivity gains. They use AI to eliminate cognitive overhead on routine implementation, freeing attention for architectural judgment. Their output quality is higher, their delivery pace is faster, and their job satisfaction is elevated.

Average adopters — developers who use AI tools reactively, without structured review processes, without governance frameworks, and without sustained investment in understanding the failure modes — are experiencing the worst of both worlds: the technical debt accumulation documented by GitClear and DORA, without the productivity gains that would justify it.

The data from Accenture's deployment is instructive here: developers in the top usage quartile (75–100% AI usage) showed dramatically better outcomes than those in lower quartiles. High usage frequency and high satisfaction reinforce each other in a compounding feedback loop (SecondTalent, 2025).

The implication is uncomfortable: AI tools are widening the gap between the best and the average, not narrowing it.

VI. What the Evidence Recommends

Synthesising across the research — controlled experiments, enterprise deployments, longitudinal codebase analyses, and large-scale developer surveys — several practices consistently separate teams that extract sustainable value from those that accumulate debt.

Review every diff, every time. The moment human review becomes optional is the moment security and architectural debt begins to accumulate. The SonarSource and Apiiro data make this explicit: surface-level correctness is improving, deep structural vulnerabilities are worsening. Only human architectural judgment can close the gap.

Maintain a persistent context document. Whether it is .cursorrules, CLAUDE.md, or AGENTS.md, the single highest-leverage practice in documented enterprise deployments is maintaining a file that encodes project conventions, constraints, and past failure modes. This is the mechanism by which institutional knowledge gets injected into every request.

Measure quality alongside velocity. The DORA findings are clear: teams that measure only individual task completion speed miss the system-level degradation that follows. Code review duration, bug rates, security finding frequency, and delivery stability must be tracked alongside PR throughput.

Invest in security scanning as a prerequisite, not a post-hoc step. Given the documented vulnerability rates in AI-generated code, automated static analysis — specifically tools capable of deep SAST beyond surface syntax checking — is no longer optional infrastructure. SonarQube's AI Code Assurance feature, which identifies AI-generated code and applies more stringent quality gates to it, represents the direction the industry is moving (SonarSource, 2025).

Preserve the junior pipeline. The research documents that 54% of engineering leaders plan to hire fewer junior developers due to AI efficiencies (Pixelmojo, 2026). The same research projects that by 2026–2027, accumulated technical debt will require remediation by engineers with 2–4 years of experience debugging and understanding why code works or fails — precisely the cohort that is currently not being hired or trained.

VII. Conclusion: The Productivity Paradox Is Real

The data accumulated between 2022 and early 2026 supports a clear thesis: AI coding tools deliver genuine, measurable productivity gains at the individual level. They also deliver genuine, measurable degradation at the system level when deployed without governance, review discipline, and security infrastructure.

These are not contradictory findings. They describe the same phenomenon from different vantage points. The developer who completes a task 55% faster while producing code with a 41% higher churn rate has accelerated the creation of work they will have to redo. The organisation that ships features faster while accumulating 10× the security vulnerabilities has traded velocity for fragility.

The counterintuitive trust collapse is perhaps the most honest signal in all of this research. Developers, by nature methodical and skeptical, are arriving through experience at a calibrated assessment: AI tools are faster, AI tools are useful, and AI tools require more verification than they appeared to need when they were new.

That calibration — not uncritical adoption, not reflexive rejection, but informed, disciplined integration — is what the evidence recommends.

References

Accenture (2024). Randomised Controlled Trial: GitHub Copilot Enterprise Deployment. Cited in Cui et al. (2024) and SecondTalent (2025).
Apiiro (2025, September). 4× Velocity, 10× Vulnerabilities: AI Coding Assistants Are Shipping More Risks. apiiro.com
CSET Georgetown (2024, November). Cybersecurity Risks of AI-Generated Code. cset.georgetown.edu
Forrester (2025). 2025 Predictions: Technology Decision-Makers and Technical Debt. Cited in SonarSource (2025).
GitHub Blog (2024, May). Research: Quantifying GitHub Copilot's Impact on Developer Productivity and Happiness. github.blog
GitClear (2024). AI Copilot Code Quality Research: Analysis of 211 Million Lines of Code, 2020–2024.
Google DORA (2025). State of DevOps Report 2025. Cited in SonarSource (2025) and LeadDev (2025).
LeadDev (2025, August). How AI-Generated Code Compounds Technical Debt. leaddev.com
LinearB (2024). Is GitHub Copilot Worth It? ROI & Productivity Data. linearb.io
Moe et al. (2025). Developer Productivity With and Without GitHub Copilot: A Longitudinal Study. arXiv:2509.20353
Ox Security (2025, October). Army of Juniors: The AI Code Security Crisis. Cited in InfoQ (2025).
Pearce et al. (2021). Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions.
Pixelmojo (2026, January). The AI Coding Technical Debt Crisis: What 2026–2027 Holds. pixelmojo.io
Quantumrun Foresight (2026, January). GitHub Copilot Statistics 2026. quantumrun.com
Siedykh, V. (2025, August). AI Development Team Productivity: GitHub Research & Developer Community Studies 2025. vladimirsiedykh.com
Siddiq, M.L. & Santos, J.C.S. (2022). SecurityEval Dataset: Mining Vulnerability Examples to Evaluate Machine Learning-Based Code Generation Techniques.
SonarSource (2025, November). The Inevitable Rise of Poor Code Quality in AI-Accelerated Codebases. sonarsource.com
Stack Overflow (2025). 2025 Developer Survey — AI Section. survey.stackoverflow.co/2025/ai
Stack Overflow (2025, July). Press Release: Stack Overflow's 2025 Developer Survey Reveals Trust in AI at an All Time Low. stackoverflow.co
Stack Overflow Blog (2026, February). Mind the Gap: Closing the AI Trust Gap for Developers. stackoverflow.blog
Veracode (2025, September). AI-Generated Code Security Risks: What Developers Must Know. veracode.com
ZoomInfo (2025, January). Experience with GitHub Copilot for Developer Productivity at ZoomInfo. arXiv:2501.13282

← All articles Get in touch →