I spent a few evenings reading through the Stack Overflow 2025 survey results, and one number kept sticking with me: 84% of professional developers now use AI coding tools, but only 3% say they highly trust the output.
That gap is staggering. It's not just skepticism — it's active, daily use of tools people fundamentally don't believe.
I've been thinking about why that gap exists. The more I read, the more I think it's not a simple story of "AI bad, human good." It's more complicated than that, and frankly, more interesting.
The "Almost Right" Trap
Stack Overflow asked 49,000 developers what frustrates them most about AI coding tools. 66% gave the same answer: the code is almost right, but not quite.
This isn't the obvious failure mode. Bad code is easy to catch. What makes this insidious is that the code looks fine. It compiles. It passes your unit tests. It might even make it through code review if the reviewer is in a hurry. Then it hits an edge case in production and your payment pipeline goes down.
A developer in the survey put it bluntly: "I spend more time verifying AI suggestions than I would have spent writing the code myself."
I find that honest admission telling. This isn't developers being lazy. It's the opposite — they're working harder, just in a different phase of the process.
The psychology here is worth noting. When code looks 90% correct, your brain tends to fill in the remaining 10% as correct too. You stop reading carefully. You accept the PR. The bug slips through not because the AI fooled you, but because you fooled yourself.
What the Data Actually Says
Let me be upfront: I'm not going to cherry-pick the scariest numbers and ignore everything else. The picture is genuinely mixed, and pretending otherwise isn't useful.
Stack Overflow Survey 2025 (49,009 developers, 177 countries)
This is the largest survey, so it's worth looking at carefully. Some findings were predictable. Others surprised me.
- 84% use or plan to use AI tools. That's up from 76% in 2024. Adoption is clearly growing.
- 66% cite the "almost right" problem as their top frustration. That's not a fringe complaint — it's the dominant one.
- 45% say debugging AI code takes longer than writing from scratch. This one surprised me more than the others. I expected some friction, but not that it would be net slower for nearly half of developers.
- Trust dropped from 40% in 2024 to 29% in 2025. The tools are getting more users and less trust simultaneously.
Here's the part that genuinely surprised me: among developers with 10+ years of experience, only 2.6% highly trust AI output. That number is so low it almost feels like a statistical quirk, but it's consistent across the data. Senior developers are the most skeptical group, which suggests experience makes you more cautious about these tools, not less.
CodeRabbit: 470 Repositories
CodeRabbit did something I respect — they actually scanned real repositories and compared AI-authored vs human-authored PRs. The numbers:
- 1.7x more bugs overall in AI code
- 1.5-2x more security issues (improper password handling, insecure object references)
- 75% more logic and correctness errors per hundred PRs
- ~8x more excessive I/O operations — meaning performance degradation
The security numbers worry me more than the bug counts. A logic bug might crash your app. A security bug might leak your users' data. The distinction matters.
Veracode: 100+ LLMs Tested
Veracode tested over a hundred models on security-sensitive coding tasks. Their results were sobering:
- 45% of AI-generated code introduced OWASP Top 10 vulnerabilities
- Java samples failed 72% of the time
- 88% were vulnerable to log injection
I'm not sure what to make of the Java number specifically. Is Java harder for LLMs because the boilerplate is more complex? Or is the training data worse? I don't have a clear answer, but the pattern is consistent enough that it's worth mentioning.
SonarSource: 1,100+ Developers
SonarSource's survey had one finding that I keep coming back to:
- 96% of developers don't fully trust AI accuracy
- Only 48% always verify AI code before committing
That second number is the one that matters. If almost everyone distrusts the code but half of them commit it anyway, the problem isn't the tool — it's the process. The gap between what people believe and what they actually do is where the real risk lives.
The Security Problem Nobody Planned For
AI tools don't just write buggy code. They write exploitable code. And the mechanism isn't what I expected.
How Training Data Creates Vulnerabilities
LLMs learn from public repositories. That includes Stack Overflow answers from 2010 that show insecure MySQL queries. Tutorial code that demonstrates SQL injection as a "simple example." GitHub repos with broken authentication that someone copied from a tutorial and never fixed.
The model learns what code looks like, not what secure code requires. It can't distinguish between "this is how people write code" and "this is how people write code wrong."
Real CVEs in Production Code
Researchers found this in the wild, not just in theory. Some examples:
Microsoft's data-formulator repo (1,200+ stars), developed with Copilot assistance, had a backend endpoint that directly interpolated user-supplied table names into SQL queries. That's a straightforward SQL injection vector. It stayed in the repo for several weeks before someone caught it. Microsoft has security teams. If this slipped through there, what does that mean for smaller teams?
The hysteria2 repo (1,500+ stars) had a Copilot-authored commit that introduced shell=True in a subprocess call. That enables command injection. A human developer caught it later, but the initial commit went through.
Hardcoded credentials keep showing up in AI-generated authentication code. Not occasionally — routinely. The model seems to pattern-match on examples that include API keys inline and reproduce that pattern even when the context clearly calls for environment variables or a secret manager.
I'm not convinced this is a solvable problem with current architectures. The models are trained on what exists, not on what should exist. Security is fundamentally about what should exist.
Slopsquatting: A New Attack Category
This one is genuinely new and I think it's going to get worse before it gets better.
AI tools sometimes hallucinate package names — dependencies that don't exist but sound plausible. "Maybe there's a package called secure-auth-utils that does what I need?" The model makes one up.
Attackers have started registering these hallucinated names. When a developer blindly installs the AI-suggested dependency, they're installing malware. What starts as a model hallucination becomes a real security incident.
I don't think most developers know this is happening. I didn't, until I read the research. It's the kind of thing that only becomes obvious after it happens to you.
The Productivity Paradox
The marketing promises are specific: 55% faster coding. But the measured results are more complicated.
METR Randomized Controlled Trial (July 2025)
16 experienced open-source developers, 246 real issues, randomized controlled trial with Cursor Pro and Claude Sonnet.
Before the study, developers predicted AI would save them 24% of their time. After the study, they still believed it saved them about 20%.
The actual measured result? Tasks took 19% longer with AI than without.
I think I understand why. Experienced developers on familiar codebases already work fast. Adding AI means writing prompts, reviewing suggestions, debugging AI mistakes, and constantly switching between your own mental model and the model's output. The context-switching overhead eats the time savings.
This might not be true for everyone. A developer learning a new language or framework might benefit more. But for the experienced developers who make up most of the workforce, the net effect is negative.
Google DORA Report 2024
DORA's findings are more nuanced than most people quote them. For every 25% increase in AI adoption within a team:
- Individual productivity: +2.1%
- Developer experience (flow): +2.6%
- Code review speed: +3.1%
- Delivery stability: -7.2%
- Delivery throughput: -1.5%
People are faster, but they're producing larger changesets, testing less rigorously, and creating more post-deployment issues. DORA described it as AI improving the development process without improving software delivery. I think that's a fair summary, though I'm not sure it's the whole story.
The Hidden Cost: Review Time
Developers now spend 11.4 hours per week reviewing AI-generated code versus 9.8 hours writing new code. The job has shifted from writing to reviewing, and most teams haven't adjusted their processes or staffing for that reality.
I wonder if this is a transition period or a permanent shift. If it's permanent, we need to rethink how engineering teams are structured. But I don't think we know yet.
What Happens to Junior Developers
This is the part of the story that feels genuinely worrying, not just technically interesting.
Employment Decline
A Stanford Digital Economy Study found that by July 2025, employment for software developers aged 22-25 had declined nearly 20% from its peak in late 2022. That's a steep drop in a short period.
70% of hiring managers believe AI can do the jobs of interns. 57% say they trust AI's work more than the work of interns or recent grads.
Tech internship postings dropped 30% since 2023. Applications rose 7%. The bottleneck isn't lack of interest — it's lack of opportunity.
I'm not sure what to do about this. Banning AI tools for junior developers seems like a short-term fix that misses the structural problem. But ignoring it isn't an option either.
The Skill Atrophy Problem
20% of developers in the Stack Overflow survey reported becoming less confident in their own problem-solving abilities after relying on AI.
That makes sense, even if it's uncomfortable. When you stop struggling through problems, you stop building the debugging intuition that separates experienced engineers from new ones. The struggle isn't just an inconvenience — it's part of the learning process.
A Stack Overflow analysis put it well: "The more you understand code, the more useful AI becomes — because you can spot when it gets it wrong." The cruel irony is that AI tools are most valuable to developers who already know enough to not need them.
The Debugging Trap
45% of developers say debugging AI-generated code is more time-consuming than writing it themselves. For junior developers who didn't write the original code and don't fully understand the logic, this creates a trap: they're stuck maintaining code they can't effectively debug.
I think this is a real problem that hasn't been discussed enough. We talk about AI making coding faster, but we don't talk about what happens when the original author is a model and the maintainer is a human who doesn't understand the logic.
When the Tools Themselves Break
It's not just the generated code that's unreliable. The tools themselves are having issues.
GitHub Copilot: 257 Incidents in One Year
IncidentHub tracked 257 total GitHub incidents between May 2025 and April 2026. Some highlights:
- 44 Copilot-specific outages (9 major)