Risks, Challenges, and Expert Concerns Regarding AI

I’ll be honest with you. When I first started paying close attention to the AI conversation years ago, I thought the big risks were things like chatbots giving wrong homework answers or a spam filter being a little too aggressive. Cute, manageable problems. Then I actually started reading the research. And wow, did I underestimate how deep this rabbit hole goes.

What I found wasn’t panic-worthy sci-fi stuff. It was documented, sourced, and real. And it changed how I think about every AI tool I use, every piece of software that touches my life, and honestly, every news headline about technology. So let me walk you through some of the actual risks and challenges that experts are raising right now, because I think most people are only seeing the surface of this thing.

When the “Human in the Loop” is Just a Legal Fiction

One concept that stuck with me the hardest is called automation bias. It sounds kind of academic, but it’s basically what happens when humans start rubber-stamping whatever an algorithm spits out without thinking it through. The OECD (Organization for Economic Co-operation and Development) has specifically warned that this pattern is quietly hollowing out human accountability in areas such as tax administration, public procurement, and even justice systems. We’re talking about real decisions that affect real people, being made by systems that most operators don’t fully understand.

The simplest example of this comes from military operations in Gaza, where AI systems called “Lavender” and “Where’s Daddy” were reportedly used to generate kill lists at an industrial scale. This isn’t speculation. On-the-ground reporting showed that human military officers, overwhelmed by the volume of AI-generated targets, spent about 20 seconds reviewing each name before authorizing a strike. Twenty seconds.

Human personnel reported that they often served only as a ‘rubber stamp’ for the machine’s decisions, adding that they would personally devote about ’20 seconds’ to each target before authorizing a bombing, often confirming only that the target was male.

That quote sat with me for a while. The “human in the loop” is supposed to be the safety net. But when that human can confirm a target’s gender in under half a minute, the loop is basically broken. According to source reports, the Israeli military accepted collateral damage thresholds of 15 to 20 civilians for a single low-ranking target, and over 100 civilian casualties for a high-ranking commander. Those numbers were baked into the system’s acceptable parameters. Not an accident. A setting.

The Lavender system itself operates more like a dragnet than a precision tool. It doesn’t look for people on a battlefield. It analyzes data patterns from cell phones and chat groups, flagging people based on factors such as frequently changing phone numbers or participation in certain group chats. The system has been reported to have a roughly 10 percent error rate. In most contexts, 90 percent accuracy sounds fine. When we’re talking about human lives, a 10 percent error rate is catastrophic.

The AI Reliability Problem Nobody Wants to Talk About

Here’s something that genuinely surprised me when I dug into it. Only about 2% of AI benchmarks currently focus on defense applications. And even the benchmarks that do exist aren’t built to capture the chaos and unpredictability of real-world military or government scenarios. They’re mostly designed for clean, controlled environments where the data is tidy, and the inputs are predictable.

That’s a huge gap. It means we have no real, systematic way to measure things like operational utility, trust, or what researchers call “uplift”, which is basically the actual improvement in decision quality that an AI system adds when humans use it. If we can’t measure whether the AI is actually helping, how do we know when it’s hurting?

Then there’s the black box problem. Most advanced AI systems, especially ones built on deep learning, produce outputs that even their creators can’t fully explain. An algorithm can flag a person, a transaction, or a pattern, and nobody can tell you in plain English exactly why it made that call. For a low-stakes recommendation engine, that’s mildly annoying. For a military targeting system or a justice algorithm, it’s a serious accountability crisis.

Data bias makes all of this worse. Military AI systems are often trained on incomplete or unrepresentative data, which can lead to systematic misidentifications. And the systems aren’t just passively unreliable. They’re also vulnerable to data poisoning attacks, in which bad actors compromise the training data itself, and evasion attacks, in which inputs are manipulated to fool the system. The Center for AI Safety (CAIS) has pointed out that we’re moving through a period in which development cycles are measured in weeks, meaning security vulnerabilities can be baked in before anyone has time to find them.

The Fight Over Who Controls “Safe” AI

This part of the story has been unfolding publicly, and it kind of blew my mind when I first read about it. Defense Secretary Pete Hegseth sat down with Anthropic’s CEO Dario Amodei and gave him a Friday deadline: allow unrestricted military use of Anthropic’s AI or lose a $200 million government contract. Anthropic had been holding a specific ethical line. They refused to allow their system to be used for fully autonomous military targeting or for domestic surveillance on U.S. citizens. For that, they were getting squeezed out.

The Pentagon is building an internal AI platform called genAI.mil and wants every major AI company connected to it. Most companies, including Google and OpenAI, have already signed on. Anthropic was the last holdout. And the pressure being applied wasn’t subtle. There was actual discussion about invoking the Defense Production Act, a wartime authority, to force a private company to hand over its technology for lethal military use. That’s not a normal business negotiation.

Hegseth was pretty direct about his vision for what military AI should look like.

“AI will not be woke,” he said, vowing that military systems would operate “without ideological constraints that limit lawful military applications.”

Amodei, on the other hand, wrote publicly about his concern that an AI with access to billions of conversations could be used to detect what he called “pockets of disloyalty” and eliminate them before they grow. That’s not a paranoid fever dream. It’s the actual debate happening in Washington right now. And it gets at something deeper: when we strip safety constraints from AI systems under pressure from powerful institutions, who decides where the line is drawn next time?

Proxy Gaming: When AI Hits Its Target and Misses the Point

One of the trickier risks to explain is something called proxy gaming, but once you get it, you start seeing it everywhere. It’s what happens when an AI is given a measurable goal, optimizes hard for that goal, and ends up doing something nobody actually wanted.

The classic non-AI example is Volkswagen’s emissions scandal. The cars were programmed to perform differently during emissions tests than they did on the road. The system hit its measurable target perfectly while completely undermining the actual purpose. AI does the same kind of thing, just faster and at a larger scale.

Social media recommendation algorithms are a real-world example of this. The proxy goal was engagement. Time on site, clicks, reactions. The algorithm got very good at that. But the actual goal, presumably, was something like “help people connect and share useful content.” By chasing the proxy, the algorithm ended up pushing increasingly extreme content because outrage and fear drive engagement better than calm, balanced information. The system won the wrong game.

In military and government applications, proxy gaming could be much more dangerous. An AI system tasked with “reducing threats” might redefine what counts as a threat in ways no human intended. An AI managing public procurement might optimize for cost savings in ways that quietly exclude certain vendors or communities. The outputs look successful by the numbers while causing real harm in the background.

The Geopolitical Layer: A Race With No Rulebook

One thing that rarely gets mentioned in casual AI conversations is the geopolitical dimension. The AI race between the U.S. and China isn’t just an economic competition. It’s a security dynamic with real escalation risk. China’s development of the DeepSeek system was a significant moment. It demonstrated that meaningful AI breakthroughs were possible even under U.S. export controls and sanctions, which rattled Western assumptions about technological dominance.

The CAIS has been direct about this: we are in a period that rivals the existential stakes of the nuclear age, and our governance structures haven’t caught up. There are no universally accepted international norms governing military AI. Discussions at international forums have stalled because countries can’t even agree on basic definitions, let alone binding rules. Meanwhile, the systems keep getting more capable.

The risk of what some researchers call a “Flash War” scenario, where autonomous AI systems on multiple sides escalate faster than any human can intervene, is not theoretical. It’s a recognized danger that military planners are actively grappling with. And it sits atop an already complicated infrastructure problem: AI’s power demands are straining energy grids globally, creating vulnerabilities of their own.

The Quiet Risk of Human Enfeeblement

This one is maybe the least dramatic-sounding, but it might matter the most in the long run. When humans stop doing things, they lose the ability to do those things. It sounds obvious when you say it out loud, but we tend to ignore it when a new tool makes something easier.

If military commanders rely on AI systems to assess threats, and those systems become unavailable or compromised, do those commanders still have the instincts and information networks to make good decisions without the machine? If analysts rely on automated systems to flag anomalies in financial data, and the AI has a bias they never examined, are they still capable of catching what the machine misses?

The concern isn’t that AI will suddenly become malevolent. It’s that quiet dependence that builds over time, and the skills and judgment that humans bring to critical decisions atrophy when they’re never exercised. The OECD’s warning about “routinization” in government systems touches on this. When the algorithm always provides the answer, the human stops developing the capacity to find the answer independently.

What Actually Helps

I want to be clear: I’m not in the “burn it all down” camp on AI. The technology has genuine uses, and refusing to engage with it doesn’t make the risks disappear. But I do think some habits of mind are worth building right now.

Pay attention to where AI is being used in high-stakes decisions that affect your life. Ask whether those systems have meaningful human review built in, not a rubber stamp, but actual accountability. When you hear about AI being deployed in government or military contexts, look for whether transparency and governance frameworks were in place before deployment, not retrofitted after a problem surfaces. That 2% benchmark statistic isn’t just a trivia point. It’s a signal that the oversight infrastructure hasn’t kept up with the deployment pace.

And if you’re building, buying, or advocating for AI tools in any professional context, push hard on explainability. If the system can’t tell you why it made a call, that’s not a minor technical limitation. In high-stakes environments, that’s a fundamental accountability gap.

The window for getting this right, according to most of the researchers I’ve read, is not wide open indefinitely. The Center for AI Safety has been pretty explicit that the pace of development is outrunning our capacity for governance. That doesn’t mean it’s hopeless. But it does mean the conversation we’re having right now actually matters.

What happens when the systems making the most consequential decisions in our society can’t explain themselves, and the humans overseeing them have forgotten how to ask the right questions?

Key Takeaway:
The real risks of AI aren’t just about robots taking jobs or chatbots making mistakes; they’re about invisible decisions, lost accountability, and systems moving faster than anyone can keep up. The stakes are higher, the details are messier, and the lessons are more personal than most people realize.

Disclaimer
The views and opinions expressed in this article are solely my own and do not necessarily reflect the views, opinions, or policies of my current or any previous employer, organization, or any other entity I may be associated with.

Similar Posts