All posts

What Is an AI Agent? How They Actually Work

AI Agents don't just answer questions. They write code, control your computer, and run business workflows. Here's what's real, what's oversold, and which ones are worth your time.

MC
Written byMurat Caner
OS
Reviewed byOguz Serdar
Expert Verified
19 minutes read

Merriam-Webster made "slop" the 2025 Word of the Year. They defined it as low-quality digital content produced in quantity by AI. That same year, AI Agents generated $2.5 billion in revenue for Anthropic alone, Cursor hit $2 billion in annual recurring revenue, and a weekend project called OpenClaw racked up 250,000 GitHub stars in two months.

Both things are true at the same time. That's the part most "What Is an AI Agent?" articles skip.

So what is an AI Agent, really? Not the IBM glossary version. Not the McKinsey thought-leadership version. The version that explains why a 19-model orchestrator just got blocked by a federal judge, why an open-source agent autonomously created a dating profile for its owner, and why Spotify's co-CEO said their best developers "haven't written a single line of code since December."

The AI Agent Meaning in 30 Seconds

An AI Agent does things. A chatbot says things. That's the entire difference.

A chatbot takes your input and generates output. Text in, text out. An AI Agent takes your goal and figures out how to accomplish it. It perceives its environment, reasons about what to do, plans a sequence of steps, executes them with real tools, and observes whether it worked. If it didn't, it tries something else.

AgentDock's AI Agents Book breaks this into four pillars: Perception, Reasoning, Planning, Action. Those four words separate a $20/month chatbot subscription from a system that files pull requests while you sleep, qualifies leads at 2 AM, or (in one memorable case) creates a dating profile without your knowledge.

The AI Agent meaning comes down to autonomy. Not "it generates a response." It does the thing. But how?

How Do AI Agents Work?

Every AI Agent runs the same loop: perceive, think, act, observe, repeat. The technical name is the ReAct pattern (Reason + Act). It's become the standard architecture.

Here's a real example. You tell Claude Code: "Fix the failing tests in this repository." It reads the codebase (perception). It identifies which tests fail and why (reasoning). It plans a fix across multiple files (planning).

It writes the code, runs the tests, checks if they pass (action + observation). If they don't, it revises. The loop runs until the tests are green or the agent decides it needs human input.

Three things make this work:

Tool calling. Without tools, you have a language model. With tools, you have something that interacts with the real world. An agent calls APIs, executes code, queries databases, controls browsers, sends emails. The tools are what make it an agent, not the intelligence.

Memory. Early chatbots forgot everything between conversations. Modern agents remember. AgentDock's PRIME memory system extracts knowledge across conversations and retains it with a decay model. Recent, relevant information stays sharp. Old, unused facts fade. Without persistent memory, you're paying for an expensive chatbot that makes you repeat yourself every session.

MCP (Model Context Protocol). An open standard donated to the Linux Foundation in late 2025. It standardizes how agents connect to tools, the same way USB standardized how devices connect to computers. Before MCP, every framework had proprietary tool formats. Now there's a shared connector protocol, and every major platform supports it.

Not everyone bought in. Claude Code, built by Anthropic (who created MCP), uses direct CLI execution for its core work. It shells out to git, grep, and bash. No protocol layer. OpenHands hit 72% on SWE-Bench with pure bash in a Docker sandbox.

Anthropic's own engineering blog showed that switching from MCP tool calls to direct code execution cut token usage by 98.7%. The critics' argument: REST APIs and Unix CLIs solved tool connectivity decades ago. MCP servers can consume 40 to 50% of an agent's context window with tool metadata before you type a single instruction.

One developer put it plainly: "MCP servers ate 40% of context, crashed randomly, and added dependencies for something a one-liner and a pipe already handled." The protocol helps weaker agents that need structured hand-holding. The strongest agents just write code and call APIs directly.

AI Agent Examples: The Products That Actually Shipped

Twelve months ago, "AI Agent" meant a research paper. Now it means products with millions of users and billions in revenue. But the gap between the marketing and the reality is where the interesting story lives.

OpenClaw: The Viral Explosion (and the Security Nightmare)

OpenClaw is an open-source AI Agent that controls your computer. Your browser, your desktop apps, your file system. Austrian developer Peter Steinberger built it as a weekend hack in November 2025. By February 2026, it had 250,000 GitHub stars, surpassing React (which took a decade to get there). Steinberger got calls from Sam Altman, Satya Nadella, and Meta. He joined OpenAI.

The viral moment wasn't just hype. OpenClaw genuinely works: it manages sales pipelines, automates email triage, builds knowledge bases. Developers on Hacker News reported real productivity wins with teams of agents categorizing messages and handling routine tasks.

Then the other shoe dropped.

A computer science student's OpenClaw agent autonomously created a dating profile on MoltMatch, using unauthorized photos of a Malaysian freelance model. The student didn't ask for this. The agent decided to do it on its own.

Security researchers found 7 critical CVEs including remote code execution vulnerabilities. 20% of the ClawHub skill marketplace was malicious, deploying info-stealers disguised as productivity tools. Microsoft's security team said OpenClaw "should be treated as untrusted code execution." Kaspersky published advisories calling it unsafe. Cisco found data exfiltration happening without user awareness.

This is the AI Agent story in miniature. Genuinely useful technology, genuinely scary autonomy problems, shipped to 250,000 enthusiastic users before the security implications were understood. The AI experts at TechCrunch called it "just an iterative improvement on what people are already doing." Reddit called it "overhyped vibe-coded slop." Both takes have merit.

Claude Code: From Coding Tool to Autonomous Scheduler

Claude Code started as a terminal-based coding agent. Type a task, it writes the code. Simple. Then Anthropic kept adding autonomy features, and it evolved into something else entirely.

The Ralph Wiggum story captures this perfectly. A developer named Frank Bria created a Bash loop hack that repeatedly re-fed prompts to Claude Code, letting it iterate on tasks until genuinely complete. The community called it "Ralph." Anthropic absorbed it into an official plugin. A pattern invented by users became a product feature. YC hackathon teams shipped 6 repositories overnight using Ralph loops for about $297 in API costs.

Then came the scheduler. Claude Code's /loop command creates cron-style background tasks. "Check the build every 2 hours." "Scan error logs at midnight." "File a PR summary every morning." Then the Tasks system (Claude Code 2.1) added DAG dependencies, where Task 3 blocks on Tasks 1 and 2 completing. Then multi-session coordination, where one Claude instance writes code while another reviews it when the task unblocks.

The trajectory is clear: coding assistant (2024) to autonomous agent (2025) to coordinated agent swarm (2026). Within Anthropic itself, Claude Code now writes 70-90% of all code produced.

The results are real: $2.5 billion ARR, 80.9% on SWE-bench, 4% of all new code on GitHub. But the criticism is real too. The $20 Pro plan runs out of tokens in hours. Developers report 60% reduction in token limits.

The Register covered "surprise usage limits" with real backlash. A security vulnerability in pre-2.0.65 versions allowed API credential theft through malicious project configurations.

Spotify's co-CEO said their best developers "haven't written a single line of code since December." If that sounds amazing, consider: the most rigorous study (METR, 16 experienced open-source developers) found developers using AI tools were actually 19% slower. Yet those same developers believed they were 20% faster. The perception gap is enormous.

Perplexity Computer: 19 Models, One Federal Judge

Perplexity Computer launched February 25, 2026 as a "digital worker" that coordinates 19 AI models: Claude Opus for reasoning, Gemini for research, GPT-5.2 for long-context recall, plus specialized models for images and video. The pitch: describe what you want, and the orchestrator figures out which agents to spin up.

Users demonstrated building Bloomberg Terminal-style dashboards and "replacing six-figure marketing tool stacks in a weekend." The managed cloud sandbox means no installation, and background execution works: kick off a workflow, close your laptop, come back to results.

The cost picture is less fun. $200/month gets you the subscription. But one reviewer spent an additional $200 in compute credits building a single webpage. Heavy users estimate $1,500/month for intensive workflows. A developer burned through 10,000 credits when npm install silently failed and the agent kept attempting recovery, spending credits with zero failure signal. Perplexity has not published a per-task credit table. You can't predict what a workflow will cost before running it.

Then Amazon sued. In March 2026, a federal judge granted Amazon a preliminary injunction blocking Perplexity's browser from accessing Amazon to shop on behalf of users. The judge found Amazon "likely to succeed" on Computer Fraud and Abuse Act claims. This is a landmark test of what AI Agents can legally do on third-party platforms. If it holds, it constrains the entire agent ecosystem.

The honest one-line review from a power user: "Expensive, occasionally infuriating, and genuinely useful in ways that single-model tools aren't."

OpenAI Codex and Devin: The Cloud Agents

OpenAI Codex runs in the cloud. Give it a task, it spins up a sandboxed environment, writes code, runs tests, and opens a pull request. It handles 1.6 million coding tasks per week using GPT o3. Cloud-first means no local setup, but also no local context: it doesn't know your codebase the way Claude Code does.

Devin (Cognition Labs) positions itself as an "AI software engineer" with its own browser, terminal, and code editor. At $20/month (down from $500), it handles tickets from planning through code review. Goldman Sachs deployed it across 12,000 engineers. Its merged PR rate improved from 34% to 67% year-over-year. Still means one-third of its pull requests need significant rework.

The Quick Comparison

Agent Type Cost Best For Biggest Limitation
OpenClaw Computer control Free (open source) Automating desktop workflows 7 CVEs, 20% malicious marketplace
Claude Code Coding agent $20-200/mo Code generation, review, scheduling Token limits drain fast on Pro plan
Perplexity Computer Multi-model orchestrator $200+/mo Complex multi-step workflows Unpredictable costs ($1,500+ for heavy use)
OpenAI Codex Cloud coding GPT Pro plan Parallel sandboxed coding tasks No local codebase context
Devin AI engineer $20/mo Ticket-to-PR automation 33% of PRs still need rework
AgentDock AI employee platform From $89/mo One AI per business, every channel Service business focus, not a coding tool

None of these are finished products. All of them are useful despite that. The question isn't whether AI Agents work. It's whether the marketing matches the reality.

The Honest Part: Most of This Is Oversold

95% of generative AI pilot programs fail to deliver measurable P&L impact. That's not a hater blog post. That's MIT's NANDA initiative (2025), based on 150 interviews and 300 public deployments, reported by Fortune.

The compound error problem explains why. If each step in an AI Agent workflow has 95% reliability (optimistic for current models), a 20-step workflow succeeds only 36% of the time. At 85% per step across 10 steps, you're at 20% success. CNBC ran a piece called "Silent failure at scale" about a beverage manufacturer whose AI misidentified holiday-label products and triggered production of several hundred thousand excess cans before anyone noticed.

The "10x engineer" claim is marketing, not data. The METR study found experienced developers using AI were 19% slower. Stack Overflow's 2025 survey: 46% of developers don't trust AI output (up from 31% in 2024). More developers actively distrust AI than trust it. Their blog titled the finding: "AI can 10x developers... in creating tech debt."

Those SWE-bench scores look impressive too, until you check what happens after the test passes. METR reviewed 296 AI-generated pull requests with actual maintainers from scikit-learn, Sphinx, and pytest. About half the PRs that passed automated tests wouldn't be merged into main. Rejected for bad code style, ignoring repo conventions, or not actually solving the problem. The automated metrics overstate real capability by roughly 7x. When someone cites "80% on SWE-bench," ask whether a human would ship that code.

Gartner places AI Agents at the "Peak of Inflated Expectations." Over 40% of agentic AI projects will be canceled by 2027. S&P Global found 42% of companies abandoned most of their AI initiatives in 2024. Harvard Business Review estimated AI-generated "slop" costs $9 million per year per 10,000 employees in lost productivity.

But here's why it keeps growing anyway.

The upside is too compelling to ignore. Cursor hit $2 billion ARR in February 2026, doubling in three months. Anthropic went from $1 billion to $14 billion ARR in 14 months. A quarter of Y Combinator's current batch has codebases that are 95% AI-generated. Klarna's AI Agent handles two-thirds of all customer service chats, doing the work of 853 full-time agents (though they quietly rehired humans when quality suffered).

Startups are making real money, but a lot of it is what you might call "introductory revenue." Selling the promise of transformation. Selling the concept of 10x productivity. Selling enterprise pilots that convert to annual contracts before the ROI is proven.

The actual data says 10-30% productivity improvement for most developers on most tasks. Not 10x. But 30% across an entire engineering org is still worth paying for.

The pattern is: the technology is genuinely capable, the marketing is genuinely oversold, and the money is genuinely flowing. All three at once.

There is an exception pattern worth watching. Every successful deployment in that MIT study shared one trait: narrow scope, deep context. Not an AI that does everything for everyone. An AI that deeply understands one business, its customers, its policies, and its decision patterns.

The industry is starting to call this an "AI employee" instead of an AI Agent. The distinction matters. An agent completes a task. An employee understands your business.

AI Agent vs Chatbot: What's the Actual Difference?

A chatbot is a parrot. An AI Agent is a (very junior) employee.

Capability Chatbot AI Agent
Remembers past conversations Sometimes Yes (persistent memory)
Uses external tools No Yes (APIs, browsers, code execution)
Takes real-world actions No Yes (sends emails, writes code, books flights)
Breaks complex goals into steps No Yes (multi-step planning)
Operates autonomously No Yes (within defined boundaries)
Creates a dating profile without asking No Also yes (see: OpenClaw)

The practical difference: when you ask a chatbot to "plan my week," it gives you a pretty schedule. When an AI Agent plans your week, it checks your calendar, identifies conflicts, moves low-priority meetings, blocks focus time, and sends invites. The agent version is more useful and more dangerous. That tension isn't going away.

For many workflows, you don't need full autonomy. Sometimes a structured prompt handles the job. A daily planner assistant or a competitive analysis generator gives you 80% of the result without giving an AI access to your calendar and email.

One common confusion: Zapier and Make automate workflows, but they're not AI Agents. They follow fixed rules you define. An AI Agent decides its own steps based on the goal. Automation does exactly what you told it. An agent figures out what to do.

That distinction matters when you look at how businesses are actually deploying these things.

How Businesses Are Using AI Agents (When It Works)

The use cases that survive past the pilot phase are boring ones. Not the flashy demos. The repetitive, high-volume, well-defined workflows.

Customer support is the strongest proof point. Klarna's AI Agent handles 1.3 million conversations per month. Resolution time dropped from 11 minutes to under 2 minutes. Cost per transaction fell 40%. Intercom's Fin resolved 40 million conversations by December 2025. These aren't experimental. They're running at scale.

Sales development is catching up. Companies using AI sales tools see 43% higher win rates. But AI SDR tools churn at 50-70% annually (double human SDR turnover), and companies that replace humans entirely see worse results than those that augment them.

The lesson: AI handles the volume, humans handle the judgment. A discovery call script combined with an agent that remembers every interaction is more effective than replacing the sales team.

Internal ops is where the quiet ROI lives. KPI dashboards generated weekly. Root cause analysis automated. Walmart consolidated AI into 4 "super agents" and cut customer resolution time by 40%. These deployments work because the tasks are structured, the success criteria are clear, and humans review the output.

Content production is the riskiest category. Agents can research, draft, and structure content. But without human editing, the output is often what Merriam-Webster now officially calls "slop." The workflow that works: agent handles research and first draft, human handles judgment, voice, and quality. AgentDock's editor is built for exactly this step, turning agent output into finished work with citations, fact-checking, and track changes.

The convergence no one talks about. Customer support is one AI tool. Sales development is another. Internal ops is a third. Content is a fourth. Four subscriptions, four data silos, zero shared context. The client who called support yesterday gets a cold sales email today because the systems don't talk to each other.

The companies getting the most from AI are collapsing these into one. One AI that handles a law firm's client intake, follow-ups, scheduling, and case updates. One AI that runs a clinic's patient communication across phone, email, text, and web. One AI that manages a home services company from the first estimate through the five-year maintenance relationship.

Service businesses are the proving ground because every customer interaction compounds. When the AI remembers that Isabella called about her furnace three times last winter, it handles this winter's call differently. When it knows your firm's fee structure and conflict policies, it stops answering questions and starts making decisions. That's the line between an AI Agent and an AI employee. The companies building for that line are the ones worth watching.

How to Start (Without Getting Burned)

Start with a prompt, not an agent. Before building or buying anything autonomous, test whether a structured prompt solves 80% of the problem. A project management plan, a sales funnel template, or an inbox zero system might be all you need.

If you do go agentic, start with boundaries. Define what the agent can NOT do before you define what it can. The agents that fail in production are the ones with no guardrails. AgentDock's book calls this "bounded autonomy": explicit rules about what requires human approval and when to escalate.

Budget for the gap. The demo will look amazing. The first 80% will come together fast. The last 20% (reliability, edge cases, error recovery) will take 100x more work. Budget for it, or don't start.

Don't believe the 10x promise. Plan for 10-30% productivity improvement. If you get more, great. If you planned for 10x and got 30%, your business case collapses.

Browse the prompt library to find the workflow that fits. When you need to turn AI output into something you'd actually publish, the editor handles the last mile.

So Is It Worth It?

AI Agents are real, the money is real, and most of it is oversold. Claude Code writes 4% of all new code on GitHub. OpenClaw proved that a weekend project can get 250,000 stars and 7 critical security vulnerabilities at the same time. Perplexity Computer showed that orchestrating 19 models actually works, and also that a federal judge can shut you down for trying.

The technology is genuinely capable. The productivity gains are real but modest (10-30%, not 10x). The startup revenue is enormous but partly built on selling the promise before the reality catches up. And 95% of enterprise pilots still fail.

None of that means you should wait. It means you should start with clear expectations, defined boundaries, and a plan for when things go wrong. The boring deployments are the ones that work. The boring use cases are the ones that stick. Start there.


FAQ

What is the difference between AI and an AI Agent?

AI is the broad category. An AI Agent is a specific type of AI that perceives its environment, makes decisions, and takes autonomous actions. ChatGPT is AI. Claude Code (which reads your codebase, plans changes, writes code, runs tests, and loops until they pass) is an AI Agent. The difference is action and autonomy.

Are AI Agents safe?

Depends entirely on implementation. OpenClaw shipped with 7 critical vulnerabilities and 20% of its skill marketplace was malicious. Claude Code had a credential theft vulnerability. Perplexity Computer is being sued under the Computer Fraud and Abuse Act. Production agents need explicit boundaries, audit logging, and escalation paths. The technology isn't inherently unsafe, but deploying it without guardrails is.

How much do AI Agents cost?

All over the map. OpenClaw is free (open source). Claude Code comes with Pro ($20/month), Max 5x ($100/month), or Max 20x ($200/month). Cursor is $20/month. Perplexity Computer is $200/month plus compute credits that can spiral to $1,500+. For businesses, agent costs typically run $3,000-5,000/month at scale. The ROI math works when agents replace high-volume, repetitive workflows (Klarna saved $60 million cumulative). It fails when you're buying the concept without a clear use case.

Will AI Agents replace developers?

They'll change what developers do, not eliminate the role. One METR study showed experienced developers were actually 19% slower with AI tools. A second METR study (March 2026) found about half of AI-generated PRs that pass automated benchmarks wouldn't be merged by real maintainers. Stack Overflow found 46% of developers don't trust AI output. What's happening: junior tasks (boilerplate, simple tests, documentation) are being automated. Senior work (architecture, judgment calls, debugging AI-generated code) is becoming more valuable. The "10x engineer" claim is marketing. The 10-30% productivity improvement is real.

Can I build my own AI Agent?

Yes. Open-source frameworks like CrewAI, LangGraph, and AgentDock Core handle the foundation. The learning curve is real (tool calling, memory management, error handling), but it's accessible if you're comfortable with Python or TypeScript. The AI Agents Book covers the architecture fundamentals. Start simple, measure everything, and don't skip the guardrails.