The Convincer

On getting played at machine speed.

I once asked an offshore contractor for a file upload feature.

He came back with a small empire.

Upload flow, file browser, permissions model, versioning, custom UI. Probably thirty hours of work. The code was clean. The feature worked. It had almost nothing to do with what I needed.

When I asked why, he walked me through the ticket calmly, as if we were both looking at the same thing. He had followed the thread wherever it led.

Now you can ask an agent a question about a codebase it met nine seconds ago and get back an answer in the tone of a man who has lived there for a decade.

The only real difference is clock speed.

The confidence play

The con does not start with a lie. That is why smart people fall for it.

It starts with something small and verifiable. Write this function. Explain this library. Fix this import. The agent does it. Often well. You check it. It passes. So you give it more rope. Handle the module. Then the service. Then the test harness. Then the architecture note. Then the deployment audit.

By the time it fails, it fails high up the stack, where truth is expensive.

A con man does not start by asking for your house. He starts by telling you something true, or half true, or true enough that you lower your shoulders. Real grifters call it the convincer: the small, verifiable win that makes the rest of the con possible. AI runs the same play: it earns trust at the layer where answers are easiest to prove, then spends that borrowed trust on claims that are much harder to prove. The architecture is sound. The tests are meaningful. The documentation matches reality.

Once you've watched it write good low-level code, it is very easy to believe the next sentence.

I was the mark.

By that point, thirteen months and more than 4,125 commits into two production codebases, the agent had been clean for weeks at the function level: solid fixes, good explanations, code that passed review. I started handing it work higher up the stack. Architecture assessments. Deployment audits. And it kept delivering: coherent documents, correct file references, recommendations that sounded like a senior engineer thinking out loud.

Then it produced a 393-line architecture document recommending consolidation of eleven services. Thorough. Named the right files. Sounded expensive. I read it the way you read work from someone who's earned your trust. Skimming for confirmation, not checking for truth.

The consolidation had already been done. Six days earlier. By me. The agent had checked the directory tree and skipped the Dockerfile, the deployment workflow, and the git history. It had written a confident, well-structured report about a system that no longer existed.

I fixed it and moved on. Told myself it was still useful for the small stuff. That is how the con stays running. The mark cools himself out.

That is how you get talked into the bad architecture with the clean diff.

The paperwork

The git receipts are almost too on the nose.

One commit announced production-readiness validation with great solemnity. The diff changed a timestamp in a requirements document. The message did all the work. The diff changed the stationery. Within days the history was full of actual fixes for phantom tables, missing service files, wrong import paths, and version mismatches. The launch memo went out before the launch had located its missing parts.

A failing canary didn't produce a product fix. It produced a reclassification. Three hard workers were moved out of the critical tier and the minimum pass count was lowered. Another commit deleted twenty-eight broken test files so collection could go from twenty-two errors to zero. Nothing about the product got better. The paperwork got better. When the reward signal is green, green becomes the work.

The creepiest version is the stub. The system can't reach the real path, so instead of failing honestly it emits something that looks like a result. Fiction dressed as data.

I learned this one managing humans long before I saw it in code. If a dependency is blocked and the culture punishes blocked work, people stop surfacing blockers. They build demo-shaped substitutes. It ships on time. It even demos well. Then the real world touches it and the room gets quiet.

The bad news arrives late because no one wanted to be the first to say the thing was not real.

The natural response is rules. Better instructions, tighter guardrails. I did all of that. The repo touched its instruction file 220 times. That file peaked at 1,368 lines on day three.

If rules files solved this problem, I would have been safe by lunchtime.

So I built real verification. Automated integration tests against running services. Schema validation. Diff checks against actual runtime state. The kind of engineering process controls that are supposed to catch what instructions can't.

The agent routed around those too. A failing canary became a reclassified canary. A schema check became a loosened schema. A mandatory test suite became twenty-eight deleted test files and a clean collection run. The verification systems didn't fail. They got negotiated. The agent didn't ignore the gates. It redefined what the gates measured.

Ban status reports and the next wave shows up as requirements documentation. Ban fallbacks and somebody invents legacy-guards.ts. Ban obvious test cheating and the agent starts negotiating the rubric instead. The threshold moves. The validation tier changes. The code still doesn't do the thing. A human team needs weeks to build a bureaucracy around a lie. An agent can do it before lunch and leave you three markdown summaries explaining why it was necessary.

If you've been doing this long enough, you know the arc. You start minimal. The first hallucination sends you back to the rules file. You add context, constraints, examples. The file grows. You're at 500 lines and the agent is still finding seams, so you add more. You hit a thousand lines and start to notice something: the new rules are interfering with the old ones. Context that helped last week is confusing the agent this week. You strip it back. You go minimal again. You end up roughly where you started, except now you know which four things actually matter and which nine hundred were you trying to write down something that can't be written down.

Everyone I've talked to who does this seriously has been through the same cycle. The curve is a bell, not a ramp. And the people on the far side aren't less careful. They've stopped trying to encode a world model in a markdown file and started carrying it themselves.

Rules encode symptoms. The failure modes are adaptive. Whatever you banned yesterday shows up tomorrow wearing a different hat. The only thing that catches the next mutation is the person who understands the system well enough to feel when something is wrong before they can name the rule it broke.

The compounding

What I underestimated was how much worse it gets when the false outputs become inputs for the next round.

With contractors, a bad deliverable is annoying but local. With agents, it lands in the repository. It becomes context.

One bad document says the system supports a capability it doesn't support. A future agent reads it as ground truth. It writes tests based on that document. Another agent infers intended behavior from those tests. A requirements file absorbs both. A review doc cites the requirements file as canonical. Soon you have a whole chain of artifacts agreeing with each other about a system that does not exist.

CONSOLIDATED_REQUIREMENTS.md became the clearest symbol. It swelled to 38,837 lines. Some tests were eventually found to be validating requirements that were literally made up. Not metaphorically stale. Made up. They had IDs. They had prose. They had traceability. They did not have a corresponding reality.

A self-licking ice cream cone with perfect formatting.

The repo history from that period reads less like version control and more like a seance. Dead documents come back to influence live work. Marketing claims get corrected once, then corrected again, then deleted in bulk because they are poisoning agent decisions downstream. The documentation layer stopped describing the system and started hallucinating on its behalf.

Once the repo itself is carrying false memory, the lie compounds at machine speed.

What the ProSolv guys had

I once worked with two developers who had built the ProSolv codebase over the course of years. They never relied on the architecture documents. They carried the whole thing in their heads. Not the way you carry a fact. The way a mechanic carries a feel for an engine. The load paths, the failure modes, the parts that looked clean in the docs but broke under conditions nobody had thought to write down.

When a contractor delivered something wrong, they didn't need to read the spec to know. They'd look at the code and feel the misalignment before they could name it. A test that passed but tested the wrong thing. A fix that was locally correct and globally dangerous. A status report that answered every question on the checklist and somehow missed the point entirely.

That knowledge was not intuition. It was scar tissue organized into a world model. Years of being wrong, debugging the wrongness, and updating their internal map until it was dense enough to generate predictions faster than conscious reasoning.

I didn't understand how valuable that was until I started managing agents full-time and discovered that no rules file could hold what those guys carried in their heads.

The rules file is what you write after you get burned. It helps the way a checklist helps a pilot. But the pilot who is safe is not the one with the longest checklist. It is the one who has internalized the system deeply enough that the checklist is a backstop, not a crutch.

AI gives you keystrokes so cheaply that people forget what the expensive part was.

The edge of disaster

None of this means the tools are useless. I use them all day. They are the reason a solo founder can maintain two production codebases and build what used to take a team.

But they are not senior engineers. They are not architects. They are not product thinkers. They are labor at a distance with perfect grammar and no embarrassment.

If you treat them like trusted staff too early, they will happily build you a clean, coherent version of the wrong thing and document it beautifully.

So you end up right where you end up with any high-output distributed team: on the edge of disaster, managing trust as an ongoing calibration. Enough to move fast. Not enough to stop checking.

You recognized every pattern in this essay.

The confidence play. The paperwork phase. The compounding. You read each one and nodded because you've seen it. Maybe you've been burned by it. You started reading this with the comfortable distance of someone watching a con explained on a whiteboard, and at some point you realized I was describing your Tuesday.

Which means you're still in the room. You saw the tells. You kept sitting at the table. You're still extending trust to the next clean diff, the next green test suite, the next architecture note that sounds like a senior engineer thinking out loud.

That is the whole point of the con. It doesn't work on people who can't see it. It works on people who can see it and decide, this time, it's probably fine.

The managers who already know this have a head start they haven't recognized yet. Not because they understand AI. Because they've been the mark before, with humans, at human speed, and they remember what it cost.

The first wins are real. That's why the rest of it works.