Claude 4 — a note — One Way Is

Anthropic shipped Claude 4 (Opus and Sonnet variants) a couple weeks ago, and I’ve now used it enough on real work to write something down.

The headline is agentic coding. Opus 4 can take a task like “fix this bug in our auth middleware” and actually execute it — read files, form a hypothesis, write the fix, run the tests, iterate on failures — over hours, not minutes. Anthropic’s demo showed a 7-hour autonomous run. I haven’t had one go 7 hours yet, but I’ve had 40-minute runs that would previously have been 40 minutes of me doing it myself, and that math changes what my day looks like.

What’s actually different from Claude 3.7:

Tool use is more reliable. The failure mode where the model calls a tool, misreads the result, and then confabulates is much rarer. It’ll re-read, re-run, and admit when it doesn’t know.
Long horizons don’t fall apart. The model stays on-task across dozens of tool calls without losing the thread. This was the part that was broken before — after 15 or 20 steps you’d catch it doing something unrelated to the original goal.
It’s willing to say “I don’t know.” Small thing, huge for actually trusting the output.

What it isn’t: it isn’t magic, it isn’t AGI, and it isn’t going to replace the part of your job where you decide what to build. It is going to replace a lot of the part where you implement what you already decided.

Note for later: two years ago Claude could barely write a correct 20-line function. Today it can spend an hour refactoring a codebase I wrote last year. I don’t have a conclusion about this. I just want the timestamp.