Claude 4 — a note
Anthropic shipped Opus 4 and Sonnet 4. A short reaction.
Anthropic shipped Claude 4 (Opus and Sonnet variants) a couple weeks ago, and I’ve now used it enough on real work to write something down.
The headline is agentic coding. Opus 4 can take a task like “fix this bug in our auth middleware” and actually execute it — read files, form a hypothesis, write the fix, run the tests, iterate on failures — over hours, not minutes. Anthropic’s demo showed a 7-hour autonomous run. I haven’t had one go 7 hours yet, but I’ve had 40-minute runs that would previously have been 40 minutes of me doing it myself, and that math changes what my day looks like.
What’s actually different from Claude 3.7:
- Tool use is more reliable. The failure mode where the model calls a tool, misreads the result, and then confabulates is much rarer. It’ll re-read, re-run, and admit when it doesn’t know.
- Long horizons don’t fall apart. The model stays on-task across dozens of tool calls without losing the thread. This was the part that was broken before — after 15 or 20 steps you’d catch it doing something unrelated to the original goal.
- It’s willing to say “I don’t know.” Small thing, huge for actually trusting the output.
What it isn’t: it isn’t magic, it isn’t AGI, and it isn’t going to replace the part of your job where you decide what to build. It is going to replace a lot of the part where you implement what you already decided.
Note for later: two years ago Claude could barely write a correct 20-line function. Today it can spend an hour refactoring a codebase I wrote last year. I don’t have a conclusion about this. I just want the timestamp.