My agentic coding workflow

A year ago my coding tools shifted every couple of months and I kept having to relearn my workflow. It’s finally settled into something stable enough to describe.

The short version: I stopped thinking about AI tools as “autocomplete plus a chat window” and started thinking about them as “a junior collaborator I delegate to, plus a pair of glasses I wear while I edit.”

The three layers

Layer 1: inline completion. Cursor-style tab completion, running continuously while I type. I barely think about it. It’s most useful for the boring half of any task — renaming things, propagating a pattern across a file, filling in the second and third case of a switch statement after I wrote the first. I evaluate it silently on the way past and 80% of the time I accept. When I stop noticing it, that’s when I know it’s working.

Layer 2: chat for thinking. When I’m stuck, unfamiliar with an API, or about to make an architectural call, I open a chat in the IDE sidebar and talk through it. The model doesn’t write the code here. It helps me figure out what the code should do. I’ve found this is the highest-leverage use of an LLM for experienced developers — not “write this for me,” but “let me stress-test my plan against something that’s read more code than I have.”

Layer 3: agent mode for delegation. For tasks that are clearly defined and mechanically tedious, I hand them to Claude Code or Cursor’s composer and let it run. “Add a new field X through the data model, the API, and the UI, and update the tests.” “Refactor this module to use the new logging interface.” “Write a script that migrates these 30 YAML files to the new schema.” These tasks used to take me an hour and now take me ten minutes of reviewing a diff.

What I learned the hard way

Agents are bad at taste, good at mechanics. If the task is “rename this concept across a codebase,” delegate. If the task is “design the right abstraction for this new feature,” don’t — you’ll get a generic answer that matches no specific project’s conventions.
Good prompts are good specs. The time I used to spend typing code is now spent writing the specification I’d have given a junior engineer. When I do this well, the agent succeeds on the first try. When I do it badly, I iterate three times and it would have been faster to write it myself.
Review every diff. Every single one. The failure mode of agentic coding isn’t bugs; it’s correctness drift — the code does what you asked, but also quietly changes something you didn’t ask about. I’ve been burned by this more than once. Reviewing the diff is non-negotiable, even when the agent says “all tests pass.”
Context is the scarce resource. The model is only as good as the files and information it has access to. Most of the effort I put into setting up a project for agent use is making sure the right files get surfaced — a CLAUDE.md with the project conventions, a clear directory structure, good existing tests that show the model what “correct” looks like.

What I still do by hand

Anything involving security, authentication, or money.
The first implementation of a new subsystem, where the shape of the code is the decision I’m making.
Debugging a weird production issue. I let the model suggest hypotheses, but I run the diagnostic commands myself.
Anything where I want to understand the code deeply afterwards, not just get it working.

The meta-point: the question isn’t “AI or human?” anymore, it’s “which layer of AI for which piece of the task?” and the answer changes based on how well-defined the task is, how much taste it requires, and how much I need to understand the result. I spend most of my day now moving between the three layers without thinking about it. That’s the workflow.