How I deploy AI agents to customers without hosting a server

May 20, 2026·7 min read

The normal way to ship an AI agent to a customer looks like this. You rent a small server. You put your code on it. You hold an Anthropic API key, billed to you, and you mark up the usage. You add logging, error handling, a queue, a way to redeploy. The customer pays you every month, and a real share of that money is rent and babysitting.

The moment you do this, you are infrastructure. You are a server that can go down at 2 a.m. You are a key that can leak. You are a vendor the customer cannot fire without losing the thing they paid for. You have built yourself a job, and the job is operations.

I run Craftt, an AI-native Attio practice. I sell AI agents that run inside other companies' CRMs. Six are shipped so far. Not one of them runs on a server I own. This post is how that works, and why I think it is the only sane way to deploy AI agents without a server in the middle.

What I actually hand over

The agent is a markdown file.

Not a metaphor. The deliverable is a folder of Markdown: a SKILL.md that describes what the agent does and the steps it follows, a few reference files, a README. It lives in a public GitHub repo. There is no application to compile, no container, no runtime I wrote.

That folder is a Claude Code skill. A skill is a set of instructions plus a short list of tools, written so an agent can pick it up and run it. Claude Code reads the skill, follows it, and uses the tools it names. The program is prose. The interpreter is Claude Code, which the customer already has.

This sounds too light to be a product. It is not. The hard part of an agent was never the hosting. It was knowing exactly what to read, what to ignore, what to write back, and in what order, so the output is right every time. That knowledge is what the markdown encodes. The markdown is the months of work. The server was always just rent.

Where the agent runs

The agent runs on the customer's own Claude Code subscription.

They install the skill once. They connect the tools it needs through MCP, which for an Attio agent means the Attio MCP server, and often Slack and Notion. Then they schedule it with one command: /schedule, with a cron expression. Monday at 8 a.m. Friday at 9. Every night.

The schedule does not depend on anyone's laptop being open. In April 2026 Anthropic shipped Claude Code Routines: scheduled Claude Code sessions that run on Anthropic's own cloud. The agent fires on time whether the customer is asleep, on a plane, or has their machine shut. There is a server. It is Anthropic's, the customer pays for it inside a subscription they were going to buy anyway, and I never touch it.

So the full runtime is the customer's subscription, Anthropic's cloud, the MCP connections, and a cron line. My contribution is the markdown. I am not in the runtime at all.

The event-triggered version

Scheduled agents cover most of what a CRM needs. Some jobs have to run the second something happens, not on the next cron tick.

Post-call follow-up is one. When a call recording finishes, the agent should read the transcript and create the follow-up tasks right then, while the rep still remembers the call. A weekly cron is too slow.

The old answer to "run on an event" was to host the Agent SDK on a server and point a webhook at it. Back to rent. The new answer: the customer's automation tool fires an API call to Claude Code Cloud, which spins up a session, runs the skill, and exits. The trigger is a webhook the customer already controls. The runtime is still Claude's cloud. Still no server of mine, still no key of mine. Event-driven agents stopped needing a host too.

Why customer ownership is the product

Here is the part that sounds like a giveaway and is actually the pitch.

The customer owns all of it. The skill is in a repo they can fork. The runtime is their subscription. The data never leaves their workspace, because the agent reads and writes their CRM directly and I am not a hop in the path. The schedule is theirs to change.

If I disappear tomorrow, the agent keeps running.

Most consultants would hide that. It sounds like it removes the lock-in that makes a client keep paying. It does remove that. What it adds is the reason a careful buyer says yes in the first place. Nobody wants to wire their entire pipeline through a small agency's rented server. "You own the runtime, I just maintain the skill" turns the agent from a dependency into an asset. I close deals on that sentence. The lock-in I gave up was never real leverage. It was risk I was asking the customer to carry.

What it changes about the business

When you are not hosting, the shape of the business changes.

I charge for the build. The build is the expensive part, because the build is the knowledge. The monthly retainer, if the customer wants one, buys real work: tuning the skill as their process shifts, adding signals, watching output quality. It is not rent on a server dressed up as a service. A customer can feel the difference, and they renew the one that is real.

I also do not have an operations job. No 2 a.m. pages, no patching, no certificate renewals, no per-customer server drifting out of date. Six agents are shipped. The number of servers I run is zero. That is the only reason one person can build and maintain six agents at all.

What you give up

This model is not free. Three things you give up, and they are real.

You give up central observability. The agent runs in the customer's account, so I cannot watch a fleet from one dashboard. The skill has to log to a place the customer can see, and a retainer has to include reading those logs. You design for it or you fly blind.

You give up instant fixes across every customer. A repo update has to reach each workspace. The agents fetch the latest skill file from GitHub on each run, which gets most of the way there, but a customer who pinned a version stays pinned until they pull.

You give up the model where the agent does something the customer cannot inspect. Everything is markdown in a repo they can read. For me that is the point. If you were counting on the customer not understanding what they bought, this model removes the place to hide.

For an agent that has to hold secrets the customer cannot see, or run heavy custom code, host it. For an agent that reads a CRM, decides something, and writes back, hosting is a cost with no matching benefit.

The rule

If you sell AI agents, the default architecture in most tutorials is a server, a key, and a token bill. That architecture made sense when there was no shared runtime to deploy into. There is one now.

The customer already pays for Claude Code. It already runs scheduled sessions on a cloud. It already speaks MCP to the tools the agent needs. The runtime exists, the customer owns it, and the only thing missing was the instructions. Ship the instructions. Skip the server.

See it on your own workspace

If you run Attio and want to see one of these agents working before you commit to anything, I do a free teardown. You add me to your workspace as an Attio expert, which costs no extra seat and no billing. I send back the three highest-leverage agents to ship first, and what each one would read and write. No call, no pitch.

Get the free teardown

Need help with your CRM?

Messy data, manual processes, or a CRM that doesn't fit? Let's talk.

Book a call