How a crypto exchange put AI agents into production

One crypto exchange team spent six months folding AI agents into its development workflow. The result was faster feature delivery and less rework, but only after the team tightened specs, review, and scope control.
On a crypto exchange, a code change that works under normal conditions can still break under adversarial ones. That shaped the team’s approach from the start. They kept the same review bar and used agents to shift more engineering time toward work that required judgment.
Branch setup, boilerplate, endpoint wiring, and baseline tests were taking time away from architecture work and threat modeling. Over six months, the team built a workflow in which AI agents drafted the first PR and engineers stayed responsible for review, security, and architectural decisions.
Over that period, features shipped per engineer per week rose from 1.2 to 3.6, while the rework rate on agent-drafted PRs fell from 32% to 9%. The change failure rate did not materially change
The workflow
The production process follows a fixed sequence: product idea → triage → engineer → agent drafts PR → engineer review → team review → agent fix loop → merge.
Most of the improvement came from getting triage right. Before an agent starts, the engineer prepares a delegation bundle: acceptance criteria, constraints, links to prior work, and a risk tag. The team found that vague instructions broke the process quickly, so specificity became mandatory.
The agent’s job is narrow: produce a first draft PR that follows existing patterns and uses what is already in the repository. If required interfaces or conventions are missing, the agent is expected to stop and ask rather than improvise. Whatever it produces goes through the same review and CI/CD pipeline applied to human-written code.
One internal example
For one 2FA backup codes feature, the process looked like this:
- 10:15 – Engineer delegates with acceptance criteria and links to similar work.
- 10:35 – Agent opens a PR: UI and API changes, feature flag wiring, baseline tests.
- 11:00 – Engineer reviews for intent, security properties, and edge cases.
- 11:30 – Two-engineer team review covers service boundaries and failure modes.
- 12:00 – Merge.
Total active human time was around 45 minutes. Most of that time went to review, security checks, and edge cases.
Where the process failed early
In the first month, 32% of agent-drafted PRs needed meaningful rework. When the team grouped those failures by root cause, the same patterns kept appearing.
Hallucinated integrations accounted for roughly 18% of failures. In these cases, the agent assumed SDK methods existed or invented API contracts. The team responded by requiring citations to actual internal interfaces. If the agent could not point to a real source, it had to stop and ask.
About 25% of failures came from vague specs that produced the wrong UX. A prompt like “make this mobile friendly” could return something functional but still incomplete. The team responded with explicit acceptance criteria and concrete examples of what counted as acceptable.
Scope creep was another recurring problem, showing up in 22% of failures. Open-ended requests sometimes triggered large refactors with unclear value. To limit that, the team set hard caps on file scope and change size, and added a plan-first step that required engineer approval before any code was written.
In 12% of cases, the code worked but followed the wrong internal patterns. That created maintainability problems and introduced security risk further down the line.
By month six, the rework rate was down to 9%, largely because delegation standards, scope limits, and review had become more consistent.
The safeguards
Authentication, permissions, withdrawals, and key management default to manual work unless a senior engineer explicitly decides otherwise. Every agent-drafted change goes through individual review by the delegating engineer, then team review by at least two engineers. CI/CD runs the same checks on agent output as it runs on human-written code: tests, static analysis, dependency hygiene, security scanning.
The team also built an agent fix loop into the review process. Reviewers leave standard PR comments, then invoke the agent to address specific items with an explicit instruction not to touch anything else. That keeps reviewers focused on correctness and risk, while the agent handles the smaller mechanical changes.
Before and after
| Metric | Before | After |
| Features per engineer per week | 1.2 | 3.6 |
| Rework rate on agent PRs | 32% | 9% |
| Change failure rate | 0.8% | 0.9% |
What made the workflow stable
The workflow held up because the team kept delegation standards, scope limits, traceability, and review consistent as usage expanded. Agents handled more of the repetitive implementation work, but engineers still owned architecture decisions, threat modeling, and failure analysis.
The content on The Coinomist is for informational purposes only and should not be interpreted as financial advice. While we strive to provide accurate and up-to-date information, we do not guarantee the accuracy, completeness, or reliability of any content. Neither we accept liability for any errors or omissions in the information provided or for any financial losses incurred as a result of relying on this information. Actions based on this content are at your own risk. Always do your own research and consult a professional. See our Terms, Privacy Policy, and Disclaimers for more details.








