Why we built our capital markets agent stack on Claude
Long context that survives a prospectus, tool use that holds in a regulated environment, and a safety posture buyer-side compliance will sign off on. Here’s the working.
The short version
We built an evaluation framework around the dimensions that actually matter for a production agent stack across trading, risk, and operations workflows. We chose Claude. Three reasons: long context that survives a real prospectus, tool use that holds up in a regulated environment, and a safety posture that buyer-side compliance functions will actually sign off on. Here’s the working.
The problem we were solving
Capital markets workflows are not chatbot workflows. A single agent run might need to:
- Ingest a 400-page bond prospectus
- Cross-reference it against an internal credit policy
- Pull live pricing from S&P Global
- Reconcile against a position in the firm’s risk system
- Produce a report the desk can defend to compliance
Most LLM demos break before step two. We needed a model that doesn’t.
Our evaluation framework
We built an evaluation framework around the dimensions that actually matter in a front office. Not benchmark scores — production constraints.
Why Claude won
Long context that doesn’t degrade
The published context numbers are marketing. What matters is whether the model can answer a question that requires reasoning over content on page 312 of a prospectus. In our tests against EU Green Bond framework documents, Claude maintained recall and reasoning quality through documents that caused noticeable degradation in competing models. For a front-to-back agent, “the answer is somewhere in the document but the model forgot” is a non-starter.
Tool use that holds the line
Our agents call internal tools: a pricing service, a risk API, a position store, a compliance check. In stress tests with 8+ tools available and ambiguous instructions, Claude’s tool selection was the most consistent. It also volunteered fewer hallucinated tool calls — critical when a wrong call costs real money or generates a real audit finding.
Safety calibration that compliance approves
This one surprised us. We expected to fight the safety layer. Instead, Claude’s calibration — say what you can, decline what you can’t, explain why — aligned almost exactly with how a buyer-side second-line risk function wants an analyst to behave. The same posture that makes Claude “cautious” in consumer settings makes it deployable in regulated ones.
The Anthropic stance on safety is a procurement advantage
When a second-line risk function reviews an AI vendor, they aren’t reading benchmarks. They’re reading model cards, responsible scaling policies, and incident disclosures. Anthropic’s published positions — Responsible Scaling Policy, Constitutional AI, transparent post-deployment monitoring — close procurement gates that would otherwise require months of legal and risk review.
What we built
Atheryon’s reference system is a front-to-back capital markets agent stack.
- Agent 01Trading SystemsMarket Data (S&P)
- Agent 02Risk ManagementReference Data
- Agent 03Portfolio AnalyticsEnterprise Data
- Agent 04Operations & ReportingUnstructured (Research / News)
Reference architecture. Components shipped / building / roadmap — see /roadmap.
Each agent is single-purpose. They share a Claude-driven orchestration layer that routes work, manages tool selection, and enforces compliance constraints declaratively rather than in code.
What we’d tell another shop evaluating today
- 01Don’t pick on benchmarks. Pick on procurement.
The model your compliance team will approve is worth more than the model that’s 2 points higher on MMLU.
- 02Test long context with your actual documents.
Synthetic needle-in-haystack tests will mislead you.
- 03Stress tool use, not raw generation.
Agents live or die on tool selection under ambiguity.
- 04Measure cost per task, not per token.
A cheaper model that needs three retries is not cheaper.
Where we go next
We’re building the case studies. If you’re at an Australian bank, asset manager, or capital markets infrastructure provider and want to see the reference system, book a system assessment.