10/10 Levels
Stratus-powered agents completed every level in our benchmark. Baseline: 4/10.
2.3× Score
8717 vs 3750 total points. The gap widens on tasks with cascading state transitions.
68% Fewer Tokens
World model compression keeps context tight — your LLM sees a focused plan, not raw page noise.
What We’re Building
A hotel booking agent that navigates the full flow:- Search for “NYC hotels December 15–18”
- Filter by rating and price
- Select a hotel and check availability
- Fill in guest details and proceed to checkout
Setup
The Agent
State Description Quality
The quality of yoursystem message is the single biggest lever for agent performance.
- Low Quality
- High Quality
Handling Low Confidence
When confidence drops below 0.75, the agent hit a state it can’t predict well. Don’t retry blindly — inspect the state and add more detail.Choosing a Model
| Model | Best For | Latency |
|---|---|---|
stratus-x1ac-small-gpt-4o-mini | Prototyping, simple linear flows | Fastest |
stratus-x1ac-small-gpt-4o | Most navigation tasks | Fast |
stratus-x1ac-base-gpt-4o | Complex multi-step, forms with dependencies | Moderate |
stratus-x1ac-base-claude-sonnet-4-5 | Long-context flows, detailed reasoning | Moderate |
Next Steps
Cascade Prediction
When actions trigger downstream effects — handle chains before they fire.
Temporal Sequencing
Order-sensitive workflows with concurrency constraints.
API Reference
Full chat completions docs and all parameters.

