Architecture - STRATUS X1

Stratus sits between your LLM and the environment. Before a single action is executed, it plans ahead — modeling the world, evaluating candidate action sequences, and returning a verified plan with confidence scores.

This is not prompt engineering. Stratus operates in learned representation space — encoding state, simulating futures, and selecting actions before a single token is sent to your LLM.

How It Works

State arrives

Raw environment observations — DOM state, tool outputs, structured data — are passed to Stratus alongside a goal description. Nothing is assumed. Everything is encoded fresh.

Encoder compresses

The State Encoder converts both the current state and the goal into compact learned representations. These embeddings capture what actually matters for task completion — not surface text, but semantic meaning your agent can reason over.

World Model simulates

Before any action is taken, the World Model asks: “what happens if I do X?” It simulates the predicted next state for each candidate action — in representation space, in milliseconds, without touching the real environment.

Planner sequences

The Planning Layer runs a forward search through those simulations — selecting and sequencing actions until the predicted outcome converges on the goal. It returns a ranked action sequence, the predicted state at each step, and a confidence score.

LLM executes with context

The verified plan is injected into an enriched execution prompt and forwarded to your configured LLM — GPT-4o, Claude, DeepSeek, Llama, Gemini, Grok, Mistral, Qwen, or any of 2,050+ available model combinations. Your LLM executes with full context, not guesswork.

Response returned

The response comes back in the same OpenAI chat format the client sent. No SDK changes. No new integration surface. Just better decisions.

Core Components

State Encoder

Converts raw observations into learned representations that capture task-relevant semantics. The foundation everything else is built on — if this is wrong, nothing downstream can fix it.

World Model

Predicts what the environment will look like after a given action — entirely in representation space. This is what makes planning ahead possible without executing anything.

Planning Layer

Combines the world model with a goal-conditioned policy to select and sequence actions. Jointly optimized so prediction and action selection reinforce each other.

Most agent frameworks execute token-by-token — the LLM decides each action in isolation with no model of what comes next. Stratus inverts this. It plans a full sequence, verifies the predicted outcome, and only then hands the verified plan to your LLM for execution.

Inference Modes

Four modes. One for every decision pattern.

Predict State

Simulate what the environment looks like after an action — without executing it. Ask the world model “if I click here, what does the page look like?” and get an answer before committing to anything.

Predict Action

Direct decision making. Given the current state and goal, the planning layer returns the single best next action. No full plan needed — just the right move, right now.

Plan (Multi-Step)

Generate a complete verified action sequence from current state to goal. Returns actions, predicted intermediate states, and a confidence score. Resolves in under 15ms on the small model.

Predict & Verify

The planner proposes. The world model confirms. Only actions where the predicted outcome clears your confidence threshold are executed — with automatic fallback to multi-step planning when they don’t.

When to use Predict State

Use this when you want to simulate before committing. Verify safety, expected outcome, or downstream state before an action is applied to the real environment. Ideal for high-cost or irreversible operations.

When to use Predict Action

Use this for direct decision making on known patterns. The planning layer returns its best single action given state and goal — low overhead, fast response. Best when a full plan would be overkill.

When to use Plan (Multi-Step)

Use this when you need a full sequence. The planner runs a forward search through the world model, iteratively selecting actions and simulating outcomes until predicted state converges on the goal or the step budget is exhausted. Returns the ranked sequence, intermediate predicted states, and a confidence score.

When to use Predict & Verify

Use this for high-stakes tasks where you need verification before committing. The planner selects a candidate action, the world model simulates the outcome, and the similarity to the goal is scored. If confidence clears the threshold, the action executes. If not, Stratus falls back to a short multi-step plan automatically.

Model Sizes

Start with base — it’s the production-tested default. Scale up only when your task genuinely requires higher accuracy or handles significantly more complex state.

small

Fast prototyping and low-latency tasks. Under 15ms plan resolution. Pairs well with GPT-4o Mini for high-frequency agent loops.

base

Recommended. The production default — balanced accuracy, latency, and cost. Well-tested across real-world agent workloads.

large

Higher accuracy for complex multi-step tasks. Reach for this when base isn’t enough — not as a default.

xl / huge

Long-horizon planning and demanding environments. xl for production scale. huge for research and evaluation.

API Integration

Stratus wraps planning in an OpenAI-compatible format. Drop it in as a model name — no SDK changes required.

Request arrives in standard format

An incoming request arrives in the standard OpenAI chat format. No new protocol. No new SDK.

Model name is parsed

Stratus parses the model name — pattern: stratus-x1ac-{size}-{llm} — to extract the planning model size and the target downstream LLM.

Planning pipeline runs

State is encoded, the world model simulates, the planner sequences, and a verified plan is produced — all before a single token is forwarded.

Execution prompt is enriched

The verified plan is injected into a structured execution prompt, giving your LLM full planning context rather than asking it to reason from scratch.

Response returned as-is

The LLM response comes back in the same OpenAI format the client expects. Fully transparent to existing tooling.

Pairing the small planning model with GPT-4o Mini produces an extremely low-latency agent loop suitable for high-frequency tasks. Pairing base with Claude produces a well-balanced production configuration. Mix and match.

Why This Architecture Works

Verification before execution

Every action is checked against a predicted outcome before it runs. Not heuristics — a learned model of what actually happens next.

Confidence you can act on

Every plan comes with a confidence score. Route low-confidence plans to fallback logic, escalation, or human review. Build reliability into the loop.

Representation space reasoning

Planning happens in compressed semantic space — not over raw tokens. Faster, more coherent, and less sensitive to surface-level noise in observations.

Fewer retries, more coherence

Multi-step tasks stay on track because each action is planned against the predicted future — not just the present state. Coherence compounds over a task horizon.

In internal benchmarks, Stratus-augmented agents scored 10/10 on tasks that scored 4/10 with the same LLM operating unassisted. The LLM didn’t change. The planning did.

Next Steps

Use Cases

See Stratus in action across web navigation, multi-hop reasoning, and task automation.

Quickstart

Build your first Stratus-powered agent in under 10 minutes.

API Reference

Complete endpoint docs, parameters, and response formats.

​How It Works

​Core Components

State Encoder

World Model

Planning Layer

​Inference Modes

Predict State

Predict Action

Plan (Multi-Step)

Predict & Verify

​Model Sizes

small

base

large

xl / huge

​API Integration

​Why This Architecture Works

Verification before execution

Confidence you can act on

Representation space reasoning

Fewer retries, more coherence

​Next Steps

Use Cases

Quickstart

API Reference

How It Works

Core Components

Inference Modes

Model Sizes

API Integration

Why This Architecture Works

Next Steps