Embeddings - STRATUS X1

Get semantic embeddings for state descriptions using Stratus encoders.

Endpoint

POST https://api.stratus.run/v1/embeddings

Use Cases

State similarity: Compare states for agent memory/history
Goal matching: Find similar historical goals
Clustering: Group similar states/actions
Search: Semantic search over agent experiences

Request Format

interface EmbeddingRequest {
  model: string;              // Required: Stratus model (without LLM)
  input: string | string[];   // Required: Text to embed
  encoding_format?: string;   // Optional: 'float' (default) or 'base64'
}

Example Request

TypeScript

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.stratus.run/v1',
  apiKey: process.env.STRATUS_API_KEY
});

const response = await client.embeddings.create({
  model: 'stratus-x1ac-base',
  input: 'Google search results page showing list of NYC hotels'
});

console.log(response.data[0].embedding);
// [0.023, -0.841, 0.127, ...] (768 dimensions)

Python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.stratus.run/v1",
    api_key=os.environ["STRATUS_API_KEY"]
)

response = client.embeddings.create(
    model="stratus-x1ac-base",
    input="Google search results page showing list of NYC hotels"
)

print(response.data[0].embedding)
# [0.023, -0.841, 0.127, ...] (768 dimensions)

cURL

curl https://api.stratus.run/v1/embeddings \
  -H "Authorization: Bearer stratus_sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "stratus-x1ac-base",
    "input": "Google search results page showing list of NYC hotels"
  }'

Response Format

interface EmbeddingResponse {
  object: 'list';
  data: Embedding[];
  model: string;
  usage: Usage;
}

interface Embedding {
  object: 'embedding';
  index: number;
  embedding: number[];  // 768-dim vector (base model)
}

interface Usage {
  prompt_tokens: number;
  total_tokens: number;
}

Example Response

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.023, -0.841, 0.127, ...]
    }
  ],
  "model": "stratus-x1ac-base",
  "usage": {
    "prompt_tokens": 18,
    "total_tokens": 18
  }
}

Model Names

For embeddings, use Stratus model without LLM suffix:

Model	Dimensions	Parameters
`stratus-x1ac-small`	512	125M
`stratus-x1ac-base`	768	350M
`stratus-x1ac-large`	1024	300M
`stratus-x1ac-xl`	1280	750M
`stratus-x1ac-huge`	1536	1.5B

Recommendation: Use stratus-x1ac-small for most applications.

Batch Embeddings

Embed multiple texts in one request:

const response = await client.embeddings.create({
  model: 'stratus-x1ac-base',
  input: [
    'Google homepage with search box',
    'Google search results for hotels NYC',
    'Hotel booking form with date pickers',
    'Confirmation page showing booking details'
  ]
});

// Access each embedding
response.data.forEach((item, i) => {
  console.log(`Embedding ${i}:`, item.embedding);
});

Limits:

Max 100 texts per request
Max 2048 tokens per text

Encoding Formats

The encoding_format parameter controls how embedding vectors are returned in the response.

Float Format (Default)

Returns embeddings as an array of floating-point numbers. This is the default and most common format.

const response = await client.embeddings.create({
  model: 'stratus-x1ac-base',
  input: 'Example text',
  encoding_format: 'float' // or omit - this is default
});

console.log(response.data[0].embedding);
// [0.023, -0.841, 0.127, 0.456, -0.234, ...]

Response:

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.023, -0.841, 0.127, ...]
    }
  ],
  "model": "stratus-x1ac-base",
  "usage": { "prompt_tokens": 2, "total_tokens": 2 }
}

Best for:

Direct computation (cosine similarity, dot product)
Mathematical operations
Storage in databases with vector types (PostgreSQL pgvector, Pinecone, etc.)
Most common use cases

Base64 Format

Returns embeddings as base64-encoded binary data. More efficient for network transfer.

const response = await client.embeddings.create({
  model: 'stratus-x1ac-base',
  input: 'Example text',
  encoding_format: 'base64'
});

console.log(response.data[0].embedding);
// "YXNkZmFzZGZhc2Rm..." (base64 string)

// Decode to use
const buffer = Buffer.from(response.data[0].embedding, 'base64');
const floatArray = new Float32Array(buffer.buffer);
console.log(Array.from(floatArray));
// [0.023, -0.841, 0.127, ...]

Response:

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": "YXNkZmFzZGZhc2Rm..."
    }
  ],
  "model": "stratus-x1ac-base",
  "usage": { "prompt_tokens": 2, "total_tokens": 2 }
}

Best for:

Network efficiency (smaller payload size)
Binary storage systems
High-throughput batch processing
Bandwidth-constrained environments

Decoding Base64 Embeddings

TypeScript/Node.js

function decodeBase64Embedding(base64String: string): number[] {
  const buffer = Buffer.from(base64String, 'base64');
  const floatArray = new Float32Array(
    buffer.buffer,
    buffer.byteOffset,
    buffer.byteLength / Float32Array.BYTES_PER_ELEMENT
  );
  return Array.from(floatArray);
}

const response = await client.embeddings.create({
  model: 'stratus-x1ac-base',
  input: 'Example text',
  encoding_format: 'base64'
});

const embedding = decodeBase64Embedding(response.data[0].embedding);
console.log(embedding); // [0.023, -0.841, ...]

Python

import base64
import struct

def decode_base64_embedding(base64_string: str) -> list[float]:
    # Decode base64 to bytes
    bytes_data = base64.b64decode(base64_string)
    # Unpack as floats (4 bytes per float, little-endian)
    num_floats = len(bytes_data) // 4
    floats = struct.unpack(f'<{num_floats}f', bytes_data)
    return list(floats)

response = client.embeddings.create(
    model="stratus-x1ac-base",
    input="Example text",
    encoding_format="base64"
)

embedding = decode_base64_embedding(response.data[0].embedding)
print(embedding)  # [0.023, -0.841, ...]

Format Comparison

Aspect	Float	Base64
Payload Size	Larger (~4KB for 768-dim)	Smaller (~3KB for 768-dim)
Ready to Use	Yes (direct array)	No (needs decoding)
Human Readable	Yes	No
Network Efficiency	Lower	Higher (~25% smaller)
Computation	Immediate	Decode first

When to Use Each Format

Use Float When:

Building prototypes or demos
Need immediate access to values
Debugging or inspecting embeddings
Working with small batches (<10 embeddings)
Simplicity is more important than efficiency

Use Base64 When:

Processing large batches (100+ embeddings)
Network bandwidth is limited
Transferring embeddings between systems
Storing embeddings in binary format
Optimizing for production throughput

Example: Efficient Batch Processing

// Process 100 embeddings efficiently with base64
const texts = [...]; // 100 texts

const response = await client.embeddings.create({
  model: 'stratus-x1ac-base',
  input: texts,
  encoding_format: 'base64' // ~25% smaller payload
});

// Decode and process in parallel
const embeddings = await Promise.all(
  response.data.map(async (item) => {
    const decoded = decodeBase64Embedding(item.embedding);
    await storeInDatabase(texts[item.index], decoded);
    return decoded;
  })
);

console.log(`Processed ${embeddings.length} embeddings`);

Use Case Examples

State Similarity

Compare agent states to find similar situations:

// Embed current state
const current = await client.embeddings.create({
  model: 'stratus-x1ac-base',
  input: 'Amazon product page, Add to Cart button visible'
});

// Embed historical states
const historical = await client.embeddings.create({
  model: 'stratus-x1ac-base',
  input: [
    'Amazon product page, price $49.99',
    'Google search results',
    'Amazon product page, different product'
  ]
});

// Calculate cosine similarity
function cosineSimilarity(a, b) {
  const dot = a.reduce((sum, val, i) => sum + val * b[i], 0);
  const magA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0));
  const magB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0));
  return dot / (magA * magB);
}

historical.data.forEach((hist, i) => {
  const sim = cosineSimilarity(
    current.data[0].embedding,
    hist.embedding
  );
  console.log(`Similarity to state ${i}:`, sim);
});
// Output: [0.95, 0.23, 0.87] - State 0 and 2 are similar

Goal Matching

Find past goals similar to current goal:

const currentGoal = 'Find hotels in NYC for December';
const pastGoals = [
  'Book flight to NYC',
  'Find hotels in San Francisco',
  'Reserve hotel room in NYC'
];

const embeddings = await client.embeddings.create({
  model: 'stratus-x1ac-base',
  input: [currentGoal, ...pastGoals]
});

// Compare current goal to past goals
const currentEmb = embeddings.data[0].embedding;
pastGoals.forEach((goal, i) => {
  const pastEmb = embeddings.data[i + 1].embedding;
  const sim = cosineSimilarity(currentEmb, pastEmb);
  console.log(`${goal}: ${sim.toFixed(2)}`);
});
// Output:
// Book flight to NYC: 0.78
// Find hotels in San Francisco: 0.85
// Reserve hotel room in NYC: 0.92 ← Most similar

Semantic Search

Build a semantic search over agent memory:

import { createClient } from '@supabase/supabase-js';

const supabase = createClient(...);
const stratusClient = new OpenAI({
  baseURL: 'https://api.stratus.run/v1',
  apiKey: process.env.STRATUS_API_KEY
});

// Embed and store agent experience
async function storeExperience(state, action, outcome) {
  const embedding = await stratusClient.embeddings.create({
    model: 'stratus-x1ac-base',
    input: state
  });

  await supabase.from('experiences').insert({
    state,
    action,
    outcome,
    embedding: embedding.data[0].embedding
  });
}

// Search for similar experiences
async function findSimilarExperiences(currentState, limit = 5) {
  const embedding = await stratusClient.embeddings.create({
    model: 'stratus-x1ac-base',
    input: currentState
  });

  const { data } = await supabase.rpc('match_experiences', {
    query_embedding: embedding.data[0].embedding,
    match_threshold: 0.7,
    match_count: limit
  });

  return data;
}

// Usage
await storeExperience(
  'Amazon product page, $49.99',
  'click Add to Cart',
  'Added to cart successfully'
);

const similar = await findSimilarExperiences(
  'Amazon product page, $39.99'
);
console.log('Similar past experiences:', similar);

Comparison with Other Embeddings

Provider	Dimensions	Specialization	Best For
Stratus	768	Agent states/actions	State similarity, planning
OpenAI text-embedding-3	1536	General text	Semantic search, RAG
Cohere embed-v3	1024	Multilingual	International content

When to use Stratus embeddings:

Comparing agent states
Finding similar goals
Agent memory/experience retrieval
State-action pattern matching

When to use general embeddings:

Document search
FAQ matching
Content recommendation

Performance

Latency

Model	Latency (p50)	Latency (p99)
`small`	5ms	15ms
`base`	10ms	25ms
`large`	20ms	50ms
`xl`	35ms	80ms

Throughput

Single text: ~1000 req/s (base model)
Batch (10 texts): ~200 req/s

Pricing

Embeddings are priced per 1M tokens:

Model	Price per 1M tokens
`small`	$0.05
`base`	$0.10
`large`	$0.20
`xl`	$0.40

Example:

Text: “Google homepage with search box” (6 tokens)
Cost: 6 / 1,000,000 * $0.10 =$ 0.0000006

Error Handling

try {
  const response = await client.embeddings.create({
    model: 'stratus-x1ac-base',
    input: text
  });
} catch (error) {
  if (error.status === 400) {
    console.error('Invalid input:', error.message);
  } else if (error.status === 401) {
    console.error('Invalid API key');
  } else if (error.status === 413) {
    console.error('Input too large (max 2048 tokens per text)');
  }
}

Next Steps

Chat Completions - Main agent API
Quickstart - Full integration guide
Error Reference - Complete error documentation

​Endpoint

​Use Cases

​Request Format

​Example Request

​TypeScript

​Python

​cURL

​Response Format

​Example Response

​Model Names

​Batch Embeddings

​Encoding Formats

​Float Format (Default)

​Base64 Format

​Decoding Base64 Embeddings

​TypeScript/Node.js

​Python

​Format Comparison

​When to Use Each Format

​Use Float When:

​Use Base64 When:

​Example: Efficient Batch Processing

​Use Case Examples

​State Similarity

​Goal Matching

​Semantic Search

​Comparison with Other Embeddings

​Performance

​Latency

​Throughput

​Pricing

​Error Handling

​Next Steps

Endpoint

Use Cases

Request Format

Example Request

TypeScript

Python

cURL

Response Format

Example Response

Model Names

Batch Embeddings

Encoding Formats

Float Format (Default)

Base64 Format

Decoding Base64 Embeddings

TypeScript/Node.js

Python

Format Comparison

When to Use Each Format

Use Float When:

Use Base64 When:

Example: Efficient Batch Processing

Use Case Examples

State Similarity

Goal Matching

Semantic Search

Comparison with Other Embeddings

Performance

Latency

Throughput

Pricing

Error Handling

Next Steps