Skip to main content
POST
/
v1
/
embeddings
Embeddings
curl --request POST \
  --url https://api.stratus.run/v1/embeddings
Get semantic embeddings for state descriptions using Stratus encoders.

Endpoint

POST https://api.stratus.run/v1/embeddings

Use Cases

  • State similarity: Compare states for agent memory/history
  • Goal matching: Find similar historical goals
  • Clustering: Group similar states/actions
  • Search: Semantic search over agent experiences

Request Format

interface EmbeddingRequest {
  model: string;              // Required: Stratus model (without LLM)
  input: string | string[];   // Required: Text to embed
  encoding_format?: string;   // Optional: 'float' (default) or 'base64'
}

Example Request

TypeScript

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.stratus.run/v1',
  apiKey: process.env.STRATUS_API_KEY
});

const response = await client.embeddings.create({
  model: 'stratus-x1ac-base',
  input: 'Google search results page showing list of NYC hotels'
});

console.log(response.data[0].embedding);
// [0.023, -0.841, 0.127, ...] (768 dimensions)

Python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.stratus.run/v1",
    api_key=os.environ["STRATUS_API_KEY"]
)

response = client.embeddings.create(
    model="stratus-x1ac-base",
    input="Google search results page showing list of NYC hotels"
)

print(response.data[0].embedding)
# [0.023, -0.841, 0.127, ...] (768 dimensions)

cURL

curl https://api.stratus.run/v1/embeddings \
  -H "Authorization: Bearer stratus_sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "stratus-x1ac-base",
    "input": "Google search results page showing list of NYC hotels"
  }'

Response Format

interface EmbeddingResponse {
  object: 'list';
  data: Embedding[];
  model: string;
  usage: Usage;
}

interface Embedding {
  object: 'embedding';
  index: number;
  embedding: number[];  // 768-dim vector (base model)
}

interface Usage {
  prompt_tokens: number;
  total_tokens: number;
}

Example Response

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.023, -0.841, 0.127, ...]
    }
  ],
  "model": "stratus-x1ac-base",
  "usage": {
    "prompt_tokens": 18,
    "total_tokens": 18
  }
}

Model Names

For embeddings, use Stratus model without LLM suffix:
ModelDimensionsParameters
stratus-x1ac-small512125M
stratus-x1ac-base768350M
stratus-x1ac-large1024300M
stratus-x1ac-xl1280750M
stratus-x1ac-huge15361.5B
Recommendation: Use stratus-x1ac-small for most applications.

Batch Embeddings

Embed multiple texts in one request:
const response = await client.embeddings.create({
  model: 'stratus-x1ac-base',
  input: [
    'Google homepage with search box',
    'Google search results for hotels NYC',
    'Hotel booking form with date pickers',
    'Confirmation page showing booking details'
  ]
});

// Access each embedding
response.data.forEach((item, i) => {
  console.log(`Embedding ${i}:`, item.embedding);
});
Limits:
  • Max 100 texts per request
  • Max 2048 tokens per text

Encoding Formats

The encoding_format parameter controls how embedding vectors are returned in the response.

Float Format (Default)

Returns embeddings as an array of floating-point numbers. This is the default and most common format.
const response = await client.embeddings.create({
  model: 'stratus-x1ac-base',
  input: 'Example text',
  encoding_format: 'float' // or omit - this is default
});

console.log(response.data[0].embedding);
// [0.023, -0.841, 0.127, 0.456, -0.234, ...]
Response:
{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.023, -0.841, 0.127, ...]
    }
  ],
  "model": "stratus-x1ac-base",
  "usage": { "prompt_tokens": 2, "total_tokens": 2 }
}
Best for:
  • Direct computation (cosine similarity, dot product)
  • Mathematical operations
  • Storage in databases with vector types (PostgreSQL pgvector, Pinecone, etc.)
  • Most common use cases

Base64 Format

Returns embeddings as base64-encoded binary data. More efficient for network transfer.
const response = await client.embeddings.create({
  model: 'stratus-x1ac-base',
  input: 'Example text',
  encoding_format: 'base64'
});

console.log(response.data[0].embedding);
// "YXNkZmFzZGZhc2Rm..." (base64 string)

// Decode to use
const buffer = Buffer.from(response.data[0].embedding, 'base64');
const floatArray = new Float32Array(buffer.buffer);
console.log(Array.from(floatArray));
// [0.023, -0.841, 0.127, ...]
Response:
{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": "YXNkZmFzZGZhc2Rm..."
    }
  ],
  "model": "stratus-x1ac-base",
  "usage": { "prompt_tokens": 2, "total_tokens": 2 }
}
Best for:
  • Network efficiency (smaller payload size)
  • Binary storage systems
  • High-throughput batch processing
  • Bandwidth-constrained environments

Decoding Base64 Embeddings

TypeScript/Node.js

function decodeBase64Embedding(base64String: string): number[] {
  const buffer = Buffer.from(base64String, 'base64');
  const floatArray = new Float32Array(
    buffer.buffer,
    buffer.byteOffset,
    buffer.byteLength / Float32Array.BYTES_PER_ELEMENT
  );
  return Array.from(floatArray);
}

const response = await client.embeddings.create({
  model: 'stratus-x1ac-base',
  input: 'Example text',
  encoding_format: 'base64'
});

const embedding = decodeBase64Embedding(response.data[0].embedding);
console.log(embedding); // [0.023, -0.841, ...]

Python

import base64
import struct

def decode_base64_embedding(base64_string: str) -> list[float]:
    # Decode base64 to bytes
    bytes_data = base64.b64decode(base64_string)
    # Unpack as floats (4 bytes per float, little-endian)
    num_floats = len(bytes_data) // 4
    floats = struct.unpack(f'<{num_floats}f', bytes_data)
    return list(floats)

response = client.embeddings.create(
    model="stratus-x1ac-base",
    input="Example text",
    encoding_format="base64"
)

embedding = decode_base64_embedding(response.data[0].embedding)
print(embedding)  # [0.023, -0.841, ...]

Format Comparison

AspectFloatBase64
Payload SizeLarger (~4KB for 768-dim)Smaller (~3KB for 768-dim)
Ready to UseYes (direct array)No (needs decoding)
Human ReadableYesNo
Network EfficiencyLowerHigher (~25% smaller)
ComputationImmediateDecode first

When to Use Each Format

Use Float When:

  • Building prototypes or demos
  • Need immediate access to values
  • Debugging or inspecting embeddings
  • Working with small batches (<10 embeddings)
  • Simplicity is more important than efficiency

Use Base64 When:

  • Processing large batches (100+ embeddings)
  • Network bandwidth is limited
  • Transferring embeddings between systems
  • Storing embeddings in binary format
  • Optimizing for production throughput

Example: Efficient Batch Processing

// Process 100 embeddings efficiently with base64
const texts = [...]; // 100 texts

const response = await client.embeddings.create({
  model: 'stratus-x1ac-base',
  input: texts,
  encoding_format: 'base64' // ~25% smaller payload
});

// Decode and process in parallel
const embeddings = await Promise.all(
  response.data.map(async (item) => {
    const decoded = decodeBase64Embedding(item.embedding);
    await storeInDatabase(texts[item.index], decoded);
    return decoded;
  })
);

console.log(`Processed ${embeddings.length} embeddings`);

Use Case Examples

State Similarity

Compare agent states to find similar situations:
// Embed current state
const current = await client.embeddings.create({
  model: 'stratus-x1ac-base',
  input: 'Amazon product page, Add to Cart button visible'
});

// Embed historical states
const historical = await client.embeddings.create({
  model: 'stratus-x1ac-base',
  input: [
    'Amazon product page, price $49.99',
    'Google search results',
    'Amazon product page, different product'
  ]
});

// Calculate cosine similarity
function cosineSimilarity(a, b) {
  const dot = a.reduce((sum, val, i) => sum + val * b[i], 0);
  const magA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0));
  const magB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0));
  return dot / (magA * magB);
}

historical.data.forEach((hist, i) => {
  const sim = cosineSimilarity(
    current.data[0].embedding,
    hist.embedding
  );
  console.log(`Similarity to state ${i}:`, sim);
});
// Output: [0.95, 0.23, 0.87] - State 0 and 2 are similar

Goal Matching

Find past goals similar to current goal:
const currentGoal = 'Find hotels in NYC for December';
const pastGoals = [
  'Book flight to NYC',
  'Find hotels in San Francisco',
  'Reserve hotel room in NYC'
];

const embeddings = await client.embeddings.create({
  model: 'stratus-x1ac-base',
  input: [currentGoal, ...pastGoals]
});

// Compare current goal to past goals
const currentEmb = embeddings.data[0].embedding;
pastGoals.forEach((goal, i) => {
  const pastEmb = embeddings.data[i + 1].embedding;
  const sim = cosineSimilarity(currentEmb, pastEmb);
  console.log(`${goal}: ${sim.toFixed(2)}`);
});
// Output:
// Book flight to NYC: 0.78
// Find hotels in San Francisco: 0.85
// Reserve hotel room in NYC: 0.92 ← Most similar
Build a semantic search over agent memory:
import { createClient } from '@supabase/supabase-js';

const supabase = createClient(...);
const stratusClient = new OpenAI({
  baseURL: 'https://api.stratus.run/v1',
  apiKey: process.env.STRATUS_API_KEY
});

// Embed and store agent experience
async function storeExperience(state, action, outcome) {
  const embedding = await stratusClient.embeddings.create({
    model: 'stratus-x1ac-base',
    input: state
  });

  await supabase.from('experiences').insert({
    state,
    action,
    outcome,
    embedding: embedding.data[0].embedding
  });
}

// Search for similar experiences
async function findSimilarExperiences(currentState, limit = 5) {
  const embedding = await stratusClient.embeddings.create({
    model: 'stratus-x1ac-base',
    input: currentState
  });

  const { data } = await supabase.rpc('match_experiences', {
    query_embedding: embedding.data[0].embedding,
    match_threshold: 0.7,
    match_count: limit
  });

  return data;
}

// Usage
await storeExperience(
  'Amazon product page, $49.99',
  'click Add to Cart',
  'Added to cart successfully'
);

const similar = await findSimilarExperiences(
  'Amazon product page, $39.99'
);
console.log('Similar past experiences:', similar);

Comparison with Other Embeddings

ProviderDimensionsSpecializationBest For
Stratus768Agent states/actionsState similarity, planning
OpenAI text-embedding-31536General textSemantic search, RAG
Cohere embed-v31024MultilingualInternational content
When to use Stratus embeddings:
  • Comparing agent states
  • Finding similar goals
  • Agent memory/experience retrieval
  • State-action pattern matching
When to use general embeddings:
  • Document search
  • FAQ matching
  • Content recommendation

Performance

Latency

ModelLatency (p50)Latency (p99)
small5ms15ms
base10ms25ms
large20ms50ms
xl35ms80ms

Throughput

  • Single text: ~1000 req/s (base model)
  • Batch (10 texts): ~200 req/s

Pricing

Embeddings are priced per 1M tokens:
ModelPrice per 1M tokens
small$0.05
base$0.10
large$0.20
xl$0.40
Example:
  • Text: “Google homepage with search box” (6 tokens)
  • Cost: 6 / 1,000,000 * 0.10=0.10 = 0.0000006

Error Handling

try {
  const response = await client.embeddings.create({
    model: 'stratus-x1ac-base',
    input: text
  });
} catch (error) {
  if (error.status === 400) {
    console.error('Invalid input:', error.message);
  } else if (error.status === 401) {
    console.error('Invalid API key');
  } else if (error.status === 413) {
    console.error('Input too large (max 2048 tokens per text)');
  }
}

Next Steps