Skip to main content

Documentation Index

Fetch the complete documentation index at: https://runcrate.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Use the Runcrate Models API as the inference backend for your SaaS product. One API key, one bill, 140+ models — no GPU management, no model hosting, no vendor lock-in.

What you’ll build

A production AI backend that handles:
  • Chat completions for customer-facing AI assistants
  • Structured output for data extraction and classification
  • Image generation for content creation features
  • Per-request billing that maps to your own pricing

Why open-source models for SaaS

OpenAI / AnthropicRuncrate (open-source models)
Pricing$3–15 per 1M output tokens$0.20–2.00 per 1M output tokens
Vendor lock-inLocked to one providerSwitch models freely
Data privacyData sent to third partyOpen-source models, your choice
Rate limitsStrict per-org limits100 req/min default, higher on request
Model choice3–5 models140+ models across 8 categories

Next.js API routes (Vercel AI SDK)

Chat endpoint for your product

// app/api/chat/route.ts
import { runcrate } from '@runcrate/ai';
import { streamText } from 'ai';

export async function POST(req: Request) {
  const { messages, userId } = await req.json();

  const result = streamText({
    model: runcrate('deepseek-ai/DeepSeek-V3'),
    system: `You are the AI assistant for Acme Corp. Help users with their questions 
      about our product. Be concise, helpful, and professional.`,
    messages,
  });

  return result.toDataStreamResponse();
}

Structured data extraction endpoint

Turn unstructured user input into structured data your app can store:
// app/api/extract/route.ts
import { runcrate } from '@runcrate/ai';
import { generateText, Output } from 'ai';
import { z } from 'zod';

const ContactSchema = z.object({
  name: z.string(),
  email: z.string().email().optional(),
  company: z.string().optional(),
  intent: z.enum(['purchase', 'support', 'partnership', 'other']),
  summary: z.string(),
});

export async function POST(req: Request) {
  const { text } = await req.json();

  const { output } = await generateText({
    model: runcrate('deepseek-ai/DeepSeek-V3'),
    output: Output.object({ schema: ContactSchema }),
    prompt: `Extract contact information and intent from this message:\n\n${text}`,
  });

  return Response.json(output);
}

Image generation endpoint

Let users generate images from your app:
// app/api/generate-image/route.ts
import { runcrate } from '@runcrate/ai';
import { generateImage } from 'ai';

export async function POST(req: Request) {
  const { prompt, style } = await req.json();

  const { image } = await generateImage({
    model: runcrate.imageModel('black-forest-labs/FLUX.1-schnell'),
    prompt: `${prompt}, ${style || 'photorealistic'}`,
    size: '1024x1024',
  });

  return Response.json({ image: image.base64 });
}

Python backend (FastAPI)

from fastapi import FastAPI
from pydantic import BaseModel
from runcrate import Runcrate

app = FastAPI()
client = Runcrate(api_key="rc_live_...")

class ChatRequest(BaseModel):
    messages: list[dict]
    model: str = "deepseek-ai/DeepSeek-V3"

@app.post("/api/chat")
async def chat(req: ChatRequest):
    response = client.models.chat_completion(
        model=req.model,
        messages=req.messages,
        max_tokens=1024,
    )
    return {"content": response.choices[0].message.content}

class ImageRequest(BaseModel):
    prompt: str
    aspect_ratio: str = "1:1"

@app.post("/api/generate-image")
async def generate_image(req: ImageRequest):
    image = client.models.generate_image(
        model="black-forest-labs/FLUX.1-schnell",
        prompt=req.prompt,
        aspect_ratio=req.aspect_ratio,
    )
    return {"url": image.data[0].url}

Content moderation middleware

Add a moderation layer before displaying AI-generated content:
import { runcrate } from '@runcrate/ai';
import { generateText, Output } from 'ai';
import { z } from 'zod';

const ModerationResult = z.object({
  safe: z.boolean(),
  categories: z.array(z.enum(['spam', 'harassment', 'nsfw', 'violence', 'pii', 'none'])),
  action: z.enum(['allow', 'flag', 'block']),
});

export async function moderateContent(content: string) {
  const { output } = await generateText({
    model: runcrate('deepseek-ai/DeepSeek-V3'),
    output: Output.object({ schema: ModerationResult }),
    prompt: `Evaluate this user-generated content for safety. Content: "${content}"`,
  });
  return output;
}

Cost estimation

At DeepSeek-V3 rates, a typical SaaS workload:
Use caseTokens/requestCost/1K requests
Short chat responses (200 tokens out)~300 total~$0.06
Data extraction (100 tokens out)~200 total~$0.04
Long-form content (1000 tokens out)~1200 total~$0.24
Image generation1 image~$0.03/image
A SaaS serving 100K chat requests/month costs roughly **6/monthininferencenot6/month** in inference — not 600.