Documentation Index
Fetch the complete documentation index at: https://runcrate.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Use the Runcrate Models API as the inference backend for your SaaS product. One API key, one bill, 140+ models — no GPU management, no model hosting, no vendor lock-in.
What you’ll build
A production AI backend that handles:
- Chat completions for customer-facing AI assistants
- Structured output for data extraction and classification
- Image generation for content creation features
- Per-request billing that maps to your own pricing
Why open-source models for SaaS
| OpenAI / Anthropic | Runcrate (open-source models) |
|---|
| Pricing | $3–15 per 1M output tokens | $0.20–2.00 per 1M output tokens |
| Vendor lock-in | Locked to one provider | Switch models freely |
| Data privacy | Data sent to third party | Open-source models, your choice |
| Rate limits | Strict per-org limits | 100 req/min default, higher on request |
| Model choice | 3–5 models | 140+ models across 8 categories |
Next.js API routes (Vercel AI SDK)
Chat endpoint for your product
// app/api/chat/route.ts
import { runcrate } from '@runcrate/ai';
import { streamText } from 'ai';
export async function POST(req: Request) {
const { messages, userId } = await req.json();
const result = streamText({
model: runcrate('deepseek-ai/DeepSeek-V3'),
system: `You are the AI assistant for Acme Corp. Help users with their questions
about our product. Be concise, helpful, and professional.`,
messages,
});
return result.toDataStreamResponse();
}
Turn unstructured user input into structured data your app can store:
// app/api/extract/route.ts
import { runcrate } from '@runcrate/ai';
import { generateText, Output } from 'ai';
import { z } from 'zod';
const ContactSchema = z.object({
name: z.string(),
email: z.string().email().optional(),
company: z.string().optional(),
intent: z.enum(['purchase', 'support', 'partnership', 'other']),
summary: z.string(),
});
export async function POST(req: Request) {
const { text } = await req.json();
const { output } = await generateText({
model: runcrate('deepseek-ai/DeepSeek-V3'),
output: Output.object({ schema: ContactSchema }),
prompt: `Extract contact information and intent from this message:\n\n${text}`,
});
return Response.json(output);
}
Image generation endpoint
Let users generate images from your app:
// app/api/generate-image/route.ts
import { runcrate } from '@runcrate/ai';
import { generateImage } from 'ai';
export async function POST(req: Request) {
const { prompt, style } = await req.json();
const { image } = await generateImage({
model: runcrate.imageModel('black-forest-labs/FLUX.1-schnell'),
prompt: `${prompt}, ${style || 'photorealistic'}`,
size: '1024x1024',
});
return Response.json({ image: image.base64 });
}
Python backend (FastAPI)
from fastapi import FastAPI
from pydantic import BaseModel
from runcrate import Runcrate
app = FastAPI()
client = Runcrate(api_key="rc_live_...")
class ChatRequest(BaseModel):
messages: list[dict]
model: str = "deepseek-ai/DeepSeek-V3"
@app.post("/api/chat")
async def chat(req: ChatRequest):
response = client.models.chat_completion(
model=req.model,
messages=req.messages,
max_tokens=1024,
)
return {"content": response.choices[0].message.content}
class ImageRequest(BaseModel):
prompt: str
aspect_ratio: str = "1:1"
@app.post("/api/generate-image")
async def generate_image(req: ImageRequest):
image = client.models.generate_image(
model="black-forest-labs/FLUX.1-schnell",
prompt=req.prompt,
aspect_ratio=req.aspect_ratio,
)
return {"url": image.data[0].url}
Content moderation middleware
Add a moderation layer before displaying AI-generated content:
import { runcrate } from '@runcrate/ai';
import { generateText, Output } from 'ai';
import { z } from 'zod';
const ModerationResult = z.object({
safe: z.boolean(),
categories: z.array(z.enum(['spam', 'harassment', 'nsfw', 'violence', 'pii', 'none'])),
action: z.enum(['allow', 'flag', 'block']),
});
export async function moderateContent(content: string) {
const { output } = await generateText({
model: runcrate('deepseek-ai/DeepSeek-V3'),
output: Output.object({ schema: ModerationResult }),
prompt: `Evaluate this user-generated content for safety. Content: "${content}"`,
});
return output;
}
Cost estimation
At DeepSeek-V3 rates, a typical SaaS workload:
| Use case | Tokens/request | Cost/1K requests |
|---|
| Short chat responses (200 tokens out) | ~300 total | ~$0.06 |
| Data extraction (100 tokens out) | ~200 total | ~$0.04 |
| Long-form content (1000 tokens out) | ~1200 total | ~$0.24 |
| Image generation | 1 image | ~$0.03/image |
A SaaS serving 100K chat requests/month costs roughly **6/month∗∗ininference—not600.