Documentation Index
Fetch the complete documentation index at: https://runcrate.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Access Google’s Gemini models through the Runcrate API. Same models, OpenAI-compatible format, no Google Cloud project required.
Available Gemini models
| Model | Context | Strengths |
|---|
| Gemini 2.5 Pro | 1M tokens | Strongest reasoning, long-context analysis |
| Gemini 2.5 Flash | 1M tokens | Fast inference, cost-effective |
Basic usage
from runcrate import Runcrate
client = Runcrate(api_key="rc_live_YOUR_API_KEY")
response = client.models.chat_completion(
model="google/gemini-2.5-pro",
messages=[
{"role": "user", "content": "Explain how self-attention works in transformers. Include the math."},
],
max_tokens=1024,
)
print(response.choices[0].message.content)
Long-context analysis (1M tokens)
Gemini’s 1M token context handles entire codebases or books in a single request:
from runcrate import Runcrate
client = Runcrate(api_key="rc_live_YOUR_API_KEY")
codebase = open("full-codebase.txt").read()
response = client.models.chat_completion(
model="google/gemini-2.5-pro",
messages=[
{"role": "system", "content": "You are a senior engineer performing a code review."},
{"role": "user", "content": f"Review this codebase for security and performance issues:\n\n{codebase}"},
],
max_tokens=4096,
)
print(response.choices[0].message.content)
Vision — image analysis
from runcrate import Runcrate
import base64
client = Runcrate(api_key="rc_live_YOUR_API_KEY")
with open("diagram.png", "rb") as f:
image_b64 = base64.b64encode(f.read()).decode()
response = client.models.chat_completion(
model="google/gemini-2.5-flash",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "Describe this architecture diagram. List all services and connections."},
{"type": "image_url", "image_url": {"url": f"data:image/png;base64,{image_b64}"}},
],
}],
)
print(response.choices[0].message.content)
Runcrate vs. direct Google API
| Direct Google API | Runcrate |
|---|
| Auth | Google Cloud project + service account | Single API key |
| Format | Google-specific SDK | OpenAI-compatible |
| Other models | Gemini only | 140+ models, same key |
Pro vs. Flash
| Scenario | Model | Why |
|---|
| Complex reasoning | Gemini 2.5 Pro | Stronger reasoning |
| Bulk processing | Gemini 2.5 Flash | Faster, cheaper |
| Real-time chat | Gemini 2.5 Flash | Lower latency |
| Vision / image analysis | Either | Both support multimodal |
Tips
- 1M context is real — you can feed entire repositories or book-length texts.
- Gemini 2.5 Flash is the cost-effective choice for high-volume tasks.
- Same API format: just change the model string from DeepSeek or Llama.
Next steps