Use Gemini API via Runcrate

Access Google’s Gemini models through the Runcrate API. Same models, OpenAI-compatible format, no Google Cloud project required.

Available Gemini models

Model	Context	Strengths
Gemini 2.5 Pro	1M tokens	Strongest reasoning, long-context analysis
Gemini 2.5 Flash	1M tokens	Fast inference, cost-effective

Basic usage

from runcrate import Runcrate

client = Runcrate(api_key="rc_live_YOUR_API_KEY")

response = client.models.chat_completion(
    model="google/gemini-2.5-pro",
    messages=[
        {"role": "user", "content": "Explain how self-attention works in transformers. Include the math."},
    ],
    max_tokens=1024,
)

print(response.choices[0].message.content)

Long-context analysis (1M tokens)

Gemini’s 1M token context handles entire codebases or books in a single request:

from runcrate import Runcrate

client = Runcrate(api_key="rc_live_YOUR_API_KEY")

codebase = open("full-codebase.txt").read()

response = client.models.chat_completion(
    model="google/gemini-2.5-pro",
    messages=[
        {"role": "system", "content": "You are a senior engineer performing a code review."},
        {"role": "user", "content": f"Review this codebase for security and performance issues:\n\n{codebase}"},
    ],
    max_tokens=4096,
)

print(response.choices[0].message.content)

Vision — image analysis

from runcrate import Runcrate
import base64

client = Runcrate(api_key="rc_live_YOUR_API_KEY")

with open("diagram.png", "rb") as f:
    image_b64 = base64.b64encode(f.read()).decode()

response = client.models.chat_completion(
    model="google/gemini-2.5-flash",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this architecture diagram. List all services and connections."},
            {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{image_b64}"}},
        ],
    }],
)

print(response.choices[0].message.content)

Runcrate vs. direct Google API

	Direct Google API	Runcrate
Auth	Google Cloud project + service account	Single API key
Format	Google-specific SDK	OpenAI-compatible
Other models	Gemini only	140+ models, same key

Pro vs. Flash

Scenario	Model	Why
Complex reasoning	Gemini 2.5 Pro	Stronger reasoning
Bulk processing	Gemini 2.5 Flash	Faster, cheaper
Real-time chat	Gemini 2.5 Flash	Lower latency
Vision / image analysis	Either	Both support multimodal

Tips

1M context is real — you can feed entire repositories or book-length texts.
Gemini 2.5 Flash is the cost-effective choice for high-volume tasks.
Same API format: just change the model string from DeepSeek or Llama.

Next steps

Chat completions reference
AI Summarization — Gemini Flash for long-document summarization
Model catalog

​Available Gemini models

​Basic usage

​Long-context analysis (1M tokens)

​Vision — image analysis

​Runcrate vs. direct Google API

​Pro vs. Flash

​Tips

​Next steps