Vertex AI Gemini Live - Realtime API
Use Vertex AI's Gemini Live API (BidiGenerateContent) through LiteLLM's unified /realtime endpoint, which speaks the OpenAI Realtime protocol.
| Feature | Supported |
|---|---|
Proxy (/realtime) | ✅ |
| Voice in / Voice out | ✅ |
| Text in / Text out | ✅ |
| Server VAD | ✅ |
| Output transcription | ✅ |
Setup​
1. Auth​
LiteLLM uses your Google Cloud credentials (OAuth2 Bearer token), not an API key.
gcloud auth application-default login
Or set a service-account key file:
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/sa-key.json
2. Proxy config​
model_list:
- model_name: vertex-gemini-live
litellm_params:
model: vertex_ai/gemini-2.0-flash-live-001
vertex_project: your-gcp-project-id
vertex_location: us-east4 # or any supported region, or "global"
general_settings:
master_key: sk-your-key
3. Start the proxy​
litellm --config config.yaml --port 4000
Usage​
Python (websockets)​
import asyncio
import json
import websockets
PROXY_URL = "ws://localhost:4000/realtime?model=vertex-gemini-live"
API_KEY = "sk-your-key"
async def main():
async with websockets.connect(
PROXY_URL,
additional_headers={"api-key": API_KEY},
) as ws:
# Wait for session.created
event = json.loads(await ws.recv())
print(f"session.created: {event['session']['id']}")
# Send a text message
await ws.send(json.dumps({
"type": "conversation.item.create",
"item": {
"type": "message",
"role": "user",
"content": [{"type": "input_text", "text": "Say hello in one sentence."}],
},
}))
# Collect the response
async for raw in ws:
ev = json.loads(raw)
t = ev.get("type", "")
if t == "response.text.delta":
print(ev.get("delta", ""), end="", flush=True)
elif t == "response.done":
print("\n[done]")
break
asyncio.run(main())