Skip to main content
POST
/
v1
/
chat
/
completions
Gemini OpenAI format (Chat)
curl --request POST \
  --url https://api.gravitex.ai/v1/chat/completions
For Google Gemini native protocol, see Gemini Native. For general multi-model Chat Completions, see OpenAI Chat Completions.
Endpoint: POST https://api.gravitex.ai/v1/chat/completions

1. Model categories

CategoryExample modelsRoutingNotes
Chat / multimodalgemini-3.5-flashgemini-3.1-pro-previewgemini-3-flash-previewgemini-3.1-flash-lite-preview:generateContent:streamGenerateContentStreaming follows client stream flag

2. Endpoint and authentication

POST https://api.gravitex.ai/v1/chat/completions
Authorization: Bearer sk-<your-token>
Content-Type: application/json

3. OpenAI field → Gemini mapping

OpenAI fieldGemini fieldDescription
modelURL path 中的 models/<model>Model name passed through to Gemini
messagescontents[] + systemInstructionsystem/developer 角色→ systemInstruction; assistantmodel; tool/functionfunctionResponse
streamURL :streamGenerateContent vs :generateContent
temperaturegenerationConfig.temperature
top_pgenerationConfig.topP
max_tokens / max_completion_tokensgenerationConfig.maxOutputTokens
seedgenerationConfig.seed
stopgenerationConfig.stopSequencesMax 5; extra entries truncated
response_format.type = "json_schema"/"json_object"generationConfig.responseMimeType = "application/json" + responseSchemaadditionalProperties 等 Gemini 不识别的Field会被自动剔除
tools 中的 functiontools[].functionDeclarationsSee §4
tools 中三个特殊名 (googleSearch / codeExecution / urlContext)tools[].googleSearch / tools[].codeExecution / tools[].urlContextSee §4
tool_choicetoolConfig.functionCallingConfig"auto"→AUTO"none"→NONE"required"→ANY;对象形式 {type:"function",function:{name:"X"}}ANY + allowedFunctionNames=["X"]
frequency_penaltygenerationConfig.frequencyPenalty
presence_penaltygenerationConfig.presencePenalty
top_kgenerationConfig.topK
ngenerationConfig.candidateCountn > 1 时生效,控制候选回答数量
logprobsgenerationConfig.responseLogprobs是否返回 logprobs
top_logprobsgenerationConfig.logprobstop logprobs 数量
modalitiesgenerationConfig.responseModalitiesJSON 数组(如 ["text","audio"]
audiogenerationConfig.speechConfigTTS 语音配置,直接透传给 Gemini speechConfig
messages.content 支持多模态数组(OpenAI v2):
  • type:"text"parts[].text
  • type:"image_url" / type:"input_audio" / type:"file" → Downloaded/decoded into parts[].inlineData with MIME allowlist:
    • 图片:image/pngimage/jpegimage/jpgimage/webpimage/heicimage/heif
    • 音频:audio/mpegaudio/mp3audio/wav
    • 视频:video/mp4video/movvideo/mpegvideo/mpgvideo/avivideo/wmvvideo/mpegpsvideo/flv
    • 文档:application/pdftext/plain
  • content 字符串里夹的 ![alt](data:image/...;base64,...) Markdown images become inlineData parts (same as image_url).

4. Tools passthrough

"tools": [
  { "type": "function", "function": { "name": "googleSearch" } },     // Enable Google Search
  { "type": "function", "function": { "name": "codeExecution" } },    // Enable code execution
  { "type": "function", "function": { "name": "urlContext" } },       // Enable URL context
  { "type": "function", "function": {                                  // Standard function calling
      "name": "get_weather",
      "description": "Get weather",
      "parameters": { "type": "object", "properties": { "city": { "type": "string" } }, "required": ["city"] }
    }
  }
]
Three special names map to native Gemini tools; other function entries use functionDeclarations.

5. extra_body — passthrough Gemini native parameters

extra_body.google.* All fields under this namespace are passed to the Gemini native API.
{
  "model": "gemini-3.5-flash",
  "messages": [{ "role": "user", "content": "Draw a shiba inu" }],
  "extra_body": {
    "google": {
      "generationConfig": { /* ... */ },
      "safetySettings":   [ /* ... */ ],
      "tools":            [ /* ... */ ],
      "systemInstruction": { /* ... */ },
      "thinking_config":  { /* ... */ }
    }
  }
}

5.1 Two passthrough paths

路径FieldBehavior
snake_case allowlist (legacy)extra_body.google.thinking_config显式解析后写入对应Field。只接受 snake_case key,Type不匹配会被静默跳过(不报错),交由 ② 兜底。
Full passthrough (no schema validation)extra_body.google.* 下除 ① 外的任意Field把整个 extra_body.google 子树(剔除 thinking_config深度合并到最终发给 Gemini 的请求 JSON。Field名按 Gemini 官方原生 camelCase 书写(generationConfigsafetySettingstoolssystemInstructiontoolConfigcachedContentresponseModalitiesresponseSchemaresponseJsonSchema etc.)。

5.2 thinking_config (snake_case allowlist)

FieldTypeGemini fieldDescription
thinking_budgetintthinkingBudgetThinking token budget. > 0 sets include_thoughts true; 0 or negative disables thinking
include_thoughtsboolincludeThoughtsReturn thinking trace in reasoning_content
thinking_levelstringthinkingLevelThinking level (e.g. "HIGH")
只要传了 extra_body.google,系统自动的思维链适配会关闭,全部 thinking Behavior由调用方掌控。

5.3 Deep-merge rules

  • Treat extra_body.google (minus the one snake_case key above) as patch.
  • 把已经根据 OpenAI field构造好的 Gemini 请求作为 base。
  • deep merge:
    • Same key, both maps → recursive merge;
    • 其它Type(标量、数组、null)→ patch 直接覆盖 base;
    • Keys only in base are kept.
  • Merged body is sent upstream — equivalent to a native Gemini call.
含义:你可以用 extra_body.google.generationConfig.maxOutputTokens 覆盖通过 OpenAI field max_tokens 设置的值,也可以用 extra_body.google.safetySettings 完全替换平台默认安全设置,新增 Gemini field(如未来上线的Field)无需改代码即可直接使用。

5.4 Passthrough example

{
  "model": "gemini-3.5-flash",
  "messages": [{ "role": "user", "content": "Write an article about AI" }],
  "extra_body": {
    "google": {
      "generationConfig": {
        "temperature": 1,
        "topP": 0.95,
        "maxOutputTokens": 32768
      },
      "safetySettings": [
        { "category": "HARM_CATEGORY_HATE_SPEECH",       "threshold": "OFF" },
        { "category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "OFF" },
        { "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "OFF" },
        { "category": "HARM_CATEGORY_HARASSMENT",        "threshold": "OFF" }
      ]
    }
  }
}

6. Response format

6.1 Non-streaming chat.completion

{
  "id": "sXIFar39H4K0694P2MWmWQ",
  "object": "chat.completion",
  "created": 1747299537,
  "model": "gemini-3.5-flash",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello, how can I help?",
        "reasoning_content": "User is greeting..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": { /* 见 §7 */ }
}
  • id = Upstream responseId (matches log request_id); falls back to chatcmpl-* if missing.
  • reasoning_content:Thinking text (only when include_thoughts:true).
  • executable_code / code_execution_result:Embedded as markdown code blocks in text.
  • Non-image media (audio, etc.) embedded as markdown [media](data:...).
  • finish_reason Mapping: STOP→stop, MAX_TOKENS→length, safety/recitation/…→content_filter, functionCalltool_calls.

6.2 Streaming chat.completion.chunk

  • In streaming, delta.content is a string (not an array).
  • Images embedded in delta.content as ![image](data:...) markdown.
  • id Stable across streaming chunks.

7. Usage

response.usage 完整Field:
{
  "prompt_tokens": 1127,
  "completion_tokens": 2050,     // includes reasoning tokens
  "total_tokens": 2273,

  "prompt_tokens_details": {
    "cached_tokens": 0,
    "text_tokens":   7,
    "audio_tokens":  0,
    "image_tokens":  1120
  },

  "completion_tokens_details": {
    "text_tokens":      26,
    "audio_tokens":     0,
    "image_tokens":     1120,
    "reasoning_tokens": 904      // thinking tokens, shown separately
  }
}

7.1 How thinking tokens are counted

  • reasoning_tokens Shown separately for visibility.
  • completion_tokens includes reasoning_tokens (OpenAI semantics; billing uses completion_tokens).

7.2 Output token breakdown

系统会根据Output自动Bucket token Type:
OutputBucket
Image outputimage_tokens
Text onlytext_tokens
Bucket基于Output,而非模型名——即使模型名含 “image”,纯文本回答仍按文本计费。

7.3 Modality case handling

Accepts image / IMAGE case variants.

8. Logging and reconciliation

Upstream token usage is logged per request:
  • responseId Matches response.id and log request_id.

9. Examples

9.1 Text chat with thinking

curl -X POST https://api.gravitex.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-3.5-flash",
    "messages": [{"role": "user", "content": "Prove Fermat's Last Theorem"}],
    "extra_body": {
      "google": {
        "thinking_config": { "thinking_level": "HIGH", "include_thoughts": true }
      }
    }
  }'

9.2 Multimodal input (text + image URL)

curl -X POST https://api.gravitex.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-3.5-flash",
    "messages": [
      { "role": "user", "content": [
          { "type": "text", "text": "What is in this image?" },
          { "type": "image_url", "image_url": { "url": "https://example.com/cat.jpg" } }
        ]
      }
    ]
  }'

9.3 Google Search + URL context

curl -X POST https://api.gravitex.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-3.5-flash",
    "messages": [{"role": "user", "content": "Big tech news today?"}],
    "tools": [
      { "type": "function", "function": { "name": "googleSearch" } },
      { "type": "function", "function": { "name": "urlContext" } }
    ]
  }'

9.4 Streaming chat

curl -N -X POST https://api.gravitex.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-3.5-flash",
    "messages": [{"role": "user", "content": "Write a five-character quatrain"}],
    "stream": true,
    "stream_options": { "include_usage": true }
  }'

10. OpenAI-only parameters (ignored for Gemini)

以下 OpenAI 标准参数在 Gemini API 中没有对应Field,传入后会被静默丢弃,不会报错:
OpenAI fieldDescription
logit_biasGemini has no logit bias
predictionPredicted output(No Gemini equivalent)
userOpenAI 用户标识(No Gemini equivalent)
parallel_tool_callsNo parallel tool call control
verbosityGPT-5 specific
service_tierOpenAI service tier
safety_identifierOpenAI safety identifier
storeOpenAI store flag
prompt_cache_key / prompt_cache_retentionOpenAI cache control
web_search_optionsUse googleSearch / urlContext in tools instead
functions / function_callLegacy functions API — use tools + tool_choice

11. FAQ — Thinking / Reasoning

Q1:Can I control Gemini thinking length in OpenAI format?

Yes. Three approaches:

方式一:reasoning_effort(OpenAI 标准Field,最简单)

Pass OpenAI reasoning_effort; mapped to Gemini thinking config:
{
  "model": "gemini-3.5-flash",
  "messages": [{"role": "user", "content": "Prove Fermat's Last Theorem"}],
  "reasoning_effort": "high"
}
Mapping (automatic):
reasoning_effortGemini 3 series
"low"thinkingLevel = "LOW"
"medium"thinkingLevel = "MEDIUM"
"high"thinkingLevel = "HIGH"
Note: reasoning_effort only applies with -thinking suffix (e.g. gemini-3.5-flash-thinking). Plain names like gemini-3.5-flash do not enable auto thinking.

Option 2: model name suffix

SuffixBehavior
-thinkingEnables thinking + includeThoughts: true; budget from max_tokens percent
-thinking-<数字>Thinking on; budget = number (clamped)
-nothinkingDisables thinking (thinkingBudget = 0); Gemini 2.5 only
Example: gemini-3.5-flash-thinking-16384 → thinking on, budget 16384.

Option 3: extra_body.google.thinking_config (full control)

{
  "model": "gemini-3.5-flash",
  "messages": [{"role": "user", "content": "..."}],
  "extra_body": {
    "google": {
      "thinking_config": {
        "thinking_level": "HIGH",
        "include_thoughts": true
      }
    }
  }
}
Gemini 3 series用 thinking_level 代替 thinking_budget
{
  "extra_body": {
    "google": {
      "thinking_config": {
        "thinking_level": "HIGH",
        "include_thoughts": true
      }
    }
  }
}
优先级extra_body.google.thinking_config > reasoning_effort > Suffix。传了 extra_body.google 后系统自动的思维链适配会关闭,所有 thinking Behavior完全由调用方控制。

Q2:How do I read thinking output?

设了 include_thoughts: true 后,思考过程会放在响应的 reasoning_content Field:
{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "Final answer...",
      "reasoning_content": "Thinking trace..."
    }
  }]
}
In streaming, thinking arrives via delta.reasoning_content.

Q3:Gemini 2.5 vs 3 thinking differences

FeatureGemini 2.5Gemini 3
ControlthinkingBudget(integer token budget)thinkingLevel(enum MINIMAL/LOW/MEDIUM/HIGH)
Can disable thinkingYes (thinkingBudget = 0)Cannot disable (platform limit)
reasoning_effort: "none"Supported (disable thinking)Not supported

12. Known limitations

LimitationDescription
Streaming images as markdownStreaming embeds ![image](data:...); non-streaming may return multimodal array
Non-image media as markdown text音频等非图片媒体以 markdown 形式嵌入
extra_body.google.* 完全透传,不做Field校验Field写错(如 typo、值Type错误)会原样发给上游,由 Gemini 返回错误,调用方负责Field正确性
Default safety from platformOverride via extra_body.google.safetySettings

13. Troubleshooting

IssueWhat to check
Usage looks wrongCheck logs vs upstream usage
response.id mismatch with logsShould not happen — contact support
Why images are markdown not arrayStreaming uses markdown; non-streaming may use multimodal array
extra_body.google.xxx not applied1) extra_body 是 JSON 对象,不能是字符串;2) xxx 是否在 google 命名空间下;3) thinking_config 需要用 snake_case,其他Field用 Gemini 原生 camelCase
Override default safetySet extra_body.google.safetySettings