Native OpenAI Format (ChatCompletions)
curl --request POST \
--url https://api.gravitex.ai/v1/chat/completions \
--header 'Authorization: <authorization>' \
--header 'Content-Type: application/json' \
--data '
{
"model": "<string>",
"messages": [
{}
],
"temperature": 123,
"stream": true,
"max_tokens": 123,
"top_p": 123
}
'{
"id": "chatcmpl-xxx",
"object": "chat.completion",
"created": 1234567890,
"model": "glm-5",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Artificial intelligence is a branch of computer science that aims to create intelligent machines..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 100,
"total_tokens": 125
}
}
Chat & text
Native OpenAI Format (ChatCompletions)
POST
/
v1
/
chat
/
completions
Native OpenAI Format (ChatCompletions)
curl --request POST \
--url https://api.gravitex.ai/v1/chat/completions \
--header 'Authorization: <authorization>' \
--header 'Content-Type: application/json' \
--data '
{
"model": "<string>",
"messages": [
{}
],
"temperature": 123,
"stream": true,
"max_tokens": 123,
"top_p": 123
}
'{
"id": "chatcmpl-xxx",
"object": "chat.completion",
"created": 1234567890,
"model": "glm-5",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Artificial intelligence is a branch of computer science that aims to create intelligent machines..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 100,
"total_tokens": 125
}
}
Documentation Index
Fetch the complete documentation index at: https://docs.gravitex.ai/llms.txt
Use this file to discover all available pages before exploring further.
Introduction
Universal text chat API supporting OpenAI-compatible large language models for generating conversational responses. Through a unified API interface, you can call multiple mainstream large models including OpenAI, Claude, DeepSeek, Grok, and Tongyi Qianwen.Authentication
Bearer Token, e.g.
Bearer sk-xxxxxxxxxxRequest Parameters
Model identifier, supported models include:
- OpenAI series:
gpt-5.5,gpt-5.4,gpt-5.4-pro,gpt-5.4-mini,gpt-5.4-nano,gpt-4o, etc. - Claude series:
claude-opus-4-6,claude-sonnet-4-5-20250929,claude-haiku-4-5-20251001, etc. - DeepSeek series:
deepseek-v3-1-250821,deepseek-v3,deepseek-r1, etc. - Grok series:
grok-4,grok-4-fast-reasoning,grok-3, etc. - Gemini series:
gemini-3-pro-preview,gemini-3-flash-preview,nano-banana-proand-thinking/-nothinking/-thinking-<budget>/-thinking-low/-thinking-highvariants - Domestic models:
glm-5,glm-4.7,doubao-seed-1-8-251228(Doubao Seed series),qwen3-coder-plus,kimi-k2.5, etc.
Conversation message list, each element contains
role (user/system/assistant) and contentRandomness control, 0-2, higher values = more random responses
Whether to enable streaming output, returns SSE format chunked data
Maximum number of tokens to generate, controls response length
Nucleus sampling parameter, 0-1, controls generation diversity
Basic Examples
- Non-Streaming Request
- Streaming Request (SSE)
- Python Example
curl -X POST "https://api.gravitex.ai/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-xxxxxxxxxx" \
-d '{
"model": "glm-5",
"messages": [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Briefly introduce artificial intelligence"}
],
"temperature": 0.7
}'
curl -N -X POST "https://api.gravitex.ai/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-xxxxxxxxxx" \
-d '{
"model": "doubao-seed-1-8-251228",
"stream": true,
"messages": [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Briefly introduce artificial intelligence"}
]
}'
from openai import OpenAI
client = OpenAI(
api_key="sk-xxxxxxxxxx",
base_url="https://api.gravitex.ai/v1"
)
# Non-streaming
completion = client.chat.completions.create(
model="glm-5",
messages=[
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Briefly introduce artificial intelligence"}
],
temperature=0.7
)
print(completion.choices[0].message.content)
# Streaming
stream = client.chat.completions.create(
model="doubao-seed-1-8-251228",
messages=[
{"role": "user", "content": "Briefly introduce artificial intelligence"}
],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
{
"id": "chatcmpl-xxx",
"object": "chat.completion",
"created": 1234567890,
"model": "glm-5",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Artificial intelligence is a branch of computer science that aims to create intelligent machines..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 100,
"total_tokens": 125
}
}
Advanced Features
Tool Calling (Functions / Tools)
Supports OpenAI-compatible tool calling format, applicable to GPT, Claude, DeepSeek, Grok, Tongyi Qianwen, and other models.- Phase 1: Model Returns Tool Call
- Phase 2: Return Tool Execution Result
curl -X POST "https://api.gravitex.ai/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-xxxxxxxxxx" \
-d '{
"model": "glm-5",
"messages": [
{"role": "user", "content": "What'\''s the weather in Shanghai?"}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather information by city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string"}
},
"required": ["city"]
}
}
}
],
"tool_choice": "auto"
}'
After the model returns
tool_calls, you need to execute the tool and pass the result back to the model:curl -X POST "https://api.gravitex.ai/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-xxxxxxxxxx" \
-d '{
"model": "glm-5",
"messages": [
{"role": "user", "content": "What'\''s the weather in Shanghai?"},
{
"role": "assistant",
"tool_calls": [
{
"id": "call_xxx",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"city\":\"Shanghai\"}"
}
}
]
},
{
"role": "tool",
"tool_call_id": "call_xxx",
"content": "{\"temp\":\"22°C\",\"condition\":\"Cloudy\",\"aqi\":53}"
}
]
}'
tool_call_idmust match the ID returned in Phase 1- If tool execution fails, return readable error information to avoid blocking subsequent completions
- Phase 2 also supports streaming output
Structured Output (JSON Schema)
Supports controlling output format throughresponse_format parameter, applicable to GPT, Claude, Grok, and other models.
curl -X POST "https://api.gravitex.ai/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-xxxxxxxxxx" \
-d '{
"model": "doubao-seed-1-8-251228",
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "Answer",
"schema": {
"type": "object",
"properties": {
"summary": {"type": "string"}
},
"required": ["summary"]
}
}
},
"messages": [
{"role": "user", "content": "Return a JSON containing a summary field"}
]
}'
For strict structured output, it is recommended to lower the
temperature value (e.g., 0.1-0.3) and set an appropriate max_tokens to improve consistency.Thinking Capability
Some models support thinking capability (Thinking/Reasoning), which can display the reasoning process when generating responses. Different models implement this differently:- DeepSeek
- Tongyi Qianwen
- Gemini
DeepSeek models support enabling thinking capability through the
thinking field:curl -X POST "https://api.gravitex.ai/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-xxxxxxxxxx" \
-d '{
"model": "deepseek-v3-1-250821",
"messages": [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Give a medium-difficulty geometry problem and solve it step by step"}
],
"thinking": {"type": "enabled"}
}'
- Default
thinking.typeis"disabled", need to explicitly set to"enabled"to enable - The output form of thinking capability may vary by model version
- It is recommended to use with
stream: truefor better interactive experience
Tongyi Qianwen supports deep thinking functionality, requires streaming output:Inline reasoning process into content:If the client does not display
curl -N -X POST "https://api.gravitex.ai/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-xxxxxxxxxx" \
-d '{
"model": "qwen3-omni-flash",
"stream": true,
"enable_thinking": true,
"parameters": {
"incremental_output": true
},
"messages": [
{"role": "system", "content": "You are an excellent mathematician"},
{"role": "user", "content": "What is the formula for Tower of Hanoi"}
]
}'
reasoning_content, you can use gravitex_thinking_to_content: true to inline reasoning content into content:curl -N -X POST "https://api.gravitex.ai/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-xxxxxxxxxx" \
-d '{
"model": "qwen3-omni-flash",
"stream": true,
"enable_thinking": true,
"gravitex_thinking_to_content": true,
"parameters": {
"incremental_output": true
},
"messages": [
{"role": "user", "content": "What is the formula for Tower of Hanoi"}
]
}'
Tongyi Qianwenâs deep thinking functionality must be used with
stream: true. If enable_thinking: true is set but stream: false, the system will automatically disable deep thinking to avoid upstream errors.Refer to the Gemini thinking mode guide. Main ways to enable:Example (3 Pro Preview thinking + search):
- Model suffix:
-thinking(auto budget);-thinking-<number>precise budget (e.g.,gemini-2.5-flash-thinking-8192);-nothinkingdisable;gemini-3-pro-preview-thinking-low/highspecify level directly - extra_body config:
extra_body.google.thinking_config.thinking_budget+include_thoughts; special values:-1auto-enable,0disable,>0specific budget; requiresstream: true - reasoning_effort: usable when using
-thinkingandmax_tokensis not set (low/medium/highâ 20%/50%/80% budget) - Gemini 3 Pro Preview: uses
thinking_level(LOW/HIGH, default HIGH), can be combined with search - Enable search: recommended OpenAI-compatible tool
"tools":[{"type":"function","function":{"name":"googleSearch"}}]; or pass throughextra_body.google.tools:[{"googleSearch":{}}] - Notes: thinking adapter must be enabled server-side; thinking budget counts toward output tokens; use
stream: trueto viewreasoning_content
curl -X POST "https://api.gravitex.ai/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-xxxxxxxxxx" \
-d '{
"model": "gemini-3-flash-preview",
"messages": [
{"role":"user","content":"Give a medium-difficulty geometry problem and analyze it step by step."}
],
"extra_body": {
"google": {
"thinking_config": { "thinking_budget": 6000, "include_thoughts": true }
}
},
"stream": true
}'
curl -X POST "https://api.gravitex.ai/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-xxxxxxxxxx" \
-d '{
"model": "gemini-3-pro-preview",
"messages": [
{"role":"user","content":"Google search the weather in Guangzhou today"}
],
"generationConfig": {
"thinkingConfig": { "thinkingLevel": "LOW" }
},
"tools": [
{ "type": "function", "function": { "name": "googleSearch" } }
],
"stream": true
}'
Tongyi Qianwen Extended Features
Tongyi Qianwen models support extended features such as search, speech recognition, etc. All extended parameters need to be placed in theparameters object.
- Search Feature
- Speech Recognition
curl -X POST "https://api.gravitex.ai/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-xxxxxxxxxx" \
-d '{
"model": "qwen3-omni-flash",
"messages": [
{"role": "user", "content": "Please first search for recent common misconceptions about Fermat'\''s Last Theorem, then answer"}
],
"stream": true,
"enable_thinking": true,
"parameters": {
"enable_search": true,
"search_options": {
"region": "CN",
"recency_days": 30
},
"incremental_output": true
}
}'
curl -X POST "https://api.gravitex.ai/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-xxxxxxxxxx" \
-d '{
"model": "qwen3-omni-flash",
"messages": [
{"role": "user", "content": "Hello"}
],
"parameters": {
"asr_options": {
"language": "zh"
}
}
}'
All extended parameters for Tongyi Qianwen (such as
enable_search, search_options, asr_options, temperature, top_p, etc.) need to be placed in the parameters object, not at the top level of the request body.Web Search Features
Some models support real-time web search, allowing access to the latest information and including citation sources in responses.- Claude Web Search
- Grok Live Search
Claude models do not support enabling web search functionality through the Example with Location Information (showing tool call flow):
web_search_options parameter, so it can only be implemented through tool calls, and may be unstable due to network and prompt reasons. For details, see Tool Calling (Functions / Tools) above.Basic Example (showing tool call flow):curl -X POST "https://api.gravitex.ai/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-xxxxxxxxxx" \
-d '{
"model": "glm-5",
"messages": [
{"role": "user", "content": "What are the latest news about artificial intelligence?"},
{
"role": "assistant",
"content": "I'\''ll help you search for the latest news about artificial intelligence.",
"tool_calls": [
{
"id": "toolu_xxx",
"type": "function",
"function": {
"name": "WebSearch",
"arguments": "{\"query\": \"artificial intelligence latest news 2025\"}"
}
}
]
},
{
"role": "tool",
"tool_call_id": "toolu_xxx",
"name": "WebSearch",
"content": "Web search results for query: \"artificial intelligence latest news 2025\"..."
}
],
"web_search_options": {
"search_context_size": "medium"
}
}'
curl -X POST "https://api.gravitex.ai/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-xxxxxxxxxx" \
-d '{
"model": "glm-5",
"messages": [
{"role": "user", "content": "What'\''s the weather in Shanghai today?"},
{
"role": "assistant",
"content": "I'\''ll help you search for today'\''s weather in Shanghai.",
"tool_calls": [
{
"id": "toolu_xxx",
"type": "function",
"function": {
"name": "WebSearch",
"arguments": "{\"query\": \"Shanghai today weather\"}"
}
}
]
},
{
"role": "tool",
"tool_call_id": "toolu_xxx",
"name": "WebSearch",
"content": "Web search results for query: \"Shanghai today weather\"..."
}
],
"web_search_options": {
"search_context_size": "medium",
"user_location": {
"approximate": {
"timezone": "Asia/Shanghai",
"country": "CN",
"region": "Shanghai",
"city": "Shanghai"
}
}
}
}'
- Search functionality will increase response time and token consumption (including search result content)
- Search results will automatically include citation sources in the response
- Supported models include Claude Sonnet 4, Claude 3 Opus, etc.
- In multi-turn conversations, tool calls and results will be visible in message history, and the model can continue the conversation based on previous search results
Stability Notice:
- Web search functionality depends on upstream proxy services and external search services, and may have the following instabilities:
- Network fluctuations: Network connection issues may cause search requests to timeout or fail
- Service limitations: Search services may have rate limits, timeout limits, or temporary unavailability
- Search result quality: Some queries may not find relevant information, or search results may be of poor quality
- Model judgment: The model will automatically determine whether a search is needed based on the question, and in some cases may not trigger a search
- This is an inherent characteristic of web search functionality. It is recommended to:
- Implement retry mechanisms in critical scenarios
- Handle search failures with graceful degradation (e.g., using the modelâs knowledge base to answer)
- Avoid relying entirely on web search in scenarios with extremely high real-time requirements
Grok models support real-time search through the Basic Example:Force Search Example:Python Example:
search_parameters parameter.Search parameter configuration
mode(optional): Search mode, options:"off": Disable search"auto": Model automatically determines if search is needed (recommended)"on": Force search usage
return_citations(optional): Whether to return citation links in the response, defaults totrue
curl -X POST "https://api.gravitex.ai/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-xxxxxxxxxx" \
-d '{
"model": "grok-4",
"messages": [
{"role": "user", "content": "What are the latest developments in AI in 2026?"}
],
"search_parameters": {
"mode": "auto"
}
}'
curl -X POST "https://api.gravitex.ai/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-xxxxxxxxxx" \
-d '{
"model": "grok-4",
"messages": [
{"role": "user", "content": "What are the latest tech news?"}
],
"search_parameters": {
"mode": "on",
"return_citations": true
}
}'
from openai import OpenAI
client = OpenAI(
api_key="sk-xxxxxxxxxx",
base_url="https://api.gravitex.ai/v1"
)
completion = client.chat.completions.create(
model="grok-4",
messages=[
{"role": "user", "content": "What are the latest developments in AI in 2026?"}
],
extra_body={
"search_parameters": {
"mode": "auto"
}
}
)
print(completion.choices[0].message.content)
- It is recommended to use
"auto"mode to let the model automatically determine if search is needed - Search functionality will increase response time but provides access to the latest real-time information
- Supported models include
grok-4,grok-3series, etc. - Search results will include citation sources in the response
GPT File Input (Responses API)
GPT-5 and other models support file input functionality, which needs to be called through the/v1/responses endpoint, not /v1/chat/completions.
- Upload via File URL
- Upload via Base64 Encoding
You can upload PDF files by linking external URLs:
from openai import OpenAI
client = OpenAI(
api_key="sk-xxxxxxxxxx",
base_url="https://api.gravitex.ai/v1/responses?api-version=2025-03-01-preview"
)
response = client.responses.create(
model="gpt-5.2",
input=[
{
"role": "user",
"content": [
{
"type": "input_text",
"text": "Analyze this letter and summarize its key points"
},
{
"type": "input_file",
"file_url": "https://www.example.com/document.pdf"
}
]
}
]
)
print(response.output_text)
Send as Base64-encoded input:
import base64
from openai import OpenAI
client = OpenAI(
api_key="sk-xxxxxxxxxx",
base_url="https://api.gravitex.ai/v1"
)
with open("document.pdf", "rb") as f:
data = f.read()
base64_string = base64.b64encode(data).decode("utf-8")
response = client.responses.create(
model="gpt-5.2",
input=[
{
"role": "user",
"content": [
{
"type": "input_file",
"filename": "document.pdf",
"file_data": f"data:application/pdf;base64,{base64_string}"
},
{
"type": "input_text",
"text": "What is the main content of this document?"
}
]
}
]
)
print(response.output_text)
- File size limit: Single file not exceeding 50 MB, total size of all files in a single request not exceeding 50 MB
- Supported models:
gpt-4o,gpt-4o-mini,gpt-5-chat, and other models that support text and image input
Grok Reasoning Capability
Grok models (especiallygrok-4-fast-reasoning) support reasoning capability. The usage in the response distinguishes between completion_tokens and reasoning_tokens:
{
"usage": {
"prompt_tokens": 100,
"completion_tokens": 500,
"total_tokens": 600,
"completion_tokens_details": {
"reasoning_tokens": 300
}
}
}
completion_tokens - reasoning_tokens
Response Format
- Non-Streaming Response
- Streaming Response
{
"id": "chatcmpl-xxx",
"object": "chat.completion",
"created": 1234567890,
"model": "glm-5",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Response content..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 100,
"total_tokens": 125
}
}
Streaming responses are returned in SSE (Server-Sent Events) format, each chunk contains partial content:The last chunk usually contains
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1234567890,"model":"doubao-seed-1-8-251228","choices":[{"index":0,"delta":{"content":"ĺ"},"finish_reason":null}]}
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1234567890,"model":"doubao-seed-1-8-251228","choices":[{"index":0,"delta":{"content":"ĺ¤"},"finish_reason":null}]}
data: [DONE]
usage statistics.Error Handling
| Exception Type | Trigger Scenario | Return Message |
|---|---|---|
| AuthenticationError | Invalid or unauthorized API key | Error: Invalid or unauthorized API key |
| NotFoundError | Model does not exist or is not supported | Error: Model [model] does not exist or is not supported |
| APIConnectionError | Network interruption or server not responding | Error: Cannot connect to API server |
| APIError | Request format error and other server-side exceptions | API request failed: [error details] |
Supported Model Series
OpenAI Series
- GPT-5.5, GPT-5.4 family (5.4 / Pro / Mini / Nano), GPT-4o, GPT-4o Mini
Claude Series (Anthropic)
- Claude Sonnet 4, Claude 3 Opus, Claude 3 Haiku
DeepSeek Series
- DeepSeek V3, DeepSeek R1
Grok Series (xAI)
- Grok-4, Grok-3, Grok-3-fast, Grok-4-fast-reasoning
Tongyi Qianwen Series (Qwen)
- Qwen3-omni-flash, etc.
Doubao Seed Series
- doubao-seed-1-8-251228, etc.
Other Models
- Gemini series, GLM series (including glm-5), Kimi series, etc.
Notes
- In the
messageslist,systemrole is used to set model behavior,userrole is for user questions - Multi-turn conversations require appending history (including
assistantrole responses) - Requires
openailibrary:pip install openai - Different models may have different levels of support for certain features, it is recommended to check the specific model documentation before use
- Using streaming output can improve first token response time and interactive experience
- Tool calling requires proper timeout and retry mechanisms to avoid blocking model responses
- Tongyi Qianwen extended parameters must be placed in the
parametersobject
Related Resources
FAQ
View FAQ for chat interface
Model List
View all supported model information
âI