Native Claude Format
curl --request POST \
--url https://api.gravitex.ai/v1/messages \
--header 'Authorization: <authorization>' \
--header 'Content-Type: application/json' \
--data '
{
"model": "<string>",
"messages": [
{}
],
"max_tokens": 123,
"system": {},
"temperature": 123,
"top_p": 123,
"top_k": 123,
"stream": true,
"stop_sequences": [
{}
],
"tools": [
{}
],
"tool_choice": {},
"thinking": {},
"metadata": {},
"mcp_servers": [
{}
],
"context_management": {},
"cache_control": {}
}
'{
"id": "msg_xxx",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Artificial intelligence is a branch of computer science that focuses on creating intelligent machines capable of performing tasks that typically require human intelligence..."
}
],
"model": "claude-sonnet-4-5-20250929",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 25,
"output_tokens": 100
}
}
Chat & text
Native Claude Format
POST
/
v1
/
messages
Native Claude Format
curl --request POST \
--url https://api.gravitex.ai/v1/messages \
--header 'Authorization: <authorization>' \
--header 'Content-Type: application/json' \
--data '
{
"model": "<string>",
"messages": [
{}
],
"max_tokens": 123,
"system": {},
"temperature": 123,
"top_p": 123,
"top_k": 123,
"stream": true,
"stop_sequences": [
{}
],
"tools": [
{}
],
"tool_choice": {},
"thinking": {},
"metadata": {},
"mcp_servers": [
{}
],
"context_management": {},
"cache_control": {}
}
'{
"id": "msg_xxx",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Artificial intelligence is a branch of computer science that focuses on creating intelligent machines capable of performing tasks that typically require human intelligence..."
}
],
"model": "claude-sonnet-4-5-20250929",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 25,
"output_tokens": 100
}
}
Documentation Index
Fetch the complete documentation index at: https://docs.gravitex.ai/llms.txt
Use this file to discover all available pages before exploring further.
Introduction
Claude’s native message API, suitable for native Anthropic clients like Claude Code. This API follows Anthropic’s specification and provides full Claude model capabilities, including Extended Thinking, tool calling, and other advanced features.If you’re using an OpenAI-compatible client (like OpenAI SDK), we recommend using the
/v1/chat/completions endpoint instead.Authentication
Bearer Token, e.g.,
Bearer sk-xxxxxxxxxxRequest Parameters
Claude model identifier, supported models include:
claude-opus-4-5-20251101- Claude Opus 4.5 (Latest, strongest reasoning)claude-haiku-4-5-20251001- Claude Haiku 4.5 (Latest, fastest)claude-sonnet-4-5-20250929- Claude Sonnet 4.5 (Latest, balanced)claude-opus-4-1-20250805- Claude Opus 4.1claude-sonnet-4-20250514- Claude Sonnet 4- Other Claude series models
List of conversation messages, each containing
role (user/assistant) and content. content can be a string or an array of media content.Maximum number of tokens to generate. Must be greater than 0.
System prompt, can be a string or an array of media content. Used to set the model’s behavior and role.
Randomness control, 0-1. Higher values make responses more random. Recommended to set to 1.0 when using extended thinking.
Nucleus sampling parameter, 0-1, controls generation diversity. Recommended to set to 0 when using extended thinking.
Top-K sampling parameter, only supported by some models.
Whether to enable streaming output, returns SSE format data chunks. Recommended to enable when using extended thinking.
List of stop sequences. Generation stops when the model produces these sequences.
Tool definitions list, supports function tools and web search tools.
Tool selection strategy, controls how the model uses tools.
Extended thinking configuration, enables Claude’s deep reasoning capability.
Request metadata for tracking and debugging.
MCP (Model Context Protocol) server configuration.
Context management configuration, controls how conversation context is handled.
Prompt Caching
Prompt Caching allows you to cache frequently used context content, significantly reducing costs and improving response speed. Supports using thecache_control parameter in system and messages.
Cache Control Parameters
Cache control configuration, can be used in
system array elements and content array elements in messages.type: Cache type"ephemeral": 5-minute cache (default, most cost-effective)"persistent": 1-hour cache (suitable for long-term stable context)
Caching Mechanism
- Cache Position: The last content block marked with
cache_controlwill be cached - Cache Threshold: Content needs at least 1024 tokens (Claude Sonnet 4.5) or 2048 tokens (Claude 3 Haiku)
- Cache Duration:
ephemeral: Valid for 5 minutespersistent: Valid for 1 hour
- Cost Savings: Cache reads are 90% cheaper than regular inputs
Use Cases
- Long Document Analysis: Cache large documents in
system, ask multiple questions - Codebase Understanding: Cache code context for multi-turn code analysis
- Knowledge Base Q&A: Cache knowledge base content for fast queries
- Multi-turn Conversations: Cache conversation history to maintain context coherence
Basic Examples
- Non-streaming Request
- Streaming Request (SSE)
- Python Example (Anthropic SDK)
curl -X POST "https://api.gravitex.ai/v1/messages" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-xxxxxxxxxx" \
-d '{
"model": "claude-sonnet-4-5-20250929",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "Please briefly introduce artificial intelligence"}
]
}'
curl -N -X POST "https://api.gravitex.ai/v1/messages" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-xxxxxxxxxx" \
-d '{
"model": "claude-sonnet-4-5-20250929",
"max_tokens": 1024,
"stream": true,
"messages": [
{"role": "user", "content": "Please briefly introduce artificial intelligence"}
]
}'
from anthropic import Anthropic
client = Anthropic(
api_key="sk-xxxxxxxxxx",
base_url="https://api.gravitex.ai"
)
# Non-streaming
message = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[
{"role": "user", "content": "Please briefly introduce artificial intelligence"}
]
)
print(message.content[0].text)
# Streaming
with client.messages.stream(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[
{"role": "user", "content": "Please briefly introduce artificial intelligence"}
]
) as stream:
for text_block in stream.text_stream:
print(text_block, end="")
{
"id": "msg_xxx",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Artificial intelligence is a branch of computer science that focuses on creating intelligent machines capable of performing tasks that typically require human intelligence..."
}
],
"model": "claude-sonnet-4-5-20250929",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 25,
"output_tokens": 100
}
}
Advanced Features
System Prompt
System prompts can be set as a string or an array of media content:- String Format
- Array Format
curl -X POST "https://api.gravitex.ai/v1/messages" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-xxxxxxxxxx" \
-d '{
"model": "claude-sonnet-4-5-20250929",
"max_tokens": 1024,
"system": "You are a helpful assistant that excels at answering questions.",
"messages": [
{"role": "user", "content": "What is machine learning?"}
]
}'
curl -X POST "https://api.gravitex.ai/v1/messages" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-xxxxxxxxxx" \
-d '{
"model": "claude-sonnet-4-5-20250929",
"max_tokens": 1024,
"system": [
{"type": "text", "text": "You are a helpful assistant that excels at answering questions."}
],
"messages": [
{"role": "user", "content": "What is machine learning?"}
]
}'
Extended Thinking
Claude supports extended thinking, allowing the model to perform deep reasoning. When enabled, the model will think internally before generating the final answer.- Basic Usage
- Python Example
curl -X POST "https://api.gravitex.ai/v1/messages" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-xxxxxxxxxx" \
-d '{
"model": "claude-sonnet-4-5-20250929",
"max_tokens": 4096,
"temperature": 1.0,
"top_p": 0,
"stream": true,
"messages": [
{"role": "user", "content": "Give a medium difficulty geometry problem and solve it step by step"}
]
}'
from anthropic import Anthropic
client = Anthropic(
api_key="sk-xxxxxxxxxx",
base_url="https://api.gravitex.ai"
)
with client.messages.stream(
model="claude-sonnet-4-5-20250929",
max_tokens=4096,
thinking={
"type": "enabled",
"budget_tokens": 4096
},
temperature=1.0,
top_p=0,
messages=[
{"role": "user", "content": "Give a medium difficulty geometry problem and solve it step by step"}
]
) as stream:
for event in stream:
if event.type == "content_block_delta":
if hasattr(event.delta, "thinking"):
# Thinking process
print(f"[Thinking] {event.delta.thinking}", end="")
elif hasattr(event.delta, "text"):
# Final answer
print(event.delta.text, end="")
budget_tokensmust be greater than 1024- When using extended thinking, it’s recommended to set
temperature: 1.0andtop_p: 0 - Streaming output (
stream: true) must be enabled to see the thinking process
Tool Calling
Supports function tools and web search tools:- Function Tools
- Claude Official Web Search Tool
- Complete Tool Calling Flow
curl -X POST "https://api.gravitex.ai/v1/messages" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-xxxxxxxxxx" \
-d '{
"model": "claude-sonnet-4-5-20250929",
"max_tokens": 1024,
"tools": [
{
"name": "get_weather",
"description": "Get weather information for a city",
"input_schema": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name"
}
},
"required": ["city"]
}
}
],
"tool_choice": {
"type": "auto"
},
"messages": [
{"role": "user", "content": "What is the weather in Shanghai?"}
]
}'
Claude supports the official web search tool With Search Limit:With Location Information (Improves Search Accuracy):Python Example:
web_search_20250305, which can search the web in real-time and include citation sources in responses.Basic Usage:curl -X POST "https://api.gravitex.ai/v1/messages" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-xxxxxxxxxx" \
-d '{
"model": "claude-sonnet-4-5-20250929",
"max_tokens": 1024,
"tools": [
{
"type": "web_search_20250305",
"name": "web_search"
}
],
"messages": [
{"role": "user", "content": "What are the latest news about artificial intelligence?"}
]
}'
curl -X POST "https://api.gravitex.ai/v1/messages" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-xxxxxxxxxx" \
-d '{
"model": "claude-sonnet-4-5-20250929",
"max_tokens": 1024,
"tools": [
{
"type": "web_search_20250305",
"name": "web_search",
"max_uses": 5
}
],
"messages": [
{"role": "user", "content": "Search for today'\''s weather in Beijing"}
]
}'
curl -X POST "https://api.gravitex.ai/v1/messages" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-xxxxxxxxxx" \
-d '{
"model": "claude-sonnet-4-5-20250929",
"max_tokens": 1024,
"tools": [
{
"type": "web_search_20250305",
"name": "web_search",
"max_uses": 5,
"user_location": {
"type": "approximate",
"timezone": "Asia/Shanghai",
"country": "CN",
"region": "Beijing",
"city": "Beijing"
}
}
],
"messages": [
{"role": "user", "content": "What'\''s the weather in Shanghai today?"}
]
}'
from anthropic import Anthropic
client = Anthropic(
api_key="sk-xxxxxxxxxx",
base_url="https://api.gravitex.ai"
)
message = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
tools=[
{
"type": "web_search_20250305",
"name": "web_search",
"max_uses": 5
}
],
messages=[
{"role": "user", "content": "What are the latest news about artificial intelligence?"}
]
)
print(message.content[0].text)
typemust be"web_search_20250305"namemust be"web_search"max_uses(optional): Maximum number of search uses in a single conversation, recommended value: 2-10user_location(optional): User location information to improve localization accuracy of search results- Search results will automatically include citation sources in the response
- Supported models include Claude Sonnet 4.5, Claude Opus 4.5, Claude Haiku 4.5, etc.
Phase 1: Model returns tool call requestPhase 2: Return tool execution result
{
"id": "msg_xxx",
"content": [
{
"type": "tool_use",
"id": "toolu_xxx",
"name": "get_weather",
"input": {"city": "Shanghai"}
}
],
"stop_reason": "tool_use"
}
curl -X POST "https://api.gravitex.ai/v1/messages" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-xxxxxxxxxx" \
-d '{
"model": "claude-sonnet-4-5-20250929",
"max_tokens": 1024,
"tools": [...],
"messages": [
{"role": "user", "content": "What is the weather in Shanghai?"},
{
"role": "assistant",
"content": [
{
"type": "tool_use",
"id": "toolu_xxx",
"name": "get_weather",
"input": {"city": "Shanghai"}
}
]
},
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": "toolu_xxx",
"content": "{\"temp\":\"22°C\",\"condition\":\"Cloudy\",\"aqi\":53}"
}
]
}
]
}'
tool_choice Parameter Details
tool_choice controls how the model uses tools:
| Value | Description |
|---|---|
{"type": "auto"} | Automatically decide whether to use tools (default) |
{"type": "any"} | Must use at least one tool |
{"type": "none"} | Don’t use any tools |
{"type": "tool", "name": "tool_name"} | Must use the specified tool |
{
"tool_choice": {
"type": "auto",
"disable_parallel_tool_use": false
}
}
Multimodal Input (Images)
Supports including images in messages:curl -X POST "https://api.gravitex.ai/v1/messages" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-xxxxxxxxxx" \
-d '{
"model": "claude-sonnet-4-5-20250929",
"max_tokens": 1024,
"messages": [
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg=="
}
},
{
"type": "text",
"text": "What is in this image?"
}
]
}
]
}'
Prompt Caching
Caching frequently used context content can significantly reduce costs and improve response speed.- System Cache (5 minutes)
- Messages Cache (1 hour)
- Python SDK Example
curl -X POST "https://api.gravitex.ai/v1/messages" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-xxxxxxxxxx" \
-d '{
"model": "claude-sonnet-4-5-20250929",
"max_tokens": 1024,
"system": [
{
"type": "text",
"text": "You are a professional technical documentation analyst. Here is the complete AWS Lambda technical documentation:\n\nAWS Lambda is a serverless computing service...[large documentation content, at least 1024 tokens]",
"cache_control": {"type": "ephemeral"}
}
],
"messages": [
{"role": "user", "content": "What is Lambda's pricing model?"}
]
}'
{
"usage": {
"input_tokens": 50,
"cache_creation_input_tokens": 1200,
"cache_read_input_tokens": 0,
"output_tokens": 150
}
}
{
"usage": {
"input_tokens": 45,
"cache_creation_input_tokens": 0,
"cache_read_input_tokens": 1200,
"output_tokens": 100
}
}
curl -X POST "https://api.gravitex.ai/v1/messages" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-xxxxxxxxxx" \
-d '{
"model": "claude-sonnet-4-5-20250929",
"max_tokens": 1024,
"system": "You are a Python programming assistant",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Analyze this code:\n```python\n[large code snippet, at least 1024 tokens]\n```",
"cache_control": {"type": "persistent"}
}
]
},
{
"role": "assistant",
"content": [
{
"type": "text",
"text": "The main functionality of this code is...[detailed analysis]",
"cache_control": {"type": "persistent"}
}
]
},
{
"role": "user",
"content": "How can I optimize the performance of this code?"
}
]
}'
- 1-hour cache duration, suitable for long sessions
- Ideal for code reviews, document analysis, etc.
- Faster subsequent requests after cache hit
from anthropic import Anthropic
client = Anthropic(
api_key="sk-xxxxxxxxxx",
base_url="https://api.gravitex.ai"
)
# First request: Create cache
message1 = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
system=[
{
"type": "text",
"text": "You are a professional document analyst...[long text content]",
"cache_control": {"type": "ephemeral"}
}
],
messages=[
{"role": "user", "content": "First question"}
]
)
print(f"Cache created: {message1.usage.cache_creation_input_tokens} tokens")
print(f"Cache read: {message1.usage.cache_read_input_tokens} tokens")
# Second request within 5 minutes: Use cache
message2 = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
system=[
{
"type": "text",
"text": "You are a professional document analyst...[same long text]",
"cache_control": {"type": "ephemeral"}
}
],
messages=[
{"role": "user", "content": "Second question"}
]
)
print(f"Cache created: {message2.usage.cache_creation_input_tokens} tokens")
print(f"Cache read: {message2.usage.cache_read_input_tokens} tokens")
Cache Key Points:
- Content must be ≥ 1024 tokens (Claude Sonnet 4.5) to trigger caching
ephemeralcache is valid for 5 minutespersistentcache is valid for 1 hour- Cache reads cost 90% less than regular inputs
- The last block with
cache_controlwill be cached - Cache is based on exact content match; any changes invalidate the cache
Best Practices:
- Place unchanging long context (documents, codebases, etc.) in
systemwith caching enabled - Use
persistentcache (1 hour) for long-term stable content - Use
ephemeralcache (5 minutes) for frequently changing content - Cache conversation history in multi-turn dialogues
- Monitor
cache_creation_input_tokensandcache_read_input_tokensto optimize costs
Response Format
- Non-streaming Response
- Streaming Response
{
"id": "msg_xxx",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Response content..."
}
],
"model": "claude-sonnet-4-5-20250929",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 25,
"cache_creation_input_tokens": 0,
"cache_read_input_tokens": 0,
"output_tokens": 100
}
}
input_tokens: Non-cached input tokens for the current requestcache_creation_input_tokens: Tokens cached for the first time (only present in first request)cache_read_input_tokens: Tokens read from cache (present when cache hits)output_tokens: Generated output tokens
Streaming responses are returned in SSE (Server-Sent Events) format, containing the following event types:When using extended thinking,
message_start: Message startcontent_block_start: Content block startcontent_block_delta: Content delta (containstextorthinking)content_block_stop: Content block endmessage_delta: Message delta (contains usage info)message_stop: Message end
event: message_start
data: {"type":"message_start","message":{"id":"msg_xxx","type":"message","role":"assistant","content":[],"model":"claude-sonnet-4-5-20250929","stop_reason":null,"stop_sequence":null,"usage":{"input_tokens":25,"output_tokens":0}}}
event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Response"}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" content"}}
event: content_block_stop
data: {"type":"content_block_stop","index":0}
event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn","stop_sequence":null},"usage":{"output_tokens":100}}
event: message_stop
data: {"type":"message_stop"}
content_block_delta may contain a thinking field:event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"thinking_delta","thinking":"Let me think about this problem..."}}
Error Handling
The system processes upstream Claude API errors and returns standardized error response formats.| Error Type | HTTP Status Code | Description |
|---|---|---|
invalid_request | 400 | Request parameter error (e.g., missing required fields) |
authentication_error | 401 | Invalid or unauthorized API key |
rate_limit_error | 429 | Request rate limit exceeded |
upstream_error | 500 | Upstream service error |
gravitex_api_error | 500 | System internal error |
{
"error": {
"type": "invalid_request",
"message": "field messages is required"
}
}
Comparison with /v1/chat/completions
| Feature | /v1/messages | /v1/chat/completions |
|---|---|---|
| Authentication | Authorization: Bearer | Authorization: Bearer |
| Response Format | Anthropic native format | OpenAI compatible format |
| Extended Thinking | Native thinking parameter | Via reasoning_effort or reasoning parameter |
| Tool Calling | Native tools and tool_choice | OpenAI compatible format |
| Suitable Clients | Anthropic SDK, Claude Code | OpenAI SDK, compatible clients |
- If you’re using Claude Code or other Anthropic native clients, we recommend using the
/v1/messagesendpoint - If you’re using OpenAI SDK or need OpenAI format compatibility, we recommend using the
/v1/chat/completionsendpoint - Both endpoints have essentially the same functionality, the main difference is in request/response format
Notes
max_tokensis a required parameter and must be greater than 0messagesarray cannot be empty- When using extended thinking,
budget_tokensmust be greater than 1024 - Extended thinking requires streaming output to see the thinking process
- Tool calling requires multiple rounds of interaction: first round returns tool call request, second round returns tool execution result
- Image input requires base64 encoding
- Using streaming output can improve first token response time and interaction experience
- Tool calling should have proper timeout and retry mechanisms to avoid blocking model responses
- Extended thinking can significantly improve reasoning quality for complex problems
Related Resources
Chat Completions (OpenAI Compatible)
View OpenAI compatible chat endpoint documentation
Model List
View all supported model information
⌘I