Dify

Dify is an open-source LLM application development platform that offers visual orchestration, knowledge bases, workflows, and API services — letting you ship chatbots, agents, and RAG apps quickly. With GravitexAI, you can drive Dify with a single API key against 100+ leading models (Claude, OpenAI, Gemini, Qwen, DeepSeek, Kimi, and more) while benefiting from unified billing, automatic failover, and enterprise-grade reliability.

1. Quick Integration

1.1 Get an API Key

Visit the GravitexAI Console and create an API key:

1.2 Open Dify Model Providers

Log in to Dify, click your avatar (top-right) → Settings
From the left menu, choose Model Provider
Find the OpenAI-API-compatible plugin in the list and install it

OpenAI-API-compatible supports LLM / Embedding / TTS / STT endpoint types. GravitexAI is fully compatible — one plugin covers every model.

1.3 Add a Model

After installing the plugin, click Add Model and fill in the three core fields below:

Field	Value	Notes
Model Type	LLM / Text Embedding / Speech2text …	Match the endpoint type
Model Name	e.g. `gpt-5.5`, `claude-opus-4-7`, `gemini-3.5-flash`	Must be the canonical model ID. Free-form display names will not work.
Display Name	e.g. `GPT-5.5`, `Claude Opus 4.7`	Anything readable; UI only
API Key	Copy from the GravitexAI Console	Format: `sk-xxxxxxxx`
API endpoint URL	`https://api.gravitex.ai/v1`	Don’t drop the trailing `/v1`
Model Name in API endpoint	Identical to “Model Name”	Dify sends this as the `model` field in the request body

“Model Name” and “Model Name in API endpoint” must be identical. Friendly names with spaces (e.g. Gemini 3.5 Flash) will return 404 / model not found.

1.4 Context Length

Dify defaults to max_context = 4096, which is far below most modern models. Override it based on the model you choose:

Model	Context
`claude-opus-4-7` / `gpt-5.5` / `gemini-3.5-flash`	1,000,000
`claude-sonnet-4-5` / `claude-haiku-4-5`	200,000
`kimi-k2.5`	256,000
`deepseek-v3-2-251201`	128,000

Full context specs live on the model catalog.

2. Core Features

2.1 Chat Assistant

The simplest app type — great for support bots, Q&A, and role-play:

Create app → choose Chat Assistant template

Configure the system prompt:

You are the GravitexAI integration assistant. Your job:
- Answer questions about wiring up our API
- Recommend models that fit the user's scenario
- Refer the user to the docs or BD when unsure
Keep tone friendly, professional, and concise.

Pick gpt-5.5 or claude-opus-4-7 as the model
Suggested parameters: temperature = 0.7, max_tokens = 2000

2.2 Workflow Application

Compose multi-step DAGs with branches, parallel paths, and loops: Node-by-node model suggestions:

Intent classification: gemini-3.5-flash (high throughput, low latency)
KB retrieval / embedding: text-embedding-3-large or gemini-embedding-001
Long-doc reasoning: claude-opus-4-7 (1M context, strong reasoning)
Code generation: gpt-5.5 or qwen3-coder-plus

2.3 Knowledge Base Q&A (RAG)

Create a knowledge base → upload documents (PDF / Word / Markdown / TXT)
Embedding model: text-embedding-3-large recommended
Chunking: auto-split by paragraph, ~500 tokens per chunk
Reference the KB in the app
Retrieval parameters:
- Top-K: 3–5
- Similarity threshold: 0.7
- Reranker: enabled (significantly improves relevance)

3. Application Templates

Customer Support
Document Analysis
Coding Assistant

type: chat assistant
model: gpt-5.5
system_prompt: |
  You are a professional AI customer support agent. Responsibilities:
  - Answer user questions
  - Provide product information
  - Handle post-sales support
  Keep tone friendly and professional.
temperature: 0.7
max_tokens: 2000

type: workflow
input: uploaded document
pipeline:
  1. Document parsing (Dify built-in parser)
  2. Long-form summary (claude-opus-4-7, 1M context)
  3. Structured highlight extraction
  4. Generate report
output: Markdown report

type: chat assistant
model: claude-opus-4-7
system_prompt: |
  You are a professional coding assistant, specialized in:
  - Writing and refactoring code
  - Debugging
  - Architecture design
  - Best-practice recommendations
  Produce clear, runnable code with a brief explanation.
temperature: 0.3

4. Advanced Features

4.1 Call Dify Apps via API

Each Dify app is exposed as an HTTP service that can be called from outside:

import requests

url = "https://your-dify-instance/v1/chat-messages"
headers = {
    "Authorization": "Bearer YOUR_DIFY_APP_API_KEY",
    "Content-Type": "application/json",
}

payload = {
    "inputs": {},
    "query": "Introduce GravitexAI in one sentence.",
    "response_mode": "streaming",
    "user": "user_123",
}

resp = requests.post(url, headers=headers, json=payload, stream=True)
for line in resp.iter_lines():
    if line:
        print(line.decode("utf-8"))

4.2 Multimodal Input (Vision)

Vision-capable models (gpt-5.5, claude-opus-4-7, gemini-3.5-flash) accept image input:

{
    "inputs": {
        "image": "data:image/jpeg;base64,...",
        "instruction": "Identify the key information in this image."
    },
    "query": "Describe the image in detail and provide actionable insights."
}

4.3 Batch Processing

For large-scale workloads (CSV imports, bulk summarization, etc.):

Pick a fast/cheap model (gemini-3.5-flash, gpt 5.4 mini)
Cap concurrency in your Dify workflow to avoid hitting rate limits all at once
Enable result caching to avoid re-billing identical inputs

5. Model Selection

Full scenario-based model picks

Browse GravitexAI’s recommendations across writing, coding, fast response, long-context, image generation, and more.

Cost optimization: dev vs prod

development:
  model: gemini-3.5-flash       # low cost, fast iteration
  max_tokens: 1000
  temperature: 0.7

production:
  model: claude-opus-4-7         # flagship intelligence, reliable
  max_tokens: 2000
  temperature: 0.3
  fallback: gpt-5.5              # GravitexAI auto-failover

GravitexAI provides platform-level automatic failover. If a provider is unavailable, traffic is routed to an equivalent model automatically — no manual fallback chain in Dify required.

6. Best Practices

6.1 Structured prompts

# Role
You are a professional [role here].

# Task
Help the user complete [the specific task].

# Output Format
1. Summary (≤ 100 words)
2. Detailed analysis (bullet points)
3. Recommended actions

# Constraints
- Be factual; flag uncertainty explicitly
- Keep total length under 500 words
- Respond in [language]

6.2 Workflow design

6.3 Observability

Track regularly:

✅ User feedback (thumbs up/down)
⏱️ P95 latency
💰 Cost per call and daily/monthly usage curve
❌ Error rate and root cause distribution

The GravitexAI console shows per-key and per-model usage and cost in real time for easy reconciliation.

6.4 Versioning

Regularly export Dify app configs (JSON / YAML) as backups
Roll out new versions via canary; keep at least N-1 for rollback
Pin model versions in production to avoid surprise behavior changes

7. Troubleshooting

401 / invalid API key

Re-copy the API key from the GravitexAI Console
Verify the account has balance
Confirm baseURL is https://api.gravitex.ai/v1 (note the trailing /v1)

404 / model not found

Use the canonical model ID (e.g. gpt-5.5, not GPT-5.5)
“Model Name” and “Model Name in API endpoint” must be identical

Slow streaming / hangs

Prefer Flash / Mini tier models
Reduce max_tokens
Turn on Dify response caching

Performance tuning template

caching:
  enabled: true
  ttl: 3600s
  key: identical input

concurrency:
  max_in_flight: 10
  queue_size: 100
  timeout: 30s

resources:
  memory: 2GB
  cpu: 80%

8. Deployment Recommendations

Production Docker Compose (self-hosted Dify)

version: '3.8'
services:
  dify-api:
    image: langgenius/dify-api:latest
    environment:
      - SECRET_KEY=your-secret-key
      - DB_HOST=postgres
      - REDIS_HOST=redis
      - OPENAI_API_KEY=sk-your-gravitex-key
      - OPENAI_API_BASE=https://api.gravitex.ai/v1
    depends_on:
      - postgres
      - redis

  dify-web:
    image: langgenius/dify-web:latest
    ports:
      - "3000:3000"
    depends_on:
      - dify-api

  postgres:
    image: postgres:14
    environment:
      - POSTGRES_DB=dify
      - POSTGRES_USER=dify
      - POSTGRES_PASSWORD=password

  redis:
    image: redis:alpine

Security

Store the API key in env vars or a secrets manager — never hardcode it inside Dify configs
Terminate HTTPS in front of Dify (Nginx / Caddy / Traefik)
Enable SSO / 2FA for the Dify admin console
Keep base images and dependencies up to date

Health checks

import requests, time

def monitor_dify():
    try:
        r = requests.get("http://dify-api:5001/health", timeout=5)
        if r.status_code == 200:
            print("Dify healthy")
        else:
            print(f"Unhealthy, status: {r.status_code}")
    except Exception as e:
        print(f"Probe failed: {e}")

while True:
    monitor_dify()
    time.sleep(60)

9. Reconciliation & Insights

Once integrated, head back to the GravitexAI Console to see request volume, token consumption, cost breakdown, and per-model success rates:

Always use the canonical model ID (lowercase, exactly matching the official ID)
Standard API base: https://api.gravitex.ai/v1
Iterate on the full workflow with a fast/cheap model like gemini-3.5-flash, then swap to claude-opus-4-7 / gpt-5.5 for production
For high-volume workloads, reach out to bd@gravitex.ai for dedicated quotas

Chat AI

Coding

AI Agent

Tech Engineering

Translation

1. Quick Integration

1.1 Get an API Key

1.2 Open Dify Model Providers

1.3 Add a Model

1.4 Context Length

2. Core Features

2.1 Chat Assistant

2.2 Workflow Application

2.3 Knowledge Base Q&A (RAG)

3. Application Templates

4. Advanced Features

4.1 Call Dify Apps via API

4.2 Multimodal Input (Vision)

4.3 Batch Processing

5. Model Selection

Full scenario-based model picks

Cost optimization: dev vs prod

6. Best Practices

6.1 Structured prompts

6.2 Workflow design

6.3 Observability

6.4 Versioning

7. Troubleshooting

Performance tuning template

8. Deployment Recommendations

Production Docker Compose (self-hosted Dify)

Security

Health checks

9. Reconciliation & Insights

​1. Quick Integration

​1.1 Get an API Key

​1.2 Open Dify Model Providers

​1.3 Add a Model

​1.4 Context Length

​2. Core Features

​2.1 Chat Assistant

​2.2 Workflow Application

​2.3 Knowledge Base Q&A (RAG)

​3. Application Templates

​4. Advanced Features

​4.1 Call Dify Apps via API

​4.2 Multimodal Input (Vision)

​4.3 Batch Processing

​5. Model Selection

Full scenario-based model picks

​Cost optimization: dev vs prod

​6. Best Practices

​6.1 Structured prompts

​6.2 Workflow design

​6.3 Observability

​6.4 Versioning

​7. Troubleshooting

​Performance tuning template

​8. Deployment Recommendations

​Production Docker Compose (self-hosted Dify)

​Security

​Health checks

​9. Reconciliation & Insights

1. Quick Integration

1.1 Get an API Key

1.2 Open Dify Model Providers

1.3 Add a Model

1.4 Context Length

2. Core Features

2.1 Chat Assistant

2.2 Workflow Application

2.3 Knowledge Base Q&A (RAG)

3. Application Templates

4. Advanced Features

4.1 Call Dify Apps via API

4.2 Multimodal Input (Vision)

4.3 Batch Processing

5. Model Selection

Cost optimization: dev vs prod

6. Best Practices

6.1 Structured prompts

6.2 Workflow design

6.3 Observability

6.4 Versioning

7. Troubleshooting

Performance tuning template

8. Deployment Recommendations

Production Docker Compose (self-hosted Dify)

Security

Health checks

9. Reconciliation & Insights