MODEL API · UNIFIED

One platform
for leading models.

TokenFleet aggregates production LLM models from DeepSeek, Moonshot, MiniMax, Zhipu, and more: 12 models through one API key, one invoice, and direct mainland connectivity.

Start building View docs

12 production models
Unified API gateway
Direct mainland routing


          # Same request body, switch base_url to connect
curl https://tokenfleet.cn/v1/chat/completions \
  -H "Authorization: Bearer $TOKENFLEET_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-v4-pro",
    "messages": [{"role": "user", "content": "Explain RAG in one sentence."}]
  }'


          from openai import OpenAI

client = OpenAI(
    base_url="https://tokenfleet.cn/v1",
    api_key=os.environ["TOKENFLEET_API_KEY"],
)

resp = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[{"role": "user", "content": "Explain RAG in one sentence."}],
)
print(resp.choices[0].message.content)


          import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://tokenfleet.cn/v1",
  apiKey: process.env.TOKENFLEET_API_KEY,
});

const resp = await client.chat.completions.create({
  model: "deepseek-v4-pro",
  messages: [{ role: "user", content: "Explain RAG in one sentence." }],
});
console.log(resp.choices[0].message.content);

MODELS GALLERY

12 production models,
one billing layer.

All models share one OpenAI-compatible endpoint and one business account.

View all models

DeepSeek LLM

deepseek-v4-pro

Next-gen DeepSeek flagship reasoning model

$1.71 / $3.43 per 1M
DeepSeek LLM

DeepSeek-V3.2

Efficient reasoning and tool use

$0.286 / $0.429 per 1M
Moonshot LLM

kimi-k2.5

Native visual agent engine

$0.571 / $3 per 1M
Moonshot LLM

kimi-k2.6

Long-form Chinese documents and office tasks

$0.929 / $3.86 per 1M
MiniMax LLM

MiniMax-M2.7

Efficient coding with self-iteration

$0.3 / $1.2 per 1M
Zhipu LLM

glm-5.1

Next-gen Zhipu language model

$0.857 / $3.43 per 1M

PRODUCT 01

Unified API gateway

Production models are aggregated behind one API gateway. One API key connects your app to multiple vendors today.

Unified endpoint and API key
One account, reconciliation, and invoice
Direct mainland routing with lower latency

View supported models

12 production models, one endpoint

LLM, image, video, and audio calls run through one API gateway for execution, metering, and reconciliation.

deepseek-v4-pro DeepSeek
deepseek-v4-flash DeepSeek
DeepSeek-V3.2 DeepSeek
DeepSeek-V3.2-A DeepSeek
deepseek-v3.1 DeepSeek
deepseek-v3.2-exp DeepSeek
kimi-k2.6 Moonshot
kimi-k2.5 Moonshot
MiniMax-M2.7 MiniMax
MiniMax-M2.5 MiniMax
glm-5.1 Zhipu
deepseek-v3 DeepSeek

+0 more production models

Private deployment / VPC direct connect

At scale, teams can request a VPC endpoint so traffic stays inside their private network boundary.

peering: cn-shanghai-2 · cn-beijing-1
encryption: TLS 1.3 + mTLS
egress: private egress only

Unified endpoint integration

One API key can call multiple model providers while keeping integration paths and usage records consistent.

− base_url="https://api.openai.com/v1"
+ base_url="https://tokenfleet.cn/v1"

Monthly token usage, error rates, and model mix are visible in one console.

Direct mainland routing, millisecond latency

Average time to first token (P50) across five cities from production samples.

City	P50	0 800ms
Beijing	142 ms
Shanghai	128 ms
Guangzhou	156 ms
Shenzhen	149 ms
Hangzhou	134 ms
Overseas direct	≥ 800ms

Sample window 2026-04 · View status

FOR PRODUCTION SCALE

Built for large-scale production usage.

When token usage outgrows self-serve limits, we work directly with your engineering team on single-entry integration, capacity planning, dedicated routing, and custom terms.

SLA
Enterprise SLA

Commitments are tailored by usage tier and team size, with monthly reconciliation.
VPC
Private deployment / VPC direct connect

Requests can stay inside your private network boundary. Final shape depends on scale.
SUPPORT
Dedicated technical contact

7x24 Chinese engineering support channel with front-line incident response.

Usually replies within 24 hours · zhangyue@nyuncloud.com

One platform
for leading models.

12 production models,
one billing layer.

One endpoint. Every model.

Unified API gateway

Why TokenFleet.

12 production models, one endpoint

Private deployment / VPC direct connect

Unified endpoint integration

Direct mainland routing, millisecond latency

Built for large-scale production usage.

Enterprise SLA

Private deployment / VPC direct connect

Dedicated technical contact

Contact TokenFleet sales

Contact TokenFleet sales

One platform for leading models.

12 production models, one billing layer.

One endpoint. Every model.

Unified API gateway

Why TokenFleet.

Built for large-scale production usage.

Enterprise SLA

Private deployment / VPC direct connect

Dedicated technical contact

One platform
for leading models.

12 production models,
one billing layer.