MODEL API · UNIFIED

One platform
for leading models.

TokenFleet aggregates production LLM models from DeepSeek, Moonshot, MiniMax, Zhipu, and more: 12 models through one API key, one invoice, and direct mainland connectivity.

          # Same request body, switch base_url to connect
curl https://tokenfleet.cn/default/v1/chat/completions \
  -H "Authorization: Bearer $TOKENFLEET_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-v4-pro",
    "messages": [{"role": "user", "content": "Explain RAG in one sentence."}]
  }'
        

POWERED BY THESE MODELS

  • DeepSeek
  • Kimi · Moonshot
  • MiniMax
  • Zhipu GLM
WHAT TOKENFLEET SHIPS

One endpoint. Every model.

PRODUCT 01

Unified API gateway

Production models are aggregated behind one API gateway. One API key connects your app to multiple vendors today.

  • Unified endpoint and API key
  • One account, reconciliation, and invoice
  • Direct mainland routing with lower latency
View supported models
WHY TOKENFLEET

Why TokenFleet.

Four verifiable capabilities, not four adjectives.

A

12 production models, one endpoint

LLM, image, video, and audio calls run through one API gateway for execution, metering, and reconciliation.

  • deepseek-v4-pro DeepSeek
  • deepseek-v4-flash DeepSeek
  • DeepSeek-V3.2 DeepSeek
  • DeepSeek-V3.2-A DeepSeek
  • deepseek-v3.1 DeepSeek
  • deepseek-v3.2-exp DeepSeek
  • kimi-k2.6 Moonshot
  • kimi-k2.5 Moonshot
  • MiniMax-M2.7 MiniMax
  • MiniMax-M2.5 MiniMax
  • glm-5.1 Zhipu
  • deepseek-v3 DeepSeek
+0 more production models
B

Private deployment / VPC direct connect

At scale, teams can request a VPC endpoint so traffic stays inside their private network boundary.

peering
cn-shanghai-2 · cn-beijing-1
encryption
TLS 1.3 + mTLS
egress
private egress only
C

Unified endpoint integration

One API key can call multiple model providers while keeping integration paths and usage records consistent.

 base_url="https://api.openai.com/v1"
+ base_url="https://tokenfleet.cn/default/v1"

Monthly token usage, error rates, and model mix are visible in one console.

D

Direct mainland routing, millisecond latency

Average time to first token (P50) across five cities from production samples.

City P50 0 800ms
Beijing 142 ms
Shanghai 128 ms
Guangzhou 156 ms
Shenzhen 149 ms
Hangzhou 134 ms
Overseas direct ≥ 800ms

Sample window 2026-04 · View status

FOR PRODUCTION SCALE

Built for large-scale production usage.

When token usage outgrows self-serve limits, we work directly with your engineering team on single-entry integration, capacity planning, dedicated routing, and custom terms.

  1. SLA

    Enterprise SLA

    Commitments are tailored by usage tier and team size, with monthly reconciliation.

  2. VPC

    Private deployment / VPC direct connect

    Requests can stay inside your private network boundary. Final shape depends on scale.

  3. SUPPORT

    Dedicated technical contact

    7x24 Chinese engineering support channel with front-line incident response.

Contact TokenFleet sales

TokenFleet WeChat group QR code

Scan to join the WeChat group for enterprise usage and integration support.

Usually replies within 24 hours · zhangyue@nyuncloud.com