
Deep Infra
- Features
- Ease of use
- Ease of management
- Quality of support
- Affordability
- Market presence
- Information technology and software
- Healthcare and life sciences
- Transportation and logistics
What is Deep Infra
Managed model inference APIs
Broad model catalog access
Developer-oriented integration
Less end-to-end AI tooling
Vendor dependency for serving
Governance and compliance fit varies
Plan & Pricing
Pricing model: Pay-as-you-go (per-token, per-execution-time, and per-GPU-hour billing)
Billing & minimums: Requires adding a card or pre-paying to use services. Invoicing thresholds / usage tiers (automatic tiering as spend increases): Tier 1 = $20; Tier 2 = $100; Tier 3 = $500; Tier 4 = $2,000; Tier 5 = $10,000. Invoices generated monthly and when tier thresholds are reached.
Free tier / trial: No permanent free plan stated on pricing/docs. (See Free plan / trial fields below.)
Example costs (official site examples / representative SKUs):
-
Token-priced LLM examples (prices shown are per 1M input tokens / per 1M output tokens):
- DeepSeek-V3.2 — $0.26 (in) / $0.38 (out)
- DeepSeek-R1-0528 — $0.50 (in) / $2.15 (out)
- MiniMax-M2.5 — $0.27 (in) / $0.95 (out)
- zai-org GLM-5 — $0.80 (in) / $2.56 (out)
- Llama-4-Scout-17B-16E — $0.08 (in) / $0.30 (out)
- gemini-2.5-pro — $1.25 (in) / $10.00 (out)
-
Execution-time / per-second examples (models billed by inference execution time):
- Bria text-to-video models (e.g., video_eraser, video_foreground_mask, etc.) — $0.14 per second.
- Some image models — $0.00–$0.01 per image examples shown on site (e.g., fibo_edit listed as $0.00 / image; p-image $0.005 / image).
- FLUX image pricing formulas shown (example: FLUX-1-dev $0.009 x (w/1024) x (h/1024) x (iters/25); FLUX-1-schnell $0.0005 x ... ).
-
Embeddings (per 1M input tokens):
- bge-base-en-v1.5 — $0.005 / 1M
- e5-large-v2 — $0.01 / 1M
- other embedding models listed at $0.005–$0.01 per 1M
-
Dedicated GPU (custom LLMs / uptime billed in minute granularity; invoiced weekly):
- A100 80GB — $0.89 per GPU-hour
- H100 80GB — $1.69 per GPU-hour
- H200 141GB — $1.99 per GPU-hour
- B200 180GB / DGX B200 — $2.49 per instance-hour (DGX/B200 cluster pricing noted)
Discounts / enterprise / custom: No public standard discounts listed. Dedicated instances, DGX clusters, and enterprise or volume pricing handled via sales / contact (contact sales / dedicated@deepinfra.com referenced).
Notes / billing behavior:
- Models are priced either per-token (input + output billed separately) or per-execution-time depending on model type.
- Accounts limited to 200 concurrent requests by default; request increases via sales.
- Invoicing generated at start of month and also intra-month when tier thresholds are reached.
- Official site states "You have to add a card or pre-pay or you won't be able to use our services."