DeepInfra

571 posts

DeepInfra banner
DeepInfra

DeepInfra

@DeepInfra

Fast ML inference. Run top AI models using a simple API.

Palo Alto 가입일 Şubat 2023
62 팔로잉4.6K 팔로워
고정된 트윗
DeepInfra
DeepInfra@DeepInfra·
Deep Infra and NVIDIA are working together on NVIDIA NemoClaw - an open-source stack that simplifies running OpenClaw always-on assistants, more safely, with a single command. As part of the @nvidia Agent Toolkit, it installs the NVIDIA OpenShell runtime - a secure environment for running autonomous agents, and open-source models like NVIDIA Nemotron. nvidia.com/nemoclaw
English
6
5
30
3.3K
Genuine Grape
Genuine Grape@GenuineGrape·
@DeepInfra I think it has to do with GPT OSS allowing to select reasoning... Other providers just REMOVE IT silently.... this is so frustrating
English
1
0
0
15
Genuine Grape
Genuine Grape@GenuineGrape·
I was talking shit about @DeepInfra but god damn it... 6 months passed this is the most reliable provider... Yeah... their servers can be slow but gpt 120b has been working consistently... Nothing breaks! unlike other providers... I use all of them but yeah... Deepinfra is stbl
English
2
0
0
33
DeepInfra
DeepInfra@DeepInfra·
Day 3 of #GTC in San Jose and the energy is still going strong. So many great conversations with customers, partners, and friends, this is why we show up. Come find us at Booth #4022 PS- Everyone needs a Jensen in their tshirt pocket. Amazing swag @nvidia
DeepInfra tweet mediaDeepInfra tweet mediaDeepInfra tweet media
English
0
1
7
237
DeepInfra
DeepInfra@DeepInfra·
Deep Infra serves the largest selection of open models - Kimi K2.5, GLM-5, Minimax M2.5, NVIDIA Nemotron, and more - running on NVIDIA Blackwell. Competitive pricing. Cached pricing to cut costs further. More inference for your budget. deepinfra.com/models
English
0
1
6
467
DeepInfra
DeepInfra@DeepInfra·
Deep Infra and NVIDIA are working together on NVIDIA NemoClaw - an open-source stack that simplifies running OpenClaw always-on assistants, more safely, with a single command. As part of the @nvidia Agent Toolkit, it installs the NVIDIA OpenShell runtime - a secure environment for running autonomous agents, and open-source models like NVIDIA Nemotron. nvidia.com/nemoclaw
English
6
5
30
3.3K
DeepInfra
DeepInfra@DeepInfra·
These agents don't use one model - they orchestrate teams of models for reasoning, coding, and tool calling. Token consumption scales exponentially. Sustained 2x growth every 2 weeks. When your agents are burning through millions of tokens, pricing matters. A lot!
English
0
0
2
64
DeepInfra
DeepInfra@DeepInfra·
NVIDIA OpenShell works with any coding agent - OpenClaw, Claude Code, Cursor, Codex, OpenCode - with zero code changes. Inference stays private by default. The runtime governs what the agent can see, do, and where inference goes.
English
0
0
3
88
DeepInfra
DeepInfra@DeepInfra·
NVIDIA OpenShell sits between your agent and your infrastructure. Agents run in isolated sandboxes with zero permissions by default. Every action is policy-enforced at the infrastructure layer, not inside the agent process.
English
0
0
4
255
DeepInfra
DeepInfra@DeepInfra·
@NVIDIA GTC starts tomorrow. If you're in San Jose this week - come find us at Booth #4022. Happy to talk models, inference, Blackwell optimizations, or anything AI. See you there!
English
1
0
7
306
DeepInfra
DeepInfra@DeepInfra·
Kimi K2.5 Turbo just dropped on Deep Infra 🚀 #1 by speed: 341 tokens/sec #1 by price: $0.90/1M tokens credits to @ArtificialAnlys for benchmarks
DeepInfra tweet media
English
12
20
310
25.3K
DeepInfra
DeepInfra@DeepInfra·
We are excited to launch @NVIDIA Nemotron 3 Super on DeepInfra! Built for complex multi-agent applications, this open hybrid MoE model with 120B/12B active params delivers up to 5x faster inference and supports a 1M-token context window — all optimized for efficient single-GPU deployment. Available now on DeepInfra OpenAI-compatible API at $0.10 input / $0.50 output / $0.04 cached per 1M tokens.
DeepInfra tweet media
English
4
7
23
3.5K
DeepInfra 리트윗함
Touch Grass Capital
Touch Grass Capital@Touch_GrassCap·
"How much am I wasting on inference?" $500/mo on Claude Sonnet - $15/mo on DeepInfra. Same workload. I built the tool that answers this in seconds. Free, open source, runs inside Cursor. volthq.dev
English
1
2
6
552
DeepInfra 리트윗함
Artificial Analysis
Artificial Analysis@ArtificialAnlys·
NVIDIA has released Nemotron 3 Super, a 120B (12B active) open weights reasoning model that scores 36 on the Artificial Analysis Intelligence Index with a hybrid Mamba-Transformer MoE architecture We were given access to this model ahead of launch and evaluated it across intelligence, openness, and inference efficiency. Key takeaways ➤ Combines high openness with strong intelligence: Nemotron 3 Super performs strongly for its size and is substantially more intelligent than any other model with comparable openness ➤ Nemotron 3 Super scored 36 on the Artificial Analysis Intelligence Index, +17 points ahead of the previous Super release and +12 points from Nemotron 3 Nano. Compared to models in a similar size category, this places it ahead of gpt-oss-120b (33), but behind the recently-released Qwen3.5 122B A10B (42). ➤ Focused on efficient intelligence: we found Nemotron 3 Super to have higher intelligence than gpt-oss-120b while enabling ~10% higher throughput per GPU in a simple but realistic load test ➤ Supported today for fast serverless inference: providers including @DeepInfra and @LightningAI are serving this model at launch with speeds of up to 484 tokens per second Model details 📝 Nemotron 3 Super has 120.6B total and 12.7B active parameters, along with a 1 million token context window and hybrid reasoning support. It is published with open weights and a permissive license, alongside open training data and methodology disclosure 📐 The model has several design features enabling efficient inference, including using hybrid Mamba-Transformer and LatentMoE architectures, multi-token prediction, and NVFP4 quantized weights 🎯 NVIDIA pre-trained Nemotron 3 Super in (mostly) NVFP4 precision, but moved to BF16 for post-training. Our evaluation scores use the BF16 weights 🧠 We benchmarked Nemotron 3 Super in its highest-effort reasoning mode ("regular"), the most capable of the model's three inference modes (reasoning-off, low-effort, and regular)
Artificial Analysis tweet media
English
20
63
483
92.4K
DeepInfra 리트윗함
NVIDIA AI Developer
NVIDIA AI Developer@NVIDIAAIDev·
Ready to get started? Nemotron 3 Super supports deployment across environments, from workstations to the cloud, and can be accessed through API, OpenRouter, or build.nvidia.com. It is now live and available on major inference platforms, packaged as NVIDIA NIM: 📥 Download the weights from @HuggingFace, launch an optimized instance through NVIDIA NIM, fine-tune with @UnslothAI, or start with the cookbooks from @lmsysorg and @vllm_project to get running in minutes. Super is also available through @baseten, @Cloudflare, @deepinfra, @FireworksAI_HQ, @friendliai, inference.net, @LightningAI, and @modal. 📗Read the Nemotron 3 Super technical report for the full details research.nvidia.com/labs/nemotron/…
English
3
4
30
6.4K