DeepInfra

571 posts

DeepInfra

@DeepInfra

Fast ML inference. Run top AI models using a simple API.

Palo Alto 가입일 Şubat 2023

62 팔로잉4.6K 팔로워

고정된 트윗

DeepInfra@DeepInfra·3d

Deep Infra and NVIDIA are working together on NVIDIA NemoClaw - an open-source stack that simplifies running OpenClaw always-on assistants, more safely, with a single command. As part of the @nvidia Agent Toolkit, it installs the NVIDIA OpenShell runtime - a secure environment for running autonomous agents, and open-source models like NVIDIA Nemotron. nvidia.com/nemoclaw

English

3.3K

DeepInfra@DeepInfra·1d

@GenuineGrape we try our best, to well be the best

English

Genuine Grape@GenuineGrape·2d

@DeepInfra I think it has to do with GPT OSS allowing to select reasoning... Other providers just REMOVE IT silently.... this is so frustrating

English

Genuine Grape@GenuineGrape·2d

I was talking shit about @DeepInfra but god damn it... 6 months passed this is the most reliable provider... Yeah... their servers can be slow but gpt 120b has been working consistently... Nothing breaks! unlike other providers... I use all of them but yeah... Deepinfra is stbl

English

DeepInfra@DeepInfra·1d

Day 3 of #GTC in San Jose and the energy is still going strong. So many great conversations with customers, partners, and friends, this is why we show up. Come find us at Booth #4022 PS- Everyone needs a Jensen in their tshirt pocket. Amazing swag @nvidia

English

237

DeepInfra@DeepInfra·3d

@Touch_GrassCap Thanks for referencing Us :D

English

Touch Grass Capital@Touch_GrassCap·3d

x.com/i/article/2033…

ZXX

107

DeepInfra@DeepInfra·3d

Deep Infra serves the largest selection of open models - Kimi K2.5, GLM-5, Minimax M2.5, NVIDIA Nemotron, and more - running on NVIDIA Blackwell. Competitive pricing. Cached pricing to cut costs further. More inference for your budget. deepinfra.com/models

English

467

DeepInfra@DeepInfra·3d

English

3.3K

DeepInfra@DeepInfra·3d

These agents don't use one model - they orchestrate teams of models for reasoning, coding, and tool calling. Token consumption scales exponentially. Sustained 2x growth every 2 weeks. When your agents are burning through millions of tokens, pricing matters. A lot!

English

DeepInfra@DeepInfra·3d

NVIDIA OpenShell works with any coding agent - OpenClaw, Claude Code, Cursor, Codex, OpenCode - with zero code changes. Inference stays private by default. The runtime governs what the agent can see, do, and where inference goes.

English

DeepInfra@DeepInfra·3d

NVIDIA OpenShell sits between your agent and your infrastructure. Agents run in isolated sandboxes with zero permissions by default. Every action is policy-enforced at the infrastructure layer, not inside the agent process.

English

255

DeepInfra@DeepInfra·3d

Heading to GTC today — so much to see this week. If you're around, come say hi at Booth #4022. Let's go #GTC2026 #NVIDIA #AI #DeepInfra

English

578

DeepInfra@DeepInfra·4d

@NVIDIA GTC starts tomorrow. If you're in San Jose this week - come find us at Booth #4022. Happy to talk models, inference, Blackwell optimizations, or anything AI. See you there!

English

306

DeepInfra@DeepInfra·6d

deepinfra.com/moonshotai/Kim…

ZXX

DeepInfra@DeepInfra·6d

Kimi K2.5 Turbo just dropped on Deep Infra 🚀 #1 by speed: 341 tokens/sec #1 by price: $0.90/1M tokens credits to @ArtificialAnlys for benchmarks

English

310

25.3K

DeepInfra@DeepInfra·6d

@csgriff_ @nvidia yes we are working on adding these

English

Chris Griffin@csgriff_·6d

@DeepInfra @nvidia Any plans to add the Qwen3.5 series?

English

DeepInfra@DeepInfra·11 Mar

We are excited to launch @NVIDIA Nemotron 3 Super on DeepInfra! Built for complex multi-agent applications, this open hybrid MoE model with 120B/12B active params delivers up to 5x faster inference and supports a 1M-token context window — all optimized for efficient single-GPU deployment. Available now on DeepInfra OpenAI-compatible API at $0.10 input / $0.50 output / $0.04 cached per 1M tokens.

English

3.5K

DeepInfra 리트윗함

Touch Grass Capital@Touch_GrassCap·6d

"How much am I wasting on inference?" $500/mo on Claude Sonnet - $15/mo on DeepInfra. Same workload. I built the tool that answers this in seconds. Free, open source, runs inside Cursor. volthq.dev

English

552

DeepInfra 리트윗함

Artificial Analysis@ArtificialAnlys·11 Mar

NVIDIA has released Nemotron 3 Super, a 120B (12B active) open weights reasoning model that scores 36 on the Artificial Analysis Intelligence Index with a hybrid Mamba-Transformer MoE architecture We were given access to this model ahead of launch and evaluated it across intelligence, openness, and inference efficiency. Key takeaways ➤ Combines high openness with strong intelligence: Nemotron 3 Super performs strongly for its size and is substantially more intelligent than any other model with comparable openness ➤ Nemotron 3 Super scored 36 on the Artificial Analysis Intelligence Index, +17 points ahead of the previous Super release and +12 points from Nemotron 3 Nano. Compared to models in a similar size category, this places it ahead of gpt-oss-120b (33), but behind the recently-released Qwen3.5 122B A10B (42). ➤ Focused on efficient intelligence: we found Nemotron 3 Super to have higher intelligence than gpt-oss-120b while enabling ~10% higher throughput per GPU in a simple but realistic load test ➤ Supported today for fast serverless inference: providers including @DeepInfra and @LightningAI are serving this model at launch with speeds of up to 484 tokens per second Model details 📝 Nemotron 3 Super has 120.6B total and 12.7B active parameters, along with a 1 million token context window and hybrid reasoning support. It is published with open weights and a permissive license, alongside open training data and methodology disclosure 📐 The model has several design features enabling efficient inference, including using hybrid Mamba-Transformer and LatentMoE architectures, multi-token prediction, and NVFP4 quantized weights 🎯 NVIDIA pre-trained Nemotron 3 Super in (mostly) NVFP4 precision, but moved to BF16 for post-training. Our evaluation scores use the BF16 weights 🧠 We benchmarked Nemotron 3 Super in its highest-effort reasoning mode ("regular"), the most capable of the model's three inference modes (reasoning-off, low-effort, and regular)

English

483

92.4K

DeepInfra 리트윗함

NVIDIA AI Developer@NVIDIAAIDev·11 Mar

Ready to get started? Nemotron 3 Super supports deployment across environments, from workstations to the cloud, and can be accessed through API, OpenRouter, or build.nvidia.com. It is now live and available on major inference platforms, packaged as NVIDIA NIM: 📥 Download the weights from @HuggingFace, launch an optimized instance through NVIDIA NIM, fine-tune with @UnslothAI, or start with the cookbooks from @lmsysorg and @vllm_project to get running in minutes. Super is also available through @baseten, @Cloudflare, @deepinfra, @FireworksAI_HQ, @friendliai, inference.net, @LightningAI, and @modal. 📗Read the Nemotron 3 Super technical report for the full details research.nvidia.com/labs/nemotron/…

English

6.4K

DeepInfra@DeepInfra·11 Mar

Try the model here → deepinfra.com/nvidia/NVIDIA-… More details → deepinfra.com/blog/nvidia-ne…

English

207

탐색

@GenuineGrape @nvidia @Touch_GrassCap @ArtificialAnlys @csgriff_ @NVIDIA @LightningAI @elonmusk