Thach Nguyen

178 posts

Thach Nguyen

@Thachnh

Staff @DeepInfra - building AI Inference Cloud | Co-Founder & CEO @Millis.ai - 600ms AI Voice Agent

Bay area, CA Katılım Nisan 2009

431 Takip Edilen305 Takipçiler

Thach Nguyen@Thachnh·2d

@warpdotdev Nice release!!! Worked out quite well from initial testing. Here's the quick way to plug into DeepInfra for cheap.

English

517

Warp@warpdotdev·2d

You can now bring your own key and inference endpoint to the Warp Agent, no paid plan required.

Zach Lloyd@zachlloydtweets

x.com/i/article/2057…

English

523

116.7K

Thach Nguyen@Thachnh·4 May

Excited to share that we at DeepInfra has raised $107M in Series B, backed by 500 Global, Georges Harik and NVIDIA 🎉 ! As Jensen Huang said at GTC 2026: ‘The inference inflection has arrived.’ We’ve seen exponential growth firsthand, and this is just the beginning. Onward!

English

Thach Nguyen retweetledi

DeepInfra@DeepInfra·1 May

DeepInfra is now a first-class provider in @OpenClaw. One key, every model. 🦞

OpenClaw🦞@openclaw

OpenClaw 2026.4.27 🦞 🧠 DeepInfra provider 📎 better file attachments 🛡️ operator-managed proxy routing 🧭 stricter model selection + local model fixes 🔧 gateway, channel, and session reliability Ships more than it brags. github.com/openclaw/openc…

English

1.8K

Thach Nguyen retweetledi

DeepInfra@DeepInfra·29 Nis

DeepInfra × Hugging Face DeepInfra is live on @HuggingFace Inference Providers. Run DeepSeek V4, Kimi-K2.6, GLM-5.1 and 100+ more open models straight from the Hub — same OpenAI-compatible API, same low per-token pricing, no markup. Just add :deepinfra to the model name.

English

24.4K

Thach Nguyen retweetledi

DeepInfra@DeepInfra·28 Nis

DeepInfra is an official launch partner for @nvidia Nemotron™ 3 Nano Omni — live today. One open multimodal 30B-A3B model. One pass over image, video, audio, docs+ text. No multi-model pipelines. OpenAI-compatible API, usage-based pricing. $0.20 in / $0.80 out per 1M tokens

English

4.4K

Thach Nguyen@Thachnh·24 Nis

Btw check it live here deepinfra.com/deepseek-ai/De… deepinfra.com/deepseek-ai/De…

English

Thach Nguyen@Thachnh·24 Nis

@ArtificialAnlys @deepseek_ai Very exciting release! Our team stayed up all night to host this model. Check it out here: deepinfra.com/deepseek-ai/De…

English

803

Artificial Analysis@ArtificialAnlys·24 Nis

DeepSeek V4 Pro is the #1 open weights model on GDPval-AA, our agentic real-world work tasks evaluation @deepseek_ai has released V4 Pro (1.6T total / 49B active) and V4 Flash (284B total / 13B active). V4 is DeepSeek's first new size since V3, with all intermediate models (V3.1, V3.2, R1, R1 0528) sharing the V3 family's 685B total / 37B active parameter MoE design. V4 Pro is also the largest open weights model released to date, surpassing Kimi K2.6 (1T total / 32B active) in both total and active parameter counts. V4 Pro is released mostly in FP4 precision, putting total model size at ~865GB, comparable to Kimi K2.6 (INT4, ~500GB). GLM-5.1 is BF16 (~1.49TB) natively and typically served in FP8 or FP4. Both models are hybrid thinking/non-thinking, and we tested the reasoning variants at Max Effort and High Effort. We are currently running the full suite of evaluations in the Artificial Analysis Intelligence Index and will share updates imminently. Key takeaways from evaluating GDPval-AA: ➤ V4 Pro leads all open weights models on GDPval-AA. V4 Pro (Reasoning, Max) scores 1554, ahead of GLM-5.1 (Reasoning, 1535), MiniMax-M2.7 (1514), and Kimi K2.6 (1484). V4 Flash (Reasoning, Max) scores 1388 well ahead of DeepSeek V3.2 (Reasoning, 1203) despite being a smaller model with fewer active and total parameters. V4 Pro (Reasoning, High, 1558) and V4 Flash (Reasoning, High, 1414) are effectively tied with their Max counterparts within the confidence interval ➤ V4 Pro is a significant upgrade on V3.2 in agentic capabilities. V3.2 (Reasoning) scored 1203 on GDPval-AA; V4 Pro (Reasoning, Max) scores 1554, a ~355 Elo point uplift. V4 Flash (Reasoning, High) scores 1414, a ~210 Elo point uplift over V3.2 (Reasoning) at a smaller and faster model size ➤ Output token usage varies materially across the V4 family on GDPval-AA. V4 Pro (Reasoning, High) used 8M output tokens on GDPval-AA and V4 Pro (Reasoning, Max) used 11M, in line with leading open weights peers Kimi K2.6 (10M) and MiniMax-M2.7 (7M). V4 Flash (Reasoning, Max) used 15M output tokens for a score of 1388, the highest token usage of any open weights peer on this benchmark. Notably, V4 Flash (Reasoning, High) scored higher at 1414 using only 7M output tokens Key model details: ➤ Size: V4 Pro 1.6T total / 49B active, V4 Flash 284B total / 13B active ➤ Architecture: First new DeepSeek architecture since V3 (V3 family was 685B total / 37B active MoE) ➤ Modality: Text input and output only, equivalent to V3.2 ➤ Context window: 1M tokens, an 8x expansion on V3.2's 128K context window ➤ Precision: Available as a mix of FP4 and FP8, or FP8 only ➤ License: MIT ➤ Availability: Available on DeepSeek's first-party API. As of writing, we expect many third-party providers to host the mode ➤ Pricing (DeepSeek first-party API): V4 Pro $1.74 / $3.48 per 1M input/output tokens, V4 Flash $0.14 / $0.28 per 1M input/output tokens. Cache hit input token pricing is $0.145 (V4 Pro) and $0.028 (V4 Flash) per 1M tokens

English

912

55.1K

Thach Nguyen@Thachnh·24 Nis

@OpenRouter @deepseek_ai Still waiting for OpenRouter to list DeepInfra :) Then the provider list won't be empty anymore (We already hosted here deepinfra.com/deepseek-ai/De…)

English

273

OpenRouter@OpenRouter·24 Nis

DeepSeek V4 Pro and V4 Flash are live on OpenRouter! @deepseek_ai's latest models show a huge jump in capabilities compared to V3.2, and meet or surpass current SOTA models across a variety of benchmarks.

English

914

48.6K

Thach Nguyen@Thachnh·24 Nis

@FindLogan @OpenRouter @deepseek_ai Some providers are marked as ignored due to the data tracking policy. Wait for DeepInfra to be listed. It's available deepinfra.com/deepseek-ai/De…

English

FindLogan@FindLogan·24 Nis

@OpenRouter @deepseek_ai OpenRouter is always quick to release new models... But for some reason they've IGNORED them??? Way to go to post an X post before checking your website.

English

1.3K

Thach Nguyen@Thachnh·24 Nis

@dw_hd_ @OpenRouter @deepseek_ai Waiting for OpenRouter to add us. It's available on DeepInfra deepinfra.com/deepseek-ai/De…

English

hdwei@dw_hd_·24 Nis

@OpenRouter @deepseek_ai Encountered this error, may i ask what’s the policy for the new DeepSeek models and how to update the settings?

English

914

Thach Nguyen@Thachnh·24 Nis

@deepseek_ai Finally!!! Congrats on the launch 🚀. Day-0 available on DeepInfra, running on Blackwells here in the US deepinfra.com/deepseek-ai/De…

English

141

DeepSeek@deepseek_ai·24 Nis

🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice. Try it now at chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today! 📄 Tech Report: huggingface.co/deepseek-ai/De… 🤗 Open Weights: huggingface.co/collections/de… 1/n

English

1.6K

7.7K

45.5K

9.8M

Thach Nguyen retweetledi

DeepInfra@DeepInfra·8 Nis

Gemma 4 is live on DeepInfra. Both models, best price anywhere. 26B: $0.08 in / $0.35 out / $0.01 cached 31B: $0.13 in / $0.38 out / $0.02 cached

English

771

Thach Nguyen retweetledi

DeepInfra@DeepInfra·7 Nis

Day 0. GLM-5.1 from @Zai_org is live on DeepInfra. Open source getting close to GPT-5.4 and Claude Opus 4.6. Powered by @nvidia B300 Blackwell Ultra. Early access pricing, costs will drop as we scale. $1.40 in / $4.40 out / $0.26 cached per 1M tokens ↓

English

1.2K

Thach Nguyen@Thachnh·7 Nis

@Zai_org deepinfra.com/zai-org/GLM-5.1 Day-0 available on DeepInfra here. Thanks to the collab with @Zai_org team

English

1.3K

Z.ai@Zai_org·7 Nis

Introducing GLM-5.1: The Next Level of Open Source - Top-Tier Performance: #1 in open source and #3 globally across SWE-Bench Pro, Terminal-Bench, and NL2Repo. - Built for Long-Horizon Tasks: Runs autonomously for 8 hours, refining strategies through thousands of iterations. Blog: z.ai/blog/glm-5.1 Weights: huggingface.co/zai-org/GLM-5.1 API: docs.z.ai/guides/llm/glm… Coding Plan: z.ai/subscribe Coming to chat.z.ai in the next few days.

English

546

1.3K

10.9K

4.3M

Thach Nguyen retweetledi

DeepInfra@DeepInfra·26 Mar

The #1 ranked TTS model is now on DeepInfra. @inworld_ai TTS-1.5 tops Artificial Analysis for expressiveness - 30% better than the previous gen. Starting at $5/1M characters. Two variants: Max for quality, Mini for speed and scale.

English

535

Thach Nguyen retweetledi

DeepInfra@DeepInfra·11 Mar

We are excited to launch @NVIDIA Nemotron 3 Super on DeepInfra! Built for complex multi-agent applications, this open hybrid MoE model with 120B/12B active params delivers up to 5x faster inference and supports a 1M-token context window — all optimized for efficient single-GPU deployment. Available now on DeepInfra OpenAI-compatible API at $0.10 input / $0.50 output / $0.04 cached per 1M tokens.

English

4.1K

Thach Nguyen retweetledi

DeepInfra@DeepInfra·12 Şub

DeepInfra x @NVIDIA Blackwell x @latitudeio Latitude runs large-scale MoE models on our Blackwell-powered inference platform with NVFP4 and TensorRT-LLM — powering AI Dungeon (1.5M monthly players) and their upcoming Voyage AI RPG platform. "DeepInfra on NVIDIA Blackwell gives us the performance we need at a cost that actually works at scale." — Nick Walton, CEO, Latitude deepinfra.com/blog/nvidia-bl…

English

Thach Nguyen@Thachnh·11 Şub

@bridgemindai Checkout DeepInfra pricing. We have the best price in the market deepinfra.com/zai-org/GLM-5

English

BridgeMind@bridgemindai·11 Şub

GLM 5 just dropped and the pricing is absurd. $0.80 per million input tokens. $2.56 per million output tokens. For context: Claude Opus 4.6: $5/$25 GPT 5.3 Codex: $1.75/$14 GLM-5: $0.80/$2.56 GLM 5 is 6x cheaper than Opus on input and 10x cheaper on output. 200K context window. Built for chat, coding, and agentic tasks. This is Zhipu's flagship model. The same lab that likely ran "Pony Alpha" on OpenRouter for free as a stealth test. GLM 5 is now live on OpenRouter. Created today, February 11, 2026. China isn't just competing. They're undercutting everyone while shipping frontier-level models. Time to find out if GLM-5 can actually hang with the big labs.

English

1.3K

157.7K

Thach Nguyen@Thachnh·11 Şub

@Zai_org Congrats on the release, team 🎉 DeepInfra is proud to be day-0 inference provider for this amazing model. deepinfra.com/zai-org/GLM-5

English

196

Z.ai@Zai_org·11 Şub

Introducing GLM-5: From Vibe Coding to Agentic Engineering GLM-5 is built for complex systems engineering and long-horizon agentic tasks. Compared to GLM-4.5, it scales from 355B params (32B active) to 744B (40B active), with pre-training data growing from 23T to 28.5T tokens. Try it now: chat.z.ai Weights: huggingface.co/zai-org/GLM-5 Tech Blog: z.ai/blog/glm-5 OpenRouter (Previously Pony Alpha): openrouter.ai/z-ai/glm-5 Rolling out from Coding Plan Max users: z.ai/subscribe

English

314

783

5.3K

1.5M

Thach Nguyen retweetledi

DeepInfra@DeepInfra·11 Şub

Day-0 with @Zai_org: GLM-5 is live on DeepInfra 🔥 Built for long-horizon agents that plan, orchestrate, and self-correct. Serving ~100 TPS at launch and as usual the best price on the market!

English

151

12.5K

Keşfet

@warpdotdev @openclaw @huggingface @nvidia @ArtificialAnlys @deepseek_ai @OpenRouter @FindLogan