Deepak Dubey 🇩🇪
473 posts

Deepak Dubey 🇩🇪
@thatdeepak
infrastructure engineer , Kubernetes on the brain 💡🌥️ Applied maths 👨🎓
Hamburg, Deutschland Katılım Eylül 2019
209 Takip Edilen146 Takipçiler

TurboQuant 2-bit KV killer upgrade
vLLM@vllm_project
vLLM v0.20.0 is here! 752 commits from 320 contributors (123 new). 🎉 Highlights: DeepSeek V4, Hunyuan v3 preview support, CUDA 13 / PyTorch 2.11 / Transformers v5 baseline, FA4 as default MLA prefill, TurboQuant 2-bit KV (4× capacity), vLLM IR foundation. Thread 👇
Deutsch

My bet: in the near future, 80%⬆️ of CS research will be done by AI in collaboration with humans. However, today's research ecosystem is still built around the human, not the AI scientist.
For example, the 8-page paper PDF is a lossy compression of months of branching exploration into a linear story, optimized for a human reviewer to skim in 30 minutes. It hides two structural taxes:
📖 Storytelling Tax — failures, rejected hypotheses, and dead ends get stripped. On RE-Bench (24,008 runs, 21 frontier models), failed runs = 90.2% of total compute cost, with a 113× median failed-to-success token ratio. Every lab independently rediscovers the same dead ends.
🔧 Engineering Tax — the gap between reviewer-sufficient prose and agent-sufficient spec. Across 8,921 PaperBench requirements (23 ICML'24 papers), only 45.4% are fully specified in the PDF. The rest is tacit lab knowledge. Tolerable when readers were human. Critical now that agents read, reproduce, and extend.
We propose ARA: the Agent-Native Research Artifact — replace the narrative PDF with an agent-executable package, in 4 layers:
🧠 structured scientific logic
⚙️ executable code w/ full specs
🌳 exploration graph (every failure preserved)
📊 evidence grounding every claim
English

@ClementDelangue But for multi-user scenarios, vLLM is still ahead
English

MiniMax m2.7 - 36gb
HumanEval+: 81% pass1, 90% pass5.
huggingface.co/OsaurusAI/Mini…
Eesti

🎉 We just shipped a major redesign of recipes.vllm.ai.
"How do I run model X on hardware Y for task Z?" now has a clickable answer.
What's new:
- URLs mirror HuggingFace: just swap huggingface.co → recipes.vllm.ai in any model URL to jump straight to its recipe (e.g. recipes.vllm.ai/Qwen/Qwen3.6-3…)
- Interactive command builder: pick hardware, variant, strategy (tensor, tensor+expert, or data+expert; single or multi-node; or a prefill/decode disaggregated cluster), toggle features → get the exact `vllm serve` command
- Pluggable hardware: NVIDIA + AMD already integrated. One-click switch between Hopper/Blackwell and MI300X/MI355X, and the right flags and env are applied automatically
- JSON API for agents: every recipe is also published at //.json (e.g. recipes.vllm.ai/Qwen/Qwen3.6-3…), so tools and agents can consume recipes without scraping
- Contribute a new recipe end-to-end with the agent skill shipped in the repo: github.com/vllm-project/r…
🔗 recipes.vllm.ai
Enjoy! ✨

English

MiniMax M2.7 is open-source!
The most interesting part of this release isn't a benchmark number. It's what MiniMax calls "self-evolution," and it's essentially Karpathy's Autoresearch applied at full scale.
Every AI agent today runs inside a harness: the scaffolding of skills, tools, memory, and workflow rules that surrounds it. Normally a human engineer builds this, and the agent operates within it. The harness stays fixed.
M2.7 treats its harness as something it can rewrite.
The agent runs a task, analyzes where things went wrong, plans changes to its own scaffold, applies them, evaluates against a benchmark, and decides whether to keep or revert. It writes self-criticism into memory so the next round starts smarter, then loops again.
MiniMax ran this for 100+ rounds internally. The model discovered optimizations on its own: it systematically searched for optimal sampling parameters, wrote workflow-specific guidelines (like checking for the same bug pattern in other files after a fix), and added loop detection to avoid getting stuck.
They also tested it on 22 ML competitions from OpenAI's MLE Bench Lite, each running 24 hours fully autonomous. With every round, the trained models achieved higher medal rates. The best run earned 9 gold medals.
The weights never changed. What improved was the system around the model: better skills, better memory, better workflow rules. That distinction matters because the improvement loop can run continuously without any retraining.
I'm pretty sure every major AI lab is doing some version of this internally. The fact that MiniMax is publishing it openly is what makes this release worth paying attention to.
huggingface : huggingface.co/MiniMaxAI/Mini…
Blog: minimax.io/news/minimax-m…
Note: The model licence is NON-COMMERCIAL LICENSE, that said, there's a lot to learn from this work being available in the open.
GIF
English

@shawnchauhan1 the real value lies in solving niche, high-impact problems not just building what general LLMs will eventually be able to do with enough training and scale
English

India is not waiting for OpenAI to build a Hindi voice model.
Sarvam AI is close to raising $300–350 million at a $1.5 billion valuation, with NVIDIA, Amazon, and Bessemer all participating.
The product: voice-first, multilingual models covering 22 Indian languages.
This is not a ChatGPT wrapper.
It is a direct bet that the next billion AI users will not interact in English, and that whoever builds the native-language infrastructure first will own the relationship.
Every large language market without a domestic frontier model is a gap waiting to be filled.
Sarvam is the first serious attempt to fill India's.

English

How to found a GmbH in Germany in 48 hours (digitally) 🇩🇪.
A lot of people commented on my other post, so I thought I'd share all the details. Here is a step-by-step guide for how to do it.
1. Get your electronic ID activated. Any German ID issued after 2022 works. You sign up online and get a PIN when you first receive it.
2. Call or email a notary and ask for a digital appointment. I called two in Munich on a Wednesday. One had a slot 18 hours later.
3. Optional: get a free confirmation from your local IHK that your "Gesellschaftszweck" doesn't conflict with existing companies. Takes just a web form and 24 hours. Create your company purpose with Claude.
4. Use the "Musterprotokoll", the simplest standard articles of association. Every notary has them ready to go.
5. Send the notary your company basics: name, address, share capital, and personal data.
6. They send you an invite to a video appointment.
7. The call takes ~10 minutes. You hold your ID to your phone via NFC in the notary app to verify your identity.
8. They send you the incorporation documents after the call.
9. Open a business bank account (I used Qonto), wire the capital. A GmbH requires €25K, but you can start with €12.5K. The bank confirms your deposit to the notary.
10. The notary triggers entry into the commercial register. Took me three working days.
bUt YoU haV'nT rEalLY inCorPorATeD yEt! 😡😭
> The GmbH can operate as a "GmbH in Gründung" (GmbH i.G.) from the moment the notarization is complete, so even before entry in the commercial register. Once the entry is made, it becomes a full GmbH.
> The VAT ID is separate and not required to start operating, though you'll need it for invoicing with VAT.
Godspeed.
English

Composer 2 is out!
Cursor is an example of a new type of company, not a pure app maker and not a model provider.
Our aim is to build the most useful coding agents by combining the best API models and our domain-specific models.
Cursor@cursor_ai
Composer 2 is now available in Cursor.
English

Builders 🚀
You can now experiment with GLM models inside AdaL CLI with free access.
A coding agent designed for real developer workflows.
• multi-model support
• long-term memory
• designed for shipping products faster
Exactly the kind of tooling we want to support through the Z.ai Startup Program.
Watch the video below.👇
English

@HelenaWangZuZAI Helena We already applied could you check our application thank you 🤩
English

GLM models support founders via free API tokens! Up to $10000!
Apply here:
startup.z.ai/?src=linkedin&…
English

We’re seeing more startups move from “demo” to real production AI.
But the hard part isn’t the prototype.
It’s shipping reliably. Scaling economically. Handling real users.
That’s why we built the “Z.ai Startup Program” → Grow your startup beyond limits with Z.ai.
Apply now → lnkd.in/gNJhHmGG Ship faster, scale cheaper.
English

Z.ai Startup Program is NOW OPEN.
What you can get:
·Free API credits
·Priority rate limits
·Exclusive Community
·Early API Access
Who we're looking for:
·AI-native startups
·Agent builders
·SaaS founders integrating LLM infra
·Global teams building for real-world scale
If you're building something that matters, don't wait!!
Apply now: startup.z.ai
Questions? Details? Follow & DM @ZaiforStartups

English

@livingdevops @mischavdburg @brankopetric00 Reason why there is cloud native pg exits We have been managing PostgreSQL clusters on-premises for over five years
English

@mischavdburg @brankopetric00 One should not put their main db in a containers in production. It’s ok for demo, dev environments, but never in production.
English

The obsession with putting everything in a container has to stop.
"We should containerize our PostgreSQL database!" said someone who has never managed stateful data in their life.
Not everything is a stateless 12-factor app. You're adding a layer of abstraction, performance overhead, and operational complexity for literally zero benefit.
Use a managed database service. Stop trying to Dockerize things that want to live on a file system.
English

@karanjagtiani04 @K8sArchitect I wouldn’t recommend all these tools
English

@K8sArchitect True, oversimplification can lead to missed nuances. How does K8sgpt handle complex cases?
English

K8sgpt is a tool that scans Kubernetes clusters, diagnoses, and triages issues in simple English
➤ ku.bz/jfdbw60d4
English




