Pranjal Srivastava

21

Pau Labarta Bajo@paulabartabajo_·2d

Advice for AI engineers 💡 In 2026 audio to function call can be solved with Small Models. This is how you build local, fast and private voice assistants... that work :-) Here's a full example with code⭣ github.com/Liquid4All/coo…

English

6

16

144

7.3K

Pranjal Srivastava@_Pranjal·1d

Absolutely agree.... tried DeepSeek v4 flash using SSDs for KV cache on my Mac Studio works like a charm... Am yet to experiment with TurboQuant.. may be will give it a try today and compare gemma4 31 B with / without.. But yes OpenSource is back at what it does best - democratise tech

English

88

Bindu Reddy@bindureddy·2d

Open-source AI is ruthlessly out-innovating the trillion-dollar monopolies. 🚀 Big labs are burning billions brute-forcing AGI on massive GPU clusters. Meanwhile, the open ecosystem is structurally forced to innovate on inference—and it's working. Look at what just happened: - DeepSeek v4 using SSDs for KV cache. - Breakthroughs like TurboQuant and Kimi K2 are aggressively compressing memory and driving the cost of intelligence to near zero. When you don't have infinite compute, you actually have to engineer better solutions. Constraints breed miracles. By solving the KV cache bottleneck, scrappy open-source builders are creating vastly cheaper and more profitable AI than the bloated closed-source giants. Hacker culture > GPU monopolies. Period.

English

68

35

292

16.1K

Pranjal Srivastava@_Pranjal·2d

@paulabartabajo_ What kind of hardware will I need for finetune? Will it work on dgx spark or Mac Studio?

English

82

Pau Labarta Bajo@paulabartabajo_·2d

End-2-end Tutorial 𝗙𝗶𝗻𝗲-𝘁𝘂𝗻𝗲 𝗮𝗻 𝗮𝘂𝗱𝗶𝗼 𝗺𝗼𝗱𝗲𝗹 to build a voice home assistant. Audio in, function call out. Step by step Enjoy ⬇ github.com/Liquid4All/coo…

GIF

English

4

21

115

4K

Pranjal Srivastava@_Pranjal·3d

Debugging a "Memory leak" today — but not the C/C++ kind. Back then: your process slowly ate RAM till the OS killed it. Today, on our SaaS based AI agent: User A's memory was bleeding into User B's session. Same word. Different decade. The blast radius went from "crash" to "privacy incident."

English

37

Pranjal Srivastava@_Pranjal·4d

Been using my custom AI agent with DeepSeek V4 Flash running on my Mac Studio. Was on qwen3.6:35b-a3b on the DGX Spark before this. This feels like a serious upgrade. Thanks to the people who made this possible - open weights from DeepSeek and a custom Metal inference engine from @antirez Wild that this runs on consumer hardware. Full setup + benchmark in the blog 👇 x.com/_Pranjal/statu…

English

133

Pranjal Srivastava@_Pranjal·4d

@MohapatraHemant True that… actually one easy metric would be judge the excitement in their voice when they talk about their adventures with ai 😀

English

249

Hemant Mohapatra@MohapatraHemant·4d

A key question I've been focusing on hiring engg, product, or GTM leaders for portfolio is how they plan their org buildout. It's a real tell on your AI nativeness. If you've been using AI the right way, your first instinct should just be "I'll do it all myself" - just SO much of what was delegated out because one couldn't scale the # of hours can now be in-sourced back and built to exactly your taste, your quality bar, your desired output. If your first instinct is still to go hire a bunch of people because that's how you scaled ARR, users, or products before, you are really on the wrong side of this wave. And it's easy to tell who's faking it in these interviews, who is really red-pilled.

English

29

5

151

15.8K

Pranjal Srivastava@_Pranjal·4d

@bindureddy Thanks for the tweet, downloaded and tested it on my Mac Studio, got 32 t/s. I totally agree OpenSource is the real winer here. x.com/_Pranjal/statu…

x.com/i/article/2053…

English

1

1.2K

Bindu Reddy@bindureddy·6d

🚨 OPEN SOURCE AI IS LITERALLY UNSTOPPABLE 🚨 The legendary founder of Redis (Antirez) just dropped ds4 - a custom native inference engine built specifically for DeepSeek v4 Flash This is earth shattering! Here is why: DeepSeek v4 Flash is a quasi-frontier model with a massive 1M context window You can now run it LOCALLY on a 128GB Mac using specialized 2-bit quantization The architecture is reimagined—he moved the KV cache from RAM directly to the SSD disk! 🤯 We already know DeepSeek v4 Flash is insanely good for agentic loops - Now you don't even need the cloud to run it Closed-source labs are burning tens of billions on massive GPU clusters while single brilliant developers are running frontier-level AI on laptops! They told us open-source would be worthless against trillion-dollar monopolies Instead, pure hacker culture + incredible open-weight models are completely rewriting the rules Open Source will ALWAYS win 💕

English

146

320

2.8K

779K

Pranjal Srivastava@_Pranjal·4d

@garrytan Thanks for the tweet, I was able to setup and run this on my MacStudio and test the model. As per my tests thinking mode is slower and sub optimal in terms of output. Here are the details 👇 x.com/_Pranjal/statu…

x.com/i/article/2053…

English

904

Garry Tan@garrytan·6d

github.com/antirez/ds4

ZXX

5

32

359

32.8K

Garry Tan@garrytan·6d

Downloading now... 1M token context window with supposedly usable coding agent capability all on a 128GB Macbook Pro is 🤯

Bindu Reddy@bindureddy

🚨 OPEN SOURCE AI IS LITERALLY UNSTOPPABLE 🚨 The legendary founder of Redis (Antirez) just dropped ds4 - a custom native inference engine built specifically for DeepSeek v4 Flash This is earth shattering! Here is why: DeepSeek v4 Flash is a quasi-frontier model with a massive 1M context window You can now run it LOCALLY on a 128GB Mac using specialized 2-bit quantization The architecture is reimagined—he moved the KV cache from RAM directly to the SSD disk! 🤯 We already know DeepSeek v4 Flash is insanely good for agentic loops - Now you don't even need the cloud to run it Closed-source labs are burning tens of billions on massive GPU clusters while single brilliant developers are running frontier-level AI on laptops! They told us open-source would be worthless against trillion-dollar monopolies Instead, pure hacker culture + incredible open-weight models are completely rewriting the rules Open Source will ALWAYS win 💕

English

118

322

4K

1.1M

Pranjal Srivastava@_Pranjal·4d

x.com/i/article/2053…

ZXX

1

2.2K

Pranjal Srivastava@_Pranjal·6d

@bindureddy Good list.. but did you try glm 5.1? In my test it turned out cheaper and gave better results compared to kimi k2.6 Here is the detailed comparison I did x.com/_pranjal/statu…

x.com/i/article/2052…

English

1

537

Bindu Reddy@bindureddy·6d

Best models - May Edition Coding - GPT 5.5 xHigh Seeking truth - Grok 4.3 Video - SeeDance 2.0 Image - GPT Image 2.0 Voice - Gemini Live Hermes - m2.7 Cheap coding - Kimi 2.6 Cheap fast - Gemini Flash Best open source - DeepSeek v4 Pretty much everything will change after Google I/O

English

50

35

428

26.3K

Pranjal Srivastava@_Pranjal·6d

@paulabartabajo_ Cool thanks for this. Just fyi: I did build my voice agent runs fully locally but has tts, llm, stt. Currently I use whisper, llama and kokoro. Would be happy to remove llm :)

English

Pau Labarta Bajo@paulabartabajo_

25

Pau Labarta Bajo@paulabartabajo_·6d

@_Pranjal Info

I'm running a free webinar walking through every step of building a voice agent that does not need cloud, fully on-device: Problem framing System design Evals Fine-tuning Deployment Register today ↓ liquid-ai.zoom.us/webinar/regist…

Español

0

20

Pau Labarta Bajo@paulabartabajo_·8 May

Advice for AI engineers 💡 If you're building voice agents, stop wiring up 3 separate models, for audio-to-text, text-to-audio, or text-to-text. 𝗹𝗶𝗾𝘂𝗶𝗱-𝗮𝘂𝗱𝗶𝗼 is the open-source repo for SOTA speech-to-speech LFM models. End 2 end. Now with fine-tuning support. More examples coming. Bookmark this ↓ github.com/Liquid4All/liq…

English

11

43

286

14.7K

Pranjal Srivastava@_Pranjal·6d

@paulabartabajo_ Cool! While you are at it, it will be good to understand why we all need to finetune on our dataset, why a generalised finetune on using mcp or APIs tool will not work. Look forward to the webinar.

English

34

Pau Labarta Bajo@paulabartabajo_·6d

@_Pranjal Yes, but to get good results you need to fine tune it to your own dataset. I am preparing an example an a 60min webinar on this

English

0

110

Pranjal Srivastava@_Pranjal·7 May

@Vitali_EE Glad you like it!

English

14

RiderOnTheStorm@Vitali_EE·7 May

@_Pranjal Great article! Thanks

English

0

1

21

Pranjal Srivastava@_Pranjal·6 May

x.com/i/article/2052…

ZXX

2

0

4

4.9K

Pranjal Srivastava@_Pranjal·7 May

@TheGeorgePu I agree, Chinese models are doing really well. I ran a test using OpenRouter API to measure how much $ will be consumed by different models for same task.. The results were surprising in terms of cost as well as model performance Here is the article 👇 x.com/_Pranjal/statu…

x.com/i/article/2052…

English

1

54

George Pu@TheGeorgePu·6 May

Three Chinese models in the top five on OpenRouter. MiniMax. Moonshot. DeepSeek. Not in China. Globally. The API is 10 to 20 times cheaper. The output is close enough. 80% of open-source AI startups are running Chinese models. That's a16z's number. Not mine. Developers didn't switch for ideology. They switched for the bill. The race ended in February. Nobody made the announcement.

English

19

9

73

4.3K

Pranjal Srivastava@_Pranjal·7 May

@stock_beaver Yep, I have been using GLM 5.1 on openclaw, works great there as well

English

34

stockbeaver@stock_beaver·7 May

@_Pranjal Good stuff! glm5.1 is underrated and needs more love. Only downside is the lack of native vision capability.

English

0

1

41

Pranjal Srivastava@_Pranjal·6 May

@bindureddy Well grok 4.3 is performance really bad.. so may be they will take sometime to build a better model and why waste the compute in the meantime...

English

518

Bindu Reddy@bindureddy·6 May

Grok is giving up a lot of it's compute to Claude Hopefully, Grok doesn't die 💀 They have the best line of fast models with real-time information

English

66

9

217

16.6K

Pranjal Srivastava@_Pranjal·6 May

WOW! love it! Compute was the biggest bottleneck for Anthropic models. I hope they now open up oAuth on opencode and other harnesses

Claude@claudeai

We’ve agreed to a partnership with @SpaceX that will substantially increase our compute capacity. This, along with our other recent compute deals, means that we’ve been able to increase our usage limits for Claude Code and the Claude API.

English

81

Pranjal Srivastava@_Pranjal·6 May

@rayanabdulcader @BrianMRey I agree, GPT 5.5 is very efficient with output tokens. That plus oAuth on Opencode / Openclaw makes GPT 5.5 real deal

English

2

50

Rayan A Cader@rayanabdulcader·6 May

@_Pranjal @BrianMRey The cost difference is what got me. Opus 4.7 winning on quality is expected but GPT 5.5 being way cheaper per task changes how you'd actually pick a model for daily coding.

English