Sumit Datta

2.3K posts

Sumit Datta banner
Sumit Datta

Sumit Datta

@sumitdatta

41, engineer, founder, nocodo: Lovable for AI agents

Himalayan village, India شامل ہوئے Mart 2008
549 فالونگ881 فالوورز
پن کیا گیا ٹویٹ
Sumit Datta
Sumit Datta@sumitdatta·
Sometimes it takes years for an idea to work out. I own my product's domain since 2013! Chased and failed multiple times. It was beyond what a solo founder could do. Then came LLMs, they got better at code. I restarted my product. Fresh perspective and a decade of attempts.
English
0
0
8
569
Sumit Datta
Sumit Datta@sumitdatta·
@ibocodes This is how pricing works. Look at the flagship killer Android phones. The brands start with a mouth watering price. In a year or two, they are selling at much higher prices. Then there are new entrants.
English
0
0
0
74
ibo
ibo@ibocodes·
chinese models used to be the cheap alternative why are they priced like claude & codex now 😭
ibo tweet media
English
380
71
3K
629.3K
Sumit Datta
Sumit Datta@sumitdatta·
@aryanlabde People also spend hundreds on movies, TV shows, rollercoaster rides. For thousands of people who are now able to build anything at all by just talking in their human language - $200 is not much. It is the power of freedom. And the tools are worse now than they will ever be
English
0
0
0
9
Aryan
Aryan@aryanlabde·
Nobody talks about the cost of vibe coding. Claude, cursor, vercel, supabase. You’re paying $200/month in tools to build a product making $0.
English
388
50
1.1K
83.4K
Sumit Datta
Sumit Datta@sumitdatta·
Who is gonna tell this guy? Lol these people who cannot face reality. People really are sheep - they need a leader, they like the charm of OpenAI or Anthropic. When other models are dominating and democratizing, these people will go out of their way to share scam.
Melvyn • Builder@melvynx

all people telling you to switch to this Chinese model as they are "as good as Opus for 1/10 of the price" are basically liars and incompetent they don't use the tools, they make tweets for hype, because using this model for 2 minutes makes you realize how dumb they are what for? seriously

English
0
0
0
42
Sumit Datta
Sumit Datta@sumitdatta·
@XSuxBro @sudoingX @pupposandro Tiny models are good at function calling. I use then to decide and branch out to specific agents for the given task. You have to plan your harness around tiny models.
English
0
0
2
91
Sudo su
Sudo su@sudoingX·
read this carefully anon. @pupposandro wrote a single fused CUDA kernel for all 24 layers of Qwen 3.5-0.8B. one kernel launch. absolutely zero CPU round trips between layers. the result? a $900 RTX 3090 from 2020 hit 411 tok/s. apple's M5 Max hit 229. the 3090 won on speed AND efficiency 1.55x faster than llama.cpp on the same hardware. the gap between NVIDIA and Apple was never about silicon. it was software. generic frameworks waste cycles on kernel launch overhead, memory re fetches, thread synchronization. when you fuse everything into one dispatch the hardware shows what it actually has. this is the beginning of something bigger. we already proved that a 27B dense model on a single 3090 one-shots what $70K enterprise hardware cannot. now imagine what happens when someone writes kernels optimized specifically for the 3090 and the models that run best on it. not generic inference. hardware specific, model specific fused from the kernel level up. the 3090 is not a relic. it's an untapped research platform. and the people writing these kernels are proving it with data. all open source and reproducible anon.
Sandro@pupposandro

x.com/i/article/2041…

English
55
126
1.3K
98.8K
Sumit Datta
Sumit Datta@sumitdatta·
@sudoingX @pupposandro Interesting. I am not from an ML background but when I saw "Convert ONNX models into native, backend-agnostic Burn [Rust] code" on github.com/tracel-ai/burn… - I thought, what if we create custom code for a GPU for a specific model. Would it improve performance? I would love to try
English
0
0
0
34
Sumit Datta
Sumit Datta@sumitdatta·
Multiple entity extraction (financial transactions, orders, locations, events, persons, orgs...) now works with @Alibaba_Qwen Qwen 3.5 0.8b model - should work on any small laptop Tested on M4 Mac Mini 16 GB + @UnslothAI model on llama.cpp) github.com/brainless/dwata
English
0
0
0
55
Sumit Datta
Sumit Datta@sumitdatta·
@rezoundous I use the M4 Mac Mini with 16GB system memory. Full time development setup with Zed, Claude Code, Codex or opencode. I use Ollama + Ministral 3:3b in my own product github.com/brainless/dwata Works without hiccups.
English
1
0
1
108
Tyler
Tyler@rezoundous·
Everyone says 16GB RAM isn't enough anymore. I'm still daily driving a 16GB MacBook Air with zero complaints. Am I missing something?
English
343
9
700
69.6K
Sumit Datta
Sumit Datta@sumitdatta·
@0xSero Sorry I am not well versed, what do you mean by a pure logic agent? I understand having less knowledge. Context: "prune down to 40% if you accept lots of knowledge loss and want a pure logic agent"
English
0
0
0
65
0xSero
0xSero@0xSero·
This is possibly the best we can get until another compression breakthrough pops up for LARGE MoEs WEIGHTS: - 50% reap + 3bit quant == 81.75% compression KV-CACHE: - turboquant 4 Basically you can run large MoEs with about 18-20% of the vram of the BF16 So for Deepseek (671 GB~) you will be able to run the weights in 127GB and 200k kv-cache in about 20-60gb of vram. Small models provide less savings I'd say you can comfortably prune 20-25% of the experts and quantise to about 4-8bits For 1T param models I think it's possible to prune down to 40% if you accept lots of knowledge loss and want a pure logic agent, IDK how this would look though ------------ My current plan: - Prune GLM-5 50% Done - Quantize to EXL3 w 3bits if no turboquant or if 4bits - Train the new models to respond like the original PEFT from GLM-5 --> GLM-5-358B-REAP --> REAP-GGUF-3BIT -> REAP-EXL-3BIT Very little samples were enough to recover an 80% REAP to semi-coherent from completely broken.
0xSero tweet media
English
23
20
334
18.9K
Sumit Datta ری ٹویٹ کیا
Z.ai
Z.ai@Zai_org·
Introducing GLM-4.7-Flash: Your local coding and agentic assistant. Setting a new standard for the 30B class, GLM-4.7-Flash balances high performance with efficiency, making it the perfect lightweight deployment option. Beyond coding, it is also recommended for creative writing, translation, long-context tasks, and roleplay. Weights: huggingface.co/zai-org/GLM-4.… API: docs.z.ai/guides/overvie… - GLM-4.7-Flash: Free (1 concurrency) - GLM-4.7-FlashX: High-Speed and Affordable
Z.ai tweet media
English
386
720
5.4K
1.6M
Upen
Upen@upen946·
👋👋 Monday again!! Time to promote your product. 🚀 Share your product URL
English
491
8
246
23.3K
shirish
shirish@shiri_shh·
Show me your app, website or project and I’ll share my honest thoughts👇
English
700
3
414
41.4K
Sumit Datta
Sumit Datta@sumitdatta·
@burkov Anthropic's largest revenue stream is API usage, not Claude Code. Enterprise has been their focus - agents. It is all about agents. Claude Code is only one such agent, of course, their own agent. Being hyper focused on that one agent is not even helpful. They are a model maker
English
0
0
0
31
BURKOV
BURKOV@burkov·
An absolutely right decision. Claude models were fine-tuned with reinforcement learning to work with a specific interface. This is why Claude is so effective in coding. When you withdraw the model that was finetuned to work in a certain environment and put it into a different one, the consequences are absolutely unpredictable. Furthermore, the user experience is arbitrarily worse compared to the UX with the environment for which the model was finetuned. I'm 100% certain that people who still claim that Gemini is better for coding than Claude used Claude in a non-native environment.
BURKOV tweet media
English
86
9
290
48.4K
Sumit Datta
Sumit Datta@sumitdatta·
@trq212 Double standard to model pricing is ridiculous. Claude is a model, text in, text out. That is effing it. Anthropic has two pricing - one if you use CC (can use Claude subscription) and another for every other agent (API pricing, which is expensive). Is CC a lock-in?
English
0
0
0
96
Thariq
Thariq@trq212·
Yesterday we tightened our safeguards against spoofing the Claude Code harness after accounts were banned for triggering abuse filters from third-party harnesses using Claude subscriptions.
English
262
64
1.2K
877.3K
Sumit Datta
Sumit Datta@sumitdatta·
@timsoulo Because they lose customers as easily as they get. It's easy to count up the payments for some months to show ARR but that's not actually what they get over a 12 month period. Churn is super high. Changing platforms is not like changing a cloud provider. People build and leave
English
0
0
0
47
Tim Soulo 🇺🇦
Tim Soulo 🇺🇦@timsoulo·
Why did Lovable need to raise $300m series B, when they're already at $200M ARR with just 100 people? ...and they raised $200M series A not long ago. I mean: - They didn't pay for ads all around New York and San Francisco like Clay. - They didn't pay for Las Vegas Sphere and massive drone show like Gamma. - They didn't pay to sponsor F1 and Arsenal like Airwallex (not an AI startup, but I needed a third one). So what do they need so much money for? ⚠️ IMPORTANT: I'm not a hater! I actually love using Lovable and I'm quite proud of my growing collection of vibe-coded apps. But coming from the bootstrapped world I'm genuinely confused as to why they need so much money, when they're already seem to be doing so well with their ARR. Any theories?
English
219
13
423
193.2K
Sumit Datta
Sumit Datta@sumitdatta·
@forgebitz My entire agent building platform runs on SQLite itself. And right now I'm focused on a good LLM based analyst for SQLite databases. Feel free to check the progress: github.com/brainless/noco…
English
0
0
1
458
Klaas
Klaas@forgebitz·
really curious how many "thin" startups you can replace by giving an llm a sqlite database and some cron jobs
English
35
3
460
61K
Sumit Datta ری ٹویٹ کیا
Andrew Feldman
Andrew Feldman@andrewdfeldman·
People ask what faster AI actually means for them. Here’s a concrete example. In the video, GLM-4.6 on @cerebras builds Space Invaders in ~15% of the time it takes Claude Sonnet 4.5 Thinking. The point isn’t speed for its own sake. When latency drops far enough, new classes of workflows become possible. This is like Netflix. Netflix didn’t build a $400 billion business because it mailed DVDs faster. It's entire business changed. It became a movie studio. This transformation was made possible when streaming became fast and reliable enough that people stopped thinking about buffering, downloads, and storage altogether. At that point, behavior changed: people browsed, clicked, abandoned, re-tried - without friction. Fast inference does the same thing for AI. When responses are slow, you design around waiting.
English
25
20
244
33K
Sumit Datta
Sumit Datta@sumitdatta·
@hxtxmu @TheAhmadOsman Generally I mention "opencode is a coding agent that's already installed" - to give some context. And it works well. opencode with GLM fights the Rust compiler per file, runs code lint, format, etc. You can work at file or module level Claude does higher level management.
English
0
0
0
70
Sumit Datta
Sumit Datta@sumitdatta·
@hxtxmu @TheAhmadOsman Let's say you refactor & multiple dependent files need update. So in claude code, something like: "We refactored abc.rs and multiple Rust files need update. Please check using cargo. Then use `opencode run "<prompt>"` to tell opencode to fix, one file at a time"
English
2
0
1
71
Ahmad
Ahmad@TheAhmadOsman·
i am becoming a fan of OpenCode dare i say, OpenCode > Claude Code
English
101
15
742
292.9K