Sumit Datta

2.3K posts

Sumit Datta

@sumitdatta

41, engineer, founder, nocodo: Lovable for AI agents

Himalayan village, India شامل ہوئے Mart 2008

549 فالونگ881 فالوورز

پن کیا گیا ٹویٹ

Sumit Datta@sumitdatta·15 Eki

Sometimes it takes years for an idea to work out. I own my product's domain since 2013! Chased and failed multiple times. It was beyond what a solo founder could do. Then came LLMs, they got better at code. I restarted my product. Fresh perspective and a decade of attempts.

English

569

Sumit Datta@sumitdatta·12 Nis

@ibocodes This is how pricing works. Look at the flagship killer Android phones. The brands start with a mouth watering price. In a year or two, they are selling at much higher prices. Then there are new entrants.

English

ibo@ibocodes·11 Nis

chinese models used to be the cheap alternative why are they priced like claude & codex now 😭

English

380

629.3K

Sumit Datta@sumitdatta·10 Nis

@aryanlabde People also spend hundreds on movies, TV shows, rollercoaster rides. For thousands of people who are now able to build anything at all by just talking in their human language - $200 is not much. It is the power of freedom. And the tools are worse now than they will ever be

English

Aryan@aryanlabde·9 Nis

Nobody talks about the cost of vibe coding. Claude, cursor, vercel, supabase. You’re paying $200/month in tools to build a product making $0.

English

388

1.1K

83.4K

Sumit Datta@sumitdatta·10 Nis

Who is gonna tell this guy? Lol these people who cannot face reality. People really are sheep - they need a leader, they like the charm of OpenAI or Anthropic. When other models are dominating and democratizing, these people will go out of their way to share scam.

Melvyn • Builder@melvynx

all people telling you to switch to this Chinese model as they are "as good as Opus for 1/10 of the price" are basically liars and incompetent they don't use the tools, they make tweets for hype, because using this model for 2 minutes makes you realize how dumb they are what for? seriously

English

Sumit Datta@sumitdatta·10 Nis

@XSuxBro @sudoingX @pupposandro Tiny models are good at function calling. I use then to decide and branch out to specific agents for the given task. You have to plan your harness around tiny models.

English

Dhruv@XSuxBro·9 Nis

@sudoingX @pupposandro What is a 0.8B model good for?

English

1.2K

Sudo su@sudoingX·9 Nis

read this carefully anon. @pupposandro wrote a single fused CUDA kernel for all 24 layers of Qwen 3.5-0.8B. one kernel launch. absolutely zero CPU round trips between layers. the result? a $900 RTX 3090 from 2020 hit 411 tok/s. apple's M5 Max hit 229. the 3090 won on speed AND efficiency 1.55x faster than llama.cpp on the same hardware. the gap between NVIDIA and Apple was never about silicon. it was software. generic frameworks waste cycles on kernel launch overhead, memory re fetches, thread synchronization. when you fuse everything into one dispatch the hardware shows what it actually has. this is the beginning of something bigger. we already proved that a 27B dense model on a single 3090 one-shots what $70K enterprise hardware cannot. now imagine what happens when someone writes kernels optimized specifically for the 3090 and the models that run best on it. not generic inference. hardware specific, model specific fused from the kernel level up. the 3090 is not a relic. it's an untapped research platform. and the people writing these kernels are proving it with data. all open source and reproducible anon.

Sandro@pupposandro

x.com/i/article/2041…

English

126

1.3K

98.8K

Sumit Datta@sumitdatta·10 Nis

@sudoingX @pupposandro Interesting. I am not from an ML background but when I saw "Convert ONNX models into native, backend-agnostic Burn [Rust] code" on github.com/tracel-ai/burn… - I thought, what if we create custom code for a GPU for a specific model. Would it improve performance? I would love to try

English

Sumit Datta@sumitdatta·31 Mar

Multiple entity extraction (financial transactions, orders, locations, events, persons, orgs...) now works with @Alibaba_Qwen Qwen 3.5 0.8b model - should work on any small laptop Tested on M4 Mac Mini 16 GB + @UnslothAI model on llama.cpp) github.com/brainless/dwata

English

Sumit Datta@sumitdatta·30 Mar

@rezoundous I use the M4 Mac Mini with 16GB system memory. Full time development setup with Zed, Claude Code, Codex or opencode. I use Ollama + Ministral 3:3b in my own product github.com/brainless/dwata Works without hiccups.

English

108

Tyler@rezoundous·29 Mar

Everyone says 16GB RAM isn't enough anymore. I'm still daily driving a 16GB MacBook Air with zero complaints. Am I missing something?

English

343

700

69.6K

Sumit Datta@sumitdatta·29 Mar

@0xSero Sorry I am not well versed, what do you mean by a pure logic agent? I understand having less knowledge. Context: "prune down to 40% if you accept lots of knowledge loss and want a pure logic agent"

English

0xSero@0xSero·29 Mar

This is possibly the best we can get until another compression breakthrough pops up for LARGE MoEs WEIGHTS: - 50% reap + 3bit quant == 81.75% compression KV-CACHE: - turboquant 4 Basically you can run large MoEs with about 18-20% of the vram of the BF16 So for Deepseek (671 GB~) you will be able to run the weights in 127GB and 200k kv-cache in about 20-60gb of vram. Small models provide less savings I'd say you can comfortably prune 20-25% of the experts and quantise to about 4-8bits For 1T param models I think it's possible to prune down to 40% if you accept lots of knowledge loss and want a pure logic agent, IDK how this would look though ------------ My current plan: - Prune GLM-5 50% Done - Quantize to EXL3 w 3bits if no turboquant or if 4bits - Train the new models to respond like the original PEFT from GLM-5 --> GLM-5-358B-REAP --> REAP-GGUF-3BIT -> REAP-EXL-3BIT Very little samples were enough to recover an 80% REAP to semi-coherent from completely broken.

English

334

18.9K

Sumit Datta@sumitdatta·27 Mar

I needed GliNER2 for a Rust based desktop app. So I create a quick port of GliNER2 to Rust/Candle: github.com/brainless/glin…

English

Sumit Datta ری ٹویٹ کیا

Z.ai@Zai_org·19 Oca

Introducing GLM-4.7-Flash: Your local coding and agentic assistant. Setting a new standard for the 30B class, GLM-4.7-Flash balances high performance with efficiency, making it the perfect lightweight deployment option. Beyond coding, it is also recommended for creative writing, translation, long-context tasks, and roleplay. Weights: huggingface.co/zai-org/GLM-4.… API: docs.z.ai/guides/overvie… - GLM-4.7-Flash: Free (1 concurrency) - GLM-4.7-FlashX: High-Speed and Affordable

English

386

720

5.4K

1.6M

Sumit Datta@sumitdatta·19 Oca

@upen946 nocodo.com

QME

Upen@upen946·19 Oca

👋👋 Monday again!! Time to promote your product. 🚀 Share your product URL

English

491

246

23.3K

Sumit Datta@sumitdatta·19 Oca

@denicmarko nocodo.com

QME

Sumit Datta@sumitdatta·19 Oca

@shiri_shh nocodo.com

QME

shirish@shiri_shh·19 Oca

Show me your app, website or project and I’ll share my honest thoughts👇

English

700

414

41.4K

Sumit Datta@sumitdatta·11 Oca

The Model Wars is upon us 🍿

Kylie Robison@kyliebytes

Scoop: xAI staff had been using Anthropic’s models internally through Cursor—until Anthropic cut off the startup’s access this week. Here’s what xAI cofounder Tony Wu sent to staff on Wednesday according to a copy of the internal Slack message I viewed: “Hi team, I believe many of you have already discovered that anthropic models are not responding on cursor. According to cursor this is a new policy anthropic is enforcing for all its major competitors. This is a both bad and good news. We will get a hit on productivity, but it rly pushes us to develop our own coding product / models. We're at a time in which Al is now a critical technology for our own productivity. This coming year is rly going to be wildly exciting for all of us. The team is rapidly developing our own models / product. We will have something to share with everyone soon. In the meantime, you may still try all different kinds of models in grok build.” Anthropic declined to comment. A Cursor spokesperson directed me to Anthropic for comment. xAI did not respond to request for comment.

English

Sumit Datta@sumitdatta·10 Oca

@burkov Anthropic's largest revenue stream is API usage, not Claude Code. Enterprise has been their focus - agents. It is all about agents. Claude Code is only one such agent, of course, their own agent. Being hyper focused on that one agent is not even helpful. They are a model maker

English

BURKOV@burkov·10 Oca

An absolutely right decision. Claude models were fine-tuned with reinforcement learning to work with a specific interface. This is why Claude is so effective in coding. When you withdraw the model that was finetuned to work in a certain environment and put it into a different one, the consequences are absolutely unpredictable. Furthermore, the user experience is arbitrarily worse compared to the UX with the environment for which the model was finetuned. I'm 100% certain that people who still claim that Gemini is better for coding than Claude used Claude in a non-native environment.

English

290

48.4K

Sumit Datta@sumitdatta·10 Oca

@trq212 Double standard to model pricing is ridiculous. Claude is a model, text in, text out. That is effing it. Anthropic has two pricing - one if you use CC (can use Claude subscription) and another for every other agent (API pricing, which is expensive). Is CC a lock-in?

English

Thariq@trq212·9 Oca

Yesterday we tightened our safeguards against spoofing the Claude Code harness after accounts were banned for triggering abuse filters from third-party harnesses using Claude subscriptions.

English

262

1.2K

877.3K

Sumit Datta@sumitdatta·29 Ara

@timsoulo Because they lose customers as easily as they get. It's easy to count up the payments for some months to show ARR but that's not actually what they get over a 12 month period. Churn is super high. Changing platforms is not like changing a cloud provider. People build and leave

English

Tim Soulo 🇺🇦@timsoulo·29 Ara

Why did Lovable need to raise $300m series B, when they're already at $200M ARR with just 100 people? ...and they raised $200M series A not long ago. I mean: - They didn't pay for ads all around New York and San Francisco like Clay. - They didn't pay for Las Vegas Sphere and massive drone show like Gamma. - They didn't pay to sponsor F1 and Arsenal like Airwallex (not an AI startup, but I needed a third one). So what do they need so much money for? ⚠️ IMPORTANT: I'm not a hater! I actually love using Lovable and I'm quite proud of my growing collection of vibe-coded apps. But coming from the bootstrapped world I'm genuinely confused as to why they need so much money, when they're already seem to be doing so well with their ARR. Any theories?

English

219

423

193.2K

Sumit Datta@sumitdatta·28 Ara

@forgebitz My entire agent building platform runs on SQLite itself. And right now I'm focused on a good LLM based analyst for SQLite databases. Feel free to check the progress: github.com/brainless/noco…

English

458

Klaas@forgebitz·28 Ara

really curious how many "thin" startups you can replace by giving an llm a sqlite database and some cron jobs

English

460

61K

Sumit Datta ری ٹویٹ کیا

Andrew Feldman@andrewdfeldman·26 Ara

People ask what faster AI actually means for them. Here’s a concrete example. In the video, GLM-4.6 on @cerebras builds Space Invaders in ~15% of the time it takes Claude Sonnet 4.5 Thinking. The point isn’t speed for its own sake. When latency drops far enough, new classes of workflows become possible. This is like Netflix. Netflix didn’t build a $400 billion business because it mailed DVDs faster. It's entire business changed. It became a movie studio. This transformation was made possible when streaming became fast and reliable enough that people stopped thinking about buffering, downloads, and storage altogether. At that point, behavior changed: people browsed, clicked, abandoned, re-tried - without friction. Fast inference does the same thing for AI. When responses are slow, you design around waiting.

English

244

33K

Sumit Datta@sumitdatta·25 Ara

@hxtxmu @TheAhmadOsman Generally I mention "opencode is a coding agent that's already installed" - to give some context. And it works well. opencode with GLM fights the Rust compiler per file, runs code lint, format, etc. You can work at file or module level Claude does higher level management.

English

Sumit Datta@sumitdatta·25 Ara

@hxtxmu @TheAhmadOsman Let's say you refactor & multiple dependent files need update. So in claude code, something like: "We refactored abc.rs and multiple Rust files need update. Please check using cargo. Then use `opencode run "<prompt>"` to tell opencode to fix, one file at a time"

English

Ahmad@TheAhmadOsman·24 Ara

i am becoming a fan of OpenCode dare i say, OpenCode > Claude Code

English

101

742

292.9K

دریافت کریں

@ibocodes @aryanlabde @XSuxBro @sudoingX @pupposandro @Alibaba_Qwen @UnslothAI @rezoundous