Ashish

1.7K posts

Ashish

@inqusit

Cognition, Retrieval Research at IIT, iOS dev for 10 years, Building Agentic Products, harness engineering

Mumbai Katılım Haziran 2009

700 Takip Edilen106 Takipçiler

Ashish@inqusit·2d

If a 150gb size model is delivered at 1.58 bits would it rival a trillion param model ? If yes World would be a different place, suddenly the energy and compute bottleneck would be solved. But more importantly a true physics aware world model requiring 100s of trillions of params would be possible. Robotics projects would achieve what they want to. Manufacturing abundance era would arrive. And more importantly the abundance would arrive before unemployment skyrockets due to current scenario. Most positive impact would be on jobs, suddenly every small employer cutting on job would be on a hiring spree, because playing field would be leveled. Every small startup can aspire to be self sufficient with intelligence. Democratization of Ai would be a reality not just conference talks. Edge devices would be capable of defending themselves from mythos orchestrated model threats. Govt funds would be spent on pubic infrastructure and national security rather than energy and compute. Nature won't be scortched for Resources like water, etc. There can't be anything holy than this. All of this may sound far fetched but starts at mere 14 gb and 21 gb size models at this intelligence density.

English

PrismML@PrismML·3d

On intelligence density, Ternary Bonsai significantly outperforms other models in comparable parameter classes.

English

13.6K

PrismML@PrismML·3d

Today we’re announcing Ternary Bonsai: Top intelligence at 1.58 bits Using ternary weights {-1, 0, +1}, we built a family of models that are 9x smaller than their 16-bit counterparts while outperforming most models in their respective parameter classes on standard benchmarks. We’re open-sourcing the models under the Apache 2.0 license in three sizes: 8B (1.75 GB), 4B (0.86 GB), and 1.7B (0.37 GB).

English

109

297

2.2K

453.3K

Ashish@inqusit·2d

8b at 1.58 bits is impressive but it can be truly tested at most demanding tasks like coding only at 60 billion plus model (at 1.58 bits it would be 14gb size running on 16gb ram machine) At 100 billion param (around 21gb size) US would finally have an answer to chinese opensource. There would be either frontier labs like anthropic, OpenAi, gemini, grok playing 10 trillion param scale. Rest would be all PrismML or PrismML inspired models.

English

PrismML@PrismML·3d

Links: Blog: prismml.com/news/ternary-b… Whitepaper: github.com/PrismML-Eng/Bo… Models: huggingface.co/collections/pr… HuggingFace Demo: huggingface.co/spaces/webml-c… GitHub: github.com/PrismML-Eng/Bo… Discord: discord.gg/prismml Join us: #careers" target="_blank" rel="nofollow noopener">prismml.com/#careers

English

112

11.1K

Ashish@inqusit·2d

Dwarkesh : Will TPUs or ASICs challenge NVidea in future ? Jensen : No our chips are so much better no one else can make them Dwarkesh : Why do you export chips to china which can use them to build Mythos level models which can pose cyber threat ? Jensen : China can make these chips themselves. They have great researchers. 🤷‍♂️ We need to have constructive dialogue with china. Fan boys : Jensen is such a goat. We need to stop viewing china as an enemy.

Max@minordissent

idk if its good or bad for his career, but Dwarkesh’s willingness to challenge powerful people on his pod is certainly commendable.

English

Ashish@inqusit·3d

Compute is being measured in watts instead of FLops or Tops Now after bonsai, models are going to be measured in GBs not billions of params I think if the trend for efficiency continues Tokens will be measured in Kb or MB instead of raw count.

PrismML@PrismML

English

Ashish@inqusit·3d

PrismML delivered again. This is scifi stuff. Ternary bonsai - 8b model at 1.75gb (1/9th) size of qwen 3 8b 7% point gain over their own 1bit bonsai 8b in a matter of days. Looking forward to a 20gb model at this level of compact intelligence.

PrismML@PrismML

English

118

Ashish@inqusit·3d

Can't believe my eyes Anthropic increased usage limits to compensate for increased token consumptions by Opus 4.7

Boris Cherny@bcherny

Opus 4.7 uses more thinking tokens, so we've increased rate limits for all subscribers to make up for it. Enjoy!

English

Ashish@inqusit·3d

Claude opus 4.7 is a beast. Deep researching a topic since 20 plus minutes. Whopping 2521 sources. For comparison opus 4.6 never went above 900 or 10 minutes of deep research. Did same deep research yesterday with gpt 5.4 it is nowhere close. As always Benchmarks wont tell the complete story. I am sure opus 4.7 is going to be significant jump in coding quality with claude code. Very competitive release indeed.

English

Ashish@inqusit·4d

@trigguuuu Vaibhav Raghuvanshi of influencers

हिन्दी

Gagan Choudhary@trigguuuu·5d

Never knew Dhruv Rathee's career will be ended by a Kid Seriously!!

English

1.4K

5.8K

50.7K

Ashish retweetledi

blue@bluewmist·5d

according to philosophy, the highest form of peace is to have zero desire to be understood, admired, pitied or even known.

English

218

6.5K

40.5K

1.1M

Ashish@inqusit·5d

Yes noticed and i agree. But the codex limits are still way higher than claude code. One can get 10-20 5.4 high. Some ppl say 5.4 = sonnet 5.4 pro = opus. (5.4 pro is not even available at $20 plan) I think 5.4 high is better at context and thinking through a task than sonnet high effort. But 5.4 has a drift problem (even after explicit instructions) which is why I would like opus for a planning or coding a critical task.

English

165

AndrewB@andrewbatiuk·5d

@inqusit @claudeai 'no other model provider does this' - have you not seen the recent uproar about Codex?

English

130

Claude@claudeai·5d

We've redesigned Claude Code on desktop. You can now run multiple Claude sessions side by side from one window, with a new sidebar to manage them all.

English

2.1K

3.3K

42.9K

Ashish@inqusit·5d

@NickADobos So True. Claude pro subscription is useless with opus 4.6. A couple of requests in few minutes and you are done for half a day. No one will commit to a max plan by making the experience worst for pro users.

English

214

Nick Dobos@NickADobos·5d

LLM usage limits being 5 hour windows is a fun way to ensure I swap between codex and Claude every day and have no loyalty to either

English

684

17.1K

Ashish@inqusit·5d

@claudeai Claude pro plan is almost unusable with opus. Reached hourly limit in just few requests (3-4) under half an hour. No other model provider does this. Unprecedented greed or just pure neglect for entry level customers ?

English

1.4K

Claude@claudeai·5d

The redesign also adds an integrated terminal, file editing, HTML and PDF preview, and a faster diff viewer, all in a drag-and-drop layout you can arrange to your preference. Your CLI plugins work exactly as they do on the command line.

English

2.1K

418.8K

Ashish@inqusit·5d

If limit is reached (hourly or daily) and I still type out a prompt because I am in flow at the moment. Claude mobile app immediately discards the whole prompt, deleted it from memory and refreshed the page, complete information loss. This is trivial iOS dev. Can atleast queue them and save in local on-device memory.

English

217

Claude@claudeai·5d

Scheduled routines let you give Claude a cadence and walk away. Try telling Claude to pull the top bug from Linear every night at 2am, attempt a fix, and open a draft PR. If you've been using /schedule in the CLI, those are routines now, and there's nothing to migrate.

English

435

198.6K

Claude@claudeai·5d

Now in research preview: routines in Claude Code. Configure a routine once (a prompt, a repo, and your connectors), and it can run on a schedule, from an API call, or in response to an event. Routines run on our web infrastructure, so you don't have to keep your laptop open.

English

746

1.5K

18.5K

4.5M

Ashish@inqusit·5d

you guys at INC should take a look at EPFO portal once. Any person trying to file a claim to get his hard earned money back is being blocked by a enomination page. and they dont process enomination for years. Meaning any jobless person trying to get his money back from govt EPFO portal is blocked. Then there is a separate grievance website for just raising a complaint. And that portal simply sends an SMS that go call this landline or visit an office. Imagine a person looking to file a complaint he has to know this separate portal and then gets blocked by endless bureaucracy. ECI was not a big deal, this is.

English

111

Congress@INCIndia·5d

नेता विपक्ष श्री @RahulGandhi ने आज मुर्शिदाबाद में विशाल जनसभा को संबोधित किया। पश्चिम बंगाल की जनता TMC के भ्रष्टाचार और BJP के झूठे वादों से तंग आ चुकी है, इसलिए वह अब बदलाव के लिए तैयार है। कांग्रेस पार्टी जनता के हक की लड़ाई पूरी मजबूती से लड़ेगी। 📍 पश्चिम बंगाल

हिन्दी

501

1.8K

14.6K

Ashish@inqusit·5d

English

Congress@INCIndia·5d

नोएडा में कल कर्मचारियों का आंदोलन हुआ। कुछ दिन पहले ही हरियाणा के मानेसर में भी प्रदर्शन हुआ था। कांग्रेस ने श्रमिकों के हित के लिए लेबर लॉ बनाए, लेकिन BJP की सरकार उन कानूनों को ध्वस्त करते हुए 4 लेबर कोड ले आई। एक दशक पहले मजदूर, ट्रेड यूनियनों से अपनी बात मनवाने के लिए हड़ताल करते थे, जिससे देश प्रभावित होता था और सरकारें उनसे बात करती थीं। लेकिन अब माहौल बदल चुका है, आज इन 4 लेबर कोड की आड़ में मजदूरों का शोषण हो रहा है। : @kkc_india के चेयरमैन @Dr_Uditraj जी 📍 दिल्ली

हिन्दी

354

756

9.4K

Ashish@inqusit·6d

@DannyJQuiroz @Steve_Yegge Looks much like a parasite for the last line.

English

Danny Quiroz@DannyJQuiroz·6d

🎯 You couldn’t have made a more prescient post. I don’t know if Google is bought into a “too big to fail” thesis or “Anthropic will never compete on ads” but Anthropic has bear hugged every business user like a pathogen. Google’s only consolation is that they own 14% of Anthropic and serve Anthropic models on Google Cloud.

English

1.1K

Steve Yegge@Steve_Yegge·6d

I was chatting with my buddy at Google, who's been a tech director there for about 20 years, about their AI adoption. Craziest convo I've had all year. The TL;DR is that Google engineering appears to have the same AI adoption footprint as John Deere, the tractor company. Most of the industry has the same internal adoption curve: 20% agentic power users, 20% outright refusers, 60% still using Cursor or equivalent chat tool. It turns out Google has this curve too. But why is Google so... average? How is it that a handful of companies are taking off like a spaceship, and the rest, including Google, are mired in inaction? My buddy's observation was key here: There has been an industry-wide hiring freeze for 18+ months, during which time nobody has been moving jobs. So there are no clued-in people coming in from the outside to tell Google how far behind they are, how utterly mediocre they have become as an eng org. He says the problem is that they can't use Claude Code because it's the enemy, and Gemini has never been good enough to capture people's workflows like Claude has, so basically agentic coding just never really took off inside Google. They're all just plodding along, completely oblivious to what's happening out there right now. Not only is Google not able to do anything about it, they don't seem to be aware of the problem at all. I'm having major flashbacks to fifty years ago as a kid at the La Brea Tar Pits, asking, "why can't they just climb out?" My Google friend and I had this conversation over a month ago. I didn't share it because I wanted to look around a bit, and see if it's really as bad as all that. I've been talking to people from dozens of companies since then. And yeah. It's as bad as all that. Google is about average. Some companies at the bottom have near-zero AI adoption and can't even get budget for AI. They may have moats and high walls, but the horde is coming for them all the same. And then there are a few companies I've met recently who are *amazingly* leaned in to AI adoption. One category-leader company just cancelled IntelliJ for a thousand engineers. That's an incredibly bold move, one of many they're making towards agentic adoption. In my opinion, that company is setting themselves up for a _huge_ W. As for the rest, well, it's the Great Siloing. Everyone's flying blind. With nobody moving companies, no company knows where they stand on the AI adoption curve. Nobody knows how they're doing compared to everyone else. Half of them just check a box: "We enabled {Copilot/Cursor} for everyone!" Cue smug celebrations. They think this is like getting SOC2 compliance, just a thing they turn on and now it's "solved." And they don't realize that they've done effectively nothing at all. All because of a hiring freeze.

English

534

461

5.3K

2.7M

Ashish@inqusit·6d

@addyosmani @Steve_Yegge @Google @antigravity @geminicli So googlers are tied to antigravity and @geminicli ? They can't just use claude code opus 4.6 or codex pro for obvious reasons but that's a huge handicap for anyone who tried antigravity and claude code/codex.

English

1.5K

Addy Osmani@addyosmani·6d

Much love for your work, Steve. On behalf of @Google, this post doesn't match the state of agentic coding at our company. Over 40K SWEs use agentic coding weekly here. Googlers have access to our own versions of @antigravity, @geminicli, custom models, skills, CLIs and MCPs for our daily work. Orchestrators, agent loops, virtual SWE teams and many other systems are actively available to folks. We'll be writing more about our stacks this year. For personal projects, many engineers actively try the same models and tools the community does - much love to Claude Code, Open Code, Conductor and others - which helps us also learn about opportunities to improve (folks can even use @AnthropicAI's models on Vertex). Many, many of us are here on @X and build our own tools in addition to make sure we're staying sharp. With so many friends working at other frontier labs and start-ups to give ourselves a baseline, Google is anything but average.

English

998

169.3K

Ashish@inqusit·6d

@Steve_Yegge @demishassabis You are absolutely right. I have never used antigravity since using it for 1st week of 3.1 pro launch. It doesn't write anycode that you can ship. You cannot steer it like claude code or codex.

English

234

Steve Yegge@Steve_Yegge·6d

I'm not trying to misrepresent anyone, and perhaps my Googler friends are misinformed. But I strongly suspect that by my own notions of what constitutes advanced AI adoption--and indeed, what most of the industry would expect from Google right now--you are not doing great. At Anthropic, which is basically the bar at this point, everyone is burning, I'd guess, 10M to 15M tokens a day. If Google can convince me that half their engineers are burning 4M tokens a day, then I'd be happy to post a retraction with an apology.

English

120

334

186.8K

Ashish@inqusit·6d

@GG_Observatory Al thõugh the summ áry is ni.ce but there are prøblems with this generation tôö.

Eesti

GG 🦾@GG_Observatory·6d

Chunking and scoping are pragmatic but they don't fix the generation quality at the source — they work around it. The real tell is that chunking still requires you to know what the right boundaries are, which means you're doing design work upfront that the model should be able to do. The generation problem and the context problem are two symptoms of the same cause: we're not constraining the model to produce high-quality output by default, we're hoping context management will compensate.

English

Ashish@inqusit·12 Nis

The world is obsessed with agentic AI "memory layers" with compaction, pruning, vector stores, CLAUDE, AGENTS md files, codified context, hot/cold memory tiers. But we're treating the symptom, not the disease.Every coding agent (Claude Code, Codex, Cursor, etc.) starts by vomiting verbose, drifting, low-quality code with zero user guardrails. It pollutes the context window, hallucinates architecture violations, and repeats mistakes. Then we bolt on another agentic memory system to summarize, prune, autosave, and "figure out what's important."It's counterproductive. Memory frameworks are just duct tape trying to contain the firehose of mediocre generations. Research and real-world builds (including Anthropic's own harness experiments and the "Codified Context" approach for 100K+ LOC projects) show bloated context files decrease performance. Context pollution compounds errors. Subagents explode RAM. Long sessions forget project conventions anyway. The root fix isn't better memory management. It's control at the source of code generation. Custom adaptive harnesses that learn your project's architecture, enforce user preferences, run quality gates (self-critique, tests, linting in-loop), and use progressive disclosure + structured orchestration (planning - generation - evaluation agents). Harnesses aren't overhead, they're the product. They turn general models into disciplined ones. Fine-tuned code gen models trained on high-signal data: concise, idiomatic, architecture-aware code with strict "avoid unnecessary bloat" standards + RL for fidelity to specs. No more drift by design. Memory layers become simple and reliable once generations are high-quality by default. We're adding instructions on top of a broken hose instead of fixing the nozzle. Builders, stop chasing perfect memory. Prioritize generation quality and adaptive control. That's where the next leap in reliable agentic coding happens, not another pruning trick. The future isn't agents that remember everything. It's agents that generate only what matters from the start.

Harrison Chase@hwchase17

x.com/i/article/2042…

English

181

Keşfet

@trigguuuu @claudeai @NickADobos @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates