Niklas Sheth

1.1K posts

Niklas Sheth

@niklassheth

Let’s craft.

NY เข้าร่วม Ocak 2012

134 กำลังติดตาม45 ผู้ติดตาม

Niklas Sheth@niklassheth·15h

@1a1n1d1y Has happened to me a few times, but never anything destructive

Niklas Sheth@niklassheth

Opus 4.6 is so proactive that it answered its own question for me

English

602

andy@1a1n1d1y·22h

okay here is my comment very very often after 4.6, opus will ask a question or make a suggestion, and then just do it. this extends to a lot of things it does, but electing to destroy instances costs me actual real money and it's getting super tireseome, so i'm shaming publicly

English

452

17.7K

andy@1a1n1d1y·22h

presented without comment

English

228

1.8K

222.8K

Niklas Sheth@niklassheth·1d

@noel_bhe @uwukko Same

English

107

noel@noel_bhe·1d

@uwukko Very odd, I've never encountered these issues with zed. And Im using for both work and private projects.

English

wukko@uwukko·1d

file indexing, code indenting, and basic file operations are broken beyond comprehension, so i'm back to vs code

wukko@uwukko

@cheatyyyy yes, and i just switched to it again, because my macbook would burn otherwise

English

1.6K

92.3K

Niklas Sheth@niklassheth·1d

I rigged up a low budget OpenClaw with the Claude Code Discord integration, works almost as well and stays within the subscription's terms

Boris Cherny@bcherny

Starting tomorrow at 12pm PT, Claude subscriptions will no longer cover usage on third-party tools like OpenClaw. You can still use these tools with your Claude login via extra usage bundles (now available at a discount), or with a Claude API key.

English

Niklas Sheth@niklassheth·1d

@natolambert I wonder how long it will be until frontier models are no longer available through a text completion API. I'd guess 2 years

English

370

Nathan Lambert@natolambert·1d

This was actually already policy. Regardless, destroying demand was coming with undercapacity and increasing verticalization/integration is the right move. Perfect move in fact, despite people being understandably mad.

Boris Cherny@bcherny

English

108

16.6K

Niklas Sheth@niklassheth·2d

@AndyMasley The false notion that facts help people make informed decisions 😭

English

442

Andy Masley@AndyMasley·2d

lol

Remi@rfuzzlemuzz

@AndyMasley You have quite the reputation now. I just stumbled on the Ed Zitron pod subreddit and this is one of the first posts I saw reddit.com/r/BetterOfflin…

369

12.7K

Niklas Sheth@niklassheth·3d

@jaxgriot @leothecurious It's impressive for its size, it did well on basic computations like solving a quadratic equation. Knowledge is severely lacking of course.

English

em@jaxgriot·3d

@leothecurious it was quite dumb when i used it

English

409

davinci@leothecurious·4d

is this the most "overtrained" LLM out there or what? if this really works and doesn't hallucinate as much as i expect it to at this scale, there are so many practical applications where this thing can be put to productive work at unmatched efficiency.

Liquid AI@liquidai

Trained on 28T tokens with scaled RL, LFM2.5-350M is a step change from LFM2-350M: > instruction following: 18.20 → 40.69 > data extraction: 11.67 → 32.45 > tool use: 22.95 → 44.11 These are the capabilities that matter in production.

English

145

19.3K

Niklas Sheth@niklassheth·4d

This model is hilarious

Liquid AI@liquidai

Today, we release LFM2.5-350M. Agentic loops at 350M parameters. A 350M model trained for reliable data extraction and tool use, where models at this scale typically struggle. <500MB when quantized, built for environments where compute, memory, and latency are constrained. 🧵

English

Niklas Sheth@niklassheth·4d

@ruisilva450 @Vtrivedy10 @aquariusacquah 88.8̅

Rui Silva@ruisilva450·5d

@Vtrivedy10 @aquariusacquah Then you still have 88.9

English

104

ken@aquariusacquah·5d

pour one out for github's last 9 of uptime

English

1.5K

41.3K

Niklas Sheth@niklassheth·5d

The Claude app and Claude Desktop are so janky that I switched back to the CLI. I appreciate how fast they’re shipping but it’s not a good look

English

Niklas Sheth@niklassheth·6d

@casper_hansen_ What batch size is the 300 tok/s at? If it's 1 then that's not very economical for 8xB200, about 1M tok in an hour with 8xB200 @ $5/GPU/hr = $40/MTok

English

Casper Hansen@casper_hansen_·6d

every inference engine should have a section in their docs with exact commands to achieve best possible tokens/s on the most popular models i'm told kimi k2.5 can run at 300 tokens/s on B200s if you run nvfp4 with speculative decoding in open-source

English

200

13.8K

Niklas Sheth@niklassheth·27 Mar

@JesseTayRiver Surprising that they'd document it instead of labeling them as "for manufacturer use only" or something

English

126

Jesse Smith@JesseTayRiver·27 Mar

Air conditioner and heat pump manufacturers often reuse the same unit across multiple capacities and put limits on the control board. Want to switch your 2 ton air conditioner to 3 tons? Easy as flipping a switch

English

3.1K

Niklas Sheth@niklassheth·27 Mar

@skydotcs I can tell that's 5.3 Instant, use GPT-5.4 at least

English

239

sky@skydotcs·27 Mar

wow i absolutely hate talking to this machine, is anthropic any better?

English

175

17.1K

Niklas Sheth@niklassheth·27 Mar

@cheatyyyy @krapstarr @jeremyakahn @FortuneMagazine @beafreyanolan Because that was a step change, my point is that before this leak they could've released an Opus 4.7

English

cheaty@cheatyyyy·27 Mar

@niklassheth @krapstarr @jeremyakahn @FortuneMagazine @beafreyanolan why completely leave out 4.1 to 4.5 lol

English

Jeremy Kahn@jeremyakahn·27 Mar

Exclusive: Anthropic left details of an unreleased model, exclusive CEO retreat, sitting in an unsecured data trove in a significant security lapse. Great reporting from @FortuneMagazine's @beafreyanolan fortune.com/2026/03/26/ant…

English

485

498.3K

Niklas Sheth@niklassheth·27 Mar

@jrysana @daniel_mac8 Yeah you can serve Deepseek-V3 at around that speed per GPU

English

John@jrysana·27 Mar

@daniel_mac8 In other words, *per-GPU* this represents a total of ~11.5k tok/sec across e.g. 230 users each getting 50 tok/sec. Which is good but not atypical or a "warp speed", by which I assume you believed it to be a substantial leap over the status quo.

English

429

Dan McAteer@daniel_mac8·27 Mar

a gcp eng networked 96 b200s and ran qwen 3.5 27B at warp speed of 1 million tokens per second did you know you can do that?

English

381

71K

Niklas Sheth@niklassheth·27 Mar

@krapstarr @jeremyakahn @FortuneMagazine @beafreyanolan No? 4 to 4.1 was incremental, 4.5 to 4.6 was incremental. There was no reason to believe it would be a step change

English

Naveesh /wtf@krapstarr·27 Mar

@jeremyakahn @FortuneMagazine @beafreyanolan TL;DR - the next model will be a step change which we all already knew

English

Niklas Sheth@niklassheth·27 Mar

ARC-AGI-3 is basically impossible for LLMs right now because there's no inter-frame compression, so it consumes context really fast. Tokenization is so engrained in LLM development because it's simple and effective, but we're hitting the limits now. New ideas are needed

English

Niklas Sheth@niklassheth·26 Mar

@gabriel1 Hi!

gabriel@gabriel1·26 Mar

hello friends

English

571

1.2K

98K

Niklas Sheth@niklassheth·26 Mar

@Setuna7777_2 That is odd, especially since Muon pretrained models (like Kimi K2.5) do better with Muon as the SFT optimizer

English

218

Taishi Nakamura@Setuna7777_2·25 Mar

Is there a reason Muon wasn’t used in Composer 2 training?

Cursor@cursor_ai

We're releasing a technical report describing how Composer 2 was trained.

English

18K

Niklas Sheth@niklassheth·26 Mar

@iScienceLuvr I miss the $0.99/hr H100 spot instances

English

237

Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·26 Mar

GPU market is pretty dire these days, huh?

English

94.4K

Niklas Sheth@niklassheth·25 Mar

@DrugGovoruna @3RenChengHu @moultano Yes, and a calculator was provided for the quantitative section. I guess most people don't know how to convert between square units

English

Friend of the Talking Bird@DrugGovoruna·25 Mar

@niklassheth @3RenChengHu @moultano Were they allowed a pen and paper?

English

Ryan Moulton@moultano·25 Mar

I wish this was contextualized with something like "At each 10%ile of reading ability, here's a test question we'd expect half of respondents to get wrong." Apparently average reading ability is 7th-8th grade, but more than half of of adults have attended college?

English

311

84.7K

ค้นพบ

@1a1n1d1y @noel_bhe @uwukko @natolambert @AndyMasley @jaxgriot @leothecurious @ruisilva450