Niklas Sheth

1.1K posts

Niklas Sheth banner
Niklas Sheth

Niklas Sheth

@niklassheth

Let’s craft.

NY เข้าร่วม Ocak 2012
134 กำลังติดตาม45 ผู้ติดตาม
andy
andy@1a1n1d1y·
okay here is my comment very very often after 4.6, opus will ask a question or make a suggestion, and then just do it. this extends to a lot of things it does, but electing to destroy instances costs me actual real money and it's getting super tireseome, so i'm shaming publicly
English
20
1
452
17.7K
andy
andy@1a1n1d1y·
presented without comment
andy tweet media
English
228
67
1.8K
222.8K
noel
noel@noel_bhe·
@uwukko Very odd, I've never encountered these issues with zed. And Im using for both work and private projects.
English
3
0
55
3K
wukko
wukko@uwukko·
file indexing, code indenting, and basic file operations are broken beyond comprehension, so i'm back to vs code
wukko tweet media
wukko@uwukko

@cheatyyyy yes, and i just switched to it again, because my macbook would burn otherwise

English
64
19
1.6K
92.3K
Niklas Sheth
Niklas Sheth@niklassheth·
@natolambert I wonder how long it will be until frontier models are no longer available through a text completion API. I'd guess 2 years
English
0
1
2
370
Nathan Lambert
Nathan Lambert@natolambert·
This was actually already policy. Regardless, destroying demand was coming with undercapacity and increasing verticalization/integration is the right move. Perfect move in fact, despite people being understandably mad.
Boris Cherny@bcherny

Starting tomorrow at 12pm PT, Claude subscriptions will no longer cover usage on third-party tools like OpenClaw. You can still use these tools with your Claude login via extra usage bundles (now available at a discount), or with a Claude API key.

English
4
3
108
16.6K
Niklas Sheth
Niklas Sheth@niklassheth·
@AndyMasley The false notion that facts help people make informed decisions 😭
Niklas Sheth tweet media
English
2
0
27
442
Niklas Sheth
Niklas Sheth@niklassheth·
@jaxgriot @leothecurious It's impressive for its size, it did well on basic computations like solving a quadratic equation. Knowledge is severely lacking of course.
Niklas Sheth tweet media
English
0
0
2
78
em
em@jaxgriot·
@leothecurious it was quite dumb when i used it
English
1
0
3
409
davinci
davinci@leothecurious·
is this the most "overtrained" LLM out there or what? if this really works and doesn't hallucinate as much as i expect it to at this scale, there are so many practical applications where this thing can be put to productive work at unmatched efficiency.
Liquid AI@liquidai

Trained on 28T tokens with scaled RL, LFM2.5-350M is a step change from LFM2-350M: > instruction following: 18.20 → 40.69 > data extraction: 11.67 → 32.45 > tool use: 22.95 → 44.11 These are the capabilities that matter in production.

English
11
3
145
19.3K
ken
ken@aquariusacquah·
pour one out for github's last 9 of uptime
ken tweet media
English
11
21
1.5K
41.3K
Niklas Sheth
Niklas Sheth@niklassheth·
The Claude app and Claude Desktop are so janky that I switched back to the CLI. I appreciate how fast they’re shipping but it’s not a good look
Niklas Sheth tweet media
English
0
0
0
29
Niklas Sheth
Niklas Sheth@niklassheth·
@casper_hansen_ What batch size is the 300 tok/s at? If it's 1 then that's not very economical for 8xB200, about 1M tok in an hour with 8xB200 @ $5/GPU/hr = $40/MTok
English
0
0
0
99
Casper Hansen
Casper Hansen@casper_hansen_·
every inference engine should have a section in their docs with exact commands to achieve best possible tokens/s on the most popular models i'm told kimi k2.5 can run at 300 tokens/s on B200s if you run nvfp4 with speculative decoding in open-source
English
19
6
200
13.8K
Niklas Sheth
Niklas Sheth@niklassheth·
@JesseTayRiver Surprising that they'd document it instead of labeling them as "for manufacturer use only" or something
English
0
0
1
126
Jesse Smith
Jesse Smith@JesseTayRiver·
Air conditioner and heat pump manufacturers often reuse the same unit across multiple capacities and put limits on the control board. Want to switch your 2 ton air conditioner to 3 tons? Easy as flipping a switch
Jesse Smith tweet media
English
6
0
36
3.1K
Niklas Sheth
Niklas Sheth@niklassheth·
@skydotcs I can tell that's 5.3 Instant, use GPT-5.4 at least
English
0
0
4
239
sky
sky@skydotcs·
wow i absolutely hate talking to this machine, is anthropic any better?
sky tweet media
English
22
0
175
17.1K
John
John@jrysana·
@daniel_mac8 In other words, *per-GPU* this represents a total of ~11.5k tok/sec across e.g. 230 users each getting 50 tok/sec. Which is good but not atypical or a "warp speed", by which I assume you believed it to be a substantial leap over the status quo.
English
2
0
12
429
Dan McAteer
Dan McAteer@daniel_mac8·
a gcp eng networked 96 b200s and ran qwen 3.5 27B at warp speed of 1 million tokens per second did you know you can do that?
Dan McAteer tweet media
English
20
26
381
71K
Niklas Sheth
Niklas Sheth@niklassheth·
ARC-AGI-3 is basically impossible for LLMs right now because there's no inter-frame compression, so it consumes context really fast. Tokenization is so engrained in LLM development because it's simple and effective, but we're hitting the limits now. New ideas are needed
English
0
0
0
35
gabriel
gabriel@gabriel1·
hello friends
English
571
3
1.2K
98K
Niklas Sheth
Niklas Sheth@niklassheth·
@Setuna7777_2 That is odd, especially since Muon pretrained models (like Kimi K2.5) do better with Muon as the SFT optimizer
Niklas Sheth tweet media
English
0
0
1
218
Ryan Moulton
Ryan Moulton@moultano·
I wish this was contextualized with something like "At each 10%ile of reading ability, here's a test question we'd expect half of respondents to get wrong." Apparently average reading ability is 7th-8th grade, but more than half of of adults have attended college?
English
34
4
311
84.7K