fallpeak

1K posts

fallpeak

@_fallpeak

pseudonymous identities have a storied history https://t.co/iW4Q7e5zlV

Beigetreten Ağustos 2025

188 Folgt52 Follower

Angehefteter Tweet

fallpeak@_fallpeak·3 Kas

The only thing worse than seeing all these bot posts getting real engagement is going to be the day I open up my browser and don't see any bot posts at all.

English

665

fallpeak@_fallpeak·12h

@NousResearch @karpathy >pull out illustrated diagram explaining what is quality and what is slop >she laughs and says "it's a good novel sir" >download the novel (to be continued)

English

236

Nous Research@NousResearch·17h

Hermes Agent wrote a novel. "The Second Son of the House of Bells" runs 79,456 words across 19 chapters. The agent built its own pipeline to do it, using the ame modify-evaluate-keep/discard loop as @karpathy's Autoresearch but applied to fiction: world-building, chapter drafting, adversarial editing, Opus review loops, LaTeX typesetting, cover art, audiobook generation, and landing page setup. Book: nousresearch.com/bells Code: github.com/NousResearch/a…

emozilla@theemozilla

it's been a longstanding dream of mine build an ai system that can tell a compelling story. it's what got me started in the space in the beginning, and with Hermes Agent I finally pulled it off 100% written, typeset, etc. by Hermes Agent those at our gtc event got hard copies🤗

English

1.1K

121.2K

fallpeak@_fallpeak·1d

@wispem_wantex I just want to know if it's actually possible to "resistance train" eyesight via the obvious process of wearing slightly underpowered lenses. I don't trust optometrists to actually study this without bias, fitness broscience would be more trustworthy IMO

English

wispem-wantex@wispem_wantex·1d

I'm surprised that eyesight isn't discussed more in health circles. People want to bench press 315 lbs, have a resting heart rate of 40 bpm, reach 10% body fat, etc. 20/10 vision feels like it belongs in this category

ሶ@sofi_a

No idea what I did but my eyes improved from a -1.75 in both eyes to -1.5 in my left and -1.25 in my right. Kind of amazing tbh

English

167

6.2K

405.4K

fallpeak@_fallpeak·1d

@simonw That's fair and I'm not trying to accuse anyone of malfeasance, I'm just griping because "how heavily did you cut this down" is the number one question I have whenever I see a big model running on tiny hardware.

English

Simon Willison@simonw·1d

@_fallpeak I tried to cover that with "a custom version"

English

374

Simon Willison@simonw·1d

Dan says he's got Qwen 3.5 397B-A17B - a 209GB on disk MoE model - running on an M3 Mac at ~5.7 tokens per second using only 5.5 GB of active memory (!) by quantizing and then streaming weights from SSD (at ~17GB/s), since MoE models only use a small subset of their weights for each token

Dan Woods@danveloper

x.com/i/article/2034…

English

180

1.9K

244.7K

fallpeak@_fallpeak·1d

@xpasky @scaling01 I like it, feels a lot like M2.5 but with a bit less of that autistic tendency to take you extremely literally and get to work without asking for clarification. Definitely not frontier-level smart, and it degrades above ~80k context, but it's fast and takes direction well.

English

Petr Baudis@xpasky·1d

Looks like an absolute banger on paper, but zero hype on my timeline. (I know, @scaling01 currently consumed by Dune 3, but even so.) Is it a good model, or is it benchmaxxed?

MiniMax (official)@MiniMax_AI

Introducing MiniMax-M2.7, our first model which deeply participated in its own evolution, with an 88% win-rate vs M2.5 - Production-Ready SWE: With SOTA performance in SWE-Pro (56.22%) and Terminal Bench 2 (57.0%), M2.7 reduced intervention-to-recovery time for online incidents to 3-min on certain occasions. - Advanced Agentic Abilities: Trained for Agent Teams and tool search tool, with 97% skill adherence across 40+ complex skills. M2.7 is on par with Sonnet 4.6 in OpenClaw. - Professional Workspace: SOTA in professional knowledge, supports multi-turn, high-fidelity Office file editing. MiniMax Agent: agent.minimax.io API: platform.minimax.io Token Plan: platform.minimax.io/subscribe/toke…

English

1.8K

fallpeak@_fallpeak·1d

@simonw It feels misleading to report "5.5 tok/s" up top and then hide a "(with less than half the usual expert count)" multiple paragraphs away. I guess in some sense it's no more misleading than using a quant at all, but it feels different somehow

English

387

Simon Willison@simonw·1d

Wrote a bit more about this on my blog simonwillison.net/2026/Mar/18/ll…

English

104

17.9K

fallpeak@_fallpeak·1d

Wow MiniMax M2.7 is actually quite good and feels a bit less autistic than M2.5 was, that extra user conversation training definitely had an effect

English

fallpeak@_fallpeak·2d

@wispem_wantex A rule that works pretty well for me is that if nobody cares enough to advocate for something specific we're defaulting to soup in the pressure cooker

English

wispem-wantex@wispem_wantex·2d

I assume this was a joke, but I genuinely believe it's unreasonable to expect everyone to figure out what they're going to eat every single day

Silenced on Site@77_steeze

There should just be one food, called “Food” that is a paste made up of every other food. That would prevent a lot of the trauma I endure from food-choice paralysis

English

398

fallpeak@_fallpeak·3d

@xlr8harder My primary objection is the operator precedence ambiguity. It should be written PFLOP-days/s

English

xlr8harder@xlr8harder·4d

I love Kimis work, but the x axis unit label of "PFLOP/s-days" is extremely cursed

Kimi.ai@Kimi_Moonshot

Scaling law experiments reveal a consistent 1.25× compute advantage across varying model sizes.

English

12.7K

fallpeak@_fallpeak·3d

@PalmyrPar Are you really confused? Because it seems to me like political polarization fully explains the phenomenon.

English

373

☀️AliquisNovus☀️@PalmyrPar·4d

I’m still so confused as to why this was mocked or how it’s wrong. Because it really isn’t. This is a 100% correct view of information from a strategic perspective, and it’s expressed in a very digestible way.

Fin Moorhouse@finmoorhouse

Thinking about the time our high school English teacher gave us a poem to analyse When we finished she told us it was Donald Rumsfeld’s WMD press conference, and that was the class

English

388

10.7K

fallpeak@_fallpeak·4d

@goopium If your theory doesn't explain completely unforced "origin story" errors like when they decided that Han Solo's name really needed a backstory justification, it's probably not correct

English

484

goopium, reviewing.@goopium·5d

Here's a hack cultural criticism: Surf Dracula: Origins is so popular because getting a job has become much more difficult than doing the job. No one knows how to get to things, so they watch "how recognisable heroes became so".

Radec@realradec

remember when Nathan Drake had to "earn" the shoulder straps until the final act of the Uncharted movie?

English

321

181.5K

fallpeak@_fallpeak·5d

@goblinodds You can just use Unix user accounts to run the AI, they're literally designed for exactly this sort of isolation. But if you want a real suggestion, Beelink Mini S13

English

2HP goblin advisor@goblinodds·5d

husband is thinking of getting a separate computer to run claude code dangerously skipping permissions (bc sandboxing safely is hard) anyone have recs, things to consider?

English

100

218

96.2K

fallpeak@_fallpeak·5d

@Esoteric_Though @JoePostingg Letting user app behavior directly influence the duration of long-running transactions in your DB would certainly be an interesting design choice, in the "interesting times" sense.

English

192

EsotericThoughts@Esoteric_Though·6d

@JoePostingg Do that to an app with any sort of backend connection and transactions still open and you will definitely risk “misbehavior”

English

181

12.1K

Joe@JoePostingg·6d

"If you force stop an app it may misbehave" No it won't. That's not true. It has never happened.

English

602

18.6K

216.9K

fallpeak@_fallpeak·6d

@bartlebytaco American cheese nachos sounds like exactly the sort of cooking I'd expect to see from a guy named Jack Seed TBH

English

238

sebastian castillo@bartlebytaco·6d

jacques pépin videos in his old age are either him showing you how to expertly debone an entire salmon using a small knife he's had for 50 years or him microwaving american cheese on top of a bowl of croutons and saying "it is french nacho"

English

158

2.8K

107.2K

fallpeak@_fallpeak·6d

@xpasky When someone is smarter than you are, it's very hard to evaluate exactly how much smarter. Thus if someone is dumber than Qwen or MiniMax they might genuinely not be able to tell the difference vs Opus

English

331

Petr Baudis@xpasky·13 Mar

Are... are these people really unable to tell the rift of a difference between Qwen and Opus? Is it like the NYT survey where 54% of people prefer AI slop writing? What is going on

Alex Finn@AlexFinn

If you have your OpenClaw working 24/7 using frontier models like Opus, you're easily burning $300 a day. That's $100,000 a year. I have 3 Mac Studios and a DGX Spark running 4 high end local models (Nemotron 3, Qwen 3.5, Kimi K2.5, MiniMax2.5). They're chugging 24/7/365. I spent a third of that yearly cost to buy these computers I'll be able to use them for years for free On top of that they're completely private, secure, and personalized. Not a single prompt goes to a cloud server that can be read by an employee or used to train another model I hope this makes it painfully obvious why local is the future for AI agents. And why America needs to enter the local AI race.

English

19.9K

fallpeak@_fallpeak·6d

@scaling01 Insane API margins

Português

283

Lisan al Gaib@scaling01·6d

How in the linear attention did they pull that off?

Lisan al Gaib@scaling01

Anthropic no longer charges extra for longer context windows

English

889

103.5K

fallpeak retweetet

Lisan al Gaib@scaling01·12 Mar

@natolambert im going to be sad if its deepseek

English

3.4K

fallpeak@_fallpeak·12 Mar

One point strongly in Hunter Alpha's favor: it generates significantly fewer slop names than typical models. Slop-name rate is something like 10% and seems to be front-loaded (suggesting it would happen less often in longer contexts than my tests)

English

fallpeak@_fallpeak·12 Mar

Several more oneshot "build a thing" prompts in, Hunter Alpha is still feeling pretty GLM-5 / Kimi K2.5 tier to me. Of course all of these are more exercising creativity and independent judgement rather than intelligence per se, but it's not obviously better in those dimensions.

English

fallpeak@_fallpeak·12 Mar

1T params with 1M context you say? Are the Two More Weeks finally over?

English

Entdecken

@NousResearch @karpathy @wispem_wantex @simonw @xpasky @scaling01 @xlr8harder @PalmyrPar