rihim

5.8K posts

rihim

@rihim_s

incoming @EverpureData | computer engineering @UCSB class of 2027 | @ucsbNLP | prev: swe intern @Cisco

Beigetreten Kasım 2020

809 Folgt155 Follower

rihim@rihim_s·6h

@HououinTyouma i’m getting the opposite now they say all my ideas are dead ends and suck and i should give up and there’s no point in continuing on it

English

Mad ML scientist@HououinTyouma·22h

concerning

English

286

Mad ML scientist@HououinTyouma·22h

as a test, asking GPT5.5 Pro to apply abandoned ML research from the dark ages (2019) to transformers

Mad ML scientist@HououinTyouma

every time when gpt 5.5 can't implement my ML ideas after I learned what they did to fable

English

2.8K

rihim@rihim_s·6h

@xeophon wait this is so smart all the coding harnesses use explorer subagents so it makes sense to fine tune a model for that purpose; could see copilot using this as a default explorer agent. maybe cursor distills their next composer model and fine tunes it for this and same for oai/ant

English

900

Florian Brand@xeophon·6h

good stuff from microsoft: 4B model just to explore code bases, cutting token costs by 10-50% (!!!) while the performance of the big model stays the same :)

English

610

23K

rihim@rihim_s·10h

@_ueaj @AndrewCurran_ that fully makes sense given how limited the interface is to use llms - just writing text strips so much nuance and taste that knowing how to prompt it really matters a lot. seems like fable had that intuition of understanding built in from the other side as well

English

ueaj@_ueaj·10h

I think using LLMs to their fullest is an empathy skill check (i.e. how quickly can you intuit how to communicate/prompt with them) and I consider myself really good at this. So I think I got a uniquely good taste of what the ceiling on performance was and it was obcene. Unbelievable autonomy and understanding on very abstract ideas, very good judgement on vauge things that come up during implementation, etc. Probably 2-3x uplift over opus 4.8 atleast

English

Andrew Curran@AndrewCurran_·11h

x.com/i/article/2066…

ZXX

166

356

2.6K

rihim@rihim_s·10h

@DWestkrew @thesnufki @haider1 not necessarily, it can outperform mythos AND they don’t have to compare it cause it isn’t an accessible model. kinda shady but i can see an argument being made that it’s the best frontier model available

English

D@DWestkrew·12h

@thesnufki @haider1 That means mythos better so option 2

English

Haider.@haider1·16h

openai trying to survive june 23rd: release gpt-5.6, call it better than Mythos, and get banned immediately release gpt-5.6 and say Mythos is still better, then watch everyone roast it

English

288

15.3K

rihim@rihim_s·10h

@ThiccQuidity @Kalshi that this time the government banned it in less than a week after release

English

ThiccQuidity@ThiccQuidity·23h

@Kalshi they say this with, literally, every single new model. what else is new?

English

171

4.7K

Kalshi@Kalshi·23h

BREAKING: Anthropic says its newest AI model is too powerful to release to the public

English

510

278

3.8K

546.7K

rihim@rihim_s·10h

@_ueaj @AndrewCurran_ damn i didn’t use it out of principle cause of the ML research lora nerf but now i wish i had

English

ueaj@_ueaj·11h

@AndrewCurran_ to think they had that model ~5 months ago (or atleast a less post-trained version). For the brief time I used it, and even with the frontier flagging, yeah, it's no normal model.

English

3.6K

rihim@rihim_s·10h

@hopes_revenge

GIF

QME

hope hopes hoping@hopes_revenge·20h

gender gap monogamy relationships where one person has more gender than the other who also has a monogamy fetish ex mormon biomarkers are so normalized especially when bisexuality is in play or theyfab calico critters cottage core sexual neglect is a gradient dissent

English

154

rihim@rihim_s·10h

@giffmana

QME

Lucas Beyer (bl16)@giffmana·1d

Butthole logo for AI is over, now we have wave-ish logo as the new standard. I don't make the rules.

English

203

18.8K

rihim@rihim_s·1d

@teortaxesTex yeah 4.6 was quite special, surprised 4.7 is higher than 4.8; 4.7 seems more rl’d to me

English

204

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex·1d

Incredible 4.6 does smell the biggest (least RL-degraded) to me this looks like a very sensitive eval. Yes, V4-Pro vs V4-Flash do have a roughly 0.1 Opus' worth of gap in perceived size and capability.

kalomaze@kalomaze

i am trying to work on the closest thing possible to a true "big model smell" eval which is to say: something that measures something that clever post training can't trivially gap, and is cheap + topically diverse i can't test mythos for obvious reasons, but... hmm...

English

137

14.1K

rihim retweetet

Shannon Sands@max_paperclips·1d

There's a much funnier thing ensembles unlock though (if it's consistent). It doesn't matter if it's inefficient, really. if you can get Mythos by throwing 3 or 4 other weaker models in a trenchcoat, you can distill from the ensemble directly. I wonder, how many Qwens + Kimi's + Deepseeks + GLMs do you need to throw in a trenchcoat to get Mythos quality data? Can you stack enough 9b's to reach heaven?

English

2.3K

rihim@rihim_s·1d

@max_paperclips @LokiJulianus @teortaxesTex the only issue is the model providers themselves limiting it lmao; ironically neither of the actual model developers can do it cause they can’t use their model in that way

English

Shannon Sands@max_paperclips·1d

@LokiJulianus @teortaxesTex it's so stupid and bitter-pilled it could work

English

149

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex·1d

> And it landed within 1% of Fable 5 while costing roughly half the price. I have reservations about how this generalizes, but do you understand that this implies like 20 times as many tokens? In terms of compute, it's not close either (Fable was overpriced). Well, good cope

OpenRouter@OpenRouter

Notably, the budget panel was comparable with Claude Fable 5 in performance. A panel of Gemini 3 Flash, Kimi K2.6, and DeepSeek V4 Pro, fused together, beat solo GPT-5.5 and solo Opus 4.8 outright. And it landed within 1% of Fable 5 while costing roughly half the price.

English

102

10.5K

rihim@rihim_s·1d

@gfodor speak softly and carry a big stick

English

377

gfodor.id@gfodor·1d

Hilariously it was clear that Anthropic convinced the government their models are far ahead. Meanwhile OpenAI drops far bigger godshogs behind dropdowns and CLI switches and points at their free mini smol models like any competent org should who wants to avoid the ban hammer.

English

387

17.5K

rihim@rihim_s·1d

@viemccoy i'd assume the main issue is cost, right? given how inefficient some of the current models are with token output it'd probably cost a multitude more than running the good model (ofc it's extraordinary times rn)

English

318

𝚟𝚒𝚎 ⟢@viemccoy·1d

in college me and the professor I was working with did this with gpt-3.5-turbo and got gpt-4 level performance we never published it but Varibot remains forever vindicated.

OpenRouter@OpenRouter

How does it work? When you send a prompt to Fusion, we fan it out to a panel of models in parallel, each with web search and bash tools enabled. A judge model reads every response and extracts the structure: consensus points, contradictions, partial coverage, unique insights, blind spots. Chatroom: openrouter.ai/fusion

English

180

16.6K

rihim@rihim_s·1d

@mweinbach slightly unrelated but the new ui looks so damn good here, i thought it was a physical glass thing oval at the beginning of the video

English

163

Max Weinbach@mweinbach·1d

Siri can help you do your expenses

English

105

3.4K

261.5K

rihim@rihim_s·2d

@december4th1980 @Sockppp1 @craicrailicious @kevsack that's not a lot

English

homelander@december4th1980·2d

@Sockppp1 @craicrailicious @kevsack 39 Million gallons a water of day just from ChatGPT

English

138

Tuxedo@kevsack·3d

Kill us all

..@chaos2x

English

115

13.6K

186.4K

3.6M

rihim@rihim_s·2d

@stochasticchasm @anacreonte_ probably nothing tho

English

stochasm@stochasticchasm·2d

@anacreonte_ hmm so we're ahead of schedule

English

104

2.7K

stochasm@stochasticchasm·2d

so when does ai2027 say this is supposed to happen

English

358

29.9K

rihim@rihim_s·2d

@max_spero_ someone who’s a us born citizen at anthropic needs to task mythos on finding a new prime RIGHT NOW

English

457

Max Spero@max_spero_·2d

Is Claude Fable 5 the newest illegal number?

English

450

13.7K

rihim@rihim_s·3d

@teortaxesTex oh do you think scale is going to be an important factor going forward? given their low level systems-ish work i’d argue that while more compute might make relatively better models it wouldn’t necessarily help with the fundamental changes needed for the next step change

English

421

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex·3d

@rihim_s moonshot has comparable vision but I don't think they'll get the compute for the next stage

English

1.2K

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex·3d

Top 3 labs: Anthropic, OpenAI, DeepSeek. Maybe other Chinese labs qualify in taste but not in ambition. Google, xAI aren't worth the GPUs they've bought, so they're renting it out to Anthropic. Close to Amazon tier, actually. Embarrassing skill issue.

English

348

21.8K

rihim@rihim_s·3d

@threepointone @cursor_ai lowkey same i think fable should be used like 5.5 pro in the sense that it’s not the main agent working on the codebase but the monk in a cave that you turn to when all else fails

English

sunil pai@threepointone·3d

spent all day on fable for a giant PR. ~10kloc, lots of testing and intervention. 250$. I... don't think it's worth it? happy with 4.8/5.5, and the quality of work is better when it's smaler steps. Still rocking @cursor_ai, that's software that I still love using on the daily.

English

600

114.8K

rihim@rihim_s·3d

@aannuujX @mufasaYC literally not but ok

English

121

dope-a-meme in SF@aannuujX·3d

@mufasaYC haha its just gemini with a system prompt, memory and different ui layer🥲

English

1.4K

Mustafa Yusuf@mufasaYC·3d

I did something I never thought I would, I reached out to Siri instead of ChatGPT and honestly it was better and gave me an excellent output with less text which was to the point 🤯

English

367

108.8K

Entdecken

@HououinTyouma @xeophon @_ueaj @AndrewCurran_ @DWestkrew @thesnufki @haider1 @ThiccQuidity