efe

2.1K posts

efe banner
efe

efe

@extliqprovider

druckenmiller is my GOAT what the fuck is this world? you can't hedge a worldview

Beigetreten Şubat 2025
539 Folgt83 Follower
Angehefteter Tweet
efe
efe@extliqprovider·
@HedgieMarkets you can't hedge a worldview
GIF
English
0
0
13
3.2K
efe retweetet
davinci
davinci@leothecurious·
machines of selectively loving grace
English
7
69
657
14.5K
efe
efe@extliqprovider·
@AgustinLebron3 @AnthropicAI this is just consistent with their prior behaviour and thinking if they had the mandate then they are not losing it because they are the same anthropic
English
0
0
1
356
Agustin Lebron
Agustin Lebron@AgustinLebron3·
Again, they're not nerfing ML research by refusing requests. Instead, it quietly sabotages users by lying to them. @AnthropicAI is steadily losing the Mandate of God.
Jeremy Howard@jeremyphoward

@karpathy This is not a day for celebrating, Andrej. It's a very dark and very sad day, and the damage may be impossible to undo.

English
12
8
245
19.5K
efe
efe@extliqprovider·
@bubbleboi @zephyr_z9 btw elon said anthropic is good people a month ago 😂😂
English
0
0
0
464
bubble boi
bubble boi@bubbleboi·
Have canceled my team subscription for Claude Pro. Idc how good that model is, it’s not good enough for me to support people who actively stifle innovation and gate keep knowledge that they didn’t even create.
English
115
204
4.3K
127.3K
efe
efe@extliqprovider·
@basedjensen gpt 5.6 + oss and i will worship oai
English
0
0
4
658
Hensen Juang
Hensen Juang@basedjensen·
All oai folks now have to do is to release the big boy they have without sandbagging and anthropic will start hemorrhaging market share right before ipo
English
18
33
904
28.6K
efe
efe@extliqprovider·
@gbrl_dick dario was honest from the beginning that there shouldnt be any open source ai
English
0
0
6
165
Gabriel
Gabriel@gbrl_dick·
late night post from me on the Mythos and Fable 5 launch for MTS i am generally inclined to take Anthropic at their word. but the AI research safeguards—in the absence of a Glasswing for AI—raise some questions.
Gabriel tweet mediaGabriel tweet mediaGabriel tweet media
English
6
3
53
5.5K
trotsky
trotsky@LeonHowqua·
@extliqprovider @DevelopmentsAI @teortaxesTex At this current point in time, no fancy new math/science is really needed to improve the LLMs. It's just more efficient training code, architectural experiments, scaffolding, data generation, RL environment building etc, which is achieved with better coding capabilities
English
1
0
2
52
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
I really don't think OpenAI is going to let this slide. I've been saying it for a long time, the real inflection was when they reached 5.2. I have no clear insight on what they currently have internally, but if they haven't made a Mythos/Fable yet, it was *a choice*.
Andrew Curran@AndrewCurran_

The internal boost from Mythos-assisted development since February is just too big. Anthropic is pulling away from the pack for the first time, and at the same time they are also speeding up. The race legitimately feels like it is changing for the first time in years.

English
16
9
465
47.4K
efe
efe@extliqprovider·
@hu_yifei only if oai has published a 240b model
English
0
0
1
176
Yifei Hu
Yifei Hu@hu_yifei·
Here me out. Claude Fable 5 scored 65, gpt-oss-120b scored 33. You run gpt-oss twice you will have combined score of 66, better than Claude Fable 5 and cheaper. Thank me later.
Yifei Hu tweet media
English
5
0
41
4.9K
efe
efe@extliqprovider·
@LeonHowqua @DevelopmentsAI @teortaxesTex isnt coding just a tool to implement your research ideas? how can rsi be achieved if this model is only good at coding and mid tier at maths/science/etc? not assuming mythos is bad at math but coding is just one vertical
English
1
0
0
57
trotsky
trotsky@LeonHowqua·
@DevelopmentsAI @teortaxesTex Recursive self improvement, that's how take-off happens. For now seems like coding is the way for that to happen
English
1
0
0
104
Burito
Burito@Britoisinsane·
@teortaxesTex Decode is $50/Mtok vs $30/Mtok, and Ant has the higher margin Almost the same size I guess
English
2
0
4
2K
efe
efe@extliqprovider·
@tszzl rooting for oai to democratize it
English
0
0
3
51
efe
efe@extliqprovider·
@ASM65617010 bigger model for HLE you just need more data and anthropic has a lead over oai in this
English
1
0
8
3.4K
ASM
ASM@ASM65617010·
Claude Mythos 5 scores 59% on Humanity’s Last Exam, with no tools. As a contributor of HLE, I would never have expected such a score barely a year and a half after the benchmark’s release.
ASM tweet media
English
24
48
938
66.8K
efe
efe@extliqprovider·
@drisspg its actually to show pareto frontier
English
0
0
1
415
efe
efe@extliqprovider·
@qcapital2020 openai having the lowest valuation out of 3 is the kost retarded thing
English
0
0
0
46
 Q-Cap 
 Q-Cap @qcapital2020·
2026: The final orgasm
 Q-Cap  tweet media
English
5
2
42
4.1K
Augmenta Blake
Augmenta Blake@RoboIntellect·
@Lentils80 Low effort vs xhigh and Fable still wins. Architecture efficiency problem for OpenAI?
English
1
0
0
3.3K
Lentils
Lentils@Lentils80·
I compared Claude Fable 5 to GPT-5.5 in this Power Rangers prompt Thing is, Fable 5 is using Low thinking effort and GPT-5.5 is using xhigh Safe to say, the results are... not even close. 5.5's output is bad across the board, from the UI to the actual voxel scene itself🥲 1st video: Claude Fable 5 (Low effort) 2nd video: GPT-5.5 (xhigh)
Lentils@Lentils80

🚨Major Scoop: The first Claude 5 model, Claude Fable 5 (Mythos-class model) is gonna release very soon! It's the same underlying model as Mythos but with increased guardrails, headed to public release

English
34
38
560
205.4K
efe
efe@extliqprovider·
@ewveggies yeah you are right
English
0
0
1
28
Kyle Wong
Kyle Wong@ewveggies·
@extliqprovider Yeah it’s pretty strange. Perhaps an artifact of a small task set leading to high variance. Those jaggedness really is just 1-2 more tasks correct/incorrect. For reference, Diamond is 50 tasks while SWE bench verified is 500 tasks.
English
1
0
2
33
Kyle Wong
Kyle Wong@ewveggies·
Finally a nice eval to expose all the SWE benchmaxxing. The scores never fully made sense to me: Models that somehow one shot 80+% on SWE-bench Verified, yet struggle to simply fetch and parse logs, even when given proper skills and hints. I swear I can’t be the only one feeling this way.
Cognition@cognition

Introducing FrontierCode: a coding eval that raises the bar for difficulty & quality. Each task took 40+ hrs of work by leading open-source maintainers. Models write sloppy code that works but isn’t maintainable. Our eval is first to measure: would you actually merge this code?

English
4
0
58
6.4K
efe
efe@extliqprovider·
@ewveggies i dont know there might be a problem with the benchmark this is also not what you would expect
efe tweet media
English
1
0
2
27
Kyle Wong
Kyle Wong@ewveggies·
@extliqprovider Yeah doesn’t feel like 4.8 is 2.5x better. But the diamond subset is only 50 tasks, so this means it only solves 3-4 more task.
English
1
0
2
57