Everything AI

60 posts

Everything AI banner
Everything AI

Everything AI

@Everything82048

Katılım Mart 2026
135 Takip Edilen3 Takipçiler
Everything AI retweetledi
Zvi Mowshowitz
Zvi Mowshowitz@TheZvi·
Okay, since people seem to be not understanding the distinction here, I'll spell it out. They are not the same. Mythos can, on its own, discover lots of new vulnerabilities, because it is capable of navigating and exploring on its own and stringing these things together. It doesn't need to be told exactly what to do, it can figure out what to do. GPT-5.5 is at least as good as Mythos on 'narrow cyber tasks' as per UK AISI, but they have to be narrow. You need to know what it is you want done. That's valuable, but it's not at all the same thing, and far less dangerous. If OpenAI could have compiled and fixed a similar stream of bugs in the world's most important software, at similar compute cost, I presume that they would have. Indeed, GPT-5.5-Cyber exists, and yet the White House is objecting to Anthropic expanding deployment of Mythos. You think they're doing this for no reason? Meanwhile, the whole 'everyone will have it in six months' is the usual pretending that the situation is much closer than it is, although of course on a long enough time horizon the point stands.
David Sacks@DavidSacks

It’s time to demystify Mythos. Mythos is not magic. It’s not a doomsday device. It’s the first of many models that can automate cyber tasks (just like coding). OpenAI’s GPT-5.5-cyber can now do the same. And all the frontier models (including those from China) will be there within approximately 6 months. It’s important to recognize that these models do not create vulnerabilities; they discover them. The bugs are already in the code. Using AI to discover and patch them will actually harden these systems. The leap from pre-AI cyber to post-AI cyber means that there will be a big upgrade cycle. After that, however, the market is likely to reach a new equilibrium between AI-powered cyber-offense and AI-powered cyber-defense. Obviously it’s important that cyber defenders get access before cyber attackers. That process is already underway but needs to happen quickly (see point above about Chinese models). Unlike Mythos, GPT-5.5-cyber appears not to be token constrained so it may be the first cyber model that defenders actually get to use.

English
33
27
434
53.6K
Everything AI retweetledi
Deedy
Deedy@deedydas·
GPT 5.5 underperforms Opus 4.7 on SWE-Bench Pro. Couldn't find any reported SWE-Bench scores at all and an internal benchmark is reported instead. That footnote is trying really hard to bury the lede. GPT 5.5 isn't SOTA for coding.
Deedy tweet media
English
163
36
1.1K
226.5K
Everything AI
Everything AI@Everything82048·
@banteg lol I don't think OpenAI intended this. Let's see if this is one off or more users experience this.
English
0
0
1
2.4K
banteg
banteg@banteg·
what kind of personality did they put in gpt 5.5
banteg tweet media
English
113
133
5.5K
404.2K
Everything AI
Everything AI@Everything82048·
@peterwildeford This is exactly how I felt when listening to the podcast - that feeling that didn't he just say the opposite of this earlier in the podcast? Thanks to whoever created this short clip
English
0
0
0
9
Everything AI
Everything AI@Everything82048·
@MillionInt Is it the frontier when we know the benchmark results of Mythos?
English
1
0
0
1K
Everything AI retweetledi
Benjamin Todd
Benjamin Todd@ben_j_todd·
Yann LeCun in 2032
Benjamin Todd tweet media
Indonesia
57
70
1.2K
151.5K
Jules
Jules@julesagent·
We're opening up the waitlist for a new version of Jules. We're evolving Jules into an end-to-end agentic product development platform that reads your entire product context, figures out what to build next, comes up with solutions, and then ships a PR. Join the waitlist today! Link in comments.
Jules tweet media
English
75
110
1.4K
401.5K
Everything AI retweetledi
Jan Kulveit
Jan Kulveit@jankulveit·
1. Obviously Dario knows way more about the effects of AGI on the labour market than almost any economist, by the virtue of treating AGI seriously, and not "as if nothing ever happens" 2. Yes, listen to the actual expert: youtube.com/watch?v=Z8K-Np… 3. LeCun is not a serious voice.
YouTube video
YouTube
Yann LeCun@ylecun

Dario is wrong. He knows absolutely nothing about the effects of technological revolutions on the labor market. Don't listen to him, Sam, Yoshua, Geoff, or me on this topic. Listen to economists who have spent their career studying this, like @Ph_Aghion , @erikbryn , @DAcemogluMIT , @amcafee , @davidautor

English
41
17
309
98.8K
Miles Brundage
Miles Brundage@Miles_Brundage·
This part of the new Claude app gives me the ick
Miles Brundage tweet media
English
10
0
53
5.7K
Everything AI retweetledi
Andrew Carr 🤸
Andrew Carr 🤸@andrew_n_carr·
meta muse spark crushes one of my hard benchmarks "recommended me something good to read that I am certain to have never read before" theres lots of theory of mind involved, most models recommend the same 20 or so pieces of work. everything spark returned was novel, weird, and good. I had to heard of most of them and they were fun reads.
English
9
12
241
39.9K
Everything AI
Everything AI@Everything82048·
@SkyLi0n @Meta For code "held out perplexity can be a notoriously bad metric and doesn’t reflect downstream performance." Is this really true? Do you know why this is?
English
1
0
0
143
Aaron Gokaslan
Aaron Gokaslan@SkyLi0n·
Yikes, @Meta in their new Muse model is using held out perplexity in a codebase metric? Specifically in a task where held out perplexity can be a notoriously bad metric and doesn’t reflect downstream performance. Worrying to say the least.
Aaron Gokaslan tweet media
English
3
0
17
4.1K
Everything AI retweetledi
The Tennessee Holler
The Tennessee Holler@TheTNHoller·
CNBC: “Is Trump destroying a civilization a bigger upside risk or downside risk?” incredible stuff
English
90
497
3.7K
290.3K
Everything AI retweetledi
Everything AI
Everything AI@Everything82048·
@inductionheads Best how? You use it over Opus 4.6? I just use Opus 4.6 as the default for everything.
English
0
0
1
156
Super Dario
Super Dario@inductionheads·
Why is no one talking about how Sonnet 4.6 is the best model to work with
English
9
1
17
2.6K
Everything AI
Everything AI@Everything82048·
Once you show up on Ark Invest you know you are trying to sell your narrative (whether that is true or not), otherwise there is no need to interact with shady people like Cathy Wood.
Ark Invest Tracker@ArkkDaily

OPENAI'S CFO SAYS: NO COMPUTE. NO REVENUE. - OpenAI is turning down business in 2026 because they don't have enough compute - Codex went from 100K to 2M developers in 3 months. - "If you do not have compute, you do not have revenue. That is one thing I know for sure."

English
0
0
0
27
Everything AI retweetledi
BURKOV
BURKOV@burkov·
With all due respect to Andrew, in his motivational post, he didn't explain why anyone would write code by hand. I can code, but I consider coding by hand a waste of time. So, if I, the one who already knows how to code, consider this a waste of time, why would anyone learn something which is very hard to learn only to then consider it a waste of time, like I do?
BURKOV tweet media
English
186
18
249
121.7K
Everything AI
Everything AI@Everything82048·
@karpathy We will need true continual learning i.e. online model weight updates
English
0
0
0
4
Andrej Karpathy
Andrej Karpathy@karpathy·
One common issue with personalization in all LLMs is how distracting memory seems to be for the models. A single question from 2 months ago about some topic can keep coming up as some kind of a deep interest of mine with undue mentions in perpetuity. Some kind of trying too hard.
English
1.8K
1.1K
21.3K
2.7M