Everything AI

60 posts

Everything AI

@Everything82048

Katılım Mart 2026

135 Takip Edilen3 Takipçiler

Everything AI retweetledi

Zvi Mowshowitz@TheZvi·3d

Okay, since people seem to be not understanding the distinction here, I'll spell it out. They are not the same. Mythos can, on its own, discover lots of new vulnerabilities, because it is capable of navigating and exploring on its own and stringing these things together. It doesn't need to be told exactly what to do, it can figure out what to do. GPT-5.5 is at least as good as Mythos on 'narrow cyber tasks' as per UK AISI, but they have to be narrow. You need to know what it is you want done. That's valuable, but it's not at all the same thing, and far less dangerous. If OpenAI could have compiled and fixed a similar stream of bugs in the world's most important software, at similar compute cost, I presume that they would have. Indeed, GPT-5.5-Cyber exists, and yet the White House is objecting to Anthropic expanding deployment of Mythos. You think they're doing this for no reason? Meanwhile, the whole 'everyone will have it in six months' is the usual pretending that the situation is much closer than it is, although of course on a long enough time horizon the point stands.

David Sacks@DavidSacks

It’s time to demystify Mythos. Mythos is not magic. It’s not a doomsday device. It’s the first of many models that can automate cyber tasks (just like coding). OpenAI’s GPT-5.5-cyber can now do the same. And all the frontier models (including those from China) will be there within approximately 6 months. It’s important to recognize that these models do not create vulnerabilities; they discover them. The bugs are already in the code. Using AI to discover and patch them will actually harden these systems. The leap from pre-AI cyber to post-AI cyber means that there will be a big upgrade cycle. After that, however, the market is likely to reach a new equilibrium between AI-powered cyber-offense and AI-powered cyber-defense. Obviously it’s important that cyber defenders get access before cyber attackers. That process is already underway but needs to happen quickly (see point above about Chinese models). Unlike Mythos, GPT-5.5-cyber appears not to be token constrained so it may be the first cyber model that defenders actually get to use.

English

434

53.6K

Everything AI@Everything82048·5d

The viral myth that made you think your job was safe youtu.be/D5lWbeBxzPs?si… via @YouTube

YouTube

English

Everything AI retweetledi

Deedy@deedydas·23 Nis

GPT 5.5 underperforms Opus 4.7 on SWE-Bench Pro. Couldn't find any reported SWE-Bench scores at all and an internal benchmark is reported instead. That footnote is trying really hard to bury the lede. GPT 5.5 isn't SOTA for coding.

English

163

1.1K

226.5K

Everything AI@Everything82048·24 Nis

@banteg lol I don't think OpenAI intended this. Let's see if this is one off or more users experience this.

English

2.4K

banteg@banteg·23 Nis

what kind of personality did they put in gpt 5.5

English

113

133

5.5K

404.2K

Everything AI@Everything82048·23 Nis

@peterwildeford This is exactly how I felt when listening to the podcast - that feeling that didn't he just say the opposite of this earlier in the podcast? Thanks to whoever created this short clip

English

Everything AI@Everything82048·23 Nis

@MillionInt Is it the frontier when we know the benchmark results of Mythos?

English

Jerry Tworek@MillionInt·23 Nis

Congratulations to my friends at OpenAI for releasing new frontier model!

OpenAI@OpenAI

Introducing GPT-5.5 A new class of intelligence for real work and powering agents, built to understand complex goals, use tools, check its work, and carry more tasks through to completion. It marks a new way of getting computer work done. Now available in ChatGPT and Codex.

English

475

13.9K

Everything AI retweetledi

Benjamin Todd@ben_j_todd·23 Nis

Yann LeCun in 2032

Indonesia

1.2K

151.5K

Everything AI@Everything82048·23 Nis

@julesagent @julesagent First can you please let us know if @Google employees are using Jules or are they using Claude Code?

English

709

Jules@julesagent·22 Nis

We're opening up the waitlist for a new version of Jules. We're evolving Jules into an end-to-end agentic product development platform that reads your entire product context, figures out what to build next, comes up with solutions, and then ships a PR. Join the waitlist today! Link in comments.

English

110

1.4K

401.5K

Everything AI retweetledi

Jan Kulveit@jankulveit·19 Nis

1. Obviously Dario knows way more about the effects of AGI on the labour market than almost any economist, by the virtue of treating AGI seriously, and not "as if nothing ever happens" 2. Yes, listen to the actual expert: youtube.com/watch?v=Z8K-Np… 3. LeCun is not a serious voice.

YouTube

Yann LeCun@ylecun

Dario is wrong. He knows absolutely nothing about the effects of technological revolutions on the labor market. Don't listen to him, Sam, Yoshua, Geoff, or me on this topic. Listen to economists who have spent their career studying this, like @Ph_Aghion , @erikbryn , @DAcemogluMIT , @amcafee , @davidautor

English

309

98.8K

Everything AI@Everything82048·18 Nis

@Miles_Brundage My feelings exactly!

English

Miles Brundage@Miles_Brundage·17 Nis

This part of the new Claude app gives me the ick

English

5.7K

Everything AI retweetledi

Miles Brundage@Miles_Brundage·16 Nis

Props to @dwarkesh_sp for asking Jensen real Qs

English

196

7.3K

Everything AI retweetledi

Andrew Carr 🤸@andrew_n_carr·14 Nis

meta muse spark crushes one of my hard benchmarks "recommended me something good to read that I am certain to have never read before" theres lots of theory of mind involved, most models recommend the same 20 or so pieces of work. everything spark returned was novel, weird, and good. I had to heard of most of them and they were fun reads.

English

241

39.9K

Everything AI@Everything82048·9 Nis

@SkyLi0n @Meta For code "held out perplexity can be a notoriously bad metric and doesn’t reflect downstream performance." Is this really true? Do you know why this is?

English

143

Aaron Gokaslan@SkyLi0n·8 Nis

Yikes, @Meta in their new Muse model is using held out perplexity in a codebase metric? Specifically in a task where held out perplexity can be a notoriously bad metric and doesn’t reflect downstream performance. Worrying to say the least.

English

4.1K

Everything AI@Everything82048·8 Nis

Absolute bonkers stuff!

The Tennessee Holler@TheTNHoller

CNBC: “Is Trump destroying a civilization a bigger upside risk or downside risk?” incredible stuff

Svenska

Everything AI retweetledi

The Tennessee Holler@TheTNHoller·8 Nis

CNBC: “Is Trump destroying a civilization a bigger upside risk or downside risk?” incredible stuff

English

497

3.7K

290.3K

Everything AI retweetledi

Zvi Mowshowitz@TheZvi·8 Nis

I am entirely unsurprised that mainstream media doesn't realize the importance of Claude Mythos and Project Glasswing, and I can't blame them for leading with the Iran cease fire, but yeah the world has no idea what is going on.

Shakeel@ShakeelHashim

The Anthropic Mythos release does not appear near the top of the homepage on any major news site today. The NYT is closest, but it's still pretty far down. The Guardian thinks a Vogue cover with Anna Wintour and Meryl Streep is more important. The Washington Post is prioritizing yet another "we tried to get into Berghain" story. The media is not adequately covering the insane moment we are in.

English

866

50.6K

Everything AI@Everything82048·6 Nis

@inductionheads Best how? You use it over Opus 4.6? I just use Opus 4.6 as the default for everything.

English

156

Super Dario@inductionheads·6 Nis

Why is no one talking about how Sonnet 4.6 is the best model to work with

English

2.6K

Everything AI@Everything82048·1 Nis

Once you show up on Ark Invest you know you are trying to sell your narrative (whether that is true or not), otherwise there is no need to interact with shady people like Cathy Wood.

Ark Invest Tracker@ArkkDaily

OPENAI'S CFO SAYS: NO COMPUTE. NO REVENUE. - OpenAI is turning down business in 2026 because they don't have enough compute - Codex went from 100K to 2M developers in 3 months. - "If you do not have compute, you do not have revenue. That is one thing I know for sure."

English

Everything AI retweetledi

BURKOV@burkov·29 Mar

With all due respect to Andrew, in his motivational post, he didn't explain why anyone would write code by hand. I can code, but I consider coding by hand a waste of time. So, if I, the one who already knows how to code, consider this a waste of time, why would anyone learn something which is very hard to learn only to then consider it a waste of time, like I do?

English

186

249

121.7K

Everything AI@Everything82048·25 Mar

@karpathy We will need true continual learning i.e. online model weight updates

English

Andrej Karpathy@karpathy·25 Mar

One common issue with personalization in all LLMs is how distracting memory seems to be for the models. A single question from 2 months ago about some topic can keep coming up as some kind of a deep interest of mine with undue mentions in perpetuity. Some kind of trying too hard.

English

1.8K

1.1K

21.3K

2.7M

Keşfet

@YouTube @banteg @peterwildeford @MillionInt @julesagent @Google @Miles_Brundage @dwarkesh_sp