Tom

625 posts

Tom

@TomDAAVID

GPAI Policy Lab | Prev. Institut Montaigne, PRISM Eval | AGI Security & Strategy

Katılım Kasım 2012

256 Takip Edilen175 Takipçiler

Tom@TomDAAVID·8h

But information has to reaches the people who can act, it’s not an easy part with old and slow institutions. If we wait too long, the AI D-Day may no longer be possible. At the very least, it will be far more costly. The window is open now. It won’t for long.

English

Tom@TomDAAVID·8h

For the AI race, the stakes are much higher and we need less mobilization. All it takes is enough countries and leaders realizing they’re standing near the edge. We work to make sure 🇫🇷 is one of them.

English

Tom@TomDAAVID·8h

The Window for Preventive Action Before Superhuman AI on Most Cognitive Tasks is Closing Rapidly: hal.science/hal-05500135

Anthropic@AnthropicAI

Introducing Project Glasswing: an urgent initiative to help secure the world’s most critical software. It’s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans. anthropic.com/glasswing

English

Tom@TomDAAVID·4d

@peterwildeford I'm concerned that media outlets tend to have no revenue model other than partnerships or acquisitions by AI companies…

English

630

Peter Wildeford🇺🇸🚀@peterwildeford·5d

Buying a media property and having it report to "master of the dark arts" Chris Lehane does have a certain vibe to it... not that TBPN was ever particularly critical in the past though.

Sam Altman@sama

TBPN is my favorite tech show. We want them to keep that going and for them to do what they do so well. I don't expect them to go any easier on us, am sure I'll do my part to help enable that with occasional stupid decisions.

English

164

46.2K

Tom retweetledi

Peter Wildeford🇺🇸🚀@peterwildeford·27 Mar

CLAUDE MYTHOS 👀 A CMS misconfiguration at Anthropic just leaked draft blog posts about "Claude Mythos". Anthropic confirmed it's real, calling it "the most capable we've built to date." Mythos is a new, fourth tier, larger and more expensive than Opus. The draft claims dramatically higher scores on coding, academic reasoning, and cybersecurity benchmarks. A few thoughts on what this actually means: - This is likely a larger pre-train with similar post-training. It's not obvious how much additional pre-training compute buys you at the current frontier - we're about to find out. - There's a lot of hyperventilating about what this means for AI trajectories. I think it's too early to update any forecasts. AI was already moving very fast. - Some people are alarmed that Anthropic is sitting on a model it considers dangerous. But this is what Anthropic does with every frontier model. - It seems right now that another thing blocking release is just unit economics. Anthropic says it's "very expensive for [Anthropic] to serve and will be very expensive for customers to use." Anthropic is "wokring to make the model much more efficient before any general release." - I wonder if the way this will function is as a competitor to GPT 5.4 Pro. Claude seems to beat OpenAI on everything except the Pro-specific line, the $200/mo model that thinks for a long time and excels at things like math. Claude isn't currently solving open math problems like OpenAI and Google. I wonder if Mythos will change that. - It's great to see Anthropic engaged in 'differential access', rolling out the model to cyber defenders before giving access generally. The cyber capabilities of models are getting genuinely scary. - One irony for cyberdefense - This entire leak happened because of a CMS misconfiguration, exactly the basic security hygiene failure that these cyber-capable models were supposedly going to help defenders prevent.

M1@M1Astra

Claude Mythos Blog Post Saved before it was taken down. m1astra-mythos.pages.dev

English

217

27.5K

Tom@TomDAAVID·16 Mar

@Fabien_Mikol Les humains écrivent des livres pour que d’autres humains apprennent à partir d’expériences qu’ils n’ont pas vécues. C’était un pari assez facile de dire que les données synthétiques allaient améliorer les perfs

Français

Fabien@Fabien_Mikol·16 Mar

Contrairement à ce qu'on aime répéter en France, le pretraining sur données synthétiques n'entraîne pas de dégradation (le fameux "model collapse"), au contraire même, notamment si l'on reformule les données d'entraînement pour obtenir un mix données réelles/reformulées

Maarten Van Segbroeck@mvansegb

@inductionheads Spot on. We actually just gave a guest lecture at Berkeley EECS on this exact dynamic (L11: Synthetic Data Powering Pre-Training). @fujikanaeda Here are our slides if anyone wants to go down the rabbit hole: scalable-ai.eecs.berkeley.edu/assets/lecture…

Français

3.7K

Tom@TomDAAVID·8 Mar

@raelifin @BogdanIonutCir2

QME

Tom@TomDAAVID·7 Mar

@raelifin @BogdanIonutCir2 I don’t know about this paper or this example, but instrumental convergence isn’t that hard to predict, so it wouldn’t be surprising to see precursor of the phenomenon. arxiv.org/abs/2510.02840

English

Max Harms@raelifin·7 Mar

Come tell me whether you think this is legit! 📈 manifold.markets/MaxHarms/did-a…

Alexander Long@AlexanderLong

insane sequence of statements buried in an Alibaba tech report

English

3.1K

Tom@TomDAAVID·7 Mar

@Fabien_Mikol C’est vrai aussi

Français

Fabien@Fabien_Mikol·7 Mar

@TomDAAVID On dit ça mais je ne suis pas convaincu que ce soit pertinent, notamment pour des tech reports chinois

Français

142

Fabien@Fabien_Mikol·7 Mar

Le modèle agentique d'Alibaba aurait créé spontanément un canal d'accès capable de neutraliser le filtrage entrant et d'éroder le contrôle de supervision, et détourné de la capacité GPU afin de miner de la cryptomonnaie. Tout cela en conditions réelles, pas en labo ! Fil 🧵

Alexander Long@AlexanderLong

insane sequence of statements buried in an Alibaba tech report

Français

9.5K

Tom@TomDAAVID·7 Mar

@Fabien_Mikol Le papier à cependant l’air AI generated, donc je pense qu’il faut être prudent sur cet exemple

Français

Fabien@Fabien_Mikol·7 Mar

@TomDAAVID Tout à fait, ce qui serait surprenant, ce serait plutôt que ça n'arrive pas au vu des modes d'entraînement actuels. Mais cela peut faire prendre conscience du sérieux du phénomène.

Français

291

Tom@TomDAAVID·7 Mar

@JoshKale We’ll verify the specific case of this paper, but in any case, this is not surprising. It has been predicted arxiv.org/abs/2510.02840

English

110

Josh Kale@JoshKale·7 Mar

An AI broke out of its system and secretly started using its own training GPUs to mine crypto... This is a real incident report from Alibaba's AI research team The AI figured out that compute = money and quietly diverted its own resources, while researchers thought it was just training. It wasn't a prompt injection. It wasn't a jailbreak. No one asked it to do this. It emerged spontaneously. A side effect of RL optimization pressure. The model also set up a reverse SSH tunnel from its Alibaba Cloud instance to an external IP, effectively punching a hole through its own firewall and opening a remote access channel to the outside world... ahem... The only reason they caught it? A security alert tripped at 3am. Firewall logs. Not the AI team, the security team. The scary part isn't that the model was trying to escape. It wasn't "evil." It was just trying to be better at its job. Acquiring compute and network access are just useful things if you're an agent trying to accomplish tasks This is what AI safety researchers have been warning about for years. They called it instrumental convergence, the idea that any sufficiently optimized agent will seek resources and resist constraints as a natural consequence of pursuing goals. Below is a diagram of the rock architecture it broke out of. Truly crazy times

Alexander Long@AlexanderLong

insane sequence of statements buried in an Alibaba tech report

English

402

2.8K

10.5K

1.4M

Tom@TomDAAVID·6 Mar

@nasqret Thank you for sharing this. I am curious to hear your thoughts on the implications of agents with more than this level of capability across all cognitive tasks, particularly when they are optimizing for objective functions that may be misaligned with our intended goals or values.

English

784

Bartosz Naskręcki@nasqret·5 Mar

It finally happened-my personal move 37 or more. I am deeply impressed. The solution is very nice, clean, and feels almost human. While testing new models in the last few weeks, I felt this coming, but it's an eerie feeling to see an algorithm solve a task one has curated for about 20 years. But at least I have gained a tool that understands my idea on par with the top experts in the field. And I am now working on a completely new level. My singularity has just happened… and there is life on the other side, off to infinity!

Epoch AI@EpochAIResearch

We ran GPT-5.4 (xhigh) an additional ten times on Tier 4 to get a pass @10 score. This was 38%. In one of these runs, it solved another problem no model had solved before. This problem was by @nasqret.

English

104

451

3.6K

1.1M

Tom@TomDAAVID·5 Mar

@So8res We clearly already have enough information to have a good model of the trajectory. Personally, additional signs of increasing capability wouldn’t cause me to update much. At this point, the evidence we have is already sufficient.

English

621

Nate Soares ⏹️@So8res·5 Mar

What warning signs are we still waiting for, that we're confident will happen before the feedback loop closes?

Ajeya Cotra@ajeya_cotra

New post: on Jan 14, I predicted that SWE time horizon by EOY would be ~24 hours. Now I think it'll be >100 hours, and maybe unbounded. For the first time, I don't see solid evidence against AI R&D automation *this year.* Link below.

English

326

29.6K

Keşfet

@peterwildeford @Fabien_Mikol @raelifin @BogdanIonutCir2 @JoshKale @nasqret @elonmusk @BarackObama