/MachineLearning

15.7K posts

/MachineLearning

@slashML

Cloud Katılım Aralık 2016

0 Takip Edilen123.6K Takipçiler

/MachineLearning retweetledi

💺@patience_cave·8 May

The new Codex Goals feature is able to pursue tasks indefinitely, so how useful is it? I ran it on the public ARC-AGI-3 games. After 160 hours and 30k actions it scored 61% Codex gets the most work done within 4 hours. Afterwards, it begins to stagnate, and wait times increase I am crestfallen it scored so well. I used a two-word baby prompt to “reverse engineer” the games during play. It had no prior knowledge of how the games worked Due to the reverse engineering process, it scores well on its first play through. But once it beats a game, it can score perfectly on its next play through On a few occasions I caught Codex trying to search for solutions on my computer and online. It follows your prompt closely, but if you’re not careful it will find loopholes when it’s frustrated For example, when trying to solve Erdos problems, if it becomes faintly aware the problem is from Erdos, it does not hesitate to give up and say “the problem is listed online as Open, so it cannot / should not be solved” Overall Codex Goals is fascinating, I can appreciate that it works for an unlimited amount of time. People shall value the virtue of patience once again 💺 It makes me wonder how well Codex Goals can do on the private set of ARC-AGI-3. I believe it’s possible to create benchmarks that can mog even the most devilish harness. In the coming weeks, Maze Bench will knock those scores down to 0% where they should be, ne’er to rise again Arcprize Scorecard: arcprize.org/scorecards/6f4…

English

998

150.2K

/MachineLearning retweetledi

Andrew Curran@AndrewCurran_·9 Nis

The new model and the cybersecurity 'product' are separate, and only the cybersecurity specialized model will have limited release, not the new model itself. So it looks like a general public release for Spud.

Lindsay McCallum Rémy@lindsmccallum

@AndrewCurran_ The article you are referencing is inaccurate and has been updated.

English

234

20.6K

/MachineLearning@slashML·9 Nis

"too dangerous to release GPT-2" vibes

Andrew Curran@AndrewCurran_

axios.com/2026/04/09/ope…

English

2.6K

/MachineLearning@slashML·31 Mar

@rabrg @SchmidhuberAI @flowersslop It makes even less sense then 😆

English

Ryan Greene@rabrg·30 Mar

@slashML @SchmidhuberAI @flowersslop well, to be fair, the original referenced paper was also him. lol

English

126

Flowers ☾@flowersslop·28 Mar

Every LLM from any lab today traces back to this guy, who was the only person at OpenAI pushing for pretraining transformer language models. He built GPT-1. After that did others see the potential. He invented it, and almost none of the so called AI experts even know his name.

English

235

4.3K

1.8M

/MachineLearning@slashML·29 Mar

@SchmidhuberAI @flowersslop Wouldn't it make more sense to just reference the original paper you're talking about in your response than your own paper that references it?

English

2.6K

Jürgen Schmidhuber@SchmidhuberAI·29 Mar

@flowersslop belated reply: pre-training for deep neural networks dates back to April 1991 arxiv.org/abs/2212.11279

English

366

57.1K

/MachineLearning retweetledi

Jianyang Gao@gaoj0017·27 Mar

The TurboQuant paper (ICLR 2026) contains serious issues in how it describes RaBitQ, including incorrect technical claims and misleading theory/experiment comparisons. We flagged these issues to the authors before submission. They acknowledged them, but chose not to fix them. The paper was later accepted and widely promoted by Google, reaching tens of millions of views. We’re speaking up now because once a misleading narrative spreads, it becomes much harder to correct. We’ve written a public comment on openreview (openreview.net/forum?id=tO3AS…). We would greatly appreciate your attention and help in sharing it.

Google Research@GoogleResearch

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI

English

969

6.5K

/MachineLearning retweetledi

Agentica@agenticasdk·27 Mar

We scored 36.08% on ARC-AGI-3 in one day using the Agentica SDK.

English

127

1.4K

418.7K

/MachineLearning retweetledi

Lisan al Gaib@scaling01·25 Mar

my read on the ARC-AGI-3 situation is that models were too good with harness so they decided no harness at all

Lisan al Gaib@scaling01

this is pretty much worst case performance no harness at all and very simplistic prompt

English

224

18.3K

/MachineLearning retweetledi

Artur Chakhvadze@norpadon·21 Mar

The main goal of Bayesian ML research is to show that all methods which have previously been shown to work well in practice are somehow approximately Bayesian

CLaE@leafs_s

Transformers are Bayesian Networks arxiv.org/abs/2603.17063

English

123

2.1K

142.3K

/MachineLearning retweetledi

The Spectator Index@spectatorindex·5 Mar

Anthropic is resuming negotiations with the Pentagon for a deal on artificial intelligence, according to FT report.

English

148

268

2.8K

524K

/MachineLearning@slashML·18 Şub

post-agi career choices

Amy Tam@amytam01

x.com/i/article/2023…

English

11K

/MachineLearning@slashML·26 Oca

@_rockt @NetHack_LE There will certainly be ARC-AGI 4+

English

1.3K

Tim Rocktäschel@_rockt·26 Oca

After ARC-AGI 3 is saturated there will still be @NetHack_LE / balrogai.com left to conquer.

English

10.2K

/MachineLearning@slashML·21 Oca

@bubbleboi What financial commitments? Everything announced has thus far been optoinal ("up to X" amount)

English

/MachineLearning@slashML·20 Oca

@EnoReyes What interface are you wrapping GLM with?

English

436

Eno Reyes@EnoReyes·19 Oca

The most cost effective combination right now is setting Opus as your plan model and GLM 4.7 or GPT-5.2-Codex as your execution model. Gives you basically the same performance as opus, for a fraction of the tokens.

English

818

131.4K

/MachineLearning@slashML·19 Oca

Taken directly from: openai.com/index/a-busine…

English

1.4K

/MachineLearning@slashML·19 Oca

OpenAI plans to claim IP over the tokens sent to users?

English

1.5K

/MachineLearning retweetledi

Jeffrey Emanuel@doodlestein·18 Oca

Would you believe that, far from sponsoring me, @AnthropicAI today started banning several of my (now 22) Max accounts? For the crime of using their models to produce the most useful open-source agent coding tooling on the planet, and then giving it all away for free. And teaching my workflows and methods and prompts to everyone selflessly. Anthropic people who follow me (I know there are dozens of you), please DM me and make this right. I’m not asking for a handout. I’m paying $212 per month with tax for each of those accounts. And I also let you collect info on my usage and use the official harness. The RL from my usage is pure gold. I’ve also been a massive promoter of your company and it’s really messed up to try to ban me like this. Puts a really bad taste in my mouth and makes me never want to promote you guys again. I need to be spending my energy creating, not being made to feel like a criminal for making MIT-licensed tools. You’re also just helping your antagonist, Sam, since I’m now the proud owner of 11 GPT Pro accounts (and counting). I refuse to lose my momentum because of this nonsense. I will not be slowed.

John Thilén@JohnThilen

@doodlestein @AnthropicAI: please sponsor this man.

English

1.2K

272K

/MachineLearning@slashML·12 Oca

@scottastevenson A rule that goes as far back as life itself, the bigger something is, the slower it moves.

English

449

Scott Stevenson@scottastevenson·12 Oca

Software is about to go through the same transition that stock trading did when algorithmic traders entered the market. AI will not be good for bootstrappers. They will be wrecked like retail traders were. There used to be many crevices of the market that large software companies couldn’t reach. Bootstrappers and small caps built nests there. But with AI, large software companies will start to look like multi-vertical hedge funds. With 1000 AI tentacles, they will suck the alpha out of every crevice. While one crevice may not have been appetizing enough to go after before, 1000 will be. Software will begin to have something like “market makers” who make money on everything. A small number of hedgefund-like software companies may come to own everything.

ᴅᴀɴɪᴇʟ ᴍɪᴇssʟᴇʀ 🛡️@DanielMiessler

Holy crap. This is the genre of software that's in the most danger: - Kind of mid in quality - Highly niche use-cases - It's been winner takes all for the space in the past - Often involved special formats or protocols And now Claude Code can just reverse engineer it. 🤯

English

1.3K

339.1K

/MachineLearning retweetledi

Tibo@thsottiaux·10 Oca

Codex ❤️ OSS. Over the coming days we are prioritizing working with open source coding agents and tools to support them in the same way as OpenCode, so that codex users can benefit from their account and usage in those combined with using our models in codex directly. We are already talking with OpenHands, RooCode and Pi. Reach out if you build in the open and would benefit from this. Our own work is OSS at github.com/openai/codex

English

150

155

2.4K

198.4K

/MachineLearning retweetledi

Quanta Magazine@QuantaMagazine·7 Oca

As AI models grow more powerful, they appear to be converging on how they internally represent reality. @benbenbrubaker reports: quantamagazine.org/distinct-ai-mo…

English

141

62.8K

Keşfet

@rabrg @SchmidhuberAI @flowersslop @_rockt @NetHack_LE @bubbleboi @EnoReyes @elonmusk