/MachineLearning

15.7K posts

/MachineLearning

/MachineLearning

@slashML

Cloud Katılım Aralık 2016
0 Takip Edilen123.6K Takipçiler
/MachineLearning retweetledi
💺
💺@patience_cave·
The new Codex Goals feature is able to pursue tasks indefinitely, so how useful is it? I ran it on the public ARC-AGI-3 games. After 160 hours and 30k actions it scored 61% Codex gets the most work done within 4 hours. Afterwards, it begins to stagnate, and wait times increase I am crestfallen it scored so well. I used a two-word baby prompt to “reverse engineer” the games during play. It had no prior knowledge of how the games worked Due to the reverse engineering process, it scores well on its first play through. But once it beats a game, it can score perfectly on its next play through On a few occasions I caught Codex trying to search for solutions on my computer and online. It follows your prompt closely, but if you’re not careful it will find loopholes when it’s frustrated For example, when trying to solve Erdos problems, if it becomes faintly aware the problem is from Erdos, it does not hesitate to give up and say “the problem is listed online as Open, so it cannot / should not be solved” Overall Codex Goals is fascinating, I can appreciate that it works for an unlimited amount of time. People shall value the virtue of patience once again 💺 It makes me wonder how well Codex Goals can do on the private set of ARC-AGI-3. I believe it’s possible to create benchmarks that can mog even the most devilish harness. In the coming weeks, Maze Bench will knock those scores down to 0% where they should be, ne’er to rise again Arcprize Scorecard: arcprize.org/scorecards/6f4…
💺 tweet media
English
53
72
998
150.2K
/MachineLearning retweetledi
Andrew Curran
Andrew Curran@AndrewCurran_·
The new model and the cybersecurity 'product' are separate, and only the cybersecurity specialized model will have limited release, not the new model itself. So it looks like a general public release for Spud.
Andrew Curran tweet media
Lindsay McCallum Rémy@lindsmccallum

@AndrewCurran_ The article you are referencing is inaccurate and has been updated.

English
11
22
234
20.6K
Flowers ☾
Flowers ☾@flowersslop·
Every LLM from any lab today traces back to this guy, who was the only person at OpenAI pushing for pretraining transformer language models. He built GPT-1. After that did others see the potential. He invented it, and almost none of the so called AI experts even know his name.
Flowers ☾ tweet media
English
85
235
4.3K
1.8M
/MachineLearning
/MachineLearning@slashML·
@SchmidhuberAI @flowersslop Wouldn't it make more sense to just reference the original paper you're talking about in your response than your own paper that references it?
English
1
0
30
2.6K
/MachineLearning retweetledi
Jianyang Gao
Jianyang Gao@gaoj0017·
The TurboQuant paper (ICLR 2026) contains serious issues in how it describes RaBitQ, including incorrect technical claims and misleading theory/experiment comparisons. We flagged these issues to the authors before submission. They acknowledged them, but chose not to fix them. The paper was later accepted and widely promoted by Google, reaching tens of millions of views. We’re speaking up now because once a misleading narrative spreads, it becomes much harder to correct. We’ve written a public comment on openreview (openreview.net/forum?id=tO3AS…). We would greatly appreciate your attention and help in sharing it.
Google Research@GoogleResearch

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI

English
98
969
6.5K
1M
/MachineLearning retweetledi
Agentica
Agentica@agenticasdk·
We scored 36.08% on ARC-AGI-3 in one day using the Agentica SDK.
English
71
127
1.4K
418.7K
/MachineLearning retweetledi
The Spectator Index
The Spectator Index@spectatorindex·
Anthropic is resuming negotiations with the Pentagon for a deal on artificial intelligence, according to FT report.
English
148
268
2.8K
524K
/MachineLearning
/MachineLearning@slashML·
@bubbleboi What financial commitments? Everything announced has thus far been optoinal ("up to X" amount)
English
0
0
0
38
Eno Reyes
Eno Reyes@EnoReyes·
The most cost effective combination right now is setting Opus as your plan model and GLM 4.7 or GPT-5.2-Codex as your execution model. Gives you basically the same performance as opus, for a fraction of the tokens.
English
63
38
818
131.4K
/MachineLearning
/MachineLearning@slashML·
OpenAI plans to claim IP over the tokens sent to users?
/MachineLearning tweet media
English
1
0
3
1.5K
/MachineLearning retweetledi
Jeffrey Emanuel
Jeffrey Emanuel@doodlestein·
Would you believe that, far from sponsoring me, @AnthropicAI today started banning several of my (now 22) Max accounts? For the crime of using their models to produce the most useful open-source agent coding tooling on the planet, and then giving it all away for free. And teaching my workflows and methods and prompts to everyone selflessly. Anthropic people who follow me (I know there are dozens of you), please DM me and make this right. I’m not asking for a handout. I’m paying $212 per month with tax for each of those accounts. And I also let you collect info on my usage and use the official harness. The RL from my usage is pure gold. I’ve also been a massive promoter of your company and it’s really messed up to try to ban me like this. Puts a really bad taste in my mouth and makes me never want to promote you guys again. I need to be spending my energy creating, not being made to feel like a criminal for making MIT-licensed tools. You’re also just helping your antagonist, Sam, since I’m now the proud owner of 11 GPT Pro accounts (and counting). I refuse to lose my momentum because of this nonsense. I will not be slowed.
John Thilén@JohnThilen

@doodlestein @AnthropicAI: please sponsor this man.

English
78
43
1.2K
272K
Scott Stevenson
Scott Stevenson@scottastevenson·
Software is about to go through the same transition that stock trading did when algorithmic traders entered the market. AI will not be good for bootstrappers. They will be wrecked like retail traders were. There used to be many crevices of the market that large software companies couldn’t reach. Bootstrappers and small caps built nests there. But with AI, large software companies will start to look like multi-vertical hedge funds. With 1000 AI tentacles, they will suck the alpha out of every crevice. While one crevice may not have been appetizing enough to go after before, 1000 will be. Software will begin to have something like “market makers” who make money on everything. A small number of hedgefund-like software companies may come to own everything.
ᴅᴀɴɪᴇʟ ᴍɪᴇssʟᴇʀ 🛡️@DanielMiessler

Holy crap. This is the genre of software that's in the most danger: - Kind of mid in quality - Highly niche use-cases - It's been winner takes all for the space in the past - Often involved special formats or protocols And now Claude Code can just reverse engineer it. 🤯

English
83
73
1.3K
339.1K
/MachineLearning retweetledi
Tibo
Tibo@thsottiaux·
Codex ❤️ OSS. Over the coming days we are prioritizing working with open source coding agents and tools to support them in the same way as OpenCode, so that codex users can benefit from their account and usage in those combined with using our models in codex directly. We are already talking with OpenHands, RooCode and Pi. Reach out if you build in the open and would benefit from this. Our own work is OSS at github.com/openai/codex
English
150
155
2.4K
198.4K