Aman

5.6K posts

Aman

@arcaman07

Incoming MS CS @GeorgiaTech Scaling RL + Continual Learning @lossfunk

Katılım Ağustos 2020

224 Takip Edilen975 Takipçiler

Sabitlenmiş Tweet

Aman@arcaman07·16 Nis

Can frontier AI models actually read a painting? I tested 4 frontier AI models on 15 artworks worth $1.46B, first from the image alone and then with basic metadata. What I found was not just a performance gap, but a recognition vs commitment gap. Three of the four models could identify the correct artist from pixels alone on essentially every painting. But that did not mean they would commit to the valuation implied by what they saw. Gemini 3.1 Pro was strongest in both settings. GPT-5.4 improved sharply once metadata was added. Blog: arcaman07.github.io/blog/can-llms-…

Aman@arcaman07

x.com/i/article/2044…

English

737

Aman retweetledi

Matt Mullin@matthewwmullin·1d

NASA HAS RELEASED OVER 12,000 IMAGES OF THE ARTEMIS II MISSION. Unbelievable perspectives captured by the Crew! The aurora on the eclipse is incredible.

English

273

8.7K

60.4K

1.9M

Aman retweetledi

𝗿𝗮𝗺𝗮𝗸𝗿𝘂𝘀𝗵𝗻𝗮— 𝗲/𝗮𝗰𝗰@techwith_ram·2d

Stanford's latest seminar is a deep dive into the evolution of world modeling in AI. Focuses on the shift in the world model from traditional reconstruction methods toward latent space prediction. Covers topics like: - Introduction to JEPA & World Models - Causal JEPA - LOWER Model - Practical Applications & Planning - Future Outlook

English

164

1.5K

198.6K

Aman retweetledi

Lawrence Chan@justanotherlaw·3d

A recent viral paper claims to reverse-engineer the parameter counts of frontier models: GPT-5.5 = 9.7T, Opus 4.7 = 4.0T, o1 = 3.5T, etc. @ben_sturgeon and I investigated and found serious issues in the paper; fixing them gives GPT-5.5 as ~1.5T (90% CI: 256B-8.3T).

English

950

204.5K

Aman@arcaman07·2d

@retr0sushi_ @encapsulated007 has plenty of resources for this.

English

182

himanshu@retr0sushi_·2d

always a beginner :) ps : if you have resources or roadmaps don't be shy to share them with me pls!

English

2.9K

Aman@arcaman07·3d

@carlagriffs it's happening quite often, I was a bit confused then what is the point of rebuttals if these issues still persist.

English

518

Carla Griffiths@carlagriffs·3d

@arcaman07 sorry man, i def relate, there's this new persistent trend of goal post moving in tier A conferences

English

621

Aman@arcaman07·4d

ML conference timeline: 1) submit the paper you have been working on for several months. 2) reviewers need additional experiments and clarifications. 3) as the primary author you run all of those experiments and report those findings. 4) reviewers are satisfied, don't increase the scores and ignore you. 5) PC says those experiments cleared all clarifications but please add those to updated paper ( you can't revise during rebuttals) and submit to another venue.

English

163

15.1K

Aman retweetledi

Vincent Sitzmann@vincesitzmann·16 Şub

In my recent blog post, I argue that "vision" is only well-defined as part of perception-action loops, and that the conventional view of computer vision - mapping imagery to intermediate representations (3D, flow, segmentation...) is about to go away. vincentsitzmann.com/blog/bitter_le…

English

164

380.6K

Aman retweetledi

Dwarkesh Patel@dwarkesh_sp·5d

Did a very different format with @reinerpope – a blackboard lecture where he walks through how frontier LLMs are trained and served. It's shocking how much you can deduce about what the labs are doing from a handful of equations, public API prices, and some chalk. It’s a bit technical, but I encourage you to hang in there - it’s really worth it. There are less than a handful of people who understand the full stack of AI, from chip design to model architecture, as well as Reiner. It was a real delight to learn from him. Recommend watching this one on YouTube so you can see the chalkboard. 0:00:00 – How batch size affects token cost and speed 0:31:59 – How MoE models are laid out across GPU racks 0:47:02 – How pipeline parallelism spreads model layers across racks 1:03:27 – Why Ilya said, “As we now know, pipelining is not wise.” 1:18:49 – Because of RL, models may be 100x over-trained beyond Chinchilla-optimal 1:32:52 – Deducing long context memory costs from API pricing 2:03:52 – Convergent evolution between neural nets and cryptography

English

146

595

6.5K

1.2M

Aman retweetledi

David Duvenaud@DavidDuvenaud·28 Nis

Announcing Talkie: a new, open-weight historical LLM! We trained and finetuned a 13B model on a newly-curated dataset of only pre-1930 data. Try it below! with @AlecRad and @status_effects 🧵

English

200

454

3.6K

1.4M

Aman retweetledi

Ziqian Zhong@fjzzq2002·17 Nis

x.com/i/article/2044…

ZXX

112

14.2K

Aman retweetledi

Hater Report@HaterReport·25 Nis

LeBron and LeBron and Bronny in 2006 Bronny in 2026

English

300

13K

110.7K

4.1M

Aman retweetledi

Imbue@imbue_ai·24 Nis

Deep learning works extraordinarily well. And we still largely don't know why. A new paper from @learning_mech, @KuninDaniel, and 12 co-authors argues that a scientific theory of deep learning is emerging, and coins a name for the emerging field: learning mechanics. We sat down with Jamie and Dan on Generally Intelligent to talk about what a physics of deep learning would actually look like, why now, and what's left to figure out. 3:05 Learning mechanics as the physics to mechanistic interpretability's biology 4:13 Why deep learning needs a theory 7:07 Why deep learning is uniquely hard to engineer 12:11 How a week in the woods became a paper 25:59 The barrier to theory isn't opacity, but complexity 36:26 Deep learning's first gas law 47:22 Why more particles makes the problem easier 56:22 The discretization hypothesis 1:01:50 The strongest signal that a compact theory exists 1:05:07 The Platonic Representation Hypothesis 1:15:41 Why learning mechanics and mech interp need each other 1:25:29 Theory as safety infrastructure

English

138

17.6K

Aman retweetledi

Luke Bailey@LukeBailey181·23 Nis

Self-play led to superhuman Go performance, why hasn’t it for LLMs? In practice, long run self-play plateaus like RL. We study why this happens, and build a self-play algorithm that scales better. It solves as many problems with a 7B model as the pass@4 of a model 100x bigger.

GIF

English

150

998

135.7K

Aman retweetledi

Sakana AI@SakanaAILabs·18 Nis

What happens when you put competing neural networks in a Petri Dish and start changing the rules while they adapt? Last year we released Petri Dish NCA, where neural nets are the organisms that learn during simulation. Today we're releasing Digital Ecosystems: a browser-based platform for interactive artificial life research. The setup: several small CNNs share a 2D grid, each seeing only a 3x3 neighborhood. No global plan. They compete for territory by attacking neighbours and defending against incoming attacks, learning via gradient descent online while the simulation runs. What we didn't expect was the role of the learning itself. Gradient descent isn't just optimising each species' strategy. Instead, it acts to stabilize the whole system during simulation. Species that overextend get pushed back by the loss. Species that stagnate get nudged to grow. This means you can push parameters toward edge-of-chaos regimes: a zone characterised by emergent complexity. Letting the neural networks learn acts to hold the complex system together while you explore and interact. The platform lets you steer all of this interactively. You can draw walls to create niches, erase parts of the system online, and tune 40+ system parameters to explore the most interesting configurations. We find it mesmerizing to watch species carve out territories and reorganise when you perturb them. Everything runs client-side in your browser, no install needed. Blog: pub.sakana.ai/digital-ecosys… Code: github.com/SakanaAI/digit…

English

199

1.1K

235.6K

Aman@arcaman07·24 Nis

The 🐳 is upon us.

DeepSeek@deepseek_ai

🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice. Try it now at chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today! 📄 Tech Report: huggingface.co/deepseek-ai/De… 🤗 Open Weights: huggingface.co/collections/de… 1/n

English

111

Aman retweetledi

Michael Y. Li@michaelyli__·22 Nis

Can a language model learn, end-to-end, what to keep in its own KV cache and what to throw away? Can it learn to forget while it learns to reason? Deep learning's central lesson: capability emerges from end-to-end optimization, not heuristics/strong inductive biases. But for efficiency, we rely heavily on hand-designed approaches. 🗑️ Introducing Neural Garbage Collection (NGC): we train a language model to jointly reason and manage its own KV cache, using reinforcement learning with outcome-based task reward alone. No SFT, no proxy objectives, no summarization in natural language. New paper with @jubayer_hamid, Emily Fox, and @noahdgoodman!

English

135

901

159.9K

Aman retweetledi

Rosinality@rosinality·23 Nis

Problem generator in self-play tends to hack the rewards by making non-useful but complex problems. This work incorporates a guide model to pick useful problems by how well it relates to the unsolved problems.

English

266

15.7K

Aman retweetledi

Percy Liang@percyliang·19 May

What would truly open-source AI look like? Not just open weights, open code/data, but *open development*, where the entire research and development process is public *and* anyone can contribute. We built Marin, an open lab, to fulfill this vision:

English

224

1.2K

199.6K

Aman retweetledi

Mihir Prabhudesai@mihirp98·16 Nis

What if AI learned physics the way Newton did – by experiencing it? We built Sim2Reason: train LLMs inside virtual worlds governed by real physics laws, zero human annotation. Result: +5–10% improvement on International Physics Olympiad, zero-shot. 🧵

English

214

1.6K

192.9K

Aman retweetledi

Jean Kaddour @ ICLR 2026@jeankaddour·16 Nis

Introducing Target Policy Optimization (TPO): TPO turns GRPO into supervised learning: build a target distribution over sampled completions, then fit with cross-entropy. The gradient vanishes once the target is matched, making multi-epoch training smooth. 🧵(1/4)

English

494

37.1K

Keşfet

@ben_sturgeon @retr0sushi_ @encapsulated007 @carlagriffs @reinerpope @AlecRad @status_effects @learning_mech