Antonio J. Dominguez

4.9K posts

Antonio J. Dominguez

@antferdom

Efficient AI @verdacloud, ML sys. Inference. Programming language theory. PhD LLM & efficient AI @unisevilla

Los Palacios y Villafranca, Es Katılım Ekim 2016

2.7K Takip Edilen579 Takipçiler

Antonio J. Dominguez retweetledi

Horace He@cHHillee·1d

In modern ML accelerators, FLOPS have absolutely exploded. Often though, the bottleneck is not FLOPS but memory bandwidth. Similarly, model intelligence has exploded, causing the bottleneck to be human<->AI bandwidth. At Thinky, we think that it’s important to solve this. 1/4

Thinking Machines@thinkymachines

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. thinkingmachines.ai/blog/interacti…

English

754

92.4K

Antonio J. Dominguez retweetledi

David Turturean@DavidTurturean·1d

I had never thought I would say this! But here I go: I solved my first Erdős problem! I did so using ChatGPT-5.5-Pro. 🧵1/n

English

676

65K

Antonio J. Dominguez retweetledi

mike64_t@mike64_t·1d

I think at this point I can confidently say there exists a repeatable procedure to train an IDM to recover input and physics gamestate for essentially any game so long as there is visual reflection of that input on screen with just single digit hours of gameplay data. These models not only know how fast the player is going at all times, they can estimate acceleration and thus have a model of gravity* (perhaps the more precise statement is that they have a model of "how gravity is in effect" because you just recover the parabola that's arguably already "there" in the data). And because individual wheel-speed just another layer of inverse dynamics behind motion, it turns out you can also recover that somewhat accurately. Now, under heavy conditions of understeer, you somewhat expectedly can't recover the performed input anymore because your car is going straight despite your intended input, but this failure mode is essentially what you would expect from any IDM setting because its precisely the irrecoverability boundary. If I'm not imaging things, the prediction still *looks* like something close to the remaining grip limit. (Although it also doesn't matter because the car stopped being responsive to your input anyways and any exit from understeering has an observable retro-causal effect...) Also, because rear-wheel speed serves as the intermediate variable to get "handbreak-active" correct, it actually also learns this sparse label somewhat accurately too. This seems to be a repeatable trick. Don't predict the sparse label alone, additionally predict the variable from which the sparse label follows. (i.e jump follows from motion y in mc, etc.) And all of this is just a laughably simple conv effectively computing noisy localized motion vectors processed by an LSTM. It also appears the aperture problem is a complete non-issue in this setting. Now, I don't believe any of this is specific to Minecraft or CMR2, I would argue most other games have less inertia-heavy player controllers, so if anything the unmixing objective could be easier. The good thing about games is that you can essentially dump any value so long as you have reason to believe its presence as a supervision target has synergistic effects.

English

173

20K

Antonio J. Dominguez retweetledi

wh@nrehiew_·2d

x.com/i/article/2053…

ZXX

673

234.2K

Antonio J. Dominguez retweetledi

Probability and Statistics@probnstat·3d

One theorem every ML engineer should know: The Johnson–Lindenstrauss Lemma. It states that high-dimensional data can be projected into a much lower-dimensional space while approximately preserving pairwise distances. Why it matters: • Explains why random projections work • Enables scalable learning in high dimensions • Used in embeddings, compressed learning, and ANN search • Helps fight the curse of dimensionality The surprising part: You can reduce dimensions dramatically without destroying the geometry of the data. That’s why many ML systems can operate efficiently even with massive feature spaces. Modern representation learning is deeply connected to this idea: Good embeddings preserve structure while compressing information. In ML, compression is often not loss of intelligence — it’s removal of redundancy.

English

235

1.8K

127.8K

Antonio J. Dominguez retweetledi

Jiayi Weng@Trinkle23897·2d

Codex iterated a pure NumPy + cv2 closed-loop heuristic policy for VizDoom D3 Battle. No neural network training, no map, no object coordinates, no seed-specific routes. Just screen pixels plus public game variables, roughly the same signals a human player gets. It works surprisingly well. Notes and videos are now in the blog: #en-vizdoom" target="_blank" rel="nofollow noopener">trinkle23897.github.io/learning-beyon…

Jiayi Weng@Trinkle23897

Codex grew programmatic policies with no neural nets: max score on Breakout, and SOTA-level scores on MuJoCo. Maybe heuristics were not too weak. Maybe they were just too expensive to maintain. Maybe it's the next paradigm. trinkle23897.github.io/learning-beyon…

English

537

162.6K

Antonio J. Dominguez retweetledi

Pushmeet Kohli@pushmeet·4d

The future of Math is mathematicians and AI agents working together. Very pleased to introduce @GoogleDeepMind's AI co-mathematician: a multi-agent system designed to actively collaborate with human experts on open-ended research mathematics. Mathematicians testing the agent across areas as diverse as group theory, Hamiltonian systems, and algebraic combinatorics have reported impressive results. In autonomous mode evaluation on the rigorous FrontierMath Tier 4 problems, AI co-mathematician scored an unprecedented 48% — a new high score among all AI systems evaluated.

English

166

362

2.6K

297.3K

Antonio J. Dominguez retweetledi

Jiayi Weng@Trinkle23897·4d

English

228

1.4K

571.5K

Antonio J. Dominguez retweetledi

the tiny corp@__tinygrad__·4d

The tinygrad spec is now merged in tinygrad/spec. Unlike every other ML compiler, all optimization is done in this IR all the way up to instruction selection.

English

464

29.4K

Antonio J. Dominguez retweetledi

Yiping Wang@ypwang61·4d

We improve a 32-year lower bound in a challenging open problem, Ramsey numbers, through simply scaling autoresearch. ⭕ Proves R(3,17) >= 93. Previous 92 bound were obtained in 1994. Google’s AlphaEvolve (2026) matched previous result but did not beat it. All could be done with Claude Code / Codex + a CPU server. Graphs and evolving history are available at github.com/ypwang61/Scale… [1/n]

English

315

50.1K

Antonio J. Dominguez retweetledi

comma@comma_ai·5d

We ran a month long challenge to compress a 37 MB driving video with the constraint that two frozen neural networks (a SegNet and a PoseNet) must produce similar outputs on the compressed video as they would on the original. (We heard you guys like these @karpathy style figures)

English

467

29.9K

Antonio J. Dominguez retweetledi

Pranjal@pranjalssh·5d

This is a very common misconception. Biggest wins are overlapping. If you have 2 memory bound kernels back to back reading and writing 1GB, you pay 4GB of HBM cost. If you overlap them while streaming intermediate state of 128MB(fits in L2 cache) - you only pay 2GB HBM cost. For communication overlapping, benefits are even higher.

henry tsang@henrylhtsang

so how much perf gain does megakernel give, if the baseline already uses cuda graphs

English

144

21.8K

Antonio J. Dominguez retweetledi

Zyphra@ZyphraAI·6d

Today we're releasing ZAYA1-8B, a reasoning MoE trained on @AMD and optimized for intelligence density. With <1B active params, it outperforms open-weight models many times its size on math and reasoning, closing in on DeepSeek-V3.2 and GPT-5-High with test-time compute. 🧵

English

101

295

2.5K

1.2M

Antonio J. Dominguez@antferdom·5 May

@radixark @robertnishihara @Accel @sparkcapital Congratulations, truly deserved!!

English

342

RadixArk@radixark·5 May

Today, we are thrilled to officially launch RadixArk with $100M in Seed funding at a $400M valuation. The round was led by @Accel and co-led by @sparkcapital. RadixArk exists to make frontier AI infrastructure open and accessible to everyone. Today, the systems behind the most capable AI models are concentrated in a small number of companies. As a result, most AI teams are forced to rebuild training and inference stacks from scratch, duplicating the same infrastructure work instead of focusing on new models, products, and ideas. RadixArk was founded to change that. We are building an AI platform that makes it easier for teams to train and serve the best models at scale. RadixArk comes from the open-source community. We started with SGLang, where many of us are core developers and maintainers, and expanded our work to Miles for large-scale RL and post-training. We will continue contributing to both projects and working with the community to make them the strongest open-source infrastructure foundations for frontier AI. We would like to thank our long-term partners, contributors, and the broader SGLang community for believing in this mission. We're also grateful to @Accel and @sparkcapital, NVentures (Venture capital arm of @nvidia), Salience Capital, A&E Investment, @HOFCapital, @walden_catalyst, @AMD, LDVP, WTT Fubon Family, @MediaTek, Vocal Ventures, @Sky9Capital and our angel investors @ibab, @LipBuTan1, Hock Tan, @johnschulman2, @soumithchintala, @lilianweng, @oliveur, @Thom_Wolf, @LiamFedus, @robertnishihara, @ericzelikman, @OfficialLoganK, and @multiply_matrix among others. Thanks for the exclusive interview with @MeghanBobrowsky at @WSJ about our vision.

English

100

626

340.2K

Antonio J. Dominguez@antferdom·5 May

@yassineyousfi_ We didn’t!

English

Yassine Yousfi@yassineyousfi_·4 May

ok found the magic weight decay value, y'all doubted me😭

English

406

Yassine Yousfi@yassineyousfi_·4 May

muon... you flew too close to the sun

English

7.1K

Antonio J. Dominguez retweetledi

Harald Schäfer@___Harald___·4 May

At any given time openpilot is controlling about 150 cars on real roads. When I joined comma 9 years ago it was 0 most times of the day. I'm excited to see how many machines openpilot will control in another 9 years! Here's 42 cases of openpilot driving at exactly noon May 1st:

English

184

9.2K

Antonio J. Dominguez retweetledi

Anne Ouyang@anneouyang·4 May

TIL the Huawei Ascend linear algebra kernels library is called "CATLASS" 🐱

English

150

12.9K

Antonio J. Dominguez retweetledi

apaz@apaz_cli·4 May

In the past few days I've learned a whole lot about dueling bandits, active learning, and optimal experimental design (statistics). Planning in belief-space. It's relevant to autoresearch and idea generation RL. It lets you save a lot of judge model compute.

English

2.9K

Antonio J. Dominguez retweetledi

Hieu Pham@hyhieu226·4 May

Big if true. Have we investigated gradients of the Transformer? I did. They are sparse (in certain ways), so exploring them in the low-rank world makes sense.

Bidipta Sarkar@bidiptas13

Introducing 🥚EGGROLL 🥚(Evolution Guided General Optimization via Low-rank Learning)! 🚀 Scaling backprop-free Evolution Strategies (ES) for billion-parameter models at large population sizes ⚡100x Training Throughput 🎯Fast Convergence 🔢Pure Int8 Pretraining of RNN LLMs

English

173

27.1K

Antonio J. Dominguez retweetledi

Taelin@VictorTaelin·4 May

GPT 5.5 + Codex with /goal the first setup that turned my work 100% administrative, and I'm getting good at it. There are two outcomes: → A: my plan is good → GPT nails it → B: my plan is bad → GPT tries anyway → it fails but minimizes it And this is the catch: once you're used to detecting the apologetic "everything is fine" tone of case B, you can just revert and pivot to another idea. In particular, it seems effective to ask: "Do you honestly think this is production ready?" If there is anything smelly, GPT 5.5 will say so, allowing you to evolve towards a better design. Concrete example: > Asked it to implement an in-kernel GC for Bend2. > It did exactly I asked, and declared success. > Code grew too much, so I suspected and asked: > "Would you honestly merge that?" > It then replied in the lines of: > "Ehh... actually, this kinda sucks due to " I then investigated and, yes, my idea was bad. Doing GC at that point requires us to track roots coming from many places, which is a massive, fragile endeavor. It is just a bad approach. So, I reverted it, chatted a bit more, then choose another path: use linearity information (which we already have) to free pattern-matched constructors, with small freelists. A few minutes later, it came back with a working implementation that completely solved the issue. I asked again: > "Would you honestly merge that?" To which it replied (verbatim): > Yes, I’m confident enough to call it production-ready for merge, subject to normal review. Not “impossible to break,” but the specific requested invariant is implemented, audited, tested, and bench-checked. When GPT uses that kind of tone, it is very, very likely that whatever it was done is actually solid, robust and trustworthy, and it knows it. I still inspected the code manually and, indeed, it is solid. Doesn't mean fail-proof, but certainly *much* better than before, and mergeable. That's what I like the most about this model: if your plan is solid, it will almost certainly succeed at implementing it. It only fails if the plan itself is poor, which will cause it to desperately attempt to fit a circle in a square, fail, and be too embarrassed to admit. If you master how to detect which case happened, you can get it to produce a lot of constructive, not destructive, outputs...

English

1.1K

62.5K

Keşfet

@GoogleDeepMind @karpathy @AMD @radixark @robertnishihara @Accel @sparkcapital @nvidia