Antonio J. Dominguez

4.9K posts

Antonio J. Dominguez

Antonio J. Dominguez

@antferdom

Efficient AI @verdacloud, ML sys. Inference. Programming language theory. PhD LLM & efficient AI @unisevilla

Los Palacios y Villafranca, Es Katılım Ekim 2016
2.7K Takip Edilen579 Takipçiler
Antonio J. Dominguez retweetledi
Horace He
Horace He@cHHillee·
In modern ML accelerators, FLOPS have absolutely exploded. Often though, the bottleneck is not FLOPS but memory bandwidth. Similarly, model intelligence has exploded, causing the bottleneck to be human<->AI bandwidth. At Thinky, we think that it’s important to solve this. 1/4
Horace He tweet media
Thinking Machines@thinkymachines

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. thinkingmachines.ai/blog/interacti…

English
20
57
754
92.4K
Antonio J. Dominguez retweetledi
David Turturean
David Turturean@DavidTurturean·
I had never thought I would say this! But here I go: I solved my first Erdős problem! I did so using ChatGPT-5.5-Pro. 🧵1/n
David Turturean tweet media
English
23
62
676
65K
Antonio J. Dominguez retweetledi
mike64_t
mike64_t@mike64_t·
I think at this point I can confidently say there exists a repeatable procedure to train an IDM to recover input and physics gamestate for essentially any game so long as there is visual reflection of that input on screen with just single digit hours of gameplay data. These models not only know how fast the player is going at all times, they can estimate acceleration and thus have a model of gravity* (perhaps the more precise statement is that they have a model of "how gravity is in effect" because you just recover the parabola that's arguably already "there" in the data). And because individual wheel-speed just another layer of inverse dynamics behind motion, it turns out you can also recover that somewhat accurately. Now, under heavy conditions of understeer, you somewhat expectedly can't recover the performed input anymore because your car is going straight despite your intended input, but this failure mode is essentially what you would expect from any IDM setting because its precisely the irrecoverability boundary. If I'm not imaging things, the prediction still *looks* like something close to the remaining grip limit. (Although it also doesn't matter because the car stopped being responsive to your input anyways and any exit from understeering has an observable retro-causal effect...) Also, because rear-wheel speed serves as the intermediate variable to get "handbreak-active" correct, it actually also learns this sparse label somewhat accurately too. This seems to be a repeatable trick. Don't predict the sparse label alone, additionally predict the variable from which the sparse label follows. (i.e jump follows from motion y in mc, etc.) And all of this is just a laughably simple conv effectively computing noisy localized motion vectors processed by an LSTM. It also appears the aperture problem is a complete non-issue in this setting. Now, I don't believe any of this is specific to Minecraft or CMR2, I would argue most other games have less inertia-heavy player controllers, so if anything the unmixing objective could be easier. The good thing about games is that you can essentially dump any value so long as you have reason to believe its presence as a supervision target has synergistic effects.
mike64_t tweet media
English
4
10
173
20K
Antonio J. Dominguez retweetledi
Probability and Statistics
One theorem every ML engineer should know: The Johnson–Lindenstrauss Lemma. It states that high-dimensional data can be projected into a much lower-dimensional space while approximately preserving pairwise distances. Why it matters: • Explains why random projections work • Enables scalable learning in high dimensions • Used in embeddings, compressed learning, and ANN search • Helps fight the curse of dimensionality The surprising part: You can reduce dimensions dramatically without destroying the geometry of the data. That’s why many ML systems can operate efficiently even with massive feature spaces. Modern representation learning is deeply connected to this idea: Good embeddings preserve structure while compressing information. In ML, compression is often not loss of intelligence — it’s removal of redundancy.
Probability and Statistics tweet media
English
18
235
1.8K
127.8K
Antonio J. Dominguez retweetledi
Jiayi Weng
Jiayi Weng@Trinkle23897·
Codex iterated a pure NumPy + cv2 closed-loop heuristic policy for VizDoom D3 Battle. No neural network training, no map, no object coordinates, no seed-specific routes. Just screen pixels plus public game variables, roughly the same signals a human player gets. It works surprisingly well. Notes and videos are now in the blog: #en-vizdoom" target="_blank" rel="nofollow noopener">trinkle23897.github.io/learning-beyon…
Jiayi Weng@Trinkle23897

Codex grew programmatic policies with no neural nets: max score on Breakout, and SOTA-level scores on MuJoCo. Maybe heuristics were not too weak. Maybe they were just too expensive to maintain. Maybe it's the next paradigm. trinkle23897.github.io/learning-beyon…

English
23
66
537
162.6K
Antonio J. Dominguez retweetledi
Pushmeet Kohli
Pushmeet Kohli@pushmeet·
The future of Math is mathematicians and AI agents working together. Very pleased to introduce @GoogleDeepMind's AI co-mathematician: a multi-agent system designed to actively collaborate with human experts on open-ended research mathematics. Mathematicians testing the agent across areas as diverse as group theory, Hamiltonian systems, and algebraic combinatorics have reported impressive results. In autonomous mode evaluation on the rigorous FrontierMath Tier 4 problems, AI co-mathematician scored an unprecedented 48% — a new high score among all AI systems evaluated.
Pushmeet Kohli tweet media
English
166
362
2.6K
297.3K
Antonio J. Dominguez retweetledi
Jiayi Weng
Jiayi Weng@Trinkle23897·
Codex grew programmatic policies with no neural nets: max score on Breakout, and SOTA-level scores on MuJoCo. Maybe heuristics were not too weak. Maybe they were just too expensive to maintain. Maybe it's the next paradigm. trinkle23897.github.io/learning-beyon…
English
57
228
1.4K
571.5K
Antonio J. Dominguez retweetledi
the tiny corp
the tiny corp@__tinygrad__·
The tinygrad spec is now merged in tinygrad/spec. Unlike every other ML compiler, all optimization is done in this IR all the way up to instruction selection.
the tiny corp tweet mediathe tiny corp tweet media
English
9
29
464
29.4K
Antonio J. Dominguez retweetledi
Yiping Wang
Yiping Wang@ypwang61·
We improve a 32-year lower bound in a challenging open problem, Ramsey numbers, through simply scaling autoresearch. ⭕ Proves R(3,17) >= 93. Previous 92 bound were obtained in 1994. Google’s AlphaEvolve (2026) matched previous result but did not beat it. All could be done with Claude Code / Codex + a CPU server. Graphs and evolving history are available at github.com/ypwang61/Scale… [1/n]
English
10
47
315
50.1K
Antonio J. Dominguez retweetledi
comma
comma@comma_ai·
We ran a month long challenge to compress a 37 MB driving video with the constraint that two frozen neural networks (a SegNet and a PoseNet) must produce similar outputs on the compressed video as they would on the original. (We heard you guys like these @karpathy style figures)
comma tweet media
English
8
13
467
29.9K
Antonio J. Dominguez retweetledi
Pranjal
Pranjal@pranjalssh·
This is a very common misconception. Biggest wins are overlapping. If you have 2 memory bound kernels back to back reading and writing 1GB, you pay 4GB of HBM cost. If you overlap them while streaming intermediate state of 128MB(fits in L2 cache) - you only pay 2GB HBM cost. For communication overlapping, benefits are even higher.
henry tsang@henrylhtsang

so how much perf gain does megakernel give, if the baseline already uses cuda graphs

English
10
8
144
21.8K
Antonio J. Dominguez retweetledi
Zyphra
Zyphra@ZyphraAI·
Today we're releasing ZAYA1-8B, a reasoning MoE trained on @AMD and optimized for intelligence density. With <1B active params, it outperforms open-weight models many times its size on math and reasoning, closing in on DeepSeek-V3.2 and GPT-5-High with test-time compute. 🧵
Zyphra tweet media
English
101
295
2.5K
1.2M
RadixArk
RadixArk@radixark·
Today, we are thrilled to officially launch RadixArk with $100M in Seed funding at a $400M valuation. The round was led by @Accel and co-led by @sparkcapital. RadixArk exists to make frontier AI infrastructure open and accessible to everyone. Today, the systems behind the most capable AI models are concentrated in a small number of companies. As a result, most AI teams are forced to rebuild training and inference stacks from scratch, duplicating the same infrastructure work instead of focusing on new models, products, and ideas. RadixArk was founded to change that. We are building an AI platform that makes it easier for teams to train and serve the best models at scale. RadixArk comes from the open-source community. We started with SGLang, where many of us are core developers and maintainers, and expanded our work to Miles for large-scale RL and post-training. We will continue contributing to both projects and working with the community to make them the strongest open-source infrastructure foundations for frontier AI. We would like to thank our long-term partners, contributors, and the broader SGLang community for believing in this mission. We're also grateful to @Accel and @sparkcapital, NVentures (Venture capital arm of @nvidia), Salience Capital, A&E Investment, @HOFCapital, @walden_catalyst, @AMD, LDVP, WTT Fubon Family, @MediaTek, Vocal Ventures, @Sky9Capital and our angel investors @ibab, @LipBuTan1, Hock Tan, @johnschulman2, @soumithchintala, @lilianweng, @oliveur, @Thom_Wolf, @LiamFedus, @robertnishihara, @ericzelikman, @OfficialLoganK, and @multiply_matrix among others. Thanks for the exclusive interview with @MeghanBobrowsky at @WSJ about our vision.
RadixArk tweet media
English
83
100
626
340.2K
Yassine Yousfi
Yassine Yousfi@yassineyousfi_·
ok found the magic weight decay value, y'all doubted me😭
Yassine Yousfi tweet media
English
1
0
9
406
Yassine Yousfi
Yassine Yousfi@yassineyousfi_·
muon... you flew too close to the sun
Yassine Yousfi tweet media
English
4
0
69
7.1K
Antonio J. Dominguez retweetledi
Harald Schäfer
Harald Schäfer@___Harald___·
At any given time openpilot is controlling about 150 cars on real roads. When I joined comma 9 years ago it was 0 most times of the day. I'm excited to see how many machines openpilot will control in another 9 years! Here's 42 cases of openpilot driving at exactly noon May 1st:
Harald Schäfer tweet media
English
9
3
184
9.2K
Antonio J. Dominguez retweetledi
Anne Ouyang
Anne Ouyang@anneouyang·
TIL the Huawei Ascend linear algebra kernels library is called "CATLASS" 🐱
Anne Ouyang tweet media
English
8
11
150
12.9K
Antonio J. Dominguez retweetledi
apaz
apaz@apaz_cli·
In the past few days I've learned a whole lot about dueling bandits, active learning, and optimal experimental design (statistics). Planning in belief-space. It's relevant to autoresearch and idea generation RL. It lets you save a lot of judge model compute.
apaz tweet media
English
5
4
44
2.9K
Antonio J. Dominguez retweetledi
Antonio J. Dominguez retweetledi
Taelin
Taelin@VictorTaelin·
GPT 5.5 + Codex with /goal the first setup that turned my work 100% administrative, and I'm getting good at it. There are two outcomes: → A: my plan is good → GPT nails it → B: my plan is bad → GPT tries anyway → it fails but minimizes it And this is the catch: once you're used to detecting the apologetic "everything is fine" tone of case B, you can just revert and pivot to another idea. In particular, it seems effective to ask: "Do you honestly think this is production ready?" If there is anything smelly, GPT 5.5 will say so, allowing you to evolve towards a better design. Concrete example: > Asked it to implement an in-kernel GC for Bend2. > It did exactly I asked, and declared success. > Code grew too much, so I suspected and asked: > "Would you honestly merge that?" > It then replied in the lines of: > "Ehh... actually, this kinda sucks due to " I then investigated and, yes, my idea was bad. Doing GC at that point requires us to track roots coming from many places, which is a massive, fragile endeavor. It is just a bad approach. So, I reverted it, chatted a bit more, then choose another path: use linearity information (which we already have) to free pattern-matched constructors, with small freelists. A few minutes later, it came back with a working implementation that completely solved the issue. I asked again: > "Would you honestly merge that?" To which it replied (verbatim): > Yes, I’m confident enough to call it production-ready for merge, subject to normal review. Not “impossible to break,” but the specific requested invariant is implemented, audited, tested, and bench-checked. When GPT uses that kind of tone, it is very, very likely that whatever it was done is actually solid, robust and trustworthy, and it knows it. I still inspected the code manually and, indeed, it is solid. Doesn't mean fail-proof, but certainly *much* better than before, and mergeable. That's what I like the most about this model: if your plan is solid, it will almost certainly succeed at implementing it. It only fails if the plan itself is poor, which will cause it to desperately attempt to fit a circle in a square, fail, and be too embarrassed to admit. If you master how to detect which case happened, you can get it to produce a lot of constructive, not destructive, outputs...
English
51
55
1.1K
62.5K