Jonathan Balloch

2.4K posts

Jonathan Balloch banner
Jonathan Balloch

Jonathan Balloch

@JonathanBalloch

I mostly tweet about #ai, #robots, #science, @packers... Senior SWE at @Anduril | Ph.D. Robotics @GeorgiaTech | M.S Robotics @Penn Thought/opinions are mine

Atlanta, GA Beigetreten Ekim 2012
1.1K Folgt382 Follower
Trae Stephens
Trae Stephens@traestephens·
One of the best Bible passages: "So Peter and the other disciple [John, the author] started for the tomb. Both were running, but the other disciple outran Peter and reached the tomb first." John 20:3-4 Translated: "I cooked you and I want the world to remember that forever."
Trae Stephens tweet media
English
11
5
105
5.8K
Jonathan Balloch retweetet
Nathan Lambert
Nathan Lambert@natolambert·
This looks like a model that's competitive with GPT OSS 120B or similar Qwen3.5 models on intelligence & speed, while coming with tons of open data + training details. Is a huge contribution for the ecosystem. Congrats Nvidia on the Nemotron 3 Super release!
Bryan Catanzaro@ctnzr

Announcing NVIDIA Nemotron 3 Super! 💚120B-12A Hybrid SSM Latent MoE, designed for Blackwell 💚36 on AAIndex v4 💚up to 2.2X faster than GPT-OSS-120B in FP4 💚Open data, open recipe, open weights Models, Tech report, etc. here: research.nvidia.com/labs/nemotron/… And yes, Ultra is coming!

English
9
39
480
44.6K
Jonathan Balloch retweetet
anton
anton@abacaj·
“Make the models cheap to use” “Great, they all forgot how to code” “Now 10x the price”
anton tweet media
English
231
1.6K
27.5K
671.2K
Jonathan Balloch
Jonathan Balloch@JonathanBalloch·
@HasanFaisall @natolambert @burkov OP is saying its useless, nathan is just saying its not. also qwen 3.5-9b matched it not really beat it and is a dope model; more a testament to qwen than a knock against gpt-oss
English
0
0
0
29
BURKOV
BURKOV@burkov·
GPT-OSS-120B is a useless model. Nothing I tried to use it for worked as you would expect from a model of this size. It doesn't respect the constraints, performs poorly when the task description precedes the text the task is supposed to apply to, it stops the generation at random moments without finishing the sentence, and it generates repetitive expressions that don't ever stop. All these are properties of 7B-parameter models of late 2023. IMO, altman released this model to avoid being accused of lying after he made a drunk promise to release a competitive open-weight model.
English
41
3
145
51.8K
Jonathan Balloch
Jonathan Balloch@JonathanBalloch·
@satyajitdas90 @natolambert @burkov not anymore, but not far from it. Nemotron 3 Super for long context, qwen3.5:122b for general VLM reasoning , qwen-next-coder (80B) for coding, but all depends on what you want. For example, the dedicated OCR models (almost all much smaller than any above) are better for OCR
English
1
0
0
88
Nathan Lambert
Nathan Lambert@natolambert·
@burkov skill issue, lots of people LOVE this model ;)
English
7
2
134
11.1K
Jonathan Balloch
Jonathan Balloch@JonathanBalloch·
@RelaxedPop @BrianRoemmele def real. always looks for physics consistency. the sheet getting too flat. the turning of the sheet over perfectly while barely touching it. They are getting good though
English
0
0
0
13
Charles Waters
Charles Waters@RelaxedPop·
@BrianRoemmele Do you think this video is real? I'm looking for AI artifacts, but it's fairly low res and I don't see any, but the decisions the robot is making, such as causally tossing the black piece of clothing aside, don't seem very AI-ish. I'm skeptical. We will get there though.
English
1
0
1
511
Brian Roemmele
Brian Roemmele@BrianRoemmele·
2025: “It will be a decade before Robots can do anything in the home” 2026: “Oh”
English
148
129
1.3K
127.3K
Jonathan Balloch
Jonathan Balloch@JonathanBalloch·
@sdflbb @BrianRoemmele not even possible with teleop. ai generated. anyone who has ever tried to engineer to do a cloth-related task knows this is still far from possible. but we keep getting better every day!
English
0
0
0
30
Jonathan Balloch
Jonathan Balloch@JonathanBalloch·
@karpathy small science nit: this does not seem to account for whether there is a relationship between the improvement methods. You can only know with independence testing and correlation analysis if one method undercut another. build this into autoresearcher for maximum results
English
0
0
0
31
Jonathan Balloch retweetet
Andrej Karpathy
Andrej Karpathy@karpathy·
Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.
Andrej Karpathy tweet media
English
968
2.1K
19.5K
3.6M
Jonathan Balloch
Jonathan Balloch@JonathanBalloch·
@DanutPralea @adxtyahq $5B is GENEROUS. I have heard that compute alone is $8B, other spend is ~$9B, and that doesn't fully cover the data center build outs
English
0
0
3
132
Dani Pralea
Dani Pralea@DanutPralea·
@adxtyahq spending $5B/year to make a product that keeps getting beaten by a company a tenth their size is a bold financial strategy
English
1
0
81
2.6K
aditya
aditya@adxtyahq·
Lowkey feels like OpenAI might be the first AI giant to go bankrupt.
English
128
64
4.1K
88.6K
Jonathan Balloch
Jonathan Balloch@JonathanBalloch·
@tunguz Eh, true but minimax 2.5 passed my smell test. Still currently free on open code
English
0
0
0
72
kache
kache@yacineMTB·
if you know what this is, dm me i will hire you
kache tweet media
English
615
10
916
115.3K
Jonathan Balloch
Jonathan Balloch@JonathanBalloch·
@BoWang87 Do it for ROCm and other competitors that don't have the monopoly and you have got yourself a billion dollar asset
English
0
0
0
117
Bo Wang
Bo Wang@BoWang87·
ByteDance just published something I've been waiting for someone to build: CUDA Agent! It trained a model that writes fast CUDA kernels. Not just correct ones — actually optimized ones. It beats torch.compile by 2× on simple/medium kernels, ~92% on complex ones, and even outperforms Claude Opus 4.5 and Gemini 3 Pro by ~40% on the hardest setting. The key idea is simple but kind of brilliant: CUDA performance isn’t about correctness, it’s about hardware. Warps, memory bandwidth, bank conflicts — the stuff you only see in a profiler. So instead of rewarding “did it compile?”, they reward actual GPU speed. Real profiling numbers. RL trained directly on performance. That’s a big shift. Paper: arxiv.org/abs/2602.24286 Project: cuda-agent.github.io
Bo Wang tweet mediaBo Wang tweet media
English
52
365
2.7K
181.2K
kache
kache@yacineMTB·
i need radio engineers, aerospace engineers, electrical engineers, machine learning researchers. ideally all four at the same time. how the FUCK do i do that? that's like trying to find a unicorn with diamond underwear and a golden horn
English
221
10
788
134.9K