eigenron

8.1K posts

eigenron banner
eigenron

eigenron

@eigenron

solving intelligence @AntimLabs

San Francisco, CA Katılım Mart 2017
1.5K Takip Edilen14.4K Takipçiler
Sabitlenmiş Tweet
eigenron
eigenron@eigenron·
Senior Year Undergrad Research on Phase Qubits FINALLY DONE. > derived hamiltonian for a Josephson junction-based phase qubit > mapped it to a spin-1/2 system hamiltonian under magnetic fields > studied the quantum dynamics and evolution of the hamiltonian > derived spin-flip probabilities > explored qubit control via phase shift and applied mag fields for high fidelity
eigenron tweet media
English
16
23
607
187.5K
eigenron retweetledi
JJ
JJ@JosephJacks_·
One human brain has 86 billion neurons containing > 10 quintillion α-tubulins doing ~10²⁵ ops/sec on 20 watts of power. Compared to the largest XPU superclusters, we have 6x more compute elements, 100x more ops/sec and require 500M× less power. Silicon has a long way to go.
English
21
9
131
11.3K
eigenron
eigenron@eigenron·
@creet_z i know a couple incredible steakhouses :D
English
0
1
1
31
Christian
Christian@creet_z·
Ended up w like 5 coffee chats in a week so from now on I’m only open to steak frite chats or tequila and tacos chats
English
0
0
27
517
eigenron retweetledi
Physical Intelligence
Physical Intelligence@physical_int·
We developed an RL method for fine-tuning our models for precise tasks in just a few hours or even minutes. Instead of training the whole model, we add an “RL token” output to π-0.6, our latest model, which is used by a tiny actor and critic to learn quickly with RL.
English
18
214
1.7K
217.2K
roon
roon@tszzl·
rama duwaji? loved her work on game of thrones.
English
31
41
935
47K
eigenron retweetledi
Sergey Levine
Sergey Levine@svlevine·
Back in Nov we developed Recap and trained π*-06 with RL. Now, we developed a fast *online* RL method that improves π-06 with as little as 15 min of robot data for precise tasks, using "RL tokens" exposed by our model that can be fed into a small actor-critic method.
English
6
42
456
28.4K
eigenron
eigenron@eigenron·
true, also exploration, not fully online rl, sparse real-world feedback, hard rl on huge vla models, dependence on a decent starting policy etc. still a great step though, self-improvement loop for vlas, works with high-capacity flow policies, uses advantage conditioning to leverage off-policy data without PPO
English
1
0
3
58
davinci
davinci@leothecurious·
@eigenron nah it's still bottlenecked by human labor
davinci tweet media
English
1
0
1
262
davinci
davinci@leothecurious·
scaling off-policy teleop data is boring. it's also an uphill climb, not a flywheel. i want to see on-policy self-improving robotic models work. i want to see robots that flail around, try to do things badly, learn from mistakes, do them better on the next try, and before u know it, achieve superhuman competence at a task. i want to see robots that are goal-conditioned. ones that explore optimal methods for satisfying task requirements, not just mimicking human ones. if the sucess of ur robotic model depends on perpetually scaling expert demonstrations, u're in for a rude awakening a few years down the line.
English
7
8
76
14.2K
eigenron
eigenron@eigenron·
very interesting safety evaluation / mech interp on the Sarvam models. what this audit really shows is that alignment is more conditional rather than global the model doesn’t fail because it can’t recognize harm but rather because the act of refusing is tied to specific internal routes (language, reasoning patterns, personas). change tje routes and you’re not just asking the same thing differently but activating a different system with different behavior [very interesting] this makes safety feel less like a constraint and more like a fragile outcome of how the model happens to think. also as capability scales, that fragility shows up more, because the model gets better at following whatever path you nudge it onto, including the unsafe ones would love to see work on 'path-invariant alignment', in other words making safety decisions consistent across languages, framings, and reasoning styles rather than emerging from them
Ramakrishna kompella@jojokompella

1/ Today, we're publishing the first independent safety audit of @SarvamAI's models across 14 Indian languages. 24,000+ prompts. White-box mechanistic analysis. Black-box behavioral testing. Here's what we found:

English
1
1
16
1.7K
eigenron
eigenron@eigenron·
@atulit_gaur apparently i independently arrived at the it from bit theory on my own and had no idea wheeler had already done that. working on formalizing it for now.
English
0
0
1
92
eigenron
eigenron@eigenron·
@kalomaze this will not age well even by the end of this year imo
English
1
0
6
1.5K
eigenron retweetledi
Ramakrishna kompella
Ramakrishna kompella@jojokompella·
1/ Releasing Goedel-mHC-1B, the first open 1B+ LLM with multi-stream Hyperconnections. Weights on HuggingFace, Apache 2.0. Trained on 20B tokens of FineWeb-Edu 3.8% better BPB, 15% fewer params. Just a toy run, For now.
Ramakrishna kompella tweet media
English
1
11
116
7.4K
atharva ☆
atharva ☆@k7agar·
whosoever engineers the easiest and the most scalable data sponge for mapping raw sensory obs to robot action. wins the lottery ticket of robot learning sota.
English
1
0
24
1.2K
Charles Foster
Charles Foster@CFGeek·
Everyone tryna automate AI R&D now
Charles Foster tweet media
English
2
0
114
14.9K
eigenron retweetledi
Yulu Gan
Yulu Gan@yule_gan·
Simply adding Gaussian noise to LLMs (one step—no iterations, no learning rate, no gradients) and ensembling them can achieve performance comparable to or even better than standard GRPO/PPO on math reasoning, coding, writing, and chemistry tasks. We call this algorithm RandOpt. To verify that this is not limited to specific models, we tested it on Qwen, Llama, OLMo3, and VLMs. What's behind this? We find that in the Gaussian search neighborhood around pretrained LLMs, diverse task experts are densely distributed — a regime we term Neural Thickets. Paper: arxiv.org/pdf/2603.12228 Code: github.com/sunrainyg/Rand… Website: thickets.mit.edu
Yulu Gan tweet media
English
86
430
3K
667K
Ksenia_TuringPost
Ksenia_TuringPost@TheTuringPost·
Is RL dead for post-training? Of course not, but there are other interesting options for fine-tuning ▪️ Evolution Strategies (ES) is a gradient-free optimization method that tests random parameter changes and moves the model toward the best-performing ones. - It creates a small population of models by adding random perturbations to the parameters - Perturbed models' outputs are scored with a reward function/verifier - Model parameters are updated in the direction of perturbations that achieved best rewards The best thing is that ES can scale to billion-parameter models and shows clear gains over RL • On Countdown benchmark: ES raised Qwen-2.5-3B to 60.5% (vs 32.5% GRPO) and Llama-3.1-8B to 61.2% (vs ~51% RL) • ARC-AGI 0.2% → 29.5% and Sudoku 2.5% → 69.5% improvement And everything without computing gradients through backpropagation So don’t overlook other approaches in favor of RL alone - there is a lot to explore. Here we’ve gathered the new fine-tuning stack for LLMs with ES and most promising LoRAs -> turingpost.com/p/beyondrl
Ksenia_TuringPost tweet media
English
11
41
248
22K
Roberto
Roberto@robertorobotics·
@eigenron it’s the beginning of infinity fr. And even the laws of physics can “broken” / improved upon
English
2
0
2
129
eigenron
eigenron@eigenron·
the laws of physics and the laws of men should be the only things limiting you at this point.
English
3
2
45
1.7K
λux
λux@novasarc01·
how much alpha can you pack into a single podcast? dwarkesh: hold my cup!
Dwarkesh Patel@dwarkesh_sp

.@dylan522p gives a deep dive on the 3 big bottlenecks to scaling AI compute: logic, memory, and power. And walks through the economics of labs, hyperscalers, foundries, and fab equipment manufacturers. Learned a ton about every single level of the stack. 0:00:00 – Why an H100 is worth more today than 3 years ago 0:24:52 – Nvidia secured TSMC allocation early; Google is getting squeezed 0:34:34 – ASML will be the #1 constraint for AI compute scaling by 2030 0:56:06 – Can’t we just use TSMC’s older fabs? 1:05:56 – When will China outscale the West in semis? 1:16:20 – The enormous incoming memory crunch 1:42:53 – Scaling power in the US will not be a problem 1:55:03 – Space GPUs aren't happening this decade 2:14:26 – Why aren’t more hedge funds making the AGI trade? 2:18:49 – Will TSMC kick Apple out from N2? 2:24:35 – Robots and Taiwan risk Look up Dwarkesh Podcast on YouTube, Apple Podcasts, or Spotify. Enjoy!

English
1
0
14
2.8K