
Valery Sibikovsky
2.4K posts

Valery Sibikovsky
@combdn
Human interface designer. Learning to build stuff. Believe that technology can make us better humans.









I like and bookmark so many interesting sounding papers here, and don’t get back to most of them. Time to start making a dent. I’m going to try to at least skim one of the papers in my bookmarks each weekday for the rest of the month. #PaperADay 2025: Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning (Google) I like their statement of the hierarchical goal problem as “how long does it take a twitching hand to win a game of chess?” @RichardSSutton is fond of the “options” framework in RL, but we don’t have a clear method to learn them from scratch. Their Ant environment is designed to require two levels of planning: the standard mujoco Ant locomotion work to be able to move at all, and routing decisions to get to the colored squares in the correct order, which will happen hundreds of frames apart. Basically, this takes a pre-trained sequence predicting model that predicts what separately trained expert models (manually steered) do, and inserts a metacontroller midway through it, which can tweak the residual values to perform high level “steering”, and can be RL’d at high level switch points to much greater performance than the base pre-trained model. A key claim here is that learning to predict actions in a supervised next-token manner from lots of existing expert examples, even if you don’t know the goals, results in inferring useful higher level goals. This sounds plausible, but their experiment makes it rather easy for the model: the expert RL models that generated the training data were explicitly given one of four goals in each segment, and the option learning model just classifies the sequences into one of four categories. This is a vastly simpler problem than free form option discovery. A State Space Model is used for the more complex Ant environments, while a transformer is used for the simpler grid world environments. I didn’t see an explanation for the change. The internal “walls” are more like “poison tiles”, since they don’t block movement like the map edges, they just kill the ant when its center passes into them. The 3D renderings (with shadow errors that hurt my gamedev eyes) are somewhat misleading, since it is really a 2D world that the agent gets to fully observe in a low dimensional one-hot format. It doesn’t do any kind of partially observed or pixel based sensing. Everything is done with massively parallel environments, avoiding the harder online learning challenges. The success rates still aren’t great after a million episodes. I would like to see this applied to Atari, basically doing GATO with less capable experts or lower episode quantities, then trying to identify free form options that can be usefully used to RL to higher performance.




ADHD is closely linked to circadian rhythm dysfunction. Growing evidence suggests that targeting circadian misalignment can meaningfully improve symptoms. Grateful to Dr. Matt Walker for sharing our new study! frontiersin.org/journals/psych…





New blog post w @pawtrammell: Capital in the 22nd Century Where we argue that while Piketty was wrong about the past, he’s probably right about the future. Piketty argued that without strong redistribution of wealth, inequality will indefinitely increase. Historically, however, income inequality from capital accumulation has actually been self-correcting. Labor and capital are complements, so if you build up lots of capital, you’ll lower its returns and raise wages (since labor now becomes the bottleneck). But once AI/robotics fully substitute for labor, this correction mechanism breaks. For centuries, the share of GDP that goes to paying wages has been 2/3, and the share of GDP that’s been income from owning stuff has been 1/3. With full automation, capital’s share of GDP goes to 100% (since datacenters and solar panels and the robot factories that build all the above plus more robot factories are all “capital”). And inequality among capital holders will also skyrocket - in favor of larger and more sophisticated investors. A lot of AI wealth is being generated in private markets. You can’t get direct exposure to xAI from your 401k, but the Sultan of Oman can. A cheap house (the main form of wealth for many Americans) is a form of capital almost uniquely ill-suited to taking advantage of a leap in automation: it plays no part in the production, operation, or transportation of computers, robots, data, or energy. Also, international catch-up growth may end. Poor countries historically grew faster by combining their cheap labor with imported capital/know-how. Without labor as a bottleneck, their main value-add disappears. Inequality seems especially hard to justify in this world. So if we don’t want inequality to just keep increasing forever - with the descendants of the most patient and sophisticated of today’s AI investors controlling all the galaxies - what can we do? The obvious place to start is with Piketty’s headline recommendation: highly and progressively tax wealth. This might discourage saving, but it would no longer penalize those who have earned a lot by their hard work and creativity. The wealth - even the investment decisions - will be made by the robots, and they will work just as hard and smart however much we tax their owners. But taxing capital is pointless if people can just shift their future investment to lower tax countries. And since capital stocks could grow really fast (robots building robots and all that), pretty soon tax havens go from marginal outposts to the majority of global GDP. But how do you get global coordination on taxing capital, when the benefits to defecting are so high and so accessible? Full automation will probably lead to ever-increasing inequality. We don’t see an obvious solution to this problem. And we think it’s weird how little thought has gone into what to do about it. Many more thoughts from re-reading Piketty with our AGI hats on at the post in the link below.



















