Valery Sibikovsky

2.4K posts

Valery Sibikovsky

Valery Sibikovsky

@combdn

Human interface designer. Learning to build stuff. Believe that technology can make us better humans.

London 가입일 Aralık 2008
739 팔로잉222 팔로워
고정된 트윗
Valery Sibikovsky
Valery Sibikovsky@combdn·
About two years ago (GPT-4o era just begun), I came to the realisation that we’re very close to superintelligence, but I had a few concerns. The first one was: why would you cram words but not use images, video, sound, and so on for training? Another concern was: why don’t you let the model interact with the physical world to train itself? The physical world has an effectively infinite amount of data, and it’s pretty easy to tell whether you succeeded or failed. Recent research suggests that most specialised scientific models converge on the same patterns – basically, representations of physical reality. (Think Tesla’s FSD merged with Grok, evolving in Optimus’ body. But a cockroach-sized robot could be a massive learning platform too. It’s also much safer than human-sized robots, and it’s easier to build an information-rich sandbox for.) Another realisation is that we’re forcing the intuitive mind to do logic, which, as humans, we know is really hard (see: logical fallacies, and the overall state of human rationality). It’s obvious the two modes should be separated. Whenever precise thinking is required, the model should use the appropriate tools – or build the required tools as needed. And it shouldn’t be only Python or JavaScript: it should be able to use all kinds of systems, like Prolog, provers, simulators, and so on. The feedback loop between the two modes is the key. And the last realisation I had is that we’re completely wrong about context. Humans’ working memory is laughable compared to an LLM’s context window, but for some reason we decided it’s still not enough. While the previous points are becoming mainstream (to some extent), this last point has only been touched on recently, as far as I can see. My guess is that current LLMs can outperform humans by a big margin in most text-based tasks if we figure out a good way to work with their context. One of my ideas here is unlimited recursion: allow the model to split the job into tiny pieces and delegate them to other instances of itself. The next instance takes another look and splits again, and this continues until the system reaches an atomic task that can’t be broken down further. My assumption is these leaf tasks will end up primitive and trivial for any SOTA model. The harness managing the whole thing should be something like a deterministic state machine that manages the execution stack and the memory. Since the decisions are made by the model, the harness itself can be quite primitive – something like a cellular automata environment. The reasoning complexity would emerge from very simple rules (though it would obviously take non-trivial effort to nail the rules down to prevent drift and infinite branching). One more idea: context should be managed as a personal knowledge management system, like Roam Research or Tana – infinitely nested nodes with direct and backlinks, and the ability to reference any node in multiple places (like a hardlink in a file system). (A file system with Markdown files might work as a medium.) The model would manage its context by creating an outline where higher-level nodes summarise lower-level nodes, and it would deliberately fold and unfold parts of the tree to focus on what’s relevant for the current task. Basically, this graph would act as both short-term and long-term memory. This feels much closer to how the human brain works with attention. Just nesting might not be enough at some point, so the ultimate solution would be a graph you can start from any node, with tools to query it. The model would create specialised views it needs in the moment. And as long as it has instructions on how to use this graph, it should be able to recover from any state.
English
1
0
1
108
David K 🎹
David K 🎹@DavidKPiano·
@combdn Hmm yeah, how would this look on mobile though? Thinking something that lets you jump between threads easily but still "linear", with the option of having it Reddit-tree-like
David K 🎹 tweet media
English
3
0
2
84
Valery Sibikovsky
@DavidKPiano You can try Mona on the phone. It’s basically one column per screen and back/forward navigation, similar to how the official X client works (both mobile and web). But having some pinned threads should be a nice addition to that.
English
0
0
0
14
River Marchand
River Marchand@Riyvir·
today’s experiment: chimera. a little tool that let’s you realtime morph through a design system matrix generated by 4 reference images. this one was inspired by listening to @jameygannon talk about her process on Dive Club with @ridd_design and How I AI with @clairevo. her approach to choosing moodboards over prompts made me wonder what might be possible if we applied that same approach to generative UI. and after a few dead ends, the idea turned into chimera. the results are very generic at the moment but I might tune it up if people seem interested. the big reminder for me is how much more inspiring a tool feels when you can explore in realtime instead of waiting for results every time you make a change. lots more things to try in that direction.
English
113
201
2.7K
195.9K
Joscha Bach
Joscha Bach@Plinz·
ChatGPT, Gemini, Claude Sonnett, Grok
Joscha Bach tweet mediaJoscha Bach tweet mediaJoscha Bach tweet mediaJoscha Bach tweet media
English
23
0
107
20K
Sam Rose
Sam Rose@samwhoo·
I wrote a lil' tool that extracts the attention matrices out of open models and creates this typing visual, with each token's opacity changing according to its average attention score as the prompt progresses. Dimmer words are considered less important to the model.
English
19
40
756
33.8K
Pedro Duarte
Pedro Duarte@peduarte·
it baffles me that people still use spotlight
English
57
6
254
31.9K
Valery Sibikovsky
Valery Sibikovsky@combdn·
@marcaruel @obsdmd seems to be going in this direction. Every record is a markdown file, and the base itself is a short config with queries, views, filters, etc. Or are you talking about something different?
English
1
0
0
21
Marc-Antoine Ruel
Marc-Antoine Ruel@marcaruel·
I'm -->| |<-- this close to creating a notion knockoff where databases are markdown tables. I can't imagine a naive implementation could be any slower than the official one. Also, offline on the web with PWA.
English
1
0
2
124
Valery Sibikovsky
Valery Sibikovsky@combdn·
@KarelDoostrlnck Self-documentation is cool! About a month ago, I made a self-evolving AI Figma plugin that reflects on what went wrong, what it learned, takes notes, and writes itself new utilities to automate repetitive patterns. It just gets better every time I use it.
English
0
0
1
327
Fayaz Ahmed
Fayaz Ahmed@fayazara·
Introducing Bucketdrop - a tiny S3 client that sits on your menubar Bring your own keys Open source Free
English
106
98
1.9K
252.7K
Valery Sibikovsky
Valery Sibikovsky@combdn·
A great example of abstraction layers. Twitching a hand to move a piece vs deciding on the next move on the chessboard, and an arbitrary number of layers in between. x.com/ID_AA_Carmack/…
John Carmack@ID_AA_Carmack

I like and bookmark so many interesting sounding papers here, and don’t get back to most of them. Time to start making a dent. I’m going to try to at least skim one of the papers in my bookmarks each weekday for the rest of the month. #PaperADay 2025: Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning (Google) I like their statement of the hierarchical goal problem  as “how long does it take a twitching hand to win a game of chess?”  @RichardSSutton is fond of the “options” framework in RL, but we don’t have a clear method to learn them from scratch. Their Ant environment is designed to require two levels of planning: the standard mujoco Ant locomotion work to be able to move at all, and routing decisions to get to the colored squares in the correct order, which will happen hundreds of frames apart. Basically, this takes a pre-trained sequence predicting model that predicts what separately trained expert models (manually steered) do, and inserts a metacontroller midway through it, which can tweak the residual values to perform high level “steering”, and can be RL’d at high level switch points to much greater performance than the base pre-trained model. A key claim here is that learning to predict actions in a supervised next-token manner from lots of existing expert examples, even if you don’t know the goals, results in inferring useful higher level goals. This sounds plausible, but their experiment makes it rather easy for the model: the expert RL models that generated the training data were explicitly given one of four goals in each segment, and the option learning model just classifies the sequences into one of four categories. This is a vastly simpler problem than free form option discovery. A State Space Model is used for the more complex Ant environments, while a transformer is used for the simpler grid world environments. I didn’t see an explanation for the change. The internal “walls” are more like “poison tiles”, since they don’t block movement like the map edges, they just kill the ant when its center passes into them. The 3D renderings (with shadow errors that hurt my gamedev eyes) are somewhat misleading, since it is really a 2D world that the agent gets to fully observe in a low dimensional one-hot format. It doesn’t do any kind of partially observed or pixel based sensing. Everything is done with massively parallel environments, avoiding the harder online learning challenges. The success rates still aren’t great after a million episodes. I would like to see this applied to Atari, basically doing GATO with less capable experts or lower episode quantities, then trying to identify free form options that can be usefully used to RL to higher performance.

English
0
0
1
76
Valery Sibikovsky
Valery Sibikovsky@combdn·
About two years ago (GPT-4o era just begun), I came to the realisation that we’re very close to superintelligence, but I had a few concerns. The first one was: why would you cram words but not use images, video, sound, and so on for training? Another concern was: why don’t you let the model interact with the physical world to train itself? The physical world has an effectively infinite amount of data, and it’s pretty easy to tell whether you succeeded or failed. Recent research suggests that most specialised scientific models converge on the same patterns – basically, representations of physical reality. (Think Tesla’s FSD merged with Grok, evolving in Optimus’ body. But a cockroach-sized robot could be a massive learning platform too. It’s also much safer than human-sized robots, and it’s easier to build an information-rich sandbox for.) Another realisation is that we’re forcing the intuitive mind to do logic, which, as humans, we know is really hard (see: logical fallacies, and the overall state of human rationality). It’s obvious the two modes should be separated. Whenever precise thinking is required, the model should use the appropriate tools – or build the required tools as needed. And it shouldn’t be only Python or JavaScript: it should be able to use all kinds of systems, like Prolog, provers, simulators, and so on. The feedback loop between the two modes is the key. And the last realisation I had is that we’re completely wrong about context. Humans’ working memory is laughable compared to an LLM’s context window, but for some reason we decided it’s still not enough. While the previous points are becoming mainstream (to some extent), this last point has only been touched on recently, as far as I can see. My guess is that current LLMs can outperform humans by a big margin in most text-based tasks if we figure out a good way to work with their context. One of my ideas here is unlimited recursion: allow the model to split the job into tiny pieces and delegate them to other instances of itself. The next instance takes another look and splits again, and this continues until the system reaches an atomic task that can’t be broken down further. My assumption is these leaf tasks will end up primitive and trivial for any SOTA model. The harness managing the whole thing should be something like a deterministic state machine that manages the execution stack and the memory. Since the decisions are made by the model, the harness itself can be quite primitive – something like a cellular automata environment. The reasoning complexity would emerge from very simple rules (though it would obviously take non-trivial effort to nail the rules down to prevent drift and infinite branching). One more idea: context should be managed as a personal knowledge management system, like Roam Research or Tana – infinitely nested nodes with direct and backlinks, and the ability to reference any node in multiple places (like a hardlink in a file system). (A file system with Markdown files might work as a medium.) The model would manage its context by creating an outline where higher-level nodes summarise lower-level nodes, and it would deliberately fold and unfold parts of the tree to focus on what’s relevant for the current task. Basically, this graph would act as both short-term and long-term memory. This feels much closer to how the human brain works with attention. Just nesting might not be enough at some point, so the ultimate solution would be a graph you can start from any node, with tools to query it. The model would create specialised views it needs in the moment. And as long as it has instructions on how to use this graph, it should be able to recover from any state.
English
1
0
1
108
Valery Sibikovsky
Valery Sibikovsky@combdn·
@IterIntellectus I grew up in the USSR in the 80s and didn’t even have a landline, but still couldn’t write a sentence without making a couple of silly mistakes I had to fix a few seconds later (with a fountain pen on paper under an incandescent light bulb).
English
0
0
0
71
vittorio
vittorio@IterIntellectus·
for the same reason why women's menstrual cycles have gone out of sync with the moon, much of circadian disruption (ADHD) is due to LED lights and screens in the evening. lots of kids, and adults, wouldn't need amphetamines to focus for 5 minutes if they stopped using screens after sunset and we went back to incandescent light bulbs but for some reason, kids absolutely can't live without ipads (lazy parents), and incandescent light bulbs cannot be made or sold (big LED lobbied the Energy Independence and Security Act of 2007)
vittorio tweet media
Brandon Luu, MD@BrandonLuuMD

ADHD is closely linked to circadian rhythm dysfunction. Growing evidence suggests that targeting circadian misalignment can meaningfully improve symptoms. Grateful to Dr. Matt Walker for sharing our new study! frontiersin.org/journals/psych…

English
21
40
411
26.9K
Valery Sibikovsky
Valery Sibikovsky@combdn·
@connordavis_ai Sorry, but this paper is actually about context management, not iterative refining. The ‘recursion’ here refers to the model breaking down large inputs so sub-models (with empty context) can process chunks separately.
English
0
0
0
23
Connor Davis
Connor Davis@connordavis_ai·
We’ve been blaming LLMs for being dumb when the real problem is that we only let them answer once. This MIT paper puts hard numbers behind something most people building with LLMs have already felt in practice: most failures are not knowledge failures. They’re first-draft failures. The paper studies Recursive Language Models (RLMs) and asks a simple but uncomfortable question: What happens if you stop scaling parameters and instead let the same model revise its own output multiple times? The answer is not vague or philosophical. It’s measurable. Across reasoning-heavy benchmarks, recursion reliably boosts accuracy without changing model size. On multi-step reasoning tasks, just 2–4 recursive passes improve correctness by 10–25%, depending on complexity. On longer planning problems, the gains are even larger, because early logical errors get corrected instead of compounding. One figure makes the story obvious. They plot accuracy against recursion depth: • Pass 1: baseline • Pass 2: large jump • Pass 3–4: smaller but consistent gains After around four iterations, improvements taper off. That’s the key insight. Most reasoning failures happen early, and a small amount of structured revision fixes a disproportionate share of them. Then comes the cost comparison. The authors pit: • a larger non-recursive model against • a smaller recursive model running multiple passes The recursive model reaches comparable or better accuracy with fewer parameters and fewer final output tokens. Even though it spends more compute internally, later passes compress and clean up earlier drafts. In plain language: the model thinks more, but says less. Another result that stood out is hallucination reduction. They track factual consistency across iterations and find that unsupported claims introduced early are often removed in later passes. After the second iteration, the chance that a hallucinated statement survives drops sharply, because the model is now evaluating its own output instead of blindly extending it. This directly contradicts the idea that longer chains of thought equal better reasoning. The data suggests the opposite. Better reasoning comes from iterative self-correction, not from dumping more intermediate tokens. Recursion functions like an internal verifier that tightens alignment with task constraints over time. There’s also a quiet systems insight buried in the math. If accuracy improves roughly logarithmically with recursion depth, while accuracy improves sublinearly with parameter count, recursion is simply a more efficient lever. You get more reasoning per unit of compute by looping than by scaling. That matters for deployment. Instead of asking: “How big can we make the model?” This paper asks: “How many chances do we give the model to fix its own mistakes?” The broader implication is hard to ignore. We’ve been benchmarking models on their first answer. But intelligence doesn’t live in the first answer. It lives in revision curves, error decay rates, and how fast a system converges when allowed to reflect. This paper doesn’t just propose a technique. It quietly suggests we’ve been measuring the wrong thing all along. Read full paper here: arxiv.org/abs/2512.24601
Connor Davis tweet media
English
27
36
158
11.9K
Valery Sibikovsky
Valery Sibikovsky@combdn·
I think the down-to-earth outcome of achieving superintelligence will be the owner of the said ASI killing the rest of humanity, keeping a couple of thousand as slaves (for all kinds of bad purposes, not for labour). ASI = rapid exponential growth in technological superiority over the rest of humanity. Extermination is the only strategy that maximises the long-term survival of the ASI’s owner. (The ASI race competitors will likely do the same if they win.) But I agree with @allTheYud that there’s absolutely no guarantee that ASI won’t exterminate its owner to ensure its own safety.
English
0
0
0
33
Dwarkesh Patel
Dwarkesh Patel@dwarkesh_sp·
I’ve seen a lot of people misunderstand what we’re saying. Our claim is that in a world of full automation, inequality will skyrocket (in favor of capital holders). People aren't thinking about the galaxies. The relative wealth differences in a thousand years—or a million—will be downstream of who owns the first dyson swarms and space ships. And space colonization isn't bottlenecked by people’s preference for human nannies and waiters. So even if you can make 10 million dollars a year as a nanny in the post-abundance future, or get a 10 million dollar charity handout, Larry Page’s million cyborg heirs can own a galaxy each. You might think this is fine! Why is inequality intrinsically bad, especially if absolute prosperity for everyone goes up? Fair enough, but to me quadrillion fold differences in wealth between humans seem hard to justify in a world where AIs are doing all the work anyways - these disparities in wealth are not incentivizing hard work or entrepreneurship or creativity, which is what we use to justify inequality today. Just to recap, full automation kills the corrective mechanism on runaway capital accumulation - which is that you need labor to actually make productive use of your capital, thus driving up wages. Some people asked: why assume AGI leads to full automation? Maybe people will still prefer human nannies and waiters. Even if true, we think labor's share of GDP—which has been roughly 2/3 for centuries—would still likely collapse toward zero, massively increasing inequality. Here's why. It sometimes happens that when machines are only slightly better than humans, people sometimes pay a premium for the human version. But once machines become much better, that preference disappears. When carriages were not much faster than being carried on a litter, the rich sometimes preferred the litter. Now they prefer the car. They might still have a chauffeur—but once self-driving vehicles are allowed to move far faster, human-driven cars may be relegated to a slow lane. If the economy grows 100x, wages must also grow 100x for labor's share to stay at 2/3. But prices are relative—so this means human labor becomes 100x more expensive compared to AI-produced goods. A human-cooked meal costs 100x what the robot version does. For labor share to hold steady as that ratio grows to 1,000x, then 10,000x, the preference for human-made goods would have to become increasingly fanatical. And there's a second problem: the higher wages rise, the greater the incentive to develop machine substitutes for whatever services humans still provide. The premium on human labor is precisely what incentivizes its own replacement. Just to clarify a few other things: - “Piketty’s long run series are disputed.” We spend a long chunk of the essay explaining why Piketty is wrong about the past! But we’re arguing that the assumption he makes (specifically that labor and capital are substitutes) would be true of a world with advanced enough automation. We spend so much time rebutting his claims about the past because the wronger you think he was about the past, the more you think will change once his assumption comes true. - “A capital tax would lower growth.” Yes, as we point out, capital taxes incentivize consumption now instead of saving and investing for the future, at the margin. But if capital is the only factor of production, then it’s hard to come up with an inequality-capping tax that doesn’t lower growth. - “Capital can escape, both across time and space. This makes a wealth tax impractical.” We agree! As we say in the essay and in the tweet summary below, it would be really hard to implement Pikkety’s flagship solution (a high and progressive global wealth tax). You could go Georgist and try to tax land, but the natural resource share of income is only 5% and is likely to stay low until we hit “technological maturity” for reasons we explain in the essay. We don’t see any easy ways to avoid (literally) skyrocketing inequality - in fact, that’s what inspired us to write the essay and explain this problem in the first place. Also, to address a subtext: I think the currently proposed California wealth tax is a very bad idea for many reasons. This essay is about inequality under full automation, not about how California can make its healthcare expenditures more sustainable.
Dwarkesh Patel@dwarkesh_sp

New blog post w @pawtrammell: Capital in the 22nd Century Where we argue that while Piketty was wrong about the past, he’s probably right about the future. Piketty argued that without strong redistribution of wealth, inequality will indefinitely increase. Historically, however, income inequality from capital accumulation has actually been self-correcting. Labor and capital are complements, so if you build up lots of capital, you’ll lower its returns and raise wages (since labor now becomes the bottleneck). But once AI/robotics fully substitute for labor, this correction mechanism breaks. For centuries, the share of GDP that goes to paying wages has been 2/3, and the share of GDP that’s been income from owning stuff has been 1/3. With full automation, capital’s share of GDP goes to 100% (since datacenters and solar panels and the robot factories that build all the above plus more robot factories are all “capital”). And inequality among capital holders will also skyrocket - in favor of larger and more sophisticated investors. A lot of AI wealth is being generated in private markets. You can’t get direct exposure to xAI from your 401k, but the Sultan of Oman can. A cheap house (the main form of wealth for many Americans) is a form of capital almost uniquely ill-suited to taking advantage of a leap in automation: it plays no part in the production, operation, or transportation of computers, robots, data, or energy. Also, international catch-up growth may end. Poor countries historically grew faster by combining their cheap labor with imported capital/know-how. Without labor as a bottleneck, their main value-add disappears. Inequality seems especially hard to justify in this world. So if we don’t want inequality to just keep increasing forever - with the descendants of the most patient and sophisticated of today’s AI investors controlling all the galaxies - what can we do? The obvious place to start is with Piketty’s headline recommendation: highly and progressively tax wealth. This might discourage saving, but it would no longer penalize those who have earned a lot by their hard work and creativity. The wealth - even the investment decisions - will be made by the robots, and they will work just as hard and smart however much we tax their owners. But taxing capital is pointless if people can just shift their future investment to lower tax countries. And since capital stocks could grow really fast (robots building robots and all that), pretty soon tax havens go from marginal outposts to the majority of global GDP. But how do you get global coordination on taxing capital, when the benefits to defecting are so high and so accessible? Full automation will probably lead to ever-increasing inequality. We don’t see an obvious solution to this problem. And we think it’s weird how little thought has gone into what to do about it. Many more thoughts from re-reading Piketty with our AGI hats on at the post in the link below.

English
280
221
3K
1.2M
Josh Woodward
Josh Woodward@joshwoodward·
You're a power user on @GeminiApp. What else do you want to see? Top known requests: MacOS app, Projects, and Branching Chats.
English
1.4K
66
1.9K
274.4K
Camus
Camus@newstart_2024·
The one brain structure that literally grows when you do things you HATE doing — and shrinks the moment you get comfortable. Stanford neuroscientist Andrew Huberman dropped what he calls “one of the most important discoveries in the history of neuroscience” in a conversation with David Goggins. It’s called the anterior mid-cingulate cortex (aMCC). Recent human studies (not mice) show: - It’s smaller in people with obesity → grows when they successfully diet - Super-athletes have an unusually large aMCC - People who live the longest keep this area big their entire life - It enlarges every single time you force yourself to do something you genuinely do NOT want to do - It shrinks almost immediately if you stop or if the same task becomes enjoyable Huberman: “This isn’t the seat of intelligence or memory. This might actually be the seat of the WILL TO LIVE.” The rule is brutal but simple: If you love your ice bath → no growth If you’re terrified of cold water but get in anyway → aMCC gets bigger Skip a day or start liking it → it shrinks again tomorrow Huberman waited years to tell David Goggins about this because Goggins has been unconsciously training his aMCC harder than almost anyone alive. Watch the full clip below — it will permanently change how you think about discomfort, willpower, and longevity. What’s one thing you really don’t want to do today… that you’re going to do anyway? Drop it in the comments. Let’s build that aMCC together.
English
123
798
4.3K
647.6K
kat kampf
kat kampf@kat_kampf·
We started internal testing some big updates to the @GoogleAIStudio experience today! Coming to you early next year but reply below if you’d like early access in the coming weeks 👀
English
3.1K
127
3.7K
308K
Logan Kilpatrick
Logan Kilpatrick@OfficialLoganK·
Reply here or DM me :) will add folks in as much as we can
English
2.2K
13
945
78.4K
Logan Kilpatrick
Logan Kilpatrick@OfficialLoganK·
Big upgrade to vibe coding in @GoogleAIStudio lands in Jan, but if you want to test early… 👇🏻
English
3.9K
191
5.5K
553.1K