Dominik Lukes

15.4K posts

Dominik Lukes banner
Dominik Lukes

Dominik Lukes

@techczech

Exploring schemas and propositions about language models of all kinds on https://t.co/GU07uzb7Ud and on https://t.co/UfdxBd7jvK.

UK Katılım Nisan 2009
789 Takip Edilen2K Takipçiler
Dominik Lukes
Dominik Lukes@techczech·
Do we think that whoever wrote this definition had anything approaching understanding of what LLMs are?
Dominik Lukes tweet media
English
0
0
0
45
Dominik Lukes
Dominik Lukes@techczech·
@perborgen Sorry, this makes no sense. No human can transfer their coding ability to Brainfuck from Python. Frankly, even assuming a straightforward transfer of skill from Python to Java is a big stretch.
English
0
0
0
37
Per Borgen
Per Borgen@perborgen·
LLM coding benchmarks collapse from 90% to 4% the moment you give them a language they haven’t memorized 📉 This indicates that LLMs alone lack a fundamental part of intelligence. If a human learns to write FizzBuzz in Python, they’ll easily be able to transfer that over to Java as soon as they’ve learned Java’s syntax. LLMs, not so much. To reveal this, the authors picked obscure programming languages where they knew the training data was extremely limited, like Brainfuck and Shakespeare. The LLMs were given all the documentation they needed to understand the syntax of the languages. But even with full docs, they couldn’t “reason” their way to a solution. None of the models managed to solve any of the hard problems. These were SOTA models at the time of writing the paper: GPT 5.2, Gemini 3, etc. This makes me think that LLMs lean just as much into memorization as reasoning when they help us solve problems. For many use cases, that’s not a problem in itself. But I don’t think we’re at AGI or ASI level yet.
Per Borgen tweet media
English
4
1
10
434
Dominik Lukes
Dominik Lukes@techczech·
Everybody has the experience of feeling stupid with something outside of their current level of ability or outside of the ceiling of the top potential level of ability. But most people people by and large can't easily empathize with those whose ceiling of intelligence is much lower than their own. Those people who are truly "cognitively impoverished" with respect to the average baseline. This leads to a lot of misguided infantilization and condescension that masquerades as protectionism. On the one hand, one would like to say that LLMs has the great potential to level the cognitive inequality but I'm not sure that it will be that simple. Thinking about @patio11's podcast about debt collection - I'm wondering what social structures need to develop in order for the AI to be genuinely useful to someone who is being taken advantage of by the system designed to protect them. I'm not sure - "just ask ChatGPT" is the right way of thinking about it because that alone is premised on the kind of cognition of the person who would do that and know what to do after.
English
0
0
0
32
Dominik Lukes
Dominik Lukes@techczech·
The problem is that performance on ARC was never indicative of much more than performance on a general IQ test. ARC1 could have been IQ95 level and accessible to most humans and in that weird pocket of LLM jagged frontier on non-verbal performance that it seemed like a fundamental gap. But now we know it isn't, it means nothing. But at the same time the model's ability to generalise from their experience of vision or 3D world in the way humans do did not actually improve. We just hacked our way to solving a certain class of problems. And it may be that this is perfectly sufficient for having models do what we want them to do. The fundamental element that keeps models from being more useful has always been lack of out-of-context learning - meaning keeping traces of what they've "seen" before but that is not in their context window anymore.
English
0
0
0
18
Patrick Senti
Patrick Senti@productaizery·
@techczech @fchollet That, and it also means that AI labs have finally managed to get enough training material to curve fit LLMs to do well on the test. Now the test is a lot less informative.
English
1
0
0
17
François Chollet
François Chollet@fchollet·
When the latest AI systems can't do something, there's a category of people who will immediately say, "well humans can't do it either!" - Then they stop saying it when AI improves a bit. Been hearing it for 4+ years, "humans can't reason either", "humans can't adapt to a task they haven't been prepared for", "humans can't follow instructions", "humans also suffer from hallucinations", etc. Until 2025 I was frequently told "humans can't do ARC 1 tasks either" (in reality any normally smart human would do >95% on ARC 1 if properly incentivized). Now that AI saturates ARC 1 they've completely stopped saying this.
François Chollet@fchollet

In general I've been sensing a new current deep learning maximalists recently, going from "our models can definitely reason" to "well our models can't reason, but neither can humans!"

English
72
24
316
37.4K
Dominik Lukes
Dominik Lukes@techczech·
@OfficialLoganK Since, this is powered by Antigravity (which I don't fully understand), some sort of hand off from Antigravity on the desktop to the studio or vice versa would be great.
English
0
0
0
74
Logan Kilpatrick
Logan Kilpatrick@OfficialLoganK·
Our AI Studio vibe coding roadmap for the new few weeks: - Design mode - Figma integration - Google Workspace integration - Better GitHub support - Planning mode - Immersive UI - Agents - Multiple chats per app - Simplified deploys - G1 support And more, should be fun : )
English
250
125
2.1K
93.8K
Dominik Lukes
Dominik Lukes@techczech·
@OfficialLoganK Great. Pretty much everything on my wishlist. But please, can you somehow at least partially divorce it from dependence on GCP projects? This makes it very hard to use in a university environment.
English
0
0
0
80
Dominik Lukes
Dominik Lukes@techczech·
Everybody is talking about AGENTS but most people don't have a clear idea of what human models of agents to apply. Let's compare these four that all differ in both the level of agency and automation potential. - travel agent - mostly disappeared despite the role not being automatable as such because the tools the agent was using became available to the public and many of the information collation and evaluation tasks were automated and socially distributed - customer service agent - partly automated through traditional automation and about to be fully automated through AI - real estate agent - many tasks fully automatable (discovery, contracting, communication, etc.) but overall very resistant to automation because of the need for coordination within the physical world - FBI agent - highly resistant to automation because of the need of physical action However, many of the AI agents are best suited to replace roles that are not traditionally labeled as "agent": - Web design agent - Literature research agent - Expense claim agent - Diary management agent Etc. What determines whether an AI agent can be successful? Task is decompisable into steps, steps can be executed autonomously and do not depend on large amount of learning, the tasks can be accomplished digitally through code.
English
1
0
1
77
Dominik Lukes
Dominik Lukes@techczech·
@OfficialLoganK It's great but for existing users, there's no signal that anything is new. If I didn't read this I might not have noticed.
English
0
0
0
160
Dominik Lukes
Dominik Lukes@techczech·
@OfficialLoganK @GoogleAIStudio All great features. But the one thing that is missing - is the planning mode - it is in Antigravity. That would make the experience much more like Lovable.
English
0
0
0
67
Dominik Lukes
Dominik Lukes@techczech·
I think that a very reasonable reading of that tweet is that it was Amodei who was on the hot mic. That's what I went looking for. I couldn't find it easily, I had to ask Claude to help me find it - tbh I couldn't bear reading that swill closely. So, I can imagine how somebody going to the end of the article and finding the scammy reveal of the made up interview with Amodei would be confused by that. And the tweet and headline are clickbait by any sensible definition of the term. So, I'd say the note is 80% accurate - more than can be said for article.
English
0
0
4
605
Andy Masley
Andy Masley@AndyMasley·
First time I’ve disagreed with a community note! A little confused by the reactions to this. 1) There is a real hot mic moment that imo is nothing, but it is real 2) Everyone’s replying to Julia implying she thinks the AI Dario was on the hot mic, but that’s obviously not what she’s saying 3) I agree that the article is terrible overall
Julia Black@mjnblack

there's a truly bonkers hot mic moment at the end of this that may change the way you think about anthropic you're gonna want to read all the way through this one vanityfair.com/news/story/dar…

English
12
1
90
16.3K
Dominik Lukes
Dominik Lukes@techczech·
I do not trust any framework that appeals to "collective values" without specifying some mechanistic theory of how values are encoded in individuals and societies and drive actions.
English
0
0
0
32
Dominik Lukes
Dominik Lukes@techczech·
In the history of learning and expertise, autodidacts have tended to the crackpotish because they acquired knowledge out of its social context - they were never part of the loops of feedback and correction the learning community provides. With social media, YouTube, podcasts and now AI, we have a potential for a true autodidactic revolution because these new tools allow us to simulate epistemic communities we had relied on. Crackpots will still be a big feature of the epistemic landscape, but I think these new technologies will preserve some of the greatest self-learning minds from developing into crackpots.
English
0
0
1
54
Dominik Lukes
Dominik Lukes@techczech·
This is clearly wrong. The great agents of history depended on great powers of self-observation but were also able to strategically deploy a lot of selective introspective blindness. Meaning that they tolerate great inconsistencies in their beliefs and actions. But they are very good at self-observation and identifying how they can get the most out of their minds, bodies and their environment - they are just able to target this introspective power in a directed way. Some of the great 'introspectors' such as Confucius or Plato were by and large the least agentic. But the opposite is not true. Alexander the Great, Napoleon, or others were very introspective when it came to their own process. They were just converting this introspective power into action rather than self-doubt.
David Senra@davidsenra

Great men of history had little to no introspection. The personality that builds empires is not the same personality that sits around quietly questioning itself. @pmarca and I discuss what we both noticed but no one talks about: David: You don't have any levels of introspection? Marc: Yes, zero. As little as possible. David: Why? Marc: Move forward. Go! I found people who dwell in the past get stuck in the past. It's a real problem and it's a problem at work and it's a problem at home. David: So I've read 400 biographies of history’s greatest entrepreneurs and someone asked me what the most surprising thing I’ve learned from this was [and I answered] they have little or zero introspection. Sam Walton didn't wake up thinking about his internal self. He just woke up and was like: I like building Walmart. I'm going to keep building Walmart. I'm going to make more Walmarts. And he just kept doing it over and over again. Marc: If you go back 400 years ago it never would've occurred to anybody to be introspective. All of the modern conceptions around introspection and therapy, and all the things that kind of result from that are, a kind of a manufacture of the 1910s, 1920s. Great men of history didn't sit around doing this stuff. The individual runs and does all these things and builds things and builds empires and builds companies and builds technology. And then this kind of this kind of guilt based whammy kind of showed up from Europe. A lot of it from Vienna in 1910, 1920s, Freud and all that entire movement. And kind of turned all that inward and basically said, okay, now we need to basically second guess the individual. We need to criticize the individual. The individual needs to self criticize. The individual needs to feel guilt, needs to look backwards, needs to dwell in the past. It never resonated with me.

English
0
0
1
40
Andy Masley
Andy Masley@AndyMasley·
I've been very dissatisfied with a lot of the arguments against the stochastic parrot idea, because they seemed to just rely on surface level "Well the AI can obviously do things" observations. This is the best response I've read, with an especially compelling intro.
SE Gyges@segyges

@B_Jowett i do actually! open.substack.com/pub/verysane/p…

English
7
3
61
7.9K
Rohan Varma
Rohan Varma@rohanvarma·
Some interesting data we pulled today showed that ~40% of Codex users use multiple surfaces, between the App, CLI, and IDE extensions. Everyone seems to have a primary preference, but a bigger-than-expected chunk of users launch codex agents outside of their primary interface. If you use multiple surfaces, I'm curious why? And what could we do to improve the experience?
English
158
3
280
33.7K
Andy Masley
Andy Masley@AndyMasley·
Yeah this point does confuse me a lot. Dune is "a cautionary tale about messianic leaders" and yet in later books it's revealed that Paul's actions were the only way to keep humanity free. The messianic leader was in fact smarter than everyone else and had to use cold utilitarian logic to do what was best. He literally is a messianic leader saving everyone. That doesn't seem like the message people often scold you into taking away!
itamar@ItamarLevyOr

@AndyMasley You need to understand that even though paul atreides can canonically see the future, and is following the only course of action that persists humanity, he is actually a bad man.

English
178
101
2.8K
233.5K
Dominik Lukes
Dominik Lukes@techczech·
@AndyMasley This, on the other hand, is dead on. Clearly the latent space represents something inherently semantic no matter what it was learned on and the learning modality.
Dominik Lukes tweet media
English
0
0
1
10
Dominik Lukes
Dominik Lukes@techczech·
@AndyMasley I don't think this claim stands up to scrutiny. The way the images are tokenised actually does not particularly give the models the grounding they mean. It's just more things mapped into the latent space. Doesn't make Bender et al. any less wrong, but this one is easy to refute.
Dominik Lukes tweet media
English
2
0
2
285